Effects of eye movement modeling examples on adaptive expertise in medical image diagnosis

Effects of eye movement modeling examples on adaptive expertise in medical image diagnosis

Computers & Education 113 (2017) 212e225 Contents lists available at ScienceDirect Computers & Education journal homepage: www.elsevier.com/locate/c...

888KB Sizes 0 Downloads 8 Views

Computers & Education 113 (2017) 212e225

Contents lists available at ScienceDirect

Computers & Education journal homepage: www.elsevier.com/locate/compedu

Effects of eye movement modeling examples on adaptive expertise in medical image diagnosis €ljo €e Andreas Gegenfurtner a, *, Erno Lehtinen b, Halszka Jarodzka c, d, Roger Sa a

Technische Hochschule Deggendorf, Germany University of Turku, Finland c Open University of the Netherlands, The Netherlands d Humanities Lab, Lund University, Sweden e University of Gothenburg, Sweden b

a r t i c l e i n f o

a b s t r a c t

Article history: Received 7 March 2016 Received in revised form 29 May 2017 Accepted 1 June 2017 Available online 3 June 2017

Research indicates that expert performance is domain specific and hardly transfers to novel tasks or domains. However, due to technological changes in dynamic work settings, experts sometimes need to adapt and transfer their skills to new task affordances. The present mixed method study investigates whether eye movement modeling examples (EMME) can promote adaptive expertise in medical image diagnosis. Performance, eye tracking, and think-aloud protocol data were obtained from nine medical experts and fourteen medical students. Participants interpreted dynamic visualizations before (baseline) and after (retention, transfer) viewing an expert model's eye movements. Findings indicate that studying eye movement modeling examples had positive effects on performance, task-relevant fixations, and the use of cognitive and metacognitive comprehension strategies. Effects were stronger for the retention than for the transfer task. Medical experts benefitted more from the modeling examples than did medical students. Directions for future research and implications for related domains are discussed. © 2017 Elsevier Ltd. All rights reserved.

Keywords: Adaptive expertise Eye movement modeling example (EMME) Mixed methods Medical image diagnosis Transfer

1. Introduction Research on expertise has shown that expert performance is domain specific and hardly transfers to novel tasks or other € m, 2013; Gegenfurtner & Szulewski, 2016; Jaarsma, Jarodzka, Nap, van domains (Bertram, Helle, Kaakinen, & Svedstro €nboer, & Boshuizen, 2015; Litchfield & Donovan, 2016). However, due to technological changes in dynamic work Merrie settings, experts face situations in which they need to adapt and transfer their skills to new task affordances (Gegenfurtner & €nen, 2013; Lehtinen, Hakkarainen, & Palonen, 2014). The present study investigates whether an eye movement Seppa modeling example (EMME; Jarodzka, Van Gog, Dorr, Scheiter, & Gerjets, 2013; Mason, Pluchino, & Tornatora, 2015; Sepp€ anen & Gegenfurtner, 2012; Van Gog, Jarodzka, Scheiter, Gerjets, & Paas, 2009) can promote adaptive expertise. 1.1. Adaptive expertise Expertise is often defined as being specific to the domain in which it has developed. Studies show that experts are more €ljo € , 2011; Jarodzka, accurate in domain-specific task performance than novices (Ericsson, 2004; Gegenfurtner, Lehtinen, & Sa € rlitz-Platz 1, 94469 Deggendorf, Germany. * Corresponding author. Technische Hochschule Deggendorf, Dieter-Go E-mail address: [email protected] (A. Gegenfurtner). http://dx.doi.org/10.1016/j.compedu.2017.06.001 0360-1315/© 2017 Elsevier Ltd. All rights reserved.

A. Gegenfurtner et al. / Computers & Education 113 (2017) 212e225

213

€nboer, 2017). However, Jaarsma, & Boshuizen, 2015; Krupsinki, 2010; Szulewski, Gegenfurtner, Howes, Sivilotti, & Van Merrie if domains or their constitutive elements like artifacts, work practices, or routines change, then experts need to adapt their knowledge and skills to the changing affordances. This adaptive expertise can be defined as the ability to modify expert routines to changing tasks in a domain (Gegenfurtner, 2013; Hatano & Inagaki, 1986). This transfer of expertise is not always €nen, 2013), successful and may be compromised by cognitive biases (Feltovich, Spiro, & Coulson, 1997; Gegenfurtner & Seppa thus experts may benefit from instructional guidance. However, little research exists on how experts can be instructionally guided when they are confronted with novel, unfamiliar situations. This is a relevant question in many technology-rich professional arenas. Exemplarily, the present study addresses this gap in the context of visual expertise in medicine. €ljo € , 2013; Several recent studies illuminate visual expertise in medicine (Gegenfurtner, Siewiorek, Lehtinen, & Sa Gegenfurtner, Kok, Van Geel, De Bruin, & Sorger, in press; Gruber, Jansen, Marienhagen, & Altenmüller, 2010; Jarodzka, €nboer, Holmqvist, & Gruber, 2017; Norman, Eva, Brooks, & Hamstra, 2006; Kok, 2016; see also Gegenfurtner & Van Merrie in press). For example, Balslev et al. (2012) demonstrated that clinicians with higher levels of expertise ignore taskredundant information in patient video cases more often than participants with lower levels of expertise. Wilson and colleagues (2010) highlighted experts’ strategic considerations to selectively allocate attentional resources to taskrelevant information. Litchfield and Donovan (2016) showed that experts are less biased toward distracting information than novices are. Although these studies are highly informative about routine expertise in medicine, they are limited in informing us about adaptive expertise. The medical domain is very apt for studying adaptive expertise because technologies for producing medical visualizations are dynamic work artifacts that change, with new kinds of visualizations currently €ljo €, 2009; Helle et al., 2011). An introduced to the medical workplace at a high pace (Gegenfurtner, Nivala, Lehtinen, & Sa example of a newly introduced imaging technology is the combination of computer tomography (CT) and positron emission tomography (PET), which creates a new kind of fusion picture, PET/CT (Gegenfurtner & Sepp€ anen, 2013; Sepp€ anen, 2008). CT is a typical technology in radiology that visualizes human anatomy. In contrast, PET is a typical technology in nuclear medicine that visualizes metabolism and the functional processes of the body. Experts in these two domains, radiology and nuclear medicine, need to show transformative agency (Dams¸a, Froehlich, & Gegenfurtner, in press) and adapt their skills to €nen, 2013). This is a classic scenario of the novel task affordances around PET/CT visualizations (Gegenfurtner & Seppa adaptive expertise. However, little is known how we can instructionally guide experts to adapt. Evidence suggests that a transfer of expertise is possible in powerful learning environments. Specifically, Feltovich et al. (1997) indicated that medical experts may be able to flexibly adapt their biomedical knowledge to highly atypical cases. Hatano and Inagaki (1986) as well as Schwartz, Bransford, and Sears (2005) expressed similar conclusions, arguing that instructionally guided adaptive expertise (or “preparation for future learning”) is possible in an “optimal adaptability corridor” (Schwartz et al., 2005). Building on this research, the present study aims to test whether transfer of expertise in the comprehension of visualizations can be promoted with modeling examples and if prior expertise in other fields of medicine mediates the effects of the intervention. Several theories explain the processes that underlie expertise in the comprehension of visualizations. First, the information-reduction hypothesis (Haider & Frensch, 1999) focuses on the learned selectivity of information processing. This theory suggests that expertise optimizes the amount of processed information by neglecting task-redundant information and actively focusing on task-relevant information. Second, the theory of long-term working memory (Ericsson & Kintsch, 1995) focuses on changes in memory structures. This theory assumes that expertise extends the capacities for information processing owing to the acquisition of retrieval structures. If it is true that medical expertise increases the selective allocation of attentional resources and speeds the retrieval of knowledge stored in long-term memory, then these changes should be reflected in trackings of eye movements and recordings of think-aloud protocols. 1.2. Eye movement modeling examples Modeling examples provide a solution procedure to a given problem and demonstrate the processes underlying task completion, which can then be observed by the learner (Collins & Kapur, 2014; Jarodzka et al., 2013; Mason et al., 2015; Van Gog et al., 2009). In visual tasks, some of these underlying mental processes include the allocation of attentional resources to task-relevant information. Therefore, one approach that is used in the research on modeling examples is to use gaze replays of the experts. By replaying the eye movements of experts, learners can observe where, for how long, and in which order experts fixate on information that is relevant to solving the task. Evidence suggests that eye movement modeling examples (EMME) are effective in guiding novices’ attention and thought (Boekhout, Van Gog, Van de Wiel, Gerards-Last, & Geraets, 2010; Jarodzka et al., 2012, 2013; Kundel, Nodine, & Krupinski, 1990; Litchfield, Ball, Donovan, Manning, & Crawford, 2010; Mason €nen & Gegenfurtner, 2012; Velichkovsky, 1995). However, et al., 2015; Nalanagula, Greenstein, & Gramopadhye, 2006; Seppa Van Gog et al. (2009) found that EMME had detrimental effects on novice learning; these authors recommended testing the effectiveness of EMME with perceptually more complex tasks and under conditions of information transience (when taskrelevant information appears and disappears). Following these recommendations, the present study uses a task that is perceptually complex because the visualization is three-dimensional and user-controlled. Second, the task includes transient information because the visualization is dynamic. The present study focuses not only on naïve novices but also includes a group of experts. This also allows testing whether eye movement modeling examples induce an expertise reversal effect (Chen, Kalyuga, & Sweller, 2017). Before we list the hypotheses of the study, the following section discusses instructional characteristics of eye movement modeling examples.

214

A. Gegenfurtner et al. / Computers & Education 113 (2017) 212e225

1.3. Instructional characteristics of eye movement modeling examples What are the instructional characteristics of eye movement modeling examples? Based on the cognitive theory of multimedia learning (Mayer, 2014), and the cognitive apprenticeship framework (Collins & Kapur, 2014), several characteristics can be identified. First, EMME can promote cognitive comprehension strategies. Cognitive comprehension strategies are processes of selecting, organizing, and integrating material that we attend to (Mayer, 2014). For example, when students in medical education programs learn how to diagnose cancer of plasma cells from positron emission tomographs (Sepp€ anen, 2008), they can infer how an expert radiologist selects relevant parts of the presented material, such as areas of the tomograph where malignant cells are likely to be detected. Students can infer how the expert organizes each detected detail through visual processing in their working memory into a pictorial model (Mayer, 2014)din this example, a mental representation of the presented patient case. Once this mental representation is organized, students can then infer how the expert integrates the organized material with prior knowledge, such as the clinical knowledge of previously experienced patient cases or the declarative biomedical knowledge of disease schemata. When gaze replay is combined with verbal explanations, students can also develop accurate technical terminology and minimize the use of vernacular jargon during diagnosis. In summary, and based on these considerations, EMMEs are expected to have positive effects on cognitive comprehension strategies in the interpretation of visualizations. Second, eye movement modeling examples can promote metacognitive comprehension strategies. Metacognitive comprehension strategies include heuristic strategies, control strategies, and learning strategies (Collins & Kapur, 2014). Specifically, with reference to the previous example of diagnosing cancer, heuristic strategies are approaches used by experts for the successful completion of a task, such as moving through the tomograph systematically from head to toe. When they observe experts, students can understand not only these “tricks of the trade” (Collins & Kapur, 2014) but also identify the circumstances when their use is effective. Control strategies are approaches used by experts to monitor the process of task completion, such as assessing how to proceed in a difficult case. When observing experts, novice learners can understand when and how to self-assess and remedy current comprehension difficulties in task completion. Learning strategies are approaches for learning new concepts, facts, and procedures. An example is seeking help when facing problem situations. In summary, and based on these considerations, eye movement modeling examples are expected to have positive effects on metacognitive comprehension strategies in the interpretation of visualizations.

1.4. The present study In the current research, we compared diagnostic performance by medical experts and novices in interpreting PET/CT visualizations before and after exposure to eye movement modeling examples. Participants were experts in PET or experts in CT, but not experts in PET/CT. The task involved diagnosing three patient cases that were displayed in the unfamiliar PET/CT mode: one before watching the modeling example (baseline task); one after watching the modeling example, which was identical to the modeled case (retention task); and, in addition, one case afterwards that was different and more difficult than the modeled case (transfer task). We assumed that diagnostic performance would be higher after the modeling example than before it (Hypothesis 1). We also assumed that watching eye movement modeling examples is effective in promoting cognitive (Hypothesis 2) and metacognitive comprehension strategies (Hypothesis 3), as indicated by eye movements and think-aloud verbal reports. To more deeply examine the effects of modeling examples on how experts diagnose unfamiliar visualizations, we contrasted the medical experts' performance with medical students’ performance and compared their relative improvement. The novices were thus an important reference group for understanding whether experts in unfamiliar tasks maintain high levels of performance. We assumed that expertise in other fields of medicine has prepared the expert participants for future learning and they would thus benefit more from the eye movement modeling example than the novice participants (Hypothesis 4).

2. Method 2.1. Participants Participants in the study included 23 individuals (11 women, 12 men, Mage ¼ 30.74 years, SD ¼ 10.56) at two levels of expertise. The expert group consisted of nine medical professionals (one woman, eight men, Mage ¼ 40.22 years, SD ¼ 9.32) selected from the radiology and nuclear medicine departments of a university hospital. All experts were peer nominated, board certified, interpreted PET or CT visualizations on a daily basis, and had their diagnostic accuracy tested prior to the experiment (M ¼ 0.85, SD ¼ 0.04). The novice group consisted of 14 first-, second-, and third-year students of medicine (ten women, four men, Mage ¼ 22.75 years, SD ¼ 2.63); none of the students reported prior knowledge in interpreting medical visualizations. All participants were asked to bring corrective eyewear if needed, resulting in normal or corrected-to-normal vision for all participants. Anonymity and confidentiality were guaranteed for all participants, with written informed consent obtained prior to the experiment.

A. Gegenfurtner et al. / Computers & Education 113 (2017) 212e225

215

2.2. Design The study followed a 3  3 mixed-method design with a between-subject factor expertise (PET experts vs. CT experts vs. novices) and a within-subject factor case (baseline vs. retention vs. transfer) and employed quantitative analyses of eye movements, time-on-task, and diagnostic performance as well as qualitative and quantitative analyses of verbal data. The independent variable was task (baseline, retention, transfer). The dependent variables were diagnostic performance (accuracy, sensitivity, specificity, time-on-task), eye movements (number of fixations and fixation duration on task-relevant and task-redundant information), and think-aloud verbal reports (verbalizations on technology, cognitive comprehension, metacognitive comprehension, and solution). The following paragraphs describe each element of the study design in greater detail: the measures, material, and the experimental procedure. 2.2.1. Measures The experiment used the following measures, including eye movements, think-aloud protocols, time-on-task, and diagnostic performance. Each measure is specified below. 2.2.1.1. Eye movements. Eye movements were recorded with the Tobii T60XL remote eye tracking system with a temporal resolution of 60 Hz and were analyzed with Studio 2.0.3 software (www.tobii.com). A fixation was defined with a velocity of 35 msec and a distance threshold of 35 pxl. Even though participants dealt with dynamic stimuli, fixation parameters were chosen because all motion was on a fixed spot and no motion across the screen was present, making smooth pursuit (specific eye movements that occur when following a moving object; Holmqvist et al., 2011) very unlikely. Each part of the visualization that contained information relevant for the diagnosis was defined as an area of interest. Areas of interest (AOIs) were created manually. As the visualization was dynamic and user-controlled, AOIs were transient and of varying size. The total eye movement recording was thus segmented, with the length of each segment determined by the maximum amount of time for which AOIs were visible within each segment. Data for each AOI were aggregated to determine (1) the number of fixations on task-relevant areas, (2) the number of fixations on task-redundant areas, (3) fixation duration on task-relevant areas, and (4) fixation duration on task-redundant areas. These eye movement parameters were selected because they indicate the processes of information reduction and selective attention allocation (Haider & Frensch, 1999; Holmqvist et al., 2011). 2.2.1.2. Think-aloud protocols. Concurrent verbal reports of thinking aloud during task engagement were recorded with the Tobii T60XL, using a standard microphone attached to the stimulus PC. Two trained coders coded the protocols using the NVivo 9.1 software. Codes were segmented following Strijbos, Martens, Prins, and Jochems' (2006) alternative unit of analysis and segmentation procedure. Cohen's k ¼ 0.88. Table 1 presents the coding scheme and coding examples. Specifically, the coding scheme includes 15 codes covering four general dimensions: (1) a technology dimension, which reflected

Table 1 Coding scheme and coding examples of verbal data. Category

Code

Protocol segment

Technology dimension Interacting with the visualization

T-IV T-CV

“Next let's move to these axillary eh axial images and let's get that larger again” “So I just have to think then which way around these images were”

S-RT S-RV S-IT S-IV

“Now I see this high uptake of tracer here“ “There is some red thing here“ “This is the ventricle wall of the upper stomach“ “This looks like the stomach“

O-RF O-IF

“This is not the heart because heart is not seen so this has to be tumor also” “Those could be the bowel, those black spaces“

I-RB I-RC

“Why would the tumor look like that“ “It's too big compared to what I have seen in other patients before”

M-HS

“In the usual way we'll start from the transaxial slices looking from the top downward” “I would hesitate to claim that this is cancer. It could be. But I'm not sure” “I guess this is where I would call XX to ask him what he thinks about this”

Commenting on the visualization Cognitive comprehension dimension Selecting data Selecting relevant features in technical terms Selecting relevant features in vernacular terms Selecting irrelevant features in technical terms Selecting irrelevant features in vernacular terms Organizing data Organizing relevant features Organizing irrelevant features Integrating data with prior knowledge Retrieving biomedical/declarative knowledge Retrieving clinical/experiential knowledge Metacognitive comprehension dimension Using heuristic strategies Using control strategies

M-CS

Using learning strategies

M-LS

Problem solution dimension Stating a correct problem solution Stating an incorrect problem solution

P-SC P-SI

“Mediastinum is positive for cancer both sides“ “This is prostate cancer“

216

A. Gegenfurtner et al. / Computers & Education 113 (2017) 212e225

verbalizations of interacting with and commenting on the visualization tool; (2) a dimension of cognitive comprehension, which reflected verbalizations of selecting data, organizing data, and integrating data with long-term memory; (3) a dimension of metacognitive comprehension, which reflected verbalizations of using heuristic, control, or learning strategies; and (4) a solution dimension, which reflected verbalizations of correct or incorrect problem solutions (Collins & Kapur, 2014; Haider & Frensch, 1999; Mayer, 2014). 2.2.1.3. Time-on-task. The total time that participants spent on the task was automatically recorded with the Tobii T60XL. 2.2.1.4. Diagnostic performance. Diagnostic performance was assessed via the participants' written diagnoses. Findings in the diagnoses were classified as true positive (TP; diagnosing an abnormal feature as abnormal), true negative (TN; diagnosing a normal feature as normal), false positive (FP; thinking a feature is abnormal when it is not), and false negative (FN; thinking a feature is normal when it is not). Interrater reliability of coding a random subset of 10% of the diagnoses was high (Cohen's k ¼ 0.90), so one rater continued to code the remaining diagnoses (McHugh, 2012). Standard formulas of receiver operating characteristics analysis were used to calculate the accuracy, sensitivity, and specificity of the diagnoses (Fawcett, 2006). As (Fawcett (2006), p. 861) notes: “Given a classifier and an instance, there are four possible outcomes. If the instance is positive and it is classified as positive, it is counted as a true positive; if it is classified as negative, it is counted as a false negative. If the instance is negative and it is classified as negative, it is counted as a true negative; if it is classified as positive, it is counted as a false positive.” Following the formulae reported in Fawcett (2006), accuracy reflects the sum of true positives and true negatives divided by the sum of all positives and negatives.

Accuracy ¼

TP þ TN PþN

Sensitivity reflects the number of true positives divided by the sum of true positives and false negatives.

Sensitivity ¼

TP TP þ FN

Specificity reflects the number of true negatives divided by the sum of false positives and true negatives.

Specificity ¼

TN FP þ TN

2.2.2. Material 2.2.2.1. Patient cases. A medical expert who did not participate in the study selected three patient cases from a validated test bank of anonymized patient cases in a university hospital. Diagnoses of these patient cases were available in the form of institutional reports obtained and proved at core biopsy and excision, so the selected stimulus material reflected representative task material under controlled laboratory conditions. Each patient case contained 550 static, two-dimensional, wholebody scans of the patient (275 PET scans and 275 CT scans) that together formed one dynamic, three-dimensional PET/CT visualization of the human anatomy and its functions. Fig. 1 displays a screenshot of the PET/CT visualization. Patient cases were displayed with the Carimas 2.0 image analysis software in a DICOM (Digital Imaging and Communications in Medicine) standard used in hospitals worldwide, sized 1920  1200 pxl, on a 2400 TFT monitor. The Carimas software displays PET/CT patient cases in four views: the transaxial view was always on the observer's bottom left side, with the coronal view on the top left, and the sagittal view on the top right, and a merged view on the bottom left, as shown in Fig. 1. 2.2.2.2. Eye movement modeling example. The eye movement modeling example consisted of a digital video with a duration of 158 s. The expert was asked to model a typical procedure of how to go about solving tasks of this kind. The digital video captured the model's eye movements, concurrent think-aloud protocols, and screen actions (key strokes and mouse clicks). The video was created with the replay mode of the Tobii T60XL eye tracking system (www.tobii.com), using a temporal resolution of 60 Hz. Following the Tobii default settings, a fixation was defined with a velocity of 35 msec and a distance threshold of 35 pxl. The replay mode represented eye movements as red dots; participants were told that these red dots indicated the model's eye movements in real time and that the dots became bigger the longer the expert looked at a particular spot. Fig. 2 shows an example screenshot of the eye movement modeling example. 2.2.3. Procedure Before beginning the actual experiment, participants performed three practice trials in thinking aloud and one practice trial with an additional PET/CT patient case to assure compliance with the task instructions. The experimental setup was tested with four pilot participants; minor revisions in instructions and practice tasks were reflected in the final version of the procedure.

A. Gegenfurtner et al. / Computers & Education 113 (2017) 212e225

217

Fig. 1. A screenshot of the PET/CT visualization.

Fig. 2. A screenshot of the eye movement modeling example.

The experiment was run in individual sessions of approximately 30 min. The participants’ task was to diagnose the patient cases, that is, to visually inspect PET/CT visualizations and to render an interpretation (Balslev et al., 2012; Morita et al., 2008). After having diagnosed the first patient case (baseline task), participants watched a modeling example of an expert model diagnosing a second patient case (learning phase). After studying the modeling example, participants diagnosed the same case on their own (retention task) and then a third case to estimate the transfer of the modeled processes (transfer task). Participants’ eye movements were captured during the baseline, the retention, and the transfer task. At the beginning of each task, the eye tracking system was adjusted to the individual features of the participant based on a nine-point calibration. Participants were seated approximately 60 cm from the display and received the following instruction: “You will see a patient

218

A. Gegenfurtner et al. / Computers & Education 113 (2017) 212e225

case. Diagnose the patient case the same way as you do in your normal work. There is no time limitation. Remember to keep your head in a stable position. Please keep thinking out loud while you do the diagnosis. When you are ready to continue, press the SPACE bar.” Participants started the trials by pressing the space bar. Participants stopped the task by uttering “ready” when they thought they had reached a diagnosis. At the end of each trial, participants were asked to provide a written diagnosis. Participants were unaware as to whether or not there were any abnormalities or as to how many abnormalities were present on each case. 3. Results An alpha level of 0.05 was used for the statistical tests. Each section presents between-group and within-group analyses. The between-group analyses are 2 (group) X 3 (task) mixed ANOVAs followed by pairwise comparisons. The within-group analyses are t-tests of paired samples. Results are presented for diagnostic performance, time-on-task, eye movements, and think-aloud protocols. 3.1. Diagnostic performance Table 2 presents mean and standard deviation estimates of diagnostic performance measures by task and participant group. Diagnostic performance was measured in terms of accuracy, sensitivity, and specificity. 3.1.1. Accuracy A 2 (group) X 3 (task) mixed ANOVA showed a significant main effect of task, F(2,42) ¼ 16.44, p < 0.001. Bonferroni corrected post hoc tests showed that accuracy in the retention task was significantly higher than both baseline and retention tasks (both ps ¼ 0.001). There was a significant Group  Task interaction, F(2,42) ¼ 4.66, p ¼ 0.015, h2 ¼ 0.956. This effect indicates that accuracy in the three tasks significantly differed between experts and novices. This interaction effect also signals that the change in accuracy was different for the two groups; more precisely, among experts, there was an increase from baseline to both the retention task and the transfer task whereas among novices there was only an increase from baseline to the retention task but, importantly, not to the transfer task. Fig. 3 visualizes this interaction effect in the accuracy estimates. Pairwise comparisons showed that experts had significantly higher accuracy than novices in the baseline, F(1,21) ¼ 49.94, p < 0.001, Cohen's d ¼ 2.77, retention, F(1,21) ¼ 23.75, p < 0.001, Cohen's d ¼ 2.00, and transfer tasks, F(1,21) ¼ 85.29, p < 0.001, Cohen's d ¼ 3.78. Within-group t-test estimates demonstrate that accuracy increased from baseline to retention task for experts, t(8) ¼ 2.89, p ¼ 0.02, Cohen's d ¼ 0.55, and for novices, t(13) ¼ 5.50, p < 0.001, Cohen's d ¼ 1.94. 3.1.2. Sensitivity There was a main effect of task F(2,42) ¼ 17.26, p < 0.001. 3.1.3. Specificity There was a main effect of task F(2,42) ¼ 3.16, p ¼ 0.05, and a significant Group  Task interaction, F(2,42) ¼ 4.10, p < 0.05, h2 ¼ 0.956. This effect indicates that specificity in the three tasks significantly differed in experts and novices. Experts decreased their specificity from baseline to retention and then increased their specificity from baseline to transfer, whereas novices improved from baseline to both retention and transfer. Fig. 4 visualizes this interaction effect in the specificity estimates. Pairwise comparisons showed that experts had significantly higher specificity than novices in the baseline task, F(1,21) ¼ 34.46, p < 0.001, Cohen's d ¼ 2.35, the retention task, F(1,21) ¼ 7.21, p < 0.001, Cohen's d ¼ 1.09, and the transfer task, F(1,21) ¼ 54.93, p < 0.001, Cohen's d ¼ 3.08. 3.2. Time-on-task The time expert participants spent on task completion was 362.33 s (SD ¼ 148.44) for the baseline task, 402.33 s (SD ¼ 273.71) for the retention task, and 376.00 s (SD ¼ 201.77) for the transfer task. The time novice participants spent on task completion was 432.79 s (SD ¼ 243.26) for the baseline task, 364.93 s (SD ¼ 197.49) for the retention task, and 310.85 s (SD ¼ 142.94) for the transfer task. Table 2 Mean (and standard deviation) of diagnostic performance by participant group and task. Experts

Accuracy Sensitivity Specificity

Novices

Baseline task

Retention task

Transfer task

Baseline task

Retention task

Transfer task

0.67 (0.24) 0.38 (0.12) 0.68 (0.32)

0.79 (0.19) 0.71 (0.22) 0.56 (0.46)

0.72 (0.19) 0.59 (0.32) 0.83 (0.25)

0.16 (0.10) 0.16 (0.11) 0.07 (0.18)

0.43 (0.17) 0.37 (0.15) 0.15 (0.27)

0.12 (0.12) 0.19 (0.19) 0.12 (0.21)

A. Gegenfurtner et al. / Computers & Education 113 (2017) 212e225

219

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

0 Baseline

RetenƟon Experts

Transfer Novices

Fig. 3. Accuracy by participant group and task.

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

0 Baseline

RetenƟon Experts

Transfer Novices

Fig. 4. Specificity by participant group and task.

3.3. Eye movements Table 3 presents means and standard deviations of eye movements by task and participant group. Eye movements were measured as number of fixations on task-relevant areas, number of fixations on task-redundant areas, fixation duration on task-relevant areas, and fixation duration on task-redundant areas.

Table 3 Mean (and standard deviation) of eye movements by participant group and task. Experts

Number of fixations on task-relevant areas Number of fixations on task-irrelevant areas Fixation duration on task-relevant areas Fixation duration on task-irrelevant areas

Novices

Baseline task

Retention task

Transfer task

Baseline task

Retention task

Transfer task

253.22 (126.19) 87.00 (56.24) 191.85 (81.36) 36.44 (32.26)

614.33 (278.92) 36.00 (20.59) 349.11 (196.63) 22.83 (31.50)

494.88 (248.40) 68.75 (23.61) 292.99 (131.97) 29.20 (11.79)

232.71 (97.00) 127.93 (96.76) 181.90 (170.80) 55.50 (49.95)

345.82 (207.51) 63.10 (48.36) 156.72 (87.35) 22.11 (17.12)

320.50 (140.01) 116.67 (94.00) 177.06 (131.52) 45.81 (39.66)

220

A. Gegenfurtner et al. / Computers & Education 113 (2017) 212e225

3.3.1. Number of fixations on task-relevant areas A 2 (group) X 3 (task) mixed ANOVA showed a significant main effect of task, F(2,42) ¼ 16.15, p < 0.001. Bonferroni corrected post hoc tests showed that fixation number in the baseline task was significantly smaller than both retention and transfer tasks (both ps < 0.01). Experts had more fixations on task-relevant information than novices in the retention task, F(1,21) ¼ 6.10, p < 0.03, Cohen's d ¼ 1.09, and marginally more fixations in the transfer task, F(1,21) ¼ 5.52, p < 0.06, Cohen's d ¼ 0.87. The findings also indicate that after the instruction, experts fixated task-relevant information more frequently, tretention(8) ¼ 3.51, p < 0.01, Cohen's d ¼ 1.67, and ttransfer(8) ¼ 3.07, p < 0.02, Cohen's d ¼ 1.23. Similarly, also the novices improved the number of task-relevant fixations, tretention(13) ¼ 2.86, p < 0.02, Cohen's d ¼ 0.70, and ttransfer(13) ¼ 2.82, p < 0.02, Cohen's d ¼ 0.73. 3.3.2. Number of fixations on task-redundant areas There was a significant main effect of task, F(2,42) ¼ 8.49, p ¼ 0.001. Bonferroni corrected post hoc tests showed that the number of fixations on task-redundant areas in the retention task was significantly smaller than both baseline and transfer tasks (both ps < 0.02). 3.3.3. Fixation duration on task-relevant areas There was a significant main effect of task, F(2,42) ¼ 8.33, p ¼ 0.001. Bonferroni corrected post hoc tests showed that fixation duration on task-relevant areas in the baseline task was significantly shorter than both retention and transfer tasks (both ps < 0.01). 3.3.4. Fixation duration on task-redundant areas There was a significant main effect of task, F(2,42) ¼ 4.44, p ¼ 0.02. Bonferroni corrected post hoc tests showed that fixation duration on task-redundant areas in the retention task was significantly shorter than both retention and transfer tasks (both ps < 0.05). 3.4. Think-aloud protocols Table 4 presents means and standard deviations of the concurrent think-aloud protocols by task and participant group. Thinking aloud was coded in four main dimensions: technology, cognitive comprehension, metacognitive comprehension, and solution. To contextualize the numerical estimates of verbal reports with descriptive examples of what the participants uttered, we added excerpts from the protocol data in each dimension. 3.4.1. Technology dimension The technology dimension was coded in two categories: interacting with the visualization and commenting on the visualization. On interacting with the visualization, a 2 (group)  3 (task) mixed ANOVA showed a main effect of the task, F(2,42) ¼ 23.33, p < 0.001. Expert participant E1 and novice participant N2 provide examples of what participants uttered Table 4 Mean (and standard deviation) of verbal protocols by participant group and task. Experts

Novices

Baseline task Retention task Transfer task Baseline task Retention task Transfer task Technology dimension Interacting with the visualization Commenting on the visualization Dimension of cognitive comprehension Selecting data Selecting relevant infor-mation in technical terms Selecting relevant infor-mation in vernacular terms Selecting redundant infor-mation in technical terms Selecting redundant infor-mation in vernacular terms Organizing data Organizing relevant information Organizing redundant information Integrating data with prior knowledge Retrieving biomedical knowledge Retrieving clinical knowledge Dimension of metacognitive comprehension Using heuristic strategies Using control strategies Using learning strategies Solution dimension Stating a correct solution Stating an incorrect solution

2.17 (1.68) 1.44 (1.47)

6.67 (5.24) 2.89 (1.97)

5.80 (4.74) 2.42 (1.56)

1.99 (1,97) 0.93 (1.44)

6.71 (6.50) 2.14 (2.71)

6.61 (6.25) 3.01 (3.68)

0.60 0.67 5.62 1.56

2.11 1.11 3.78 0.89

1.73 0.91 3.49 0.78

0.14 0.43 8.07 3.13

0.14 0.71 5.86 1.93

0.42 0.80 5.68 2.41

(0.72) (1.21) (7.04) (4.17)

(2.32) (1.17) (5.02) (2.32)

(1.90) (1.20) (3.21) (2.33)

(0.36) (1.60) (8.41) (3.25)

(0.36) (1.86) (6.00) (2.50)

(0.75) (1.53) (6.04) (2.57)

3.60 (2.72) 9.40 (9.62)

8.67 (5.72) 9.22 (10.65)

6.87 (5.37) 9.92 (10.16)

1.29 (1.51) 8.34 (2.96)

3.07 (3.45) 8.86 (3.51)

2.16 (2.65) 8.15 (3.77)

1.53 (1.71) 0.25 (0.58)

1.44 (1.88) 0.44 (1.01)

1.74 (1.69) 0.64 (0.99)

0.69 (1.40) 0.04 (0.15)

0.86 (1.75) 0.29 (0.47)

0.42 (0.74) 0.06 (0.21)

0.98 (0.96) 3.30 (3.38) 0.36 (0.87)

2.44 (1.74) 8.22 (6.87) 0.78 (1.39)

2.28 (1.78) 7.15 (6.02) 0.89 (1.70)

0.30 (0.54) 1.67 (1.94) 0.14 (0.28)

1.00 (1.36) 3.86 (4.19) 0.14 (0.36)

0.81 (1.12) 2.65 (2.29) 0.20 (0.41)

5.33 (4.02) 6.16 (8.93)

7.22 (4.35) 3.56 (5.73)

7.56 (5.20) 4.10 (6.21)

0.29 (0.67) 3.03 (3.82)

0.36 (0.84) 1.93 (2.40)

0.08 (0.29) 2.17 (2.67)

A. Gegenfurtner et al. / Computers & Education 113 (2017) 212e225

221

while interacting with the visualization. Both are in the retention task. E1 described a way to navigate through the image: “Let's start with seeing the sagittal image,” then “Next let's move to these axillary eh axial images and let's get that larger again,” and “Now we go downwards.” E3 also talked about zooming in and out of the visualization, for example, “I'll zoom it to see it better” and “I'll took a look at it from this view.” Novice participant N2 uttered thoughts while interacting with the visualization that were similar to E1's, such as, “We have to go upwards first” and, after zooming in on a specific aspect, “So if we take a closer look from here.” In both instances, the use of a collective “we” is remarkable: we navigate rather than I navigate; let us see rather than let me see. Participants including E4 and N3 tended not to speak of themselves in the firstperson, but used a plural form when talking about how they navigated through the PET/CT visualization. On commenting on the visualization, there was a main effect of the task, F(2,42) ¼ 15.35, p < 0.001. Novices uttered more thoughts about how the visualizations had been produced. For example, N7 wondered, “Which directions have these images been taken from?” and “So I just have to think then which way around these images were.” 3.4.2. Cognitive comprehension dimension The cognitive comprehension dimension included three subcategories: selecting data, organizing data, and integrating data with prior knowledge. We start with the results for selecting data. First, on selecting relevant data in technical terms, a 2 (group)  3 (task) mixed ANOVA showed a main effect of the task, F(2,42) ¼ 15.70, p < 0.001, and a significant Group  Task interaction, F(2,42) ¼ 13.87, p < 0.001. Experts selected more relevant data in technical terms than novices in all tasks, Fbaseline(1,21) ¼ 6.70, p < 0.02, Cohen's d ¼ 0.81; Fretention(1,21) ¼ 9.99, p < 0.01, Cohen's d ¼ 1.19; Ftransfer(1,21) ¼ 5.52, p < 0.03, Cohen's d ¼ 0.91. When compared with the baseline task, experts selected relevant data more often in technical terms in the retention task, t(8) ¼ 2.50, p < 0.04, Cohen's d ¼ 0.88, and the transfer task, t(8) ¼ 2.42, p < 0.01, Cohen's d ¼ 0.79. However, this effect decreased from the retention to the transfer task, t(8) ¼ 2.74, p < 0.03, Cohen's d ¼ 0.18. Second, on selecting relevant data in vernacular terms, there was a main effect of the task, F(2,42) ¼ 4.81, p ¼ 0.01. Third, on selecting redundant data in technical terms, there was a main effect of the task, F(2,42) ¼ 12.60, p < 0.001. For expert and novice participants, saying what they saw seemed to help them understand the present patient case. For example, E9 said, “There I see on the other side in the groin there is a larger probably active area” and “That stomach looks wild though.” For novice participants, describing what they saw seemed to help them convince themselves of certain visual aspects. Examples included, “And then there is this kind of lump on the other side” followed by “some kind of mass here in this spot, it looks red” and “There is some red thing here.” Based on a minimally developed knowledge base, novices had problems integrating their detected visual information with biomedical or clinical knowledge and remained on a bottom-up level by describing what they saw. In addition, expert participants described what they detected; for example, E8 said, “There is a bit of marker aggregation,” “I'll look at the pleural fluid,” “This is the ventricle wall of the upper stomach,” and “I'll take another look once more of the lower abdomen.” On organizing relevant data, there was a significant main effect of the task, F(2,42) ¼ 39.69, p < 0.001, and a significant Group  Task interaction, F(2,42) ¼ 9.66, p < 0.001. Experts organized relevant information more often than novices for the baseline task, F(1,21) ¼ 6.91, p < 0.02, Cohen's d ¼ 1.05, the retention task, F(1,21) ¼ 8.64, p < 0.01, Cohen's d ¼ 1.19, and the transfer task, F(1,21) ¼ 7.92, p < 0.01, Cohen's d ¼ 1.11. Second, on organizing redundant information, there was a significant Group  Task interaction, F(2,42) ¼ 24.23, p < 0.001. After watching the modeling example, experts organized relevant information more often than before, tretention(8) ¼ 4.98, p < 0.001, Cohen's d ¼ 1.13, and ttransfer(8) ¼ 3.57, p < 0.01, Cohen's d ¼ 0.77. Surprisingly, experts organized more redundant information in the transfer task than in the baseline task, t(8) ¼ 2,93, p < 0.02, Cohen's d ¼ 0.05. In addition, the novice participants organized relevant information more often than before the instruction, tretention(13) ¼ 3.35, p < 0.01, Cohen's d ¼ 0.67, and ttransfer(13) ¼ 2.43, p < 0.03, Cohen's d ¼ 0.40, with a stronger effect in the retention than in the transfer task, t(13) ¼ 2.78, p < 0.02, Cohen's d ¼ 0.30. Novices organized redundant information significantly more often in the retention task than in the transfer task, t(13) ¼ 2.77, p < 0.02, Cohen's d ¼ 0.20. For example, N1 uttered, “Those could be the bowel, those black spaces.” Expert participants organized selected information into organic features. For example, E2 said, “Perhaps a ruptured bowl,” “These black spaces are probably air in the abdominal cavity,” and “This is not the heart because [the] heart is not seen so this has to be tumor also.” Expert participants also uttered what detected information could indicate, such as E6: “This aggregation is probably an artefact, apparently an artefact from the dental fillings.” On integrating data with biomedical knowledge, there was a nonsignificant main effect of task, F < 1, and a significant Group  Task interaction, F(2,42) ¼ 5.19, p ¼ 0.01. On integrating data with clinical knowledge, there was a significant main effect of task, F(2,42) ¼ 5.65, p < 0.01, and a significant Group  Task interaction, F(2,42) ¼ 5.07, p ¼ 0.01. Expert participants verbalized the retrieval of biomedical knowledge and clinical knowledge from long-term memory more often than the novice participants in the transfer task, Fbiomedical(1,21) ¼ 6.74, p < 0.02, Cohen's d ¼ 1.01, and Fclinical(1,21) ¼ 4.67, p < 0.05, Cohen's d ¼ 0.81. Utterances of the novice participants indicated their problematics to combine what they saw with what they knew. For example, N7 wondered, “So what could be such that it affects only one kidney, so that there's some activity there …” and “What could be the underlying cause?” Novices also tried to remember how anatomical features were represented in tomographic renderings, which is indicated by N9 uttering, “What did the female genital anatomy look like again, or could this be male?” In contrast, expert participants combined biomedical knowledge with incoming visual information. For example, E5 uttered, “Probably it's so that there is so much pleural fluid, it causes that there is atelectase in the lungs and there really it appears to be on both sides.” Experts also used their experiential knowledge as a prompt for diagnosing, such as E1: “It's too big compared to what I have seen in other patients before,” “because usually I see the liver quite free of this,” and “I

222

A. Gegenfurtner et al. / Computers & Education 113 (2017) 212e225

can guess based on my experience that this patient is not live, or did not live long after this examination, because it's not possible to survive this situation.” 3.4.3. Metacognitive comprehension dimension The metacognitive comprehension dimension was measured via heuristic strategies, control strategies, and learning strategies. First, concerning heuristic strategies, there was a significant main effect of task, F(2,42) ¼ 23.48, p ¼ 0.001. There was also a significant Task  Group interaction, F(2,42) ¼ 8.07, p ¼ 0.001. Pairwise comparisons showed that experts verbalized the use of heuristic strategies more often than novices for the baseline task, F(1,21) ¼ 4.86, p < 0.04, Cohen's d ¼ 0.87, the retention task, F(1,21) ¼ 4.98, p < 0.04, Cohen's d ¼ 0.92, and the transfer task, F(1,21) ¼ 5.90, p < 0.03, Cohen's d ¼ 0.99. After watching the modeling example, experts verbalized the use of heuristic strategies more often than before, tretention(8) ¼ 4.06, p < 0.01, Cohen's d ¼ 1.04, and ttransfer(8) ¼ 2.46, p < 0.05, Cohen's d ¼ 0.91. In addition, the novices verbalized the use of heuristic strategies more often than before the instruction, tretention(13) ¼ 2.86, p < 0.02, Cohen's d ¼ 0.68, and ttransfer(13) ¼ 2.62, p < 0.02, Cohen's d ¼ 0.58. Experts' heuristic strategies involved using the same strategies in unfamiliar PET/CT visualizations that had been effective in familiar PET and CT visualizations. For example, E3 said, “We're looking at fusion images, but if this was a CT image, in the usual way we'll start from the transaxial slices looking from the top downwards” and “Let's try to start as usual from those coronal plane pictures”. E4 uttered, “And we begin from top downwards” and “Of course I am a bit old-fashioned radiologist, so I start looking at the coronal pictures.” Novices tended to use a systematic navigation through the slides only after watching the modeling example, in which the expert model mentioned the benefits of a systematic top-to-bottom search routine. For example, N3 said, “First, I will scan through the whole picture to get an overview, and I will start at the head the top,” and N6 said, “OK, now I will do as the experts do, and go from head to feet, just to see.” Second, concerning control strategies, there was a significant main effect of the task, F(2,42) ¼ 27.33, p ¼ 0.001. There was also a significant Task  Group interaction, F(2,42) ¼ 3.12, p ¼ 0.05. Experts verbalized the use of control strategies more often after than before the modeling example, tretention(8) ¼ 4.07, p < 0.01, Cohen's d ¼ 0.91, and ttransfer(8) ¼ 4.12, p < 0.01, Cohen's d ¼ 0.79. This effect decreased from the retention to the transfer task, t(8) ¼ 3.00, p < 0.02, Cohen's d ¼ 0.17. In addition, the novice participants verbalized more control strategies after than before the instruction, tretention(13) ¼ 3.62, p < 0.01, Cohen's d ¼ 0.67, and ttransfer(13) ¼ 3.18, p < 0.01, Cohen's d ¼ 0.46. Examples of experts' use of control strategies include E5, who said, “It probably isn't normal either. Or it is maybe. I don't know. To make sure, we better go back to the abdominal cavity and check this option.” E5 also said, “I would hesitate to claim that this is cancer. It could be. But I'm not sure.” Novices also tried to gather additional information to back up tentative diagnoses, such as N12 uttering, “Let's forget that for the moment and look at that bone there in between again so that we can make sure.” Learning strategies were infrequently uttered, and if, they included help-seeking strategies such as E1 (expert), “I guess this is where I would call X to ask him what he thinks about this,” or N10 (novice), “Now I would consult Professor Y to give us a hint.” 3.4.4. Solution dimension Results indicate a significant main effect of the task F(2,42) ¼ 20.04, p < 0.001. Results also indicate a significant Task  Group interaction, F(2,42) ¼ 18.64, p < 0.001. Pairwise comparisons showed that expert participants verbalized correct solutions more often than the novice participants in the baseline task, F(1,21) ¼ 21.68, p < 0.001, Cohen's d ¼ 1.75, the retention task, F(1,21) ¼ 33.73, p < 0.001, Cohen's d ¼ 2.19, and the transfer task, F(1,21) ¼ 29.53, p < 0.001, Cohen's d ¼ 2.03. After the modeling example, experts verbalized more correct solutions than before, tretention(8) ¼ 7.50, p < 0.001, Cohen's d ¼ 0.45, and ttransfer(8) ¼ 5.51, p ¼ 0.001, Cohen's d ¼ 0.48. Results showed a significant main effect of the task F(2,42) ¼ 11.05, p < 0.001. Results also revealed a significant Task  Group interaction, F(2,42) ¼ 6.32, p < 0.01. Pairwise comparisons indicated that experts decreased the number of verbalizations of incorrect solutions from the baseline to the retention task, t(8) ¼ 2.32, p < 0.05, Cohen's d ¼ 0.41. The novice participants benefitted from the modeling example because they reduced the number of verbalized incorrect solutions after the instruction, tretention(13) ¼ 2.88, p < 0.02, Cohen's d ¼ 0.35, and ttransfer(13) ¼ 2.64, p < 0.05, Cohen's d ¼ 0.26. The effect decreased from the retention to the transfer task, t(13) ¼ 2.82, p < 0.02, Cohen's d ¼ 0.10. Examples of the protocol data contextualize these numerical estimates. For example, E5 uttered, “Mediastinum is positive for cancer on both sides.” Other experts provided similarly short and precise diagnostic findings. Novices also uttered final diagnoses, but with less confidence and less precision, such as N3: “There is something in the intestines, in my opinion, some malignancy.” Although this diagnosis is correct, the verbal data indicate that experts uttered their diagnoses in a qualitatively different manner than novices.

4. Discussion The present study investigated whether eye movement modeling examples (Collins & Kapur, 2014; Jarodzka et al., 2012; €nen & Gegenfurtner, 2012; Van Gog et al., 2009) can promote adaptive expertise in medical image diagnosis. Results in Seppa Tables 2e4 show that, when confronted with unfamiliar visualizations, medical experts had higher levels of diagnostic performance, more efficient eye movements, and more task-relevant verbalizations after watching the modeling example. A second result indicates that eye movement modeling examples had a more positive effect for the expert participants than for

A. Gegenfurtner et al. / Computers & Education 113 (2017) 212e225

223

the novice participants. Several conclusions on the effectiveness of eye movement modeling examples for promoting the transfer of skills to novel task affordances can be articulated. First, the performance data showed that, after watching the eye movement modeling example, experts had higher estimates of diagnostic accuracy and sensitivity. This improvement tended to be stronger for the retention task than for the transfer task, supporting Hypothesis 1. The fact that experts had higher performance rates in the retention than in the transfer task can be explained with cognitive biases (Feltovich et al., 1997; Gegenfurtner & Sepp€ anen, 2013), suggesting that the higher level of familiarity with the material presented in the retention task promoted accuracy and sensitivity. Eye movement modeling did not significantly promote specificity of the diagnoses. Second, the fixation data indicated that eye movement modeling promoted gaze efficiency. Particularly, experts had more fixations of longer duration on task-relevant areas. These effects were stronger for the retention task compared with the transfer task (Mayer & Wittrock, 1996). The fixation data indicated that watching the modeling example tended to direct attention toward information relevant for the diagnosis (Haider & Frensch, 1999), supporting Hypothesis 2. Third, the protocol data suggested that eye movement modeling promoted task-relevant verbalizations. This effect was strongest for metacognitive comprehension strategies. Specifically, medical experts tended to verbalize the use of heuristic and control strategies more frequently after watching the modeling example than before. This finding indicates that “tricks of the trade” (Collins & Kapur, 2014) and careful self-evaluation transferred from the routine to novel visualizations. Moreover, experts selected relevant information more often in technical terms, organized more frequently relevant information into mental models, and retrieved more prior knowledge (Ericsson & Kintsch, 1995; Mayer, 2014). However, across tasks, selected information was more often redundant than relevant. This finding may suggest a boundary condition of the effectiveness of eye movement modeling examples. Together, the protocol data confirm Hypothesis 3. Finally, the results showed that expert participants benefitted more from the modeling example than the novice participants. This supports Hypothesis 4. More specifically, as evidenced by the mean estimates in Tables 2e4, experts had higher diagnostic performance, a more efficient gaze, and more task-relevant verbalizations than the novices. Though in line with previous expertise research (Balslev et al., 2012; Bertram et al., 2013; Gegenfurtner et al., 2013, 2011; Jaarsma et al., 2015; Jarodzka, Scheiter, Gerjets, & Van Gog, 2010; Kok, 2016; Litchfield & Donovan, 2016), and thus unsurprising at first glance, these between-group comparisons are interesting because they indicate that medical experts have been able to adapt their superior skills to unfamiliar modes of visualizations (Chen et al., 2017). Experts tended to improve their skills when instructionally guided. In line with Schwartz et al.’s (2005) concept of “preparation for future learning”, the data in Table 2 reveals a pattern of results particularly in the development of accuracy and specificity. EMME results in an increase in similar tasks which was used in training for both groups but only experts learned to apply this knowledge in the novel task as well under these conditions. The significant group  task interaction effect refers to this type of pattern. In other words, previous knowledge and experience have prepared experts to benefit more from EMME, which is seen in the retention task (because both novice and expert performance improves) and in their ability to apply learned skills in novel tasks (where, in contrast to the experts, novice performance returns to the pretest level). These performance measures are enriched by the eye movement measures: here, the most important finding is that eye tracking gives evidence that experts benefit more from EMME so that it helps experts to focus on relevant aspects of the visual material. Explanations for why experts benefitted more than students in this study might include that there was a good matching of the modeling example's explanations with the expert group. Thus, as evidenced with the findings reported here, eye movement modeling examples seem to be a useful tool for skills improvement (Gegenfurtner, Vauras, Gruber, & Festner, € nings, Kosmajac, & 2010; Jarodzka et al., 2013; Kundel et al., 1990; Litchfield et al., 2010) and transfer (Gegenfurtner, Ko s & Gegenfurtner, 2015; Segers & Gegenfurtner, 2013; Van Gog et al., 2009) in the mediGebhardt, 2016; Quesada-Pallare cal disciplines. A limitation of this study is the relatively small sample size. On the other hand, small samples are typically found in eye tracking-based expertise research (Bertram et al., 2013; Gegenfurtner et al., 2011; Jaarsma et al., 2015), largely because of the data-intensive nature of eye tracking and because experts are rare by definition. Nonetheless, future studies may want to attempt toward larger samples in the same or different domain to replicate the findings presented here. Overall, these results support Hypothesis 4. The use of technology-based expert modeling in powerful learning environments may also have potential for domains beyond medicine. Some professions tend to rely on the diagnosis of technology-mediated visual material, including education and training in aviation security, biology, meteorology, physiotherapy, the military, music, and sports (Al Lily et al., in press; Boekhout et al., 2010; Ericsson, 2004; Gegenfurtner et al., 2011; Gegenfurtner & Szulewski, 2016; Gruber et al., 2010; Jarodzka et al., 2010; Nalanagula et al., 2006; Velichkovsky, 1995). In all these cases, the support of novice processes in visual diagnosis s, & Knogler, 2014; could involve the modeling of eye movements in simulation-based (Gegenfurtner, Quesada-Pallare Siewiorek, Gegenfurtner, Lainema, Saarinen, & Lehtinen, 2013) or game-based (Siewiorek & Gegenfurtner, 2010; Torbeyns, Lehtinen, & Elen, 2015) learning environments to aid the detection process and the replaying of expert verbal reports to enhance conceptual processing and inference generation. As shown in the present study, the combination of eye movements and verbal explanations tends to be promising because they provide complementary evidence of covert mental processes €nboer, & Boshuizen, experts employ during task completion (Gegenfurtner et al., 2017; Jaarsma, Jarodzka, Nap, van Merrie 2014) and may therefore cover any omission in verbalizations of problem solving that are typical in the verbal reports of €nboer, in press). experts (Ericsson, 2004; Gegenfurtner & Van Merrie

224

A. Gegenfurtner et al. / Computers & Education 113 (2017) 212e225

5. Conclusion As noted at the outset, due to technological change in dynamic work settings, experts sometimes need to adapt and transform their skills to new task affordances. The present study investigated whether EMME (Jarodzka et al., 2013; Mason et al., 2015; Sepp€ anen & Gegenfurtner, 2012; Van Gog et al., 2009) can promote adaptive expertise in medical image diagnosis. The findings reported here signal that experts tended to benefit from the modeling example more than did novices. Overall, the study suggests that visual instructional guidance in technological learning environments has the potential to play a role in fostering the transfer of expert skills to new and unfamiliar task affordances. Funding €ljo €, University This research was funded with a grant from the Academy of Finland (8128766) awarded to Prof. Dr. Roger Sa of Gothenburg.

References Al Lily, A., Foland, J., Stoloff, D., Gogus, A., Erguvan, I. D., Awshar, M. T., et al. (2017). Academic domains as political battlegrounds: A global enquiry by 99 academics in the fields of education and technology. Information Development, 33(3), 270e288. http://dx.doi.org/10.1177/0266666916646415. Balslev, T., Jarodzka, H., Holmqvist, K., De Grave, W., Muijtjens, A., Eika, B., et al. (2012). Visual expertise in paediatric neurology. European Journal of Paediatric Neurology, 16, 161e166. http://dx.doi.org/10.1016/j.ejpn.2011.07.004. € m, E. (2013). The effect of expertise on eye movements in medical image perception. PLoS One, 8, e66169. http:// Bertram, R., Helle, L., Kaakinen, J., & Svedstro dx.doi.org/10.1371/journal.pone.0066169. Boekhout, P., Van Gog, T., Van de Wiel, M. W. J., Gerards-Last, D., & Geraets, J. (2010). Example-based learning: Effects of model expertise in relation to student expertise. British Journal of Educational Psychology, 80, 557e566. http://dx.doi.org/10.1348/000709910X497130. Chen, O., Kalyuga, S., & Sweller, J. (2017). The expertise reversal effect is a variant of the more general element interactivity effect. Educational Psychology Review, 29, 393e405. http://dx.doi.org/10.1007/s10648-016-9359-1. Collins, A., & Kapur, M. (2014). Cognitive apprenticeship. In R. K. Sawyer (Ed.), Cambridge handbook of the learning sciences (2nd ed., pp. 109e127). New York: Cambridge University Press. Dams¸a, C. I., Froehlich, D. E., & Gegenfurtner, A. (2017). Reflections on empirical and methodological accounts of agency at work. In M. Goller, & S. Paloniemi (Eds.), Agency at work: An agentic perspective on professional learning and development. New York: Springer (in press). Ericsson, K. A. (2004). Deliberate practice and the acquisition and maintenance of expert performance in medicine and related domains. Academic Medicine, 79, S70eS81. http://dx.doi.org/10.1097/00001888-200410001-00022. Ericsson, K. A., & Kintsch, W. (1995). Long-term working memory. Psychological Review, 102, 211e245. http://dx.doi.org/10.1037/0033-295X.102.2.211. Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27, 861e874. http://dx.doi.org/10.1016/j.patrec.2005.10.010. Feltovich, P. J., Spiro, R., & Coulson, R. (1997). Issues of expert flexibility in contexts characterized by complexity and change. In P. J. Feltovich, K. M. Ford, & R. R. Hoffman (Eds.), Expertise in context: Human and machine (pp. 125e146). Menlo Park, CA: MIT Press. Gegenfurtner, A. (2013). Transitions of expertise. In J. Seifried, & E. Wuttke (Eds.), Transitions in vocational education (pp. 305e319). Opladen: Budrich. Gegenfurtner, A., Kok, E., Van Geel, K., De Bruin, A., Jarodzka, H., Szulewski, A., et al. (2017). The challenges of studying visual expertise in medical image diagnosis. Medical Education, 51, 97e104. http://dx.doi.org/10.1111/medu.13205. Gegenfurtner, A., Kok, E. M., Van Geel, K., De Bruin, A. B. H., & Sorger, B. (in press). Neural correlates of visual perceptual expertise: Evidence from cognitive neuroscience using functional neuroimaging. Frontline Learning Research. €nings, K. D., Kosmajac, N., & Gebhardt, M. (2016). Voluntary or mandatory training participation as a moderator in the relationship Gegenfurtner, A., Ko between goal orientations and transfer of training. International Journal of Training and Development, 20, 290e301. http://dx.doi.org/10.1111/ijtd.12089. €, R. (2011). Expertise differences in the comprehension of visualizations: A meta-analysis of eye-tracking research in Gegenfurtner, A., Lehtinen, E., & S€ aljo professional domains. Educational Psychology Review, 23, 523e552. http://dx.doi.org/10.1007/s10648-011-9174-7. €ljo €, R., & Lehtinen, E. (2009). Capturing individual and institutional change: Exploring horizontal versus vertical transitions in Gegenfurtner, A., Nivala, M., Sa technology-rich environments. In U. Cress, V. Dimitrova, & M. Specht (Eds.), Learning in the synergy of multiple disciplines. Lecture notes in computer science (pp. 676e681). Berlin: Springer. http://dx.doi.org/10.1007/978-3-642-04636-0_67. s, C., & Knogler, M. (2014). Digital simulation-based training: A meta-analysis. British Journal of Educational Technology, 45, Gegenfurtner, A., Quesada-Pallare 1097e1114. http://dx.doi.org/10.1111/bjet.12188. Gegenfurtner, A., & Sepp€ anen, M. (2013). Transfer of expertise: An eye-tracking and think-aloud study using dynamic medical visualizations. Computers & Education, 6, 393e403. http://dx.doi.org/10.1016/j.compedu.2012.12.021. €ljo €, R. (2013). Assessing the quality of expertise differences in the comprehension of medical visualizations. Gegenfurtner, A., Siewiorek, A., Lehtinen, E., & Sa Vocations and Learning, 6, 37e54. http://dx.doi.org/10.1007/s12186-012-9088-7. Gegenfurtner, A., & Szulewski, A. (2016). Visual expertise and the quiet eye in sports e comment on vickers. Current Issues in Sport Science, 1, 108. http://dx. doi.org/10.15203/CISS_2016.108. €nboer, J. J. G. (2017). Methodologies for studying visual expertise. Frontline Learning Research (in press). Gegenfurtner, A., & Van Merrie Gegenfurtner, A., Vauras, M., Gruber, H., & Festner, D. (2010). Motivation to transfer revisited. In K. Gomez, L. Lyons, & J. Radinsky (Eds.), Learning in the disciplines: ICLS2010 proceedings (Vol. 1, pp. 452e459). Chicago, IL: International Society of the Learning Sciences. Gruber, H., Jansen, P., Marienhagen, J., & Altenmüller, E. (2010). Adaptations during the acquisition of expertise. Talent Development & Excellence, 2, 3e15. Haider, H., & Frensch, P. A. (1999). Eye movement during skill acquisition: More evidence for the information reduction hypothesis. Journal of Experimental Psychology: Learning, Memory, & Cognition, 25, 172e190. http://dx.doi.org/10.1037/0278-7393.25.1.172. Hatano, G., & Inagaki, K. (1986). Two courses of expertise. In H. Stevenson, H. Asuma, & K. Hakuta (Eds.), Child development and education in Japan (pp. 262e272). San Francisco: Freeman. €rk, P., & S€ € , R. (2011). Traditional microscopy instruction versus process-oriented virtual microscopy Helle, L., Nivala, M., Kronqvist, P., Gegenfurtner, A., Bjo aljo instruction: A naturalistic experiment with control group. Diagnostic Pathology, 6, S81eS89. http://dx.doi.org/10.1186/1746-1596-6-S1-S8. €m, N., Andersson, R., Dewhurst, R., Jarodzka, H., & Van de Weijer, J. (2011). Eye tracking: A comprehensive guide to methods and measures. Holmqvist, K., Nystro Oxford: Oxford University Press. €nboer, J. J. G., & Boshuizen, H. P. A. (2014). Expertise under the microscope: Processing histopathological slides. Jaarsma, T., Jarodzka, H., Nap, M., van Merrie Medical Education, 48, 292e300. http://dx.doi.org/10.1111/medu.12385. €nboer, J. J. G., & Boshuizen, H. P. A. (2015). Expertise in clinical pathology: Combining the visual and cognitive Jaarsma, T., Jarodzka, H., Nap, M., van Merrie perspective. Advances in Health Sciences Education, 20, 1086e1106. http://dx.doi.org/10.1007/s10459-015-9589-x. € m, M., Scheiter, K., Gerjets, P., et al. (2012). Conveying clinical reasoning based on visual observation via eyeJarodzka, H., Balslev, T., Holmqvist, K., Nystro movement modeling examples. Instructional Science, 40, 813e827. http://dx.doi.org/10.1007/s11251-012-9218-5.

A. Gegenfurtner et al. / Computers & Education 113 (2017) 212e225

225

Jarodzka, H., Holmqvist, K., & Gruber, H. (2017). Eye tracking in educational science: Theoretical frameworks and research agendas. Journal of Eye Movement Research, 10, 1e18. http://dx.doi.org/10.16910/jemr.10.1.3. Jarodzka, H., Jaarsma, T., & Boshuizen, H. P. A. (2015). In my mind: How situation awareness can facilitate expert performance and foster learning. Medical Education, 49, 854e856. http://dx.doi.org/10.1111/medu.12791. Jarodzka, H., Scheiter, K., Gerjets, P., & Van Gog, T. (2010). In the eyes of the beholder: How experts and novices interpret dynamic stimuli. Learning and Instruction, 20, 146e154. http://dx.doi.org/10.1016/j.learninstruc.2009.02.019. Jarodzka, H., Van Gog, T., Dorr, M., Scheiter, K., & Gerjets, P. (2013). Learning to see: Guiding students' attention via a model's eye movements fosters learning. Learning and Instruction, 25, 62e70. http://dx.doi.org/10.1016/j.learninstruc.2012.11.004. Kok, E. (2016). Developing visual expertise: From shades of grey to diagnostic reasoning in radiology. Maastricht: Maastricht University. Krupsinki, E. A. (2010). Current perspectives in medical image perception. Attention, Perception, & Psychophysics, 72, 1205e1217. http://dx.doi.org/10.3758/ APP.72.5.1205. Kundel, H. L., Nodine, C. F., & Krupinski, E. A. (1990). Computer-displayed eye position as a visual aid to pulmonary nodule interpretation. Investigative Radiology, 25, 890e896. Lehtinen, E., Hakkarainen, K., & Palonen, T. (2014). Understanding learning for the professions: How theories of learning explain coping with rapid change. In S. Billett, C. Harteis, & H. Gruber (Eds.), International handbook of research in professional practice-based learning (pp. 199e224). New York: Springer. Litchfield, D., Ball, L. J., Donovan, T., Manning, D. J., & Crawford, T. (2010). Viewing another person's eye movements improves identification of pulmonary nodules in chest X-ray inspection. Journal of Experimental Psychology: Applied, 16, 251e262. http://dx.doi.org/10.1037/a0020082. Litchfield, D., & Donovan, T. (2016). Worth a quick look? Initial scene previews can guide eye movements as a function of domain-specific expertise but can also have unforeseen costs. Journal of Experimental Psychology: Human Perception & Performance, 42, 982e994. http://dx.doi.org/10.1037/xhp0000202. Mason, L., Pluchino, P., & Tornatora, M. C. (2015). Eye-movement modeling of integrative reading of an illustrated text: Effects on processing and learning. Contemporary Educational Psychology, 41, 172e187. http://dx.doi.org/10.1016/j.cedpsych.2015.01.004. Mayer, R. E. (2014). Cognitive theory of multimedia learning. In R. E. Mayer (Ed.), The Cambridge handbook of multimedia learning (2nd ed., pp. 43e71). New York: Cambridge University Press. Mayer, R. E., & Wittrock, M. C. (1996). Problem-solving transfer. In D. C. Berliner, & R. C. Calfee (Eds.), Handbook of educational psychology (pp. 47e62). New York: Macmillan. McHugh, M. L. (2012). Interrater reliability: The kappa statistic. Biochemia Medica, 22, 276e282. http://dx.doi.org/10.11613/BM.2012.031. Morita, J., Miwa, K., Kitasaka, T., Mori, K., Suenaga, Y., Iwano, S., et al. (2008). Interactions of perceptual and conceptual processing: Expertise in medical image diagnosing. International Journal of Human-computer Studies, 66, 370e390. http://dx.doi.org/10.1016/j.ijhcs.2007.11.004. Nalanagula, D., Greenstein, J. S., & Gramopadhye, A. K. (2006). Evaluation of the effect of feedforward training displays of search strategy on visual search performance. International Journal of Industrial Ergonomics, 36, 289e300. http://dx.doi.org/10.1016/j.ergon.2005.11.008. Norman, G., Eva, K., Brooks, L., & Hamstra, S. (2006). Expertise in medicine and surgery. In K. A. Ericsson, N. Charness, P. J. Feltovich, & R. R. Hoffman (Eds.), The Cambridge handbook of expertise and expert performance (pp. 339e354). Cambridge: Cambridge University Press. s, C., & Gegenfurtner, A. (2015). Toward a unified model of motivation for training transfer: A phase perspective. Zeitschrift für ErzieQuesada-Pallare hungswissenschaft, 18, 107e121. http://dx.doi.org/10.1007/s11618-014-0604-4. Schwartz, D. L., Bransford, J. D., & Sears, D. L. (2005). Efficiency and innovation in transfer. In J. Mestre (Ed.), Transfer of learning from a modern multidisciplinary perspective (pp. 1e51). Charlotte, NC: Information Age Publishing. Segers, M., & Gegenfurtner, A. (2013). Transfer of training: New conceptualizations through integrated research perspectives. Educational Research Review, 8, 1e4. http://dx.doi.org/10.1016/j.edurev.2012.11.007. Sepp€ anen, M. (2008). Modern imaging of multiple myeloma. Acta Radiologica, 5, 487e488. http://dx.doi.org/10.1080/02841850802113172. Sepp€ anen, M., & Gegenfurtner, A. (2012). Seeing through a teacher s eyes improves students' imaging interpretation. Medical Education, 46, 1113e1114. http://dx.doi.org/10.1111/medu.12041. Siewiorek, A., & Gegenfurtner, A. (2010). Leading to win: The influence of leadership style on team performance during a computer game training. In K. Gomez, L. Lyons, & J. Radinsky (Eds.), Learning in the disciplines (Vol. 1, pp. 524e531). Chicago, IL: International Society of the Learning Sciences. Siewiorek, A., Gegenfurtner, A., Lainema, T., Saarinen, E., & Lehtinen, E. (2013). The effects of computer-simulation game training on participants' opinions on leadership styles. British Journal of Educational Technology, 44, 1012e1035. http://dx.doi.org/10.1111/bjet.12084. Strijbos, J. W., Martens, R. L., Prins, F. J., & Jochems, W. M. G. (2006). Content analysis: What are they talking about? Computers & Education, 46, 29e48. http://dx.doi.org/10.1016/j.compedu.2005.04.002. €nboer, J. J. G. (2017). Measuring physician cognitive load: Validity evidence for a Szulewski, A., Gegenfurtner, A., Howes, D., Sivilotti, M., & Van Merrie physiologic and a psychometric tool. Advances in Health Sciences Education. http://dx.doi.org/10.1007/s10459-016-9725-2 (in press). Torbeyns, J., Lehtinen, E., & Elen, J. (2015). Describing and studying domain-specific serious games. Introduction. In J. Torbeyns, E. Lehtinen, & J. Elen (Eds.), Describing and studying domain-specific serious games (pp. 1e6). New York: Springer. Van Gog, T., Jarodzka, H., Scheiter, K., Gerjets, P., & Paas, F. (2009). Attention guidance during example study via the models eye movements. Computers in Human Behavior, 25, 785e791. http://dx.doi.org/10.1016/j.chb.2009.02.007. Velichkovsky, B. M. (1995). Communicating attention: Gaze position transfer in cooperative problem solving. Pragmatics and Cognition, 3, 199e224. http:// dx.doi.org/10.1075/pc.3.2.02vel. Wilson, M., McGrath, J., Vine, S., Brewer, J., Defriend, D., & Masters, R. (2010). Psychomotor control in a virtual laparoscopic surgery training environment: Gaze control parameters differentiate novices from experts. Surgical Endoscopy, 24, 2458e2464. http://dx.doi.org/10.1007/s00464-010-0986-1.