Exploring the Relationship between Mental Workload, Variation in Performance and Physiological Parameters

Exploring the Relationship between Mental Workload, Variation in Performance and Physiological Parameters

13th IFAC/IFIP/IFORS/IEA Symposium on Analysis, Design, and Evaluation of Human-Machine Systems 13th IFAC/IFIP/IFORS/IEA Symposium on Analysis, Design...

889KB Sizes 0 Downloads 49 Views

13th IFAC/IFIP/IFORS/IEA Symposium on Analysis, Design, and Evaluation of Human-Machine Systems 13th IFAC/IFIP/IFORS/IEA Symposium on Analysis, Design, and Evaluation of Human-Machine Aug. 30 Sept. 2, 2016. Kyoto, Japan 13th IFAC/IFIP/IFORS/IEA Symposium on Available onlineSystems at www.sciencedirect.com Analysis, Design, and Evaluation of Human-Machine Systems Aug. 30 - Sept. 2, 2016. Kyoto, Japan Analysis, Design, and Evaluation of Human-Machine Systems Aug. 30 - Sept. 2, 2016. Kyoto, Japan Aug. 30 - Sept. 2, 2016. Kyoto, Japan

ScienceDirect

IFAC-PapersOnLine 49-19 (2016) 591–596 Exploring the Relationship between Mental Workload, Variation in Performance Exploring the Relationship between Mental Workload, Variation in Performance Exploring the Relationship between Mental Workload, Variation in Performance and Physiological Parameters Exploring the Relationship between Mental Workload, Variation in Performance and Physiological Parameters and Physiological Parameters and Physiological A. Marinescu*. S.Parameters Sharples**.

A. S. A. C. Ritchie**. T. Sánchez López***. M. McDowell***. A. Marinescu*. Marinescu*. S. Sharples**. Sharples**. A. Marinescu*. S. Sharples**.McDowell***. A. C. Ritchie**. T. Sánchez López***. H. Morvan*. A. C. Ritchie**. T. Sánchez López***. M. M. McDowell***. A. C. Ritchie**. T. Sánchez H.  López***. M. McDowell***. H. Morvan*. Morvan*.  of Nottingham, Jubilee Campus, Nottingham, UK H. Morvan*. *Institute of Aerospace Technology, University  *Institute of Aerospace Technology, University *Institute of Aerospace Technology, University of of Nottingham, Nottingham, Jubilee Jubilee Campus, Campus, Nottingham, Nottingham, UK UK *Institute of Aerospace Technology, University of Nottingham, Jubilee Campus, Nottingham, **Faculty of Engineering, University of Nottingham, University Park, Nottingham, UK UK **Faculty **Faculty of of Engineering, Engineering, University University of of Nottingham, Nottingham, University University Park, Park, Nottingham, Nottingham, UK UK **Faculty of Engineering, University Nottingham, Park, Newport, Nottingham, ***Airbus Group Innovations UK,ofData Analysis University and Interaction, UK UK ***Airbus ***Airbus Group Group Innovations Innovations UK, UK, Data Data Analysis Analysis and and Interaction, Interaction, Newport, Newport, UK UK ***Airbus Group Innovations UK, Data Analysis and Interaction, Newport, UK Abstract: Real time non-intrusive mental workload level estimation may lead to significant Abstract: time non-intrusive mental workload level estimation may to improvements in the design and operation of future flight decks, the cognitive demand on pilots. Abstract: Real Real time non-intrusive mental workload levelreducing estimation may lead lead to significant significant Abstract: Real time non-intrusive mental workload levelreducing estimation may lead to significant improvements in the design and operation of future flight decks, the cognitive demand pilots. This study explores the relationship between mental workload, variation of performance and objective improvements in the design and operation of future flight decks, reducing the cognitive demand on on pilots. improvements in the the design and operation of future flight decks, reducing the cognitive demand on pilots. This study explores relationship between mental workload, variation of performance and objective physiological measures. This study explores the relationship between mental workload, variation of performance and objective This study explores the relationship between mental workload, variation of performance and objective physiological measures. physiological measures. The study presented was performed in laboratory conditions and required participants to perform a physiological measures. The study presented presented was task performed in laboratory laboratory conditions and required participants to perform perform custom-designed tracking with elements of mental arithmetic thatrequired imposedparticipants varying levels of mentalaa The study was performed in conditions and to The study presented was performed in laboratory conditions and required participants to perform custom-designed tracking task with elements of mental arithmetic that imposed varying levels of mentala workload. The data collected consisted of:ofphysiological measurements (heart inter-beat intervals, custom-designed tracking task with elements mental arithmetic that imposed varying levels of mental custom-designed tracking task with elements of mental arithmetic that imposed varying levels of mental workload. The data collected consisted of: physiological measurements (heart inter-beat intervals, breathing thermography); subjective ratings of workload from (heart the participants and workload. rate, The facial data collected consisted of: physiological measurements inter-beat (ISA intervals, workload. rate, The facial data consisted of: within physiological measurements (heart inter-beat intervals, breathing thermography); subjective ratings of workload workload fromsuggest the participants participants (ISA and NASA-TLX); the collected performance measured the task. Initial results facial thermography breathing rate,and facial thermography); subjective ratings of from the (ISA and breathing rate,and facial thermography); subjective ratings of workload from the participants (ISA and NASA-TLX); the performance measured within the task. Initial results suggest facial thermography as a good candidate for non-intrusive mental workload measurements as temperature variations on some NASA-TLX); and the performance measured within the task. Initial results suggest facial thermography NASA-TLX); and the the performance measured within thechanges task. Initial results suggest level facial as aa good good candidate fornose, non-intrusive mental workload measurements as temperature temperature variations on some some areas, for example appear to relate well to the in mental workload asthermography measured by as candidate for non-intrusive mental workload measurements as variations on as a good candidate for non-intrusive mental workload measurements as temperature variations on some areas, for example the nose, appear to relate well to the changes in mental workload level as measured by subjective ratings as well as performance measures on performing the task. areas, for example the nose, appear to relate well to the changes in mental workload level as measured by areas, for example nose, appear to relate well toon theperforming changes inthe mental subjective ratings as asthewell well as performance performance measures task.workload level as measured by subjective ratings as measures on performing the task. © 2016, IFAC (International of Automatic Control) Hosting Elsevier Ltd. measures All rights reserved. Keywords: mental workload, pilot performance, facial-thermography, subjective ratings as well as Federation performance measures on performing thebyphysiological task. Keywords: mental workload, pilot performance, facial-thermography, physiological measures Keywords: mental workload, pilot performance, facial-thermography, physiological measures  Keywords: mental workload, pilot performance, facial-thermography, physiological measures  1. INTRODUCTION relationship with human performance, the current consensus  1. relationship with human performance, being that both low levels the of current mental consensus workload 1. INTRODUCTION INTRODUCTION relationship with high humanand performance, the current consensus Since the 1980s, passenger air traffic has doubled every 15 1. INTRODUCTION relationship with high humanand performance, the current consensus being that both low levels of mental Since the 1980s, passenger air traffic has doubled every 15 influence performance negatively (Sharples & Megaw, workload years is expected to double again has by 2034, withevery 70% 15 of being that both high and low levels of mental workload Since and the it1980s, passenger air traffic doubled being that both high andnegatively lowmay levels ofsome mental workload influence performance (Sharples & Since the 1980s, passenger air traffic has doubled every 15 years and it is expected to double again by 2034, with 70% of 2015). In the end, this research shed more light on the traffic on the alreadyagain existing network performance negatively (Sharples & Megaw, Megaw, years and itrelying is expected to double by 2034, with(Airbus, 70% of influence influence performance negatively (Sharples &is Megaw, 2015). In the end, this research may shed some more light on years and itrelying isfuture expected to double again by 2034, with 70% of what the traffic on the already existing network (Airbus, we actually need to measure and what a good 2015). Near air transport challenges such as increased 2015). In the end, this research may shed some more on the traffic relying on the already existing network (Airbus, 2015). In the end, this research may shed some more light light on what we actually need to measure and what is a good the traffic relying on the already existing network (Airbus, 2015). Near future air transport challenges such as increased indicator of human performance. air traffic, the need for more efficient routes or free flight what we actually need to measure and what is a good 2015). Near future air transport challenges such as increased what we actually need to measure and what is a good indicator of human performance. 2015). Near future air transport challenges such as increased air the for more efficient routes or indicator of human performance. raise new issues from human factors perspective. The 2. air traffic, traffic, the need need for the more efficient routes or free free flight flight MENTAL WORKLOAD ASSESMENT TECHNIQUES indicator of human performance. air traffic, the need forhave more efficient routes or free flight raise new issues from the human factors perspective. The pilot of the future will to operate in a more congested 2. MENTAL WORKLOAD raise new issues from the human factors perspective. The The various mental workload assessmentTECHNIQUES techniques that MENTAL WORKLOAD ASSESMENT ASSESMENT TECHNIQUES raise newaided issues the human factors perspective. The 2. pilot will to in airspace, by from more complex technology. 2. MENTAL WORKLOAD ASSESMENT TECHNIQUES The various mental workload assessment techniques that pilot of of the the future future will have have to operate operate in aa more more congested congested have been used over time can be classified into two main The various mental workload assessment techniques that pilot of the future will have to operate in a more congested have airspace, The various mental workload assessment techniques that been used over time can be classified into two main airspace, aided aided by by more more complex complex technology. technology. categories: subjective (questionnaires and rating scales) and have been used over time can be classified into two main The goal aided of thebyhuman factors practitioner, airspace, more complex technology. as stated by the have been used over time can be classified into two main categories: subjective (questionnaires and rating scales) and objective (stand-alone performance measures, secondary task The goal of the human factors practitioner, as stated by the categories: subjective (questionnaires and rating scales) International Ergonomics Association (as as of stated 2014) byis the to categories: subjective (questionnaires and rating scales) and The goal of the human factors practitioner, and objective (stand-alone performance measures, secondary task measures, or physiological data), measures, each having their own The goal of thehuman human factors practitioner, overall as stated byis the International Association objective (stand-alone performance secondary task “…optimise well-being International Ergonomics Ergonomics Associationand(as (as of of 2014) 2014)system is to to measures, objective (stand-alone performance measures, secondary task or physiological data), each having their own advantages and disadvantages. International Ergonomics Association (as overall ofand 2014) is to “…optimise well-being system or physiological data), each having their own performance...” by “contributing to theand design evaluation “…optimise human human well-being and overall system measures, measures, or physiological data), each having istheir own advantages and The main advantage of the subjective measures that they “…optimise human well-being and overall system performance...” by “contributing to the design and evaluation advantages and disadvantages. disadvantages. of tasks, jobs, products, environments and systems in order to performance...” by “contributing to the design and evaluation advantages and disadvantages. The main advantage of the subjective measures is that they are simpler to deploy and some of them can be used with performance...” by “contributing to the design and evaluation of jobs, environments systems in to main advantage of the subjective measures is that they make compatible with the and needs, abilities of tasks, tasks,them jobs, products, products, environments and systems in order orderand to The main advantage ofand the subjective measures is that with they are simpler to deploy of can be little intrusion running the study (e.g. the instantaneous of tasks,them jobs, products, environments and systems in orderand to The make compatible with the needs, abilities are simpler to while deploy and some some of them them can be used used with limitations of people” (J. R. Wilson & Sharples, 2015). make them compatible with the needs, abilities and are simpler to deploy and some of them can be used with little intrusion while running the study (e.g. the instantaneous workload scale ISA (e.g. (Brennen, 1997)); the make them compatibleR. with the Sharples, needs, abilities and self-assessment limitations little intrusion while running the –study the instantaneous limitations of of people” people” (J. (J. R. Wilson Wilson & & Sharples, 2015). 2015). little intrusion while running the –study (e.g. the instantaneous self-assessment workload ISA 1997)); disadvantages beingscale the on One of theofchallenges is evaluating the2015). impact that main limitations people” (J.posed R. Wilson & Sharples, self-assessment workload scale –subjectiveness, ISA (Brennen, (Brennen, reliance 1997)); the the self-assessment workload scale – ISA (Brennen, 1997)); the main disadvantages being the subjectiveness, reliance One of the challenges posed is evaluating the impact that and sometimes intrusiveness. these will haveposed on the andtheonimpact the whole main disadvantages being the subjectiveness, reliance on on One ofchanges the challenges is operator evaluating that memory disadvantages being the subjectiveness, reliance on One of in theterms challenges posed is evaluating theonimpact that main and intrusiveness. these will on and the The objective measures category contains types of measures system of have performance and taking memory and sometimes sometimes intrusiveness. these changes changes will have on the the operator operator andthe onappropriate the whole whole memory memory and sometimes intrusiveness. these changes will have on the operator and on the whole The objective measures category contains types of measures system in terms of performance and taking the appropriate that are thought to be influenced by variations of mental measures that demands fit capabilities. Thisthe study explores The objective measures category contains types of measures system insoterms of performance and taking appropriate objective measures category contains types measures system in terms of performance and taking the imposed appropriate that to influenced by of measures that demands capabilities. This workload. Stand-alone areof designed to techniques mental demand on The that are are thought thought to be beperformance influenced measures by variations variations of mental mental measures so sofor thatmeasuring demands fit fitthe capabilities. This study study explores explores that are the thought to beperformance influenced by variations of mental measures sofor thatmeasuring demands fitthe capabilities. Thismeasurements. study explores workload. Stand-alone measures are designed techniques mental demand imposed on observe degradation of performance and are assumed to participants using non-invasive physiological to techniques for measuring the mental demand imposed on workload. Stand-alone performance measures are designed to workload. Stand-alone performance measures are designed to techniques for measuring the physiological mental demand imposed on reflect observe the degradation of performance and assumed participants using non-invasive measurements. workload; secondary task measures are one of the observe the degradation of performance and are assumed to participants using non-invasive physiological measurements. observe the degradation of performance and assumed to One of the known factors influencing humanmeasurements. performance is reflect participants using non-invasive physiological workload; secondary task measures are of most widely used techniques measuring reflect workload; secondary and taskinvolves measures are one one reserve of the the One of the known factors influencing human performance is reflect workload; secondary task measures are one of the mental workload. The reason for using this construct in this most widely used techniques and involves measuring reserve One of the known factors influencing human performance is capacity by introducing a secondary task while the participant widely used techniques and involves measuring reserve One of isworkload. thethat known factors influencing human performance is most mental reason for using this widely used techniques andThe involves measuring reserve study it The has been suggested to construct have a in strong capacity by introducing aa secondary task while participant mental workload. The reason for using this this construct in this most is performing the primary task. decrease inthe performance capacity by introducing secondary task while the participant mental workload. The reason for using this construct in this capacity study by introducing a secondary task whileinthe participant performing the primary task. The decrease performance study is is that that it it has has been been suggested suggested to to have have aa strong strong is performing the primary task. The decrease in performance study is that it has been suggested to have a strong is is performing the primary task. The decrease in performance

Copyright © 2016 IFAC 602 2405-8963 © IFAC (International Federation of Automatic Control) Copyright © 2016, 2016 IFAC 602Hosting by Elsevier Ltd. All rights reserved. Copyright 2016 responsibility IFAC 602Control. Peer review©under of International Federation of Automatic Copyright © 2016 IFAC 602 10.1016/j.ifacol.2016.10.618

2016 IFAC/IFIP/IFORS/IEA HMS 592 Aug. 30 - Sept. 2, 2016. Kyoto, Japan

A. Marinescu et al. / IFAC-PapersOnLine 49-19 (2016) 591–596

of the secondary task is defined as a measure of workload. The main disadvantage of this method is that it is often artificial and intrusive (Wickens, Liu, & Becker p. 338-339). Physiological measures are based on measuring the human body’s responses to stimuli. Since some of the responses are controlled by the ANS (autonomic nervous system) and are involuntary, they could offer insight into how different situations are perceived. The ANS has a sympathetic division, whose activity increases alertness and metabolic activities in order to prepare the body of an emergency situation and a parasympathetic division responsible for the rest-and-digest activities, conserving and restoring body energy (Tortora & Derrickson, 2009). There is a long history of using heart rate and heart rate variability as physiological measures for inferring the level of mental workload. It has been shown that as the mental demands of an inflight activity increased, the power in the mid frequency band of the heart rate variability decreased (Tattersall & Hockey, 1995). Another study on fighter pilots found heart rate data was a better measure to distinguish between sorties than heart rate variability, as heart rate variability could only distinguish between two basic states, leading researchers to believe that the two measures are sensitive to different aspects of the subject’s environment (G. F. Wilson, 1992). Previous studies that have looked at inferring the level of mental workload by using facial thermography have shown a strong correlation of workload with the decrease in nose temperature. Ora and Duffy (2007) used a simulator driving task together with a mental arithmetic loading task to increase the mental workload while measuring nose and forehead temperature; a further study was performed in a real car driving situation. They demonstrated that there is a high correlation between the change in nose temperature and the subjective ratings for mental workload while the forehead temperature remained mostly constant (Ora & Duffy, 2007). Another study in a ship simulator showed that nasal temperature and heart rate variability are good indices for effective navigation, and also connected the measures to the variation of mental workload (Murai, Hayashi, Okazaki, & Stone, 2008). Thermal imaging of the forehead, nose, eyes, cheeks and chin during a cognitive stress test was able to classify mental workload into three levels with 81% accuracy (Stemberger, Allison, & Schnell, 2010). This study investigates the correlation between heart, respiratory, pupil diameter and facial thermography data and subjective and objective measures of mental workload with the aim of increasing the resolution of the technique.

general structure of the game in terms of variation of demand was inspired by the task structure used by Sharples, Edwards, & Balfe, 2012. During each of the levels, the participant was presented with moving coloured balls on a black background. The movement of the balls gave the impression that they are falling from the top of the screen and the task was to aim and shoot the target balls using a joystick. For the type 1 level (Fig. 2), the target balls were red while for the type 2 level (Fig. 3) the target balls were the balls that had odd numbers written on them, introducing an additional cognitive element with the intent of increasing mental demand. Each of the levels was made up of 13 sublevels, each presenting the participant with a constant number of target balls on the screen at any time. The number of balls per sublevel was varied as presented in (Fig. 1) in order to control the level of demand.

Fig. 1. Description of levels The position of the joystick was indicated by a red circular cursor that turned green once it was within range of the target balls and the participant could make a successful shot (Fig. 2). When the target balls reached the yellow line the yellow line moved down; as the height of the yellow line at the end of each level influenced the score of the participant, one of the aims was to shoot the target as soon as possible so that the yellow line remained as high as possible on the screen. This rule forced the participants to have a bottom-up approach strategy in shooting the target balls in an attempt to keep the demand constant across the participants. The yellow line was brought up by a small increment each time the participant shoot a target ball and was pushed down by the same small increment when the participant misses the ball or shoots a non-target ball. This rule discouraged the strategy of shooting continuously to increase the chances of success and also offered the participant the hope that even if the yellow line was very low on the screen there are still chances of bringing it back up. The yellow line also never disappeared beyond the bottom of the screen, which is the maximum position it could be moved to by the balls.

3. STUDY DESIGN In order to explore the relationship between mental workload, variation of performance and objective physiological parameters, a specific computer based task was designed to impose different levels of mental demand on the participant. 3.1. The Task The task consisted of a computer game with 3 main levels of two types, each level having 13 sublevels (45 seconds each) of varying difficulty. Compared to a type 1 level, the type 2 level was designed to be more mentally demanding. The

Fig. 2. Type 1 Level Screenshot 603

2016 IFAC/IFIP/IFORS/IEA HMS Aug. 30 - Sept. 2, 2016. Kyoto, Japan

A. Marinescu et al. / IFAC-PapersOnLine 49-19 (2016) 591–596

The green lines on either side of the screen represent the highest level to which another participant has managed to bring the yellow line at the end of the current level. These features were intended to motivate the participant by showing their performance in comparison to the others.

593

recalibrated before the participant started each level of the game. After each sublevel lasting 45 seconds the participant would be prompted by a voice in the task for the ISA rating. At the same time the room temperature and humidity were recorded. At the end of each main level the participant would be shown the score they have achieved in comparison to all the other participants, in an attempt to motivate them to score higher. At the end of each main level the participants filled in a NASA-TLX questionnaire for a subjective assessment of workload. 4. RESULTS 4.1. Subjective Data The data from 2 participants was discarded due to data recording problems while the data from 2 other participants was set aside for the moment due to difficulties in tracking the facial features. Data from the remaining 10 participants will be presented further. The ISA scale, developed primarily as a subjective measure of mental workload for air traffic controllers was used in this situation due to its low level of intrusion on the primary task. Figure 4 below shows the average ratings given by the participants for each of the sublevels. These ratings show that there was a perceived change in workload in accordance to the change in demand imposed by the task and shown in (Fig.1).

Fig. 3. Type 2 Level Screenshot 3.2. Participants Fourteen students and staff from the University of Nottingham took part in the study (11 men and 3 women; Mage = 28.3 years; SD = 4.88; range = 21-38). Each participant was presented with an information sheet and consent form, stating that they are over 18 years old, have no pre-existing heart related condition and have no skin conditions or allergies that could prevent them from wearing the heart rate chest strap. 3.3. The Sensors The Zephyr BioHarness 3 chest strap was used for measuring posture, heart and breathing activity. The device outputs raw ECG data at a sampling rate of 1000 Hz and also a processed version of the raw signal including the R-R intervals and heart rate (Medtronic, Annapolis USA).

Fig. 4. Average ISA ratings across the 39 sub-levels The NASA-TLX multidimensional scale for evaluating workload was used after each of the 3 levels, during the break in-between them. NASA-TLX consists of six subscales: mental demand, physical demand, temporal demand, frustration, effort and performance; combined with a weighting scheme tailored to the individual the scale will offer a measure of overall workload (Hart, California, & Staveland, 1988). The results are presented in Fig. 5 below.

For eye-tracking, the RED 250 eye tracker was used in standalone configuration, measuring pupil diameter and gaze data at 60 Hz (SensoMotric Instruments, Teltow-Germany). The FLIR SC7000 thermal IR camera with a spectral range of 3-5 µm was employed for monitoring the facial thermal features of the participants. The resolution of the camera is 640x512 and was used at a sampling frequency of 50 Hz. The camera offers a noise equivalent differential temperature of less than 25 mK (FLIR Systems, Wilsonville, Oregon-USA). 3.4. Procedure for collecting the data The aim of the study was explained to each participant and participants were invited to read the information sheet and consent form which they had to sign if they agreed with the conditions. The participants were asked to relax for two minutes in order to record a baseline of the heart activity. For the thermal camera, the images recorded before the start of the task were considered as a baseline. The eye tracker was

Fig. 5. NASA-TLX mean subscale values for each level

604

2016 IFAC/IFIP/IFORS/IEA HMS 594 Aug. 30 - Sept. 2, 2016. Kyoto, Japan

A. Marinescu et al. / IFAC-PapersOnLine 49-19 (2016) 591–596

Although the average rating on the mental demand NASATLX scale was higher for stage 2 than for stages 1 and 3, there is no statistically significant difference between any of the 3 stages as determined by a one way ANOVA (F(2,27) = 2.67, p = 0.087). 4.2. Task Performance Data Task performance was classified in terms of how high the participants were able to maintain the yellow line during the game and split into 3 classes: low, medium and high performance. The variation of performance for all participants is shown in (Fig. 6) and it is also in accordance with the expectations.

Fig. 8. Breathing rate for each sublevel across participants When performing a one way ANOVA (F(2,5530)=24.67, p<0.005) on the mean breathing rate over 30 second intervals with 90% overlap, low performance intervals were shown to be significantly different from the high and medium. With regard to the ISA subjective ratings, performing a one way ANOVA (F(4,389)=2.08, p=0.08) on the mean breathin rate over the 45 seconds sublevel intervals, grouped by ISA ratings, showed that there was no significat difference in the means of the groups. 4.4. Pupil diameter data

Fig. 6. Mean performance level

Figure 9 shows the averaged pupil diameter between the right and left eyes for each sublevel. Performing a one way ANOVA (F(2,5530)=88.64, p<0.005) on the mean pupil diameter over 30 second intervals with 90% overlap, showed that the means of the groups defined by performance levels were significantly different while for the mean pupil diameter over the 45 seconds sublevel intervals grouped by ISA ratings, the ANOVA (F(4,389)=6.16, p<0.01) test showed that there were significant differences between the means of sublevels rated 1, 2 and 3, 5, while the means of sublevels rated 4 were significantly different from the ones rated 1.

The variation of performance also shows a strong negative correlation with the ISA ratings (R(37) = -0.739 with P<0.01), indicating that as the level of mental workload increased, the performance decreased.

4.3. Physiological Data Heart and Breathing Rate Data The mean R-R intervals (inter-beat intervals) (Fig. 7) and the breathing rate (Fig. 8) and are presented below.

Fig. 9. Average pupil diameter for each sublevel across participants Fig. 7. Average R-R intervals for each sublevel across participants

4.5. Thermal Data In order to extract the thermal data from the images, a feature tracking algorithm was deployed, splitting the face into regions of interest. For each frame, the temperature was extracted from inside the large circular points, from along the lines and from inside some of the triangular areas. Features from below the nose were not tracked due to the difficulty imposed by facial hair in some of the participants, which might present a challenge in a real life application as well.

A one way ANOVA (F(2,5530)=6.7, p=0.0012) test was performend on the mean of R-R data over 30 seconds intervals with 90% overlap, grouped by performance, showing that there was a significant difference in means between the medium performance and high performance intervals. A one way ANOVA (F(4,389)=2.12, p=0.07) test performed on the mean R-R data over the 45 seconds sublevel intervals grouped by ISA ratings showed that there was no significat difference in the means of the groups.

Figure 10 below shows the regions of interest that were tracked. 605

2016 IFAC/IFIP/IFORS/IEA HMS Aug. 30 - Sept. 2, 2016. Kyoto, Japan

A. Marinescu et al. / IFAC-PapersOnLine 49-19 (2016) 591–596

595

significantly different means when grouped by ISA ratings or performance, they do show significant mean differences when grouped by level; the same cannot be said about pupil diameter. Future studies will concentrate on collecting more data and using machine learning techniques for non-intrusive real-time classification of workload levels. 6. CONCLUSIONS Some of the physiological parameters appear to be more sensitive variations in subjective mental workload levels and performance measures (pupil diameter and facial thermography) while others (R-R intervals and breathing rate) seem to show more significant changes in-between levels, being influenced probably by overall mental workload (rated using NASA-TLX for each of the levels) and possibly other factors such as fatigue. Further data analysis will focus of merging all physiological measures to provide a classification of mental workload levels based on the subjective ISA ratings.

Fig. 10. Thermal Image with tracked elements Figure 11 presents the average temperature for each of the areas for Sublevels 1, 7 and 13 (the columns), for Levels 1, 2 and 3 (rows) for one of the participants. It can be seen that, for example the nose temperature tends to decrease for Sublevel 7 which is supposed to be the most demanding one and then it tends to increase for Sublevel 13 where the level of demand is the same as for Sublevel 1. It can also be noticed that the nose temperature for Sublevel 1 is increasing from Level 1 to 3.

Acknowledgments The authors would like to thank the European Union for founding this research from the People Programme (Marie Curie Actions) of the European Union’s Seventh Framework Programme (FP7/2007-2013) under REA grant agreement no 608322’ and the Data Analysis and Interaction research team from Airbus Group Innovations UK for their support. REFERENCES Airbus. (2015). Flying By Numbers, 1–27. Brennen, J. (1997). Instantaneous self-assessment of workload technique ( ISA ), (1992). Hart, S. G., California, M. F., & Staveland, L. E. (1988). Development of NASA TLX. Murai, K., Hayashi, Y., Okazaki, T., & Stone, L. C. (2008). Evaluation of ship navigator’s mental workload using nasal temperature and heart rate variability. 2008 IEEE International Conference on Systems, Man and Cybernetics, 1528–1533. Ora, C. K. L., & Duffyb, V. G. (2007). Development of a facial skin temperature-based methodology for nonintrusive mental workload measurement, 7, 83–94. Sharples, S., Edwards, T., & Balfe, N. (2012). Inferring cognitive state from observed interaction. In Proceedings of the 4th AHFE International Conference. Sharples, S., & Megaw, T. (2015). Definition and measurement of human workload. In J. R. Wilson & S. Sharples (Eds.), Evaluation of Human Work. Stemberger, J., Allison, R. S., & Schnell, T. (2010). Thermal Imaging as a Way to Classify Cognitive Workload. 2010 Canadian Conference on Computer and Robot Vision, 231–238. Tattersall, a J., & Hockey, G. R. (1995). Level of operator control and changes in heart rate variability during simulated flight maintenance. Human Factors. Tortora, G. J., & Derrickson, B. H. (2009). Principles of Anatomy and Physiology. Wickens, C. D., D, L. J., Liu, Y., & Becker, S. E. G. (n.d.).

Fig. 11. Temperature variation across levels 5. DISCUSSION The facial thermography indicated that there was a reduction in temperature when the participants were attempting a harder level. The increase in temperature at sublevel 1 in level 3 compared to sublevel 1 in level 1 is most probably due to accumulated thermal energy during the duration of the trial. Of the physiological measures studied, thermography and pupil diameter were found to track the ISA ratings and the performance levels most closely, although for the thermal data there is some ‘inertia’ in the system, which can be attributed to the time taken for tissues to heat up and cool down in response to blood flow and breathing rate. Although the R-R intervals and breathing rate proved not to have 606

2016 IFAC/IFIP/IFORS/IEA HMS 596 Aug. 30 - Sept. 2, 2016. Kyoto, Japan

A. Marinescu et al. / IFAC-PapersOnLine 49-19 (2016) 591–596

An Introduction to Human Factors Engineering (Second Edi). Wilson, G. F. (1992). Applied use of cardiac and respiration measures: practical considerations and precautions. Biological Psychology, 34(2-3), 163–78. Wilson, J. R., & Sharples, S. (2015). Methods in the Understanding of Human Factors. In J. R. Wilson & S. Sharples (Eds.), Evaluation of Human Work.

607