Multivariate workload evaluation combining physiological and subjective measures

Multivariate workload evaluation combining physiological and subjective measures

International Journal of Psychophysiology 40 Ž2001. 233᎐238 Multivariate workload evaluation combining physiological and subjective measures Shinji M...

124KB Sizes 2 Downloads 95 Views

International Journal of Psychophysiology 40 Ž2001. 233᎐238

Multivariate workload evaluation combining physiological and subjective measures Shinji MiyakeU Department of En¨ ironmental Management II, School of Health Sciences, Uni¨ ersity of Occupational and En¨ ironmental Health, Japan 1-1 Iseigaoka, Yahatanishiku, Kitakyushu 807-8555, Japan Received 6 January 2000; received in revised form 6 June 2000; accepted 15 June 2000

Abstract This paper suggests a way to integrate different parameters into one index and results obtained by a newly developed index. The multivariate workload evaluation index, which integrates physiological parameters and one subjective parameter through Principle Components Analysis, was proposed to characterize task specific responses and individual differences in response patterns to mental tasks. Three different types of mental tasks were performed by 12 male participants. Heart rate variability, finger plethysmogram amplitude, and perspiration were used as physiological parameters. Three subscales, mental demand, temporal demand and effort out of six subscales in the NASA-Task Load Index were used as subjective scores. These parameters were standardized within each participant and then combined. It was possible to assess workload using this method from two different aspects, i.e. physiological and subjective, simultaneously. 䊚 2001 Elsevier Science B.V. All rights reserved. Keywords: Principal components analysis; HRV; Plethysmogram; Perspiration; NASA-TLX; Mental workload; Subjective

1. Introduction It is an important matter in ergonomics to develop an assessment technique for mental workload. The International Organization for

U

Tel.: q81-93-691-7151; fax: q81-93-691-2694. E-mail address: [email protected] ŽS. Miyake..

Standardization ŽISO. is attempting to standardize a workload measurement method in which several physiological indices are assigned to several effects Žfatigue, monotony, satiation and vigilance. that are induced by mental workload ŽISO, 1998.. However, if these physiological parameters can be integrated into one synthesized index with variably-weighted coefficients, it may not be necessary to change measures for different mental

0167-8760r01r$ - see front matter 䊚 2001 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 8 7 6 0 Ž 0 0 . 0 0 1 9 1 - 4

234

S. Miyake r International Journal of Psychophysiology 40 (2001) 233᎐238

workload effects. On the other hand, the response sensitivity to a mental task is different in each person. The physiological responses induced by the same task may also differ from person to person. This is the individual difference problem ŽTurner, 1994.. Furthermore, the physiological response pattern is different from task to task. For example, the response induced by the mental arithmetic task is different from the response induced by the mirror drawing task ŽSaab and Schneiderman 1993; Miyake 1997.. Thus, we must consider such individual differences and task specific response patterns when workload research is investigated. One approach to solving these problems is to record and analyze several physiological Žand subjective. responses with different attributes and integrate them in a way which can reflect individual differences in physiological sensitivity and task specific responses. The purpose of this study was to investigate a new method of mental workload assessment in which multiple physiological parameters and subjective indexes are integrated into one index through multivariate analysis.

2. Method Three different kinds of mental tasks were used: a six-piece wooden puzzle ŽP., a two-dimensional compensatory tracking task with a first-order control ŽT., and a numerical logical task ŽL.. The T task and the L task had three levels of difficulty, high ŽH., medium ŽM. and low ŽL.. Therefore, they were abbreviated as TH , TM and TL for the tracking task and L H , L M and L L for the logical task. In the P task, participants were instructed to make a simple silhouette pattern Žcross. using all six wooden pieces ŽCross Puzzle II, D.1 Products. in 8 min. This task requires the pattern recognition ability. The T task required participants to keep an airplane icon target inside a central circular gunsight area using a joystick controller ŽFlight Stick Pro, CH Products.. The task difficulty level was controlled by the target speed Žaverage speed for H, M and L were 8.0, 3.4 and

1.4 pixelrunit time, respectively. and the width of the target moving area Ž350 = 330, 167 = 166 and 86 = 83 pixels for H, M and L, respectively.. The tracking task was similar to a simple reaction time task, however, some precognition Žprediction. may have been necessary. The logical task was nearly identical to the Mine Sweeper game in Microsoft Windows ŽR.. In this task, the participants were told to guess whether there was a mine in a grid cell and to click the ‘safe’ cells. In this task, even if the participants hit the mine, the task was not finished as in the Windows game. The more difficult the level, the more mines there were. This logical task may have required short-term memory and logical inference. The task duration was 4 min for each difficulty level in the T and L tasks. The tasks were ordered from more difficult to easier and the same fixed order was applied for all participants, i.e. P, TH , L H , TM , L M , TL and L L ŽMiyake, 1996.. Using this approach, the task difficulty factor nullified the training effect and emphasized the differences in difficulty levels among tasks. The computer tasks ŽT and L. were programmed by QuickBASIC 4.5 ŽMicrosoft. and run on an MS-DOS operated 486 computer with a 15-inch Ž640 = 400 pixels. CRT display. Participants were 12 male university students ranging in age from 18.9 to 25.9 years with an average age of 22.7 years. All participants gave their informed consent before the experiments and were given the same amount of payment regardless of their task performances. Three different physiological measures were acquired during all experimental blocks, including before ŽPRE. and after ŽPOST. task rest periods Ž5 min.: Ž1. the ln LFrHF Žnatural logarithm of LF to HF ratio 1 . as a cardiovascular parameter; Ž2. the photoelectric plethysmogram amplitude

1

LF is the low-frequency component of HRV of approximately 0.10 Hz that primarily reflects baroreceptor-mediated regulation of blood pressure. HF is the relatively highfrequency component of approximately 0.25 Hz that corresponds to the frequency of respiration in HRV spectral components.

S. Miyake r International Journal of Psychophysiology 40 (2001) 233᎐238

measured on the left index finger as a peripheral blood vessel activity parameter; and Ž3. the amount of perspiration from a sweat rate meter attached to the left thumb as a non-vascular autonomic nervous system parameter. The participants were instructed to synchronize their respiration pace with a computer generated tone during the rest and task periods to reduce the effect of irregular respiration on the HRV power spectral components ŽGrossman et al., 1991; Hayano et al., 1994.. The respiration signal was recorded by a strain-gauge around the chest to monitor the respiration regularity and to identify the respiratory sinusarrythmia component in the HRV power spectrum. The HRV spectral analysis and other physiological data analyses were done as follows. A 30-s duration from the beginning of each block was not included in these analyses to reject the transient, relatively large drift in signals which is frequently observed just after the task started. A 200-s ECG recorded from the CM 5 lead ŽEllestad, 1986. was sampled at 1 kHz in each block. The near-DC components Ž- 0.05 Hz. in the R-R interval were removed by a digital filter, and the equidistant R-R interval data were obtained by resampling Ž2 Hz. a spline-interpolated trendgram. The 10th-order AR spectral analysis was applied ŽMiyake et al., 1994.. The LF and HF components were extracted by the spectral decomposition method, and the ln LFrHF was selected for the Principal Components Analysis ŽPCA. procedure. A 200-s plethysmogram recorded by a photoelectric plethysmograph ŽNihon-Koden MLV-2301. was sampled at 1 kHz, and 10-ms interval data were obtained by resampling every other tenth point Žcoarse graining.. Baseline fluctuation was removed by a 100-point moving average Žequivalent to a 0.443-Hz high pass filter., and the root mean squared ŽRMS. value was calculated as an average beat component amplitude. A small capsule with a highly accurate static capacity moisture sensor and a temperature sensor was attached on the skin surface of the left thumb, and perspiration signal was measured by the direct capsule method ŽSuzuken Kenz-Perspiro OSS-100.. The perspiration signal

235

was sampled at 100 Hz, and the average amplitude of this signal was obtained as an index for the amount of perspiration. The subjective workload score was obtained by using the NASA Task Load Index ŽTLX. ŽHart and Staveland, 1988. which contains six subscales: Mental Demand ŽMD., Physical Demand ŽPD., Temporal Demand ŽTD., Own Performance ŽOP., Effort ŽEF. and Frustration level ŽFR.. The NASA-TLX rating window automatically appeared on the computer screen when each task duration had expired. Then the participants used the mouse cursor to rate their subjective workload. Two participants were rejected from subsequent analyses. A clear HF component was not detected in one participant because his respiration rate was so slow in one block that the HF component was contaminated by the LF component. The other participant showed highly irregular respiration in one block even though he was instructed to control his breath as described above. Therefore, no clear HF component was found in his HRV power spectrum. All parameters were standardized within participants. Physiological workload evaluation ŽPWE. scores were calculated by means of the PCA of the three physiological parameters mentioned above. The Weighted Workload ŽWWL. score and the average score of MD, TD and EF subscales of NASA-TLX, which was labeled as the TLX-MTE ŽMental, Temporal and Effort. score, were calculated. This TLX-MTE score was designed according to the Subjective Workload Assessment Technique ŽReid and Nygren, 1988. which contains only three dimensions, i.e. Time Load, Mental Effort Load and Psychological Stress Load. The Time Load seems to be identical to TD, and Mental Effort Load is similar with MD and EF. The Psychological Stress Load may be equivalent to the FR in NASA-TLX. However, FR was not included in the TLX-MTE score. FR was excluded because the purpose of this new subjective scale was to reflect subjective feelings during the task; and it was assumed that the frustration level ŽFR. rated after the task might be affected greatly by the task result, i.e. success or failure.

S. Miyake r International Journal of Psychophysiology 40 (2001) 233᎐238

236

The one subjective score ŽTLX-MTE. and three physiological parameters were analyzed by PCA and the first principal component scores were obtained as multivariate workload evaluation scores ŽMWE. for each participant. The MWE score for j-th task for the k-th participant was: MWE jk s W1 k P1 jk q W2 k P2 jk q W3 k P3 jk q W4 k S1 jk

Ž1.

where Wi k was the principal component coefficient, P1 jk was ln LFrHF, P2 jk was plethysmogram amplitude, P3 jk was perspiration and S1 jk was TLX-MTE score. The mean and the standard deviation of the MWE score for all tasks for each participant were 0 and 1, respectively. Repeated measures ANOVAs ŽGLM procedure, SPSS. were carried out with the Greenhouse᎐Geisser correction for inhomogeneity of variance, and applied where appropriate. Significant main effects were followed up with Student᎐Newman᎐Keuls tests and the significance level was set to P- 0.05.

3. Results There was a significant effect of ‘task’ on NASA-TLX WWL scores calculated from all six subscales Ž F6,54 s 8.44, ␧ s 0.582, P- 0.000, Fig. 1., on PWE scores Ž F8,72 s 3.32, ␧ s 0.358, P0.05, Fig. 2. and on MWE scores Ž F6,54 s 8.184, ␧ s 0.537, P- 0.000, Fig. 3.. The task effect on TLX-MTE scores failed to reach significance Ž F6,54 , ␧ s 0.629, Ps 0.052.. Fig. 1 indicates that the WWL scores were significantly lower in TL than in the other tasks and significantly higher in P than L L . A significant positive correlation between PWE scores and TLX-MTE scores was found in three participants Ž P- 0.05 in two participants and P- 0.01 in one participant.. The average Zr ŽFisher’s z-transformation. in the whole sample of participants was 0.4096 and not significantly different from zero. Two participants showed a significant correlation Ž P - 0.05. between PWE scores and WWL scores. The aver-

Fig. 1. Subjective workload by means of the NASA-TLX WWL and TLX-MTE scores. The scores were standardized in each participant and averaged among participants Ž n s 10..

age Zr was 0.2182 and not significantly different from zero. Thus, a slightly higher correlation was found between PWE scores and TLX-MTE scores than between PWE scores and WWL scores. Furthermore, the PWE and TLX-MTE scores showed significant differences between P and TM , although there was no significant difference between them in regard to the WWL scores. The MWE score in the P task was significantly higher than those in the other tasks as shown in Fig. 3. There was also a significant difference in this score between TM and L H .

Fig. 2. Physiological workload evaluation ŽPWE. scores composed of ln LFrHF, finger plethysmogram amplitude and perspiration amount.

S. Miyake r International Journal of Psychophysiology 40 (2001) 233᎐238

Fig. 3. Multivariate workload evaluation ŽMWE. scores composed of ln LFrHF, finger tip plethysmogram amplitude, perspiration amount and the NASA-TLX MTE scores by means of principal components analysis ŽPCA..

4. Conclusions

The original NASA-TLX WWL scores showed good correlation with the difficulty level in the tracking task. The WWL scores in P, TH and L H were almost the same. Thus, the WWL score did not differentiate between these three tasks. On the contrary, the TLX-MTE scores, which were the average of MD, TD and EF subscales, were relatively low for the tracking tasks and showed no correlation with task difficulty level. However, the pattern of the TLX-MTE scores across all tasks was similar to the pattern of the PWE scores, which were composed of three physiological parameters, HRV, plethysmogram and perspiration. Therefore, the TLX-MTE scores, but not the WWL scores, were integrated together with the three physiological parameters in the multivariate evaluation wcf. Eq. Ž1.x. These results suggested that the PD, OP and FR subscales in the NASA-TLX did not covary much with the physiological responses recorded during task performance. Thus, the results of this experiment indicate one of the reasons for the discrepancy between physiological parameters and the subjective workload evaluations by WWL. Two sample cases are discussed here. If a participant were to complete

237

a very complex and delicate task such as a scale model ship assembly at the end of the task period, perhaps he or she would have a feeling of accomplishment. So, hisrher subjective workload score concerning the own performance ŽOP. scale in the NASA-TLX might be low. On the contrary, if at the very end of the task the participant dropped the model ship and it broke into pieces, hershe may feel very depressed and frustrated. So, hisrher workload may be very high. However, the physiological responses recorded during the task period would have been identical because, in both cases, the participants performed their task in quite the same manner except for the accident. The accident that occurred at the end of the task could not affect the responses during the task. Thus, even if the task is the same, the subjective workload scores rated after the task may be affected greatly by the task results, while the physiological responses recorded during the task are not. Feelings of achievement or one’s performance are important in evaluating workload. However, the correlation between such feelings and the physiological responses during the task may be low, as described above. The MWE scores were relative parameters within the participants because they were calculated by standardized scores in each participant. This means that the weight coefficient, Wi k of Eq. Ž1., which was used to calculate the MWE scores, was different from participant to participant. This individually-based multivariate workload evaluation method seems to be useful for workload research ŽFuruta et al., 1997. because the sensitivities of physiological parameters to a given workload are different in each participant. This procedure, in which the weights are decided in each individual according to his responses, is very similar to the calculation procedure for the WWL score with the NASA-TLX. Of course, the NASA-TLX does not use the PCA procedure. However, before calculating the weighted average ŽWWL. of the six subscale scores, the weight values for those subscales were obtained by means of the paired comparisons of the subscales for each subject. Thus, the weight values for the WWL score reflected individual differences in

238

S. Miyake r International Journal of Psychophysiology 40 (2001) 233᎐238

workload evaluation. When we obtain several different kinds of physiological parameters and try to combine them into one single value using different weight coefficients among individuals, the PCA method is a useful approach. Furthermore, this method can integrate subjective measures, also. The MWE score calculated in this study was composed of three physiological parameters and one subjective parameter. However, it should be noted that the important point here is the method of calculation ŽPCA on standardized parameters. and not the score itself. That is, we can integrate any physiological and subjective parameters by this method and, of course, the MWE method can assess workload objectively Žphysiologically. and subjectively at the same time. The MWE method proposed in this study seems to be useful for the evaluation of work stress ŽISO, 1991. during performance tasks. The tasks employed in this study were a puzzle and PC-like games and may be far from work in the field. However, highly controlled laboratory studies have to be performed first before one may go into the field with a new method. Furthermore, it is necessary to employ simple ‘laboratory’ tasks in which the task attributes or the resource demands of the tasks are apparently different from each other. In either case, further experimental investigation may be necessary to examine the validity and the reliability of this MWE method.

Acknowledgements This work was supported by the Ministry of Education, Sciences, Sports and Culture under the Grant-in-Aid for Scientific Research ŽC. No.06670393. The author would like to thank Wolfram Boucsein, Jay Miller and the anonymous reviewers for their very helpful comments and suggestions on earlier versions of this paper.

References Ellestad, M.H., 1986. Stress Testing ᎏ Principles and Practice, 3rd edn FA Davis Co., Philadelphia, pp. 129᎐135. Furuta, T., Miyakawa, T., Kubota, R., Ikeda, K., Miyake, S. and Osaki, H. Ž1997. Experiment on human factors-development of method for workload evaluation. Proc. 1997 Fall Meeting At. Energy Soc. Japan, 320 Žin Japanese.. Grossman, P., Karemaker, J., Wieling, W., 1991. Prediction of tonic parasympathetic cardiac control using respiratory sinus arrhythmia: the need for respiratory control. Psychophysiology 28, 201᎐216. Hart, S.G., Staveland, L., 1988. Development of NASA task load index ŽTLX.: results of empirical and theoretical research. In: Hancock, P.A., Meshkati, N. ŽEds.., Human Mental Workload, Elsevier Science Publishers B.V., Amsterdam, pp. 139᎐183. Hayano, J., Mukai, S., Sakakibara, M., Okada, A., Takata, K., Fujinami, T., 1994. Effects of respiratory interval on vagal modulation of heart rate. Am. J. Physiol. Heart Circ. Physiol. 267, H33᎐H40. ISO 10075. Ž1991. Ergonomics principles related to mental work-load ᎏ general terms and definitions. ISO, Geneva. ISO 10075. Ž1998. Ergonomics principles related to mental work-load ᎏ Part 3: measurement and assessment of mental work-load wunpublished internal Working Draft ŽWD.x. Miyake, S., Akatsu, J., Sato, N., Kumashiro, M., 1994. Heart rate variability as a mental workload index: a methodological proposal for autoregressive power spectral analysis. Proc. 12th Triennial Congr IEA 6, 417. Miyake, S., 1996. Psychophysiological responses induced by different mental tasks ᎏ comparison between perspiration, heart rate variability and T-wave amplitude. Psychophysiol. Ergonomics 1, 45᎐47. Miyake, S., 1997. Factors influencing mental workload indexes. J. UOEH 19, 313᎐325. Reid, G.B., Nygren, T.E., 1988. The subjective workload assessment technique: a scaling procedure for measuring mental workload. In: Hancock, P.A., Meshkati, N. ŽEds.., Human Mental Workload, Elsevier Science Publisher B.V., Amsterdam, pp. 185᎐218. Saab, P.G., Schneiderman, N., 1993. Biobehavioral stressors, laboratory investigations, and the risk of hypertension. In: Blascovich, J., Katkin, E.S. ŽEds.., Cardiovascular Reactivity to Psychological Stress and Disease. American Psychological Association, Washington DC, pp. 49᎐82. Turner, J.R., 1994. Cardiovascular Reactivity and Stress. Plenum, New York, pp. 51᎐53, pp. 71᎐89.