Proactive mental fatigue detection of traffic control operators using bagged trees and gaze-bin analysis

Proactive mental fatigue detection of traffic control operators using bagged trees and gaze-bin analysis

Advanced Engineering Informatics 42 (2019) 100987 Contents lists available at ScienceDirect Advanced Engineering Informatics journal homepage: www.e...

3MB Sizes 0 Downloads 25 Views

Advanced Engineering Informatics 42 (2019) 100987

Contents lists available at ScienceDirect

Advanced Engineering Informatics journal homepage: www.elsevier.com/locate/aei

Full length article

Proactive mental fatigue detection of traffic control operators using bagged trees and gaze-bin analysis

T

Fan Lia,b, Chun-Hsien Chena, Gangyan Xuc,d, , Li Pheng Khooa, Yisi Liue ⁎

a

School of Mechanical and Aerospace Engineering, Nanyang Technological University, Singapore Maritime Institute, Nanyang Technological University, Singapore c Center of Urban Emergency Management and Traffic Safety, School of Architecture, Harbin Institute of Technology, Shenzhen, China d Intelligent Transportation Lab, Shenzhen Key Laboratory of Urban Planning and Decision Making, Harbin Institute of Technology, Shenzhen, China e FraunhoferIDM@NTU Center, Singapore b

ARTICLE INFO

ABSTRACT

Keywords: Human fatigue Bagged tree Eye movement Bin analysis Time window

Most of existing eye movement-based fatigue detectors utilize statistical analysis of fixations, saccades, and blinks as inputs. Nevertheless, these parameters require long recording time and heavily depend on eye trackers. In an effort to facilitate proactive detection of mental fatigue, we introduced a complemental fatigue indicator, named gaze-bin analysis, which simply presents the eye-tracking data with histograms. A method which engaged the gaze-bin analysis as inputs of semisupervised bagged trees was developed. A case study in a vessel traffic service center demonstrated that this approach can alleviate the burden of manual labeling as well as improve the performance of fatigue detection model. In addition, the results show that the approach can achieve an excellent accuracy of 89%, which outperformed other methods. In general, this study provided a complemental indicator for detecting mental fatigue as well as enabled the application of a low sampling rate eye tracker in the traffic control center.

1. Introduction Traffic control refers to surveil, control and manage traffic via monitoring real-time traffic data and providing instructions or advice to traffic operators [1–4]. It aims to maintain the flow and safety of traffic. Currently, it has been implemented in almost all modes of transport, including air traffic management, freeway traffic management, vessel traffic service, and railway traffic control. The main operations of traffic control operators (TCOs) include passive monitoring [5–7], which can easily induce a high probability of mental fatigue, leading to high chances of disastrous outcomes for public safety [8–10]. Recently, mental fatigue is regarded among the top 10 safety issues, causing around 20% of traffic accidents [11]. Potentially, the risk of mental fatigue can be reduced by proactively monitoring fatigued TCOs using eye movements. With technological improvement, eye movements can be remotely tracked using contactless eye trackers [10]. Hence, eye movements can be measured unobtrusively and continuously during extended cognitive tasks. In addition, changes in eye movements can be promptly captured to identify and understand the interactions between users and human interface devices [12]. Consequently, eye movement-based fatigue detectors



have recently received great attention [13–16], with researchers attempting to use the eye-tracking data to detect real-time human fatigue while watching video, driving, and performing surgical operations [16–19]. The existing methods in eye movement-based mental fatigue detection have focused on using eye movements recorded over an extended period to generate descriptive fatigue indicators, such as the mean, standard deviation, and median of fixations and saccades [18]. However, these indicators impeded the ability to warn fatigued subjects in a timely manner, and they heavily depend on the selection of parsing methods and the quality of the eye-tracking data [20]. This has given rise to many contradictory results in studies of eye movement-based mental fatigue detection. For example, although many studies have suggested that saccadic parameters vary with the fatigue level [21–23], Saito [24] did not find any significant quantitative changes in saccadic eye movement in five hours of eye-tracking tasks. Furthermore, fixations and saccades parameters are task-dependent [25] and easily affected by interface design [26]. Hence, it will be challenging in using them to detect fatigue. Therefore, instead of focusing on fixations and saccades, it would be better to capture information from eye-tracking data recorded over a

Corresponding author. E-mail address: [email protected] (G. Xu).

https://doi.org/10.1016/j.aei.2019.100987 Received 14 January 2019; Received in revised form 4 August 2019; Accepted 2 September 2019 1474-0346/ © 2019 Elsevier Ltd. All rights reserved.

Advanced Engineering Informatics 42 (2019) 100987

F. Li, et al.

short period of time, which poses several challenges. First, splitting eyetracking data recorded over a long period into short-period data will introduce overlapping data, which can affect the accuracy of the fatigue detection model. Second, labeling eye-tracking data recorded over a short period is labor intensive. Emerging studies in eye movementbased fatigue detection adopt manual labeling of eye-tracking data with subjective fatigue scales. However, in this work, the amount of shortperiod data involved makes manual labeling impractical. Thirdly, there is a relatively large variance in eye movements [16]. Consequently, the existing eye-tracking fatigue detectors produce a great number of false alarms [27], inducing “cry wolf” effects, which means asking for assistance when you do not need it. Fourthly, it is difficult to determine the period of the time window. In existing studies, the time window used for eye movement-based human fatigue detection ranges from 8 s to 30 min. It is believed that the time window period will significantly affect the performance of human fatigue models, such as their detection accuracy and time delays [28]; however, little attention has been paid to determining the appropriate time window. To address these challenges and achieve the aim of proactively and noninvasively monitoring the mental fatigue of TCOs, a method that involves extending bagged trees with semisupervised training to gazebin analysis is proposed. Bin analysis, which refers to counting how many values fall into a specific interval, can accurately present the distribution of numerical data [29]. The bagged trees method adopts the concept of assembling multiple decision trees, and it performs well in analyzing data with substantial classification noise [30]. Semisupervised training enables both labeled and unlabeled training data to be used. A case study was conducted in a vessel traffic service center to illustrate the proposed method. The results show that the proposed method dominated other methods, and achieved an accuracy of 89% using 10 s eye-tracking data. The paper is organized as follows. In Section 2, the existing eye movement-based detectors of human fatigue were reviewed to figure out the methods and parameters that are commonly used. Besides, the development of bagged trees and bin analysis are described. In Section 3, the problem of using the gaze-bin analysis to discriminate human fatigue is stated. Section 4 describes the innovative approach suggested for detecting human fatigue using the gaze-bin analysis. The case study in the vessel traffic service center and the performance of the proposed approach are presented in Section 5. In addition, Section 5 presents the effects of model characterizes and the comparison results of the proposed method in relation to other classical methods. Section 6 summarizes the contributions and limitations of this work.

parameters and PERCLOS are of limited value [35]. The high rate of blink and large PERCLOS leads to visual information loss and a high possibility of human errors. Hence, using them to detect human fatigue would be too late to reduce the risk of human fatigue. Other commonly used parameters are saccadic parameters [21–23,36]. Finke et al. [22] pointed out that saccadic parameters such as saccade amplitude, saccade rate, and saccade duration, especially saccade peak velocity were reliable indicators of human fatigue. Di Stasi et al. found that the saccade peak velocity seems to decrease significantly with the increase of human fatigue [35]. All the parameters mentioned above heavily depend on the parsing methods and the eye tracker [20]. For example, only the eye tracker with a high sampling frequency can record saccade peak velocity. These drawbacks made it challenging to realize eye movement-based human fatigue detection using fixations and saccades parameters. Currently, emerging studies found that it is possible to link eye movements to human fatigue by using machine learning techniques [19,37]. Yamada and Kobayashi [19] adopted the support vector machine to detect mental fatigue. They recorded eye movements of participants who were watching videos. Then they engaged gaze allocations as inputs of the support vector machine to measure fatigue. Zhu et al. [37] applied convolutional neural networks in fatigue detection using blinks, slow eye movement, and rapid eye movement. In general, the most commonly used technique is the support vector machine, which was first proposed in 1995 [38]. It has shown good performance in detection and recognition of text, speech and even credits. Nevertheless, the performance of the support vector machine in eye movement-based human fatigue detection is still far from satisfactory (around 70%) [19]. The result might be caused by the overlook of the large classification noise. Hence, we proposed to address these problems by using the following theories. 2.2. Bin analysis Data binning refers to segmenting data into several intervals and replacing values within one interval with a representative value [39]. The interval is called “bin.” This is a data pre-processing technique used to reduce the number of values to analyze [40]. It has been widely used in many fields such as visual diagnosis, genome architecture, and metabolomics data analysis [39–41]. In general, the representative values of a given interval is the central value. In this study, the probability of the values fallen with a given interval is used as the value. In this way, a histogram can be established. It can accurately represent the distribution of continuous data [29] with B = ( p1 , p2 , , pN ). n refers to the number of the given interval. pn refers to the probability of the values fallen within the nth interval. Entropy which indicates the state of a system can be calculated following the definition of Shannon et al. [42].

2. Related works This section provides a review on eye movement-based human fatigue detection based on two aspects: eye movement-based indicators and algorithms. Also, the development and applications of bagged trees and bin analysis are examined to provide a theoretical background for the following sections.

2.3. The bagged trees The bagged predictor was first proposed in 1996 [30]. It is a method for manipulating the training data to generate multiple predictors and aggregating results of multiple predictors. It has been successfully applied to construction material classification and diesel-electric locomotive train [43,44]. The bagged trees approach attempts to assemble multiple trees so as to improve model performance [30]. A learning system, such as the decision-tree algorithm, C4.5, can be used to construct a predictor by taking a bootstrap sample from the training data. The size of the bootstrap sample is the same with the size of the training data, but some instances are duplicated and some are removed. Multiple predictors can be constructed by generating repeatedly and training the bootstrap sample from the original training data. In general, the number of repetitions can be fixed or be determined by crossvalidation [45]. To classify an instance, x, all predictors constructed indicate the class of the instance x. The instance x will be classified as Class k which has most votes from these predictors [46].

2.1. Eye movement-based human fatigue detection As early as 1996 [31], researchers had found that eye movement parameters were potential indicators of human fatigue. In general, eye movement parameters including blinks, percent eye closure (PERCLOS), saccades and fixations. Blinks related parameters are the most commonly utilized indicators [19,21,32]. In the fatigued state, operators show a high rate of blink. Du et al. [33] investigated the possibility of using blink rate variance to discriminate human fatigue. Jin et al. [34] and Ahlstrom et al. [32] detected fatigue using the blink frequency. Besides blinks, PERCLOS has been widely used to detect human fatigue [35]. PERCLOS commonly increases with the increase of human fatigue. Hartley et al. [27] suggested that PERCLOS is among the best ocular measures for assessing fatigue. However, both blink related 2

Advanced Engineering Informatics 42 (2019) 100987

F. Li, et al.

Fig. 1. Representation of eye-tracking data.

The concept of assembling multiple trees is not new, and both the boosted trees and random forest make predictions based on multiple trees [47]. Comparing with the boosted trees and random forest, the bagged trees can perform much better in analyzing data with substantial classification noise [47]. Breiman [30] indicated that ‘If perturbing the learning set can cause significant changes in the predictor constructed, then bagging can improve accuracy’. This advantage makes it very suitable for eye movement-based human fatigue detection [19].

velocity probability vector. The bin analysis was extended to gaze velocity analysis by describing each data packet as a histogram of all velocities at which the eye travelled during the time window (Tw). The gaze velocity data belonging to the data packet are transformed into a discrete probability mass function and entropy:

p = (p (v1), p (v2 ), p (v3),

(4)

, p (vB ), e )

B

p (vb) = 1

3. Gaze-bin analysis-based indicators of human fatigue In this section, we first introduce the definition of the gaze-bin analysis. Based on this definition, the research problem is described.

p (vb) is the probability that the gaze velocity is larger than vb smaller than vb .

3.1. Gaze-bin analysis

vb = b ×

t

2

T, x

xt 1)2 + (yt

(x t

X, y

yt 1) 2

e=

M = Tw × f

and

(6) (7)

where Vmax is the maximum velocity of the fixation, B is the number of bins, and e is the entropy of the gaze velocities belonging to the time window. Because the velocities of the saccade are much faster than the velocities of the fixations and the amount of time spent on saccades is substantially shorter than that spent on fixations, just one bin (bin B) was set for the saccades.

(1)

3.2. Problem statement

Y

Tw +1 Sw

p (vb) × log (p (vb)) b=1

This study proposes a novel analysis of short-period eye-tracking data instead of statistical analysis of fixations and saccades, which can be conducted using only long-recorded eye-tracking data. In this way, a new problem of using gaze-bin analysis to discriminate human fatigue in TCCs is introduced. The problem is stated as follows: given a set of eye-tracking data E = {p1 , , pm } and a set of fatigue levels FL = {fl1 , , flj } , the aim is to (1) train a fatigue model FM and (2) predict human fatigue level using short-period eye-tracking data

Hence, the eye-tracking data are represented as E = {V , T } , where V is the set of gaze velocity data. First, the eye-tracking data were split into short-period data packets, and then, bin analysis was conducted. The time series data of eye movement were split into N packets based on the time window and shift window. The time window (Tw) denotes the period of each packet. The shift window (Sw) refers to the length that the time window moves forward. T refers to the length of the eye-tracking data recording time.

T N= Sw

Vmax B 1

1

B

Fig. 1(a) shows the basic concept of the existing parameters. The previous studies generally focused on first identifying fixations and saccades and then generating statistical parameters. Different from existing studies, a novel research problem, gaze-bin analysis, is investigated. Basically, there are eye-tracking data E = {X , Y , T } , where X , Y refers to the x and y coordinates of the gaze points and T is the set of timestamps. The gaze velocity of each gaze point can be calculated by determining the difference between the center of two points and multiplying by the sampling frequency (f) of the eye tracker.

vt = f ×

(5)

b=1

fl

FM

{p} . This problem is based on the following assumptions:

Assumption 1. The dimension of FL is much smaller than the number of m. data packets belonging to E . In other words, j

(2) (3)

Assumption 2. Eye-tracking data collected from the same participant are subject to the same distribution.

In this way, eye-tracking data E = {V , T } are presented as E = {D1, D2 , DN }. M refers to the number of gaze points belonging to a data packet, while f is the sampling rate frequency of the eye tracker. The basic idea of gaze-bin analysis is shown in Fig. 1(b). The eyetracking data packet is presented with p, where p is the set of gaze

Assumption 3. There are some overlaps among data packets, pd pd+ 1 , where {pd , pd + 1 } E . Compared with existing human fatigue detection studies, which 3

Advanced Engineering Informatics 42 (2019) 100987

F. Li, et al.

focus on fixations and saccades from eye-tracking data recorded over an extended period, the proposed solution of using gaze-bin analysis is expected to capture features from short-period eye-tracking data. Moreover, the gaze-bin analysis is independent of event detection methods and eye trackers. Hence, the gaze-bin analysis is expected to reduce the contradictory results caused by eye trackers and event detection methods.

u . Then, classification confidence of the FL was selected to label the Dsub u the decision tree was trained using the data set {Dl , Dsub }, and a subset of data was reselected from the unlabeled data set. The procedure was repeated until it reached a stopping condition. The C4.5 algorithm was used to build decision trees because it is the most well-known algorithm for building decision trees, and it has been successfully used in many fields [48]. Algorithm 1 presents the main structure of the semisupervised training algorithm. The output training data set can be represented as Tr = {d1l , , dml} . The time complexity of Algorithm 1 is O (T·B ·|Dl|2 ), as the time complexity of C4.5 is O(B·|D l|2 ) [49].

4. Semisupervised bagged trees for human fatigue detection This section presents an innovative method to address the challenges summarized in Section 1 for gaze-bin analysis-based human fatigue detection. In general, the training data and test data were randomly selected. Nevertheless, a simple selection is not applicable in this study due to the overlapping data packets. Hence, the proactive selection of training and testing data is introduced first. Then, a semisupervised bagged trees method for training the fatigue model using labeled and unlabeled data is discussed. Lastly, the procedures of human fatigue detection are stated.

Algorithm 1: Semisupervised training algorithm T: number of iterations. C: prediction confidence threshold t=1 while t < T Dt ← Decision tree(Dl ) FL = Bt (Du ) u whose p(FL ) > C select Dsub sub

FL ←{FL +FLsub }

u } Dl ←{Dl +Dsub

4.1. Proactive selection of training and testing data

u } Du ←{Du -Dsub

t=t+1 end

In total, N data packets were utilized for training and testing the bagged trees model. First, 80% of the data packets were randomly selected for training and the remaining data packets were used for testing. To avoid overfitting caused by overlapping data packets, the training data and testing data were split before training. The overlapping data packets were deleted, as shown in Fig. 2. A random number s was generated using MATLAB R2018a. To prevent s from being outside of the index, s was set to be larger than No and smaller than 0.8 N-No -1. No is the number of overlapping data packets. The data packets numbered s to s + 0.2 N were utilized as testing data, while the data packets whose numbers ranged from s-o to s + 0.2 N + 0.5No were deleted. After deleting the overlapping data, the remaining data were used to train the model. In this way, the training data set Tr = {D1, ,Ds No 1, Ds + 0.2N + No+ 1, ,DN } , and the testing data set Ts = {Ds, ,Ds+ 0.2N } were obtained.

Output: generate final training data set Dl and a labeling vector FL

4.3. Bagged trees-enabled human fatigue detection To build A trees for bagging, A copies of data set BT = {BT1, BT2, BTA} were generated from the training data set Tr using sampling with replacement. Based on each data set BTa , a decision tree F a could be built using the classification and regression tree (CART) algorithm. Given the training data set, the decision tree F a divides E into K feature subspaces. On each subspace, the same prediction is made for all data packets belonging to the subspace. The ProkF , estimated probability of class F on the k subspace is the proportion of training observations in the kth subspace that are from the Fth class. The method aims to find F a , which can achieve a minimized Gini index [50]: 1

4.2. Semisupervised training with decision tree

Gini =

ProkF (1

ProkF )

F=0

Tr = Given a set of eye-tracking data {D1, ,Ds No 1, Ds + 0.2N + No + 1, ,DN } and a set of fatigue levels FL , the number of data packets in Tr is much larger than the dimension of FL. {Tr , FL} can be presented as {D l , Du } , where Dl denotes the set of labeled eye-tracking data and Du denotes the set of unlabeled eye-tracking data. Essentially, {D l , Du } = {d1l , d2l , dlj , d uj + 1, , dmu} , j is the number of labeled data packets, and m is the number of the whole data packets. Hence, a labeling vector FL = {flj + 1 , , flm }T is generated for the unlabeled data set. First, a decision tree using the labeled data set Dl was generated, and then, the decision tree was used to generate a labeling vector FL = {flj + 1 , , flm }T for the unlabeled data set. A subset fl sub with high

(8)

The final prediction for a give observation p is calculated using Eq. (9) [51].

Fbag (p) =

1 2A

A

F a (p) a=1

(9)

Out-of-bag errors are typically used to evaluate the bagged model. Although the out-of-bag error estimation is particularly convenient, it is not suitable for this study due to the overlapping data packets. As illustrated in Fig. 2, each data packet overlaps with several other data packets, making out-of-bag observation too complex. Hence, in this study, a k-fold cross-validation method was extended. The

Fig. 2. Generating data for training and testing. 4

Advanced Engineering Informatics 42 (2019) 100987

F. Li, et al.

aforementioned testing data set was used to test the bagged model. For each fold l, the Accuracyl can be obtained based on Eq. (10).

Accuracyl = 1

1 0.2N

s + 0.2N

|(FLr

Fbag (Pr ))|

(10)

r=s

In addition to accuracy, sensitivity and specificity were tested. Accuracy measures the percentage of correctly classified observations, while sensitivity, which is also known as the true positive rate, determines the proportion of actual positives that are correctly identified as positives. In contrast, specificity measures the proportion of actual negatives that are correctly identified as negative. In this study, “alert” is defined as positive, and “fatigue” is defined as negative. The time complexity of the bagged trees is O( A· B·m ·log(m)2 ). 5. Case study This section describes the data collection and the performance of the proposed method. The effects of the model characteristics, including the time window, bin number, and input features, are examined and presented, and the proposed method is compared with other commonly used methods, such as decision tree, linear regression and support vector machine methods.

Fig. 4. Experiment settings. Table 1 The number of data packets with different time window (for each participant).

5.1. Data collection and model construction of VTS operators Eight vessel traffic service operators (VTSOs) consisting of seven males and one female with normal vision were recruited for this study. Their ages ranged from 30 to 50, with an average age of 42 years. All of them worked in the morning shift, which runs from 7:30 am to 15:30 pm. None of them suffered from sleep disorders. The study was approved by the institutional review board. The data collection phase consisted of four sessions (Fig. 3). The participants were asked to conduct their daily work and to complete a 5-minute Mackworth clock test after every two hours of work. As shown in Fig. 4, the Mackworth clock test requires participants to monitor a dot moving around and to press the space bar immediately when the dot jumps more than usual. The participants were further instructed to complete the Samn-Perelli Fatigue Scale (it has seven scores, where “1” means alert and “7” means almost sleep) before and after the Mackworth clock test. Their eye movements were recorded using Tobii X3120. A total of 160 min (8 participants × 5 min × 4 sessions) of eyemovement data from eight VTSOs were collected. The human fatigue data were summarized from the results of the Samn-Perelli Fatigue Scale and the Mackworth clock test.

Time window (seconds)

N

Np

N test

4 6 8 10

1196 796 596 476

6 6 6 6

1199 159 119 95

The time window ranges from 4 to 10 s. The shift window was set to a quarter of the time window. According to Eq. (6), the number of overlapping data packets is 6. The number of data packets N varies with the length of the time window, as shown in Table 1. In addition to the time window, the value of the bin number required more investigation because it can affect the performance of the human fatigue detection model. Specifically, if b is too large, it may induce data redundancy and low efficiency of computing; if b is too small, information of the gaze velocity data cannot be well captured by the bin analysis. In this study, b was set to 6, 8, 10, 12, 14, and 16. Algorithm 1 was applied to label the data packets with human fatigue levels. First, following the methods used in [34], we labeled a part of data packets with the binary level of fatigue state based on observations related to response time from the Mackworth clock test and subjective rates from the Samn-Perelli Fatigue Scale. The manual labeled data packets were the eye-tracking data packets recorded at the moment that subjects reacted to the Mackworth clock test. 20

Fig. 3. The procedures of data collection. 5

Advanced Engineering Informatics 42 (2019) 100987

F. Li, et al.

observations were obtained from every 5-min Mackworth clock test. Hence, we labeled 80 data packets for each participant. To minimize the effects of individual difference on the performance of the Mackworth clock test, the response time data was normalized into 0 to 1. The reported fatigue level of the 20 observations was set as the mean value of two subjective scores obtained before and after the Mackworth clock test. If the participant reported fatigue level above ‘4’ and had a long response time larger than 0.7, the participant is believed to suffer from fatigue state. If the participant reported fatigue level below ‘3’ and had a short response time shorter than 0.5, it indicated that the participant was alert. Finally, the rest of the training data was labeled with Algorithm 1. The number of iterations was set as 1000, and the prediction confidence threshold was 0.8. In this way, we collected all fatigue labels for all training data packets. A repeated-measure analysis of variance (ANOVA) was conducted to test the differences in observations of all models. For factors that did not satisfy Mauchly’s test of sphericity before the repeated measurement ANOVA, a Greenhouse-Geisser correction was applied. The significant level in this study is 0.05.

Fig. 5. Model performance versus time window.

5.2. Performance of the human fatigue detection models This section presents the effects of the two main variables: bin number (6 levels) and time window size (4 levels), on bagged tree model performance in terms of three aspects, specifically, accuracy, sensitivity, and specificity. MATLAB 2017a was used to generate fatigue models. Bayesian optimization was used to preliminary select the optimal hyperparameters for the bagged tree models. After a preliminary study, the number of trees was set to 1500, and the minimum leaf size was set to 1. For each model, its performance was the mean value obtained from 10-fold cross-validation. For accuracy, neither the time window nor the bin number satisfied Mauchly’s test of sphericity (time window: p = 0.000 < 0.05; bin number: p = 0.001 < 0.05). Hence, a Greenhouse-Geisser correction was applied. The results show that the time window had a significant effect on the accuracy (F1.452, 10.161 = 13.262, p = 0.002 < 0.05). However, the effects of bin number on the accuracy was not significant (F2.149, 15.046 = 0.206, p = 0.958 > 0.05). Post hoc test indicated that differences between any two time window sizes were significant except the difference between the 6-second and 8-second time window. The accuracy increases with the time window. For sensitivity, neither the time window nor the bin number satisfied Mauchly’s test of sphericity (time window: p = 0.000 < 0.05; bin number: p = 0.03 < 0.05). Hence, a Greenhouse-Geisser correction was applied. The results show that neither the time window nor the bin number had a significant effect on the sensitivity (bin number: F2.371, 16.597 = 0.589, p = 0.708 > 0.05; time window: F1.222, 8.56 = 0.799, p = 0.519 > 0.05). For specificity, both time window and bin number satisfied Mauchly’s test of sphericity (time window: p = 0.126 > 0.05; bin number: p = 0.117 > 0.05). The results show that the time window had a significant effects on the specificity (F3, 21 = 22.9, p = 0.002 < 0.05). Post hoc test indicated that differences between any two time window sizes were significant. However, the effects of bin number on the specificity was not significant (F3, 21 = 0.507, p = 0.769 > 0.05). It can be deduced that the time window has no significant effect on detecting the “alert” state, while it has significant effects on detecting the “fatigue” state. In other words, using longer periods to summarize the data made the fatigue state easier to be detected. With an increase in the time window, the human fatigue detection model can detect a “fatigue” state with higher accuracy and more specificity. Fig. 5 shows the performance of human fatigue detection models across different time windows. However, the main effects of the bin number on model performance are not significant. Hence, the performance of a human fatigue detection model is not sensitive to the bin number. Fig. 6 shows

Fig. 6. Model performance versus bin number.

the line graph of the model performance across different bin numbers. The best performance (accuracy = 0.90; sensitivity = 0.81; specificity = 0.90) was achieved when the time window was set to 10 s, and the bin number was set to 8. Fig. 7 shows the cloud graph of human fatigue detection accuracy with different bin numbers and time windows. As shown in Fig. 6, a longer time window can yield better performance; however, a larger bin number shows no significant improvement. The interaction effect of bin number and time window appeared to be nonsignificant (accuracy: F15, 105 = 1.097, p = 0.368; sensitivity: F15, 105 = 1.178, p = 0.301; specificity: F15, 105 = 0.989, p = 0.472). In the followings sections, the bin number is fixed at 8. 5.3. Comparison with other common fatigue indicators This section compares the performance of four types of feature

Fig. 7. Effects of bin number and time window on detection accuracy. 6

Advanced Engineering Informatics 42 (2019) 100987

F. Li, et al.

Table 2 Feature combinations used as model input. Eye movement parameters

Feature combination Gaze-bin analysis (G)

Mean fixation duration [33 15 52] Mean fixation velocity [53] Mean fixation stability [53] Fixation count [34] Saccade count [28 33] Mean saccade peak velocity [22] Mean saccade velocity [54] Mean saccade amplitude [52] Mean saccade duration [33] Gaze velocity probability Entropy

Fixation data (F) ✓ ✓ ✓ ✓

✓ ✓

combinations: gaze-bin analysis, fixations, saccades, and combination data. Table 2 shows their parameters. Specifically, the gaze-bin analysis includes two types of parameters: the velocity probability of each bin and entropy. For fixations and saccades, many statistical parameters could be generated from the raw eye movement data. Thus, several frequently used fixation parameters and saccade parameters were selected based on a review of the extant literature. For accuracy, both the feature combinations and the time window satisfied Mauchly’s test of sphericity (feature combinations: p = 0.455 > 0.05; time window: p = 0.600 > 0.05). The main effects of the feature combinations (F3, 21 = 5.806, p = 0.005 < 0.05) and the time window (F3, 21 = 52.405, p = 0.000 < 0.05) on the accuracy were significant. Post hoc test indicated that the gaze-bin analysis achieved significantly higher accuracy than fixation data and saccade data. However, the differences between fixations, saccades, and combination data were not significant. For sensitivity, the feature combinations did not satisfy Mauchly’s test of sphericity (p = 0.002 < 0.05). A Greenhouse-Geisser correction was applied. Its main effects on the sensitivity were not significant (F1.346, 9.422 = 2.662, p = 0.131 > 0.05). The time windows satisfied Mauchly’s test of sphericity (p = 0.094 > 0.05) and had a significant effect on the sensitivity (F3, 21 = 4.433, p = 0.015 < 0.05). For specificity, both the feature combinations and the time window satisfied the Mauchly’s test of sphericity (feature combinations: p = 0.362 > 0.05; time window: p = 0.863 > 0.05). The main effects of the feature combinations (F3, 21 = 3.677, p = 0.028 < 0.05) and the time window (F3, 21 = 17.059, p = 0.000 < 0.05) on the specificity were significant. Post hoc test suggests that the gaze-bin analysis achieved significant higher specificity than fixation data. However, the differences between fixations, saccades, and combination data were not significant. Table 3 shows the performance of fatigue models with four kinds of inputs. The multiple comparisons show that the gaze-bin analysis can achieve higher accuracy in human fatigue detection than other inputs. There is no significant difference in model performance between fixation data, saccade data, and combination data. The results suggest that fixation data can contribute to human fatigue detection. Furthermore, the results show that the combination of fixation data with saccade data cannot greatly improve the accuracy of human fatigue detection. This

Gaze-bin analysis

Fixation data

Saccade data

Combination data

Accuracy Sensitivity Specificity

0.84 0.77 0.85

0.77 0.71 0.75

0.79 0.67 0.78

0.81 0.75 0.79

Combination data (C)

✓ ✓ ✓ ✓ ✓

✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓

finding can be explained in two aspects. First, the combination of fixation and saccade data introduces too many input features and impairs the performance of the bagged tree. Second, there should be some relationships between fixation and saccades. There was no interaction effect of time window and feature combinations (accuracy: F9, 63 = 1.782, p = 0.89; sensitivity: F9, 63 = 0.820, p = 0.600; specificity: accuracy: F9, 63 = 0.665, p = 0.737). For all types of inputs, model performance increased with the time window, as shown in Fig. 8. 5.4. Comparison with other methods The proposed approach was compared with other classical methods, specifically, linear regression, decision tree and support vector machine. For the three types of performance, neither the methods nor time window satisfied Mauchly’s test of sphericity. Hence, a GreenhouseGeisser correction was applied to all tests. Table 4 shows the accuracy of the different methods using the gaze-bin analysis as inputs. The methods used to detect human fatigue had significant effects on the accuracy (accuracy: F1.210, 8.473 = 19.004, p = 0.002 < 0.001). Multiple comparisons show that bagged trees outperformed the other three methods in terms of accuracy. No significant effects on the sensitivity and specificity (sensitivity: F1.023, 7.160 = 3.369, p = 0.108 > 0.05; specificity: F1.058, 7.403 = 0.471, p = 0.705 > 0.05) were found. Therefore, bagged trees cannot significantly improve the efficiency in detecting a “fatigue” state or an “alert” state. The time window affected the specificity (F1.39, 9.733 = 11.3, p = 0.005 < 0.05), but not accuracy (F1.139, 7.795 = 1.569, p = 0.251 > 0.05) or the sensitivity (F1.118, 7.286 = 1.941, p = 0.154 > 0.05). The model’s specificity increased with the time window, suggesting that using longer periods of eyemovement data can improve the model’s ability to detect a fatigued state. The interaction effects of the time window and methods on the accuracy (F9, 63 = 2.329, p = 0.025 < 0.05) and sensitivity (F9, 63 = 2.691, p = 0.010 < 0.005) were significant; however, the interaction effect on the specificity was not significant (F9, 63 = 0.755, p = 0.64). Compared with the results of Section 5.2, the time effects on bagged trees are more significant than those on the other three methods. Therefore, no main effect of the time window on the accuracy was found (see Fig. 9). 5.5. Discussion

Table 3 Model performance (time window = 4, 6, 8, 10 s, bin number = 8). Performance

Saccade data (S)

We run Algorithm 1 and the bagged trees on ThinkPad X1 Carbon with 8th Gen Intel Core™. The processing time for both Algorithm 1 and the bagged trees was shorter than 1 s. Hence, the proposed method can provide fatigue detection timely. The results in Sections 5.2–5.4 confirmed that the proposed method could provide relatively better performance than existing methods in this case study. It is expected that the result can be extended to TCC operations. Specifically, we collected 7

Advanced Engineering Informatics 42 (2019) 100987

F. Li, et al.

Fig. 8. Effects of feature combination and time window on the performance of human fatigue detection.

general, the microsaccades are identified by analyzing the gaze velocity. The commonly used features, such as fixation durations and fixation counts, cannot show information about these involuntary eyetracking patterns. The proposed indicators, gaze-bin analysis can capture this information by focusing on the gaze velocity. Considering the three aspects, the gaze-bin analysis is expected to be a task-independent fatigue indicator and can be applied in TCC operations. Also, there are some limitations in this case study. First, Tobii X3-120 was utilized to collect the eye-tracking data. The eye tracker cannot record blinks. Thus, we did not compare the gaze-bin analysis with blinks-related parameters. Second, the eye-tracking data of this case study were collected in the Mackworth clock test. The eye movements of traffic control operations would be different from the MackworthcClock test. Hence, the usability of the proposed method in TCCS needs more field studies.

Table 4 Model performance (bin number = 12, time window = 4, 6, 8, 10 s). Performance

Decision tree

Linear regression

Support vector machine

The bagged trees

Accuracy Sensitivity Specificity

0.76 0.65 0.82

0.71 0.63 0.81

0.70 0.68 0.81

0.84 0.75 0.85

eye-tracking data from the Mackworth clock test, which is a monitoring task. The monitoring task is the most frequent operation of TCOs. Moreover, in this case study, we controlled the factors of tasks and interface designs to guarantee the changes in the eye-tracking patterns were caused by human fatigue. Furthermore, the eye-tracking patterns related to the gaze velocity is relatively task-independent. As mentioned in Section 1, fixations and saccades are highly related to tasks and interface design [55]. However, the gaze velocity was relatively task-independent and uncontrollable. For example, fixations include microsaccades which are involuntarily produced and task-independent. In

6. Conclusion The performance of TCOs is usually impaired by mental fatigue,

Fig. 9. Effects of methods and time window on the performance of human fatigue detection. 8

Advanced Engineering Informatics 42 (2019) 100987

F. Li, et al.

resulting in considerable threats to public safety. To minimize these risks, an innovative approach to noninvasively detect human fatigue of TCOs using gaze-bin analysis was proposed. A case study was conducted to test the effectiveness and performance of the proposed approach. This research makes several valuable contributions. First, it paves an alternative way to detect mental fatigue in a timely manner. Instead of using brain dynamics, this work explored the possibility of using short-period eye movement in this field. The utilization of short-period data unlocks the critical bottleneck of time delay in detecting human fatigue. Second, we proposed using gaze-bin analysis as complementary human fatigue indicators. The utilization of gaze-bin analysis enables the application of an eye tracker with a low sampling rate. Hence, it can significantly reduce the cost and improve the usability of an eye movement-based human fatigue detector. Thirdly, this study develops an innovative method that uses gaze-bin analysis as inputs of semisupervised bagged trees. This method outperforms the commonly used machine learning methods, such as a decision tree and the support vector machine methods. Lastly, the effects of the time window and bin number on model performance were tested. The results provided reference and guidance for future studies. One limitation of this study is that the proposed method cannot clearly distinguish medium-level fatigue from alertness. The proposed method can only warn fatigued users, and it cannot be used in the fields that required operators to be quite alert. Moreover, to guarantee the performance of the human fatigue model, training data is required for each user. For future studies, a general model that can detect more levels of human fatigue should be developed. The case study was conducted with the Mackworth clock test, the usability of the proposed methods needs more studies on traffic control operations.

Med. 54 (2) (2012) 231–258. [9] S.J. Ray, J. Teizer, Real-time construction worker posture analysis for ergonomics training, Adv. Eng. Inf. 26 (2) (2012) 439–455. [10] F. Li, et al., Hybrid data-driven vigilance model in traffic control center using eyetracking data and context data, Adv. Eng. Inf. 42 (2019) 100940. [11] B. Roets, J. Christiaens, Shift work, fatigue, and human error: An empirical analysis of railway traffic control, J. Transportation Saf. Security (2017) 1–18. [12] P. Weill-Tessier, H. Gellersen, Correlation between gaze and hovers during decisionmaking interaction, Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications, ACM, 2018. [13] I.P. Bodala, et al., Measuring vigilance decrement using computer vision assisted eye tracking in dynamic naturalistic environments, Engineering in Medicine and Biology Society (EMBC), 2017 39th Annual International Conference of the IEEE, IEEE, 2017. [14] I.P. Bodala, et al., EEG and eye tracking demonstrate vigilance enhancement with challenge integration, Front. Hum. Neurosci. 10 (2016) 273. [15] D. Cazzoli, et al., Eye movements discriminate fatigue due to chronotypical factors and time spent on task–a double dissociation, PLoS ONE 9 (1) (2014) e87146. [16] L.L. Di Stasi, et al., Saccadic eye movement metrics reflect surgical residents' fatigue, Ann. Surg. 259 (4) (2014) 824–829. [17] Q. Ji, X. Yang, Real-time eye, gaze, and face pose tracking for monitoring driver vigilance, Real-time Imaging 8 (5) (2002) 357–377. [18] R.A. McKinley, et al., Evaluation of eye metrics as a detector of fatigue, Hum. Factors: J. Hum. Factors Ergonomics Soc. 53 (4) (2011) 403–414. [19] Y. Yamada, M. Kobayashi, Detecting mental fatigue from eye-tracking data gathered while watching video, Conference on Artificial Intelligence in Medicine in Europe, Springer, 2017. [20] M. Nyström, K. Holmqvist, An adaptive algorithm for fixation, saccade, and glissade detection in eyetracking data, Behav. Res. Methods 42 (1) (2010) 188–204. [21] R. Schleicher, et al., Blinks and saccades as indicators of fatigue in sleepiness warnings: looking tired? Ergonomics 51 (7) (2008) 982–1010. [22] C. Finke, et al., Dynamics of saccade parameters in multiple sclerosis patients with fatigue, J. Neurol. 259 (12) (2012) 2656–2663. [23] K. Hirvonen, et al., Improving the saccade peak velocity measurement for detecting fatigue, J. Neurosci. Methods 187 (2) (2010) 199–206. [24] S. Saito, Does fatigue exist in a quantitative measurement of eye movements? Ergonomics 35 (5–6) (1992) 607–615. [25] P. Qvarfordt, M. Lee, Gaze patterns during remote presentations while listening and speaking, Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications, ACM, 2018. [26] S. Vidyapu, V.V. Saradhi, S. Bhattacharya, Fixation-indices based correlation between text and image visual features of webpages, Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications, ACM, 2018. [27] L. Hartley, et al., Review of fatigue detection and prediction technologies, National Road Transport Commission (2000). [28] Y. Liang, M.L. Reyes, J.D. Lee, Real-time detection of driver cognitive distraction using support vector machines, IEEE Trans. Intell. Transp. Syst. 8 (2) (2007) 340–350. [29] K. Pearson, Contributions to the mathematical theory of evolution. II. Skew variation in homogeneous material, Philos. Trans. Royal Soc. London 186 (Part I) (1895) 343–424. [30] L. Breiman, Bagging predictors, Machine Learning 24 (2) (1996) 123–140. [31] T. Morris, J.C. Miller, Electrooculographic and performance indices of fatigue during simulated flight, Biol. Psychol. 42 (3) (1996) 343–360. [32] U. Ahlstrom, F.J. Friedman-Berg, Using eye movement activity as a correlate of cognitive workload, Int. J. Ind. Ergon. 36 (7) (2006) 623–636. [33] L.-H. Du, et al., Detecting driving fatigue with multimodal deep learning, Neural Engineering (NER), 2017 8th International IEEE/EMBS Conference on, IEEE, 2017. [34] L. Jin, et al., Driver sleepiness detection system based on eye movements variables, Adv. Mech. Eng. 5 (2013) 648431. [35] L.L. Di Stasi, et al., Towards a driver fatigue test based on the saccadic main sequence: A partial validation by subjective report data, Transportation Res. Part C: Emerging Technol. 21 (1) (2012) 122–133. [36] V. Renata, et al., Investigation on the correlation between eye movement and reaction time under mental fatigue influence, 2018 International Conference on Cyberworlds (CW), IEEE, 2018. [37] X. Zhu, et al., EOG-based drowsiness detection using convolutional neural networks, in IJCNN, 2014. [38] S.R. Sain, The Nature of Statistical Learning Theory, Taylor & Francis Group, 1996. [39] D.G. Saunders, et al., Two-dimensional data binning for the analysis of genome architecture in filamentous plant pathogens and other eukaryotes, Plant-Pathogen Interactions, Springer, 2014, pp. 29–51. [40] R.A. Davis, et al., Adaptive binning: An improved binning method for metabolomics data using the undecimated wavelet transform, Chemom. Intelligent Lab. Syst. 85 (1) (2007) 144–154. [41] M. Lavielle, K. Bleakley, Automatic data binning for improved visual diagnosis of pharmacometric models, J. Pharmacokinet Pharmacodyn. 38 (6) (2011) 861–871. [42] C.E., Shannon, W. Weaver, A.W. Burks, The mathematical theory of communication, 1951. [43] C.-Y. Zhang, et al., Data-driven train operation models based on data mining and driving experience for the diesel-electric locomotive, Adv. Eng. Inf. 30 (3) (2016) 553–563. [44] H. Son, et al., Classification of major construction materials in construction environments using ensemble classifiers, Adv. Eng. Inf. 28 (1) (2014) 1–10. [45] J.R. Quinlan, Bagging, boosting, and C4. 5. in AAAI/IAAI, vol. 1, 1996. [46] G. De'Ath, Boosted trees for ecological modeling and prediction, Ecology 88 (1)

Declaration of Competing Interest We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled, “Proactive Mental Fatigue Detection of Traffic Control Operators using Bagged Trees and Gaze-bin analysis”. Acknowledgement This research was supported by the Singapore Maritime Institute Research Project (SMI-2014-MA-06), National Natural Science Foundation of China (71804034), and Research Fund of Shenzhen Science and Technology Innovation Committee (JCYJ20180306171958907). The authors would like to thank all VTS operators and students participated in this study. References [1] G. Praetorius, Safety within the Vessel Traffic Service (VTS) Domain, Understanding the role of the VTS for safety within maritime traffic management, 2012. [2] M. Härmä, et al., The effect of an irregular shift system on sleepiness at work in train drivers and railway traffic controllers, J. Sleep Res. 11 (2) (2002) 141–151. [3] X. Hou, et al., EEG-based human factors evaluation of conflict resolution aid and tactile user interface in future Air Traffic Control systems, Advances in Human Aspects of Transportation, Springer, 2017, pp. 885–897. [4] F. Li, et al., A user requirement-driven approach incorporating TRIZ and QFD for designing a smart vessel alarm system to reduce alarm fatigue, J. Navigation (2019) 1–21. [5] U. Metzger, R. Parasuraman, The role of the air traffic controller in future air traffic management: An empirical study of active control versus passive monitoring, Hum. Factors 43 (4) (2001) 519–528. [6] G. Xu, et al., Toward resilient vessel traffic service: A sociotechnical perspective, Transdisciplinary Eng.: A Paradigm Shift (2015) 829. [7] G. Praetorius, E. Hollnagel, J. Dahlman, Modelling vessel traffic service to understand resilience in everyday operations, Reliab. Eng. Syst. Saf. 141 (2015) 10–21. [8] S.E. Lerman, et al., Fatigue risk management in the workplace, J. Occup. Environ.

9

Advanced Engineering Informatics 42 (2019) 100987

F. Li, et al. (2007) 243–251. [47] T.G. Dietterich, An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization, Mach. Learning 40 (2) (2000) 139–157. [48] J.R. Quinlan, C4. 5: Programs for Machine Learning, Elsevier, 2014. [49] M. Mihelčić, et al., A framework for redescription set construction, Expert Syst. Appl. 68 (2017) 196–215. [50] Gastwirth, J.L.J.T.r.o.e. and statistics, The estimation of the Lorenz curve and Gini index, 1972, pp. 306–316. [51] L.J.M.l. Breiman, Bagging predictors 24(2) (1996) 123–140.

[52] N.M. Moacdieh, N. Sarter, Using eye tracking to detect the effects of clutter on visual search in real time, IEEE Trans. Hum.-Mach. Syst. 47 (6) (2017) 896–902. [53] E. Abdulin, O. Komogortsev, User eye fatigue detection via eye movement behavior, Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems, ACM, 2015. [54] Y. Morad, et al., Ocular parameters as an objective tool for the assessment of truck drivers fatigue, Accid. Anal. Prev. 41 (4) (2009) 856–860. [55] S. D'Angelo, D. Gergle, An eye for design: gaze visualizations for remote collaborative work, Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, ACM, 2018.

10