GSR-based distracted driving identification using discrete & continuous decomposition and wavelet packet transform

GSR-based distracted driving identification using discrete & continuous decomposition and wavelet packet transform

Smart Health 14 (2019) 100085 Contents lists available at ScienceDirect Smart Health journal homepage: www.elsevier.com/locate/smhl GSR-based distr...

2MB Sizes 0 Downloads 44 Views

Smart Health 14 (2019) 100085

Contents lists available at ScienceDirect

Smart Health journal homepage: www.elsevier.com/locate/smhl

GSR-based distracted driving identification using discrete & continuous decomposition and wavelet packet transform Omid Dehzangi a, Vaishali Sahu b, Vikas Rajendra b, Mojtaba Taherisadr b, * a b

Rockefeller Neuroscience Institute, West Virginia University, Morgantown, 26505, USA Department of Electrical and Computer Engineering, University of Michigan - Dearborn, Dearborn, 48126, USA

A R T I C L E I N F O

A B S T R A C T

Keywords: Galvanic skin response Distracted driving Physiological identification Continuous/discrete decomposition Wavelet packet transform Feature selection

Distracted driving is considered as one of the main factors contributing to the fatalities on the road. Thus, detecting distraction early while driving and alerting the driver can help reduce accidents on the road. While previous studies like as camera-based systems show acceptable detection results, they suffer from distraction detection latency and risk of privacy violation. On the other hand, using physiological signals compensates for early and more secured detection of distraction. Although these technologies achieved high accuracy, they are expensive and extensively intrusive to the drivers. In this study, we propose a minimally intrusive wearable physiological sensor (available on smartwatches) to quantify skin conductance (SC) known as galvanic skin responses (GSR) to characterize and identify distraction during naturalistic driving. 15 driver subjects participated in our experiments, driving normally and then driving while engaged in a secondary task and their GSR data was recorded and labeled accordingly. We employ two deconvolution techniques i) Continuous Decomposition analysis (CDA) and ii) Discrete Decomposition Analysis (DDA) on the raw SC signal to decompose it into the phasic and tonic components. We train various classifiers on the resulted feature matrices in order to identify the data associated with normal driving from the distracted scenarios. Our results show that an averaged 10 fold cross-validation accuracy of 92.2% is achieved using Ensemble bagged classifier on the extracted 74 dimension feature space. The high dimensionality of the original feature space involves high computational cost and might also introduce the curse of dimensionality in the generalization phase. Therefore, we evaluate various types of feature selection methods to eliminate the irrelevant dimensions in the feature space and create a subspace relevant to the task of distraction identification. The improved accuracy of 92.9% and 93.5% is achieved by employing only 10-D and 15-D feature spaces using the embedded random forest feature selection and the ensemble bagged classifier, respectively.

1. Introduction Distracted driving is one of the major factors leading to fatalities on the road. Based on th National Highway Traffic Safety Administration (NHTSA) reports, the lost lives to fatal road crashes are consistently increasing (Administration et al., 2013). Accidents on the road leading to fatality are predicted to be the ninth leading cause of death globally and are estimated to become the seventh leading cause of death by 2030 (Organization, 2015). Distraction while driving is mainly caused by performing secondary tasks such

* Corresponding author. E-mail address: [email protected] (M. Taherisadr). https://doi.org/10.1016/j.smhl.2019.100085 Available online 4 November 2019 2352-6483/© 2019 Published by Elsevier Inc.

O. Dehzangi et al.

Smart Health 14 (2019) 100085

having engaging conversation with the passengers, eating, drinking or using one of the several in-vehicle information systems (IVISs) namely cell phones, navigation system and radio. Helping the drivers to use the IVISs without compromising on the safety is an important challenge (Wang, Jiang, Xia, & Cao, 2010). Distraction can be categorized into two main groups of visual distraction and cognitive distraction. Cognitive distraction corresponds to mind off the primary task of driving and visual distraction occurs when the driver has his/her eyes off the road (Victor, 2005). Among various secondary tasks that diverts the attention of the driver form the task of driving, it is observed higher distraction is induced by phone conversations, or using the phone to send text messages. Much importance is given to aforesaid scenarios since they cause cognitive and visual distraction, respectively, that divert the attention of the driver from driving task to a great extent (Liang, Reyes, & Lee, 2007). Developing a warning system that detects and mitigates the inattention state of the driver when distraction occurs in real-time is one way to solve this problem (Donmez, Boyle, & Lee, 2003). There has been an outbreak of research efforts to identify distraction while driving among the research community, industry and government agencies (Administration et al., 2013; Box, 2009). Several investigations were performed based on tracking driver's eye gaze (Liu, Yang, Huang, & Lin, 2015), eye lid movement (Metz, Sch€ omig, & Krüger, 2011), lane tracking and video cameras that capture the driver's behavior to identify inattention state of the driver. Although the above techniques achieved successful performance, they suffer from issues such as privacy and delayed identification and responses after the distraction is visually noticeable. To overcome these limitations, there is a need for early, reliable, and privacy preserving distraction detection systems, which can be achieved via continuous monitoring of physiological signals such as Electroencephalogram (EEG) and Electrocardiogram (ECG). While EEG demonstrated state-of-the-art results that are comprehensive and reliable (Dehzangi & Taherisadr, 2019; Kim, Jeong, Jung, Park, & Jung, 2013; Taherisadr & Dehzangi, 2019), the complexity to setup and analyze data is one of the major limitation making the system expensive and intrusive to implement (Wang, Zhang, Wu, Darvas, & Chaovalitwongse, 2015). Although ECG is comparatively easier to record than EEG (Taherisadr, Asnani, Galster, & Dehzangi, 2018), it also faces similar issues of intrusive implementation and cannot be put in current consumer electronics e.g. smart watches. Skin conductance (SC) also known as galvanic skin responses (GSR) is the electrical conductance of skin and is a low cost and robust physiological signal to measure, which is a minimally intrusive modality that can be recorded on the wrist and fingers easily. (Ciabattoni et al., 2017; Dehzangi, Rajendra, & Taherisadr, 2018a; Dehzangi & Taherisadr, 2018; Rajendra & Dehzangi, 2017). SC is an excellent physiological indicator of cognitive and emotional state. Measuring skin conductivity can reveal changes in sympathetic nervous system which drives human behavior, cognitive and the emotional state on a subconscious level (Nourbakhsh, Wang, & Chen, 2013). Authors in (Ayata, Yaslan, & Kamas¸ak, 2016) used time domain and empirical mode decomposition based features extracted from galvanic skin responses (GSR) to perform emotion recognition. They categorized valence and arousal and used several machine learning algorithm such as random forest, decision tree to study the relationship between the physiological signals and arousal and valence. In the previous work, (Liu, Fan, Zhang, & Gong, 2016), the authors proposed a novel human emotion recognition technique that selects GSR features automatically. Wavelet function was employed to de-noise the data and data was normalized to eliminate individual difference. They also employed covariance based feature selection on the 30 generated feature space to improve the recognition performance. Support vector machine (SVM) was used to classify human emotion and SVM demonstrated accuracy of 66.67%. The above investigations provide evident proof to use skin conductance signal to identify human emotion. However, very few investigations are conducted to evaluate cognitive workload or distraction during naturalistic driving experiment. In paper (Min et al., 2013), this work did not include any significant quick and reliable distraction detection technique and the experiment was conducted using a driving simulator which is another major limitation of this study. The authors in (Chen, Zhao, Ye, Zhang, & Zou, 2017) used several physiological signals like electrocardiogram (ECG), galvanic skin response (GSR) and respiration to identify distraction while driving on the road. Spectral, time and wavelet multi-domain features were extracted for 10 s interval of data. Optimal feature sets were selected by combining Sparse Bayesian Learning (SBL) and Principal Component Analysis (PCA). Kernel based classifiers were employed to identify stress while driving. Experimental results of this study reveals that employing physiological measures to in-vehicle intelligent system to assist drivers on road for identifying and alerting stress early. In this study raw GSR signal was used with combination of other physiological measures. Authors in (Dehzangi, Sahu, Taherisadr, & Galster, 2018b) studies a multi-modal system in order to detect the driver distraction. They collected motion signal (accelerometer and gyroscope), electrocardiogram (ECG), galvanic skin response and CAN-Bus signal from 10 subjects and extracted wide variety of features from collected signals. Then to improve the recognition accuracy the multi-modal feature space was fused and evaluated. This study obtained an average accuracy of 99.1%. In our previous work (Dehzangi et al., 2018a; Rajendra & Dehzangi, 2017) GSR recorded while naturalistic driving was used to detect driver distraction. Two scenarios of distracted state (i. using phone for calling and ii. using phone for sending and receiving text) was identified against non-distracted state of driving. we employed a continuous decomposition analysis on raw GSR signal and extracted relevant temporal and spectral features. To eliminate redundancy and decrease the time complexity caused by high dimensional feature space we employed feature selection using support vector machine - recursive feature elimination method (SVM-RFE) to select an optimized subset of features. The selected feature space demonstrated high identification accuracy. In this study, we introduce a skin conductance (SC)-based driver monitoring and intervention system to detect the effect of secondary task of calling and texting during a set of naturalistic driving experiments. A wristband wearable device was used to record SC data from 15 subjects. We then perform deconvolution of the original raw skin conductance signal into its phasic component using two techniques. i) Continuous Decomposition analysis (CDA) (Benedek & Kaernbach, 2010) and ii) Discrete Decomposition Analysis (DDA) (Alexander et al., 2005). Skin conductance is basically made up of two components i) skin conductance level-SCL (tonic component) and ii) skin conductance responses-SCR (phasic component). The SCR component contains the most useful information to characterize the signal. Segmentation of 4 s with 3 s overlap was applied to phasic component of the SC signal. We then extracted several temporal measures from the decomposed phasic component (of both DDA and CDA) and spectral measures were extracted after applying wavelet packet 2

O. Dehzangi et al.

Smart Health 14 (2019) 100085

Fig. 1. Wireless galvanic skin response (GSR) wearable device on the right-hand wrist.

Fig. 2. Proposed driver monitoring and intervention system on the edge to detect the effect of secondary task of calling and texting during naturalistic driving experiment. Galvanic skin response (GSR); Continuous decomposition analysis (CDA); Discrete decomposition analysis (DDA); Feature selection using neighborhood component analysis (FSNCA).

decomposition on every window from the phasic data of CDA. Overall three feature space was created i)DDA-measures, ii) CDA-temporal measures and iii) CDA-spectral (using wavelet coefficients). We investigated individual feature space initially and then augmented the feature spaces and employed several state-of-the-art identification algorithm with 10 fold cross validation to generate estimates of the prediction model generalization accuracies. To cope with curse of dimensionality caused by high dimensional feature space for augmented measures we employ several feature selection techniques. 2. Data acquisition We employed our proposed custom designed wireless wearable data acquisition platform that uses a multi-modal physiological signal, including Skin Conductance (SC) during driving on the road. This experiment was approved by Institutional Review Board of University of Michigan. The Ford Escape 2015 was provided to the participants and their corresponding Skin Conductance signal at a frequency of 50 Hz was recorded throughout the experiments. Fig. 1 shows wireless galvanic skin response (GSR) wearable device used 3

O. Dehzangi et al.

Smart Health 14 (2019) 100085

Fig. 3. Continuous decomposition analysis fro non-distracted state.

for data acquisition. Fifteen healthy subjects between the age group of 20–40 years participated in this study. Healthy males were considered for the experiment to eliminate the inconsistency. The data were recorded during three driving scenarios namely i) normal driving (non-distracted), ii) driving while having engaging phone conversation (cognitive distraction) and iii) driving while sending and receiving text messages (cognitive with visual distraction). 3. Methodology The proposed methodology is shown in Fig. 2. In our investigation, we first performed two deconvolution techniques namely Continuous decomposition analysis (CDA) and Discrete decomposition analysis (DDA) on the raw skin conductance signal Xi to extract the phasic component Pi . The index i represents subject id. Segmentation of 4 s with 3 s overlap was applied on the extracted phasic  component using CDA and DDA techniques. That is given by {Pij ¼ ρij j ¼ 1; 2; …:; n} where n represents number of windows in range f1; 2; …:; ng and every window is represented by a feature vector F ¼ ff 1ij ; f 2ij ; f 3ij ; …::f nij g. From each segment, temporal measures were extracted form the output of both CDA and DDA techniques. We then employed wavelet packet transform (WPT) decomposition on every segment (only CDA transformed) as mentioned above. WPT decomposition generated 32 wavelet coefficients at level 5. Out of every frequency band, power and variance measures were extracted. The extracted measures were augmented to form a 74-D space. Several identification algorithms were used to measure the accuracy on the original 74-D space, then various types of feature selection techniques were employed to cope with computational costs and possible curse of dimensionality and to improve the identification accuracy. In this section, we discuss the implementation of each step in detail.

Fig. 4. Deconvolution for distracted state. 4

O. Dehzangi et al.

Smart Health 14 (2019) 100085

Table 1 List of all extracted measures with description and formula. Domain

Measures

Description

Formula

DDA

Number of peaks Sum of amplitude of peaks Mean

identifies number of peaks in a window using second order derivative summation of all the identified peak's amplitude

Q2 ðρ Þ ¼ 1:3Q0 ðρij Þ þ 1:1Q1 ðρij Þ P ij Q2 ðρij Þ

Entropy

provides temporal distribution of signal energy in a given window

Accumulated Galvanic skin response Maximum value Katz Features

sum of phasic galvanic skin response values in a window over total task time provides maximum galvanic skin response value in a window calculates fractal dimensions of the time-series signal in a window using katz algorithm

3 Auto Regressive Coefficients

for an AR model of order p, it provides output that is linear combination of the past p outputs plus a white noise (we selected order of 3) provides the sum of the absolute squares of time-domain samples of a signal divided by the signal length

CDA Time Domain

provides mean of a window

P ρij n Hðρij Þ ¼  P Pðρij ¼ ak ÞlogðPðρij ¼ ak ÞÞ i

CDA Spectral using WPT sub-Bands

power variance

provides expectation of squared deviation of a variable from its mean

P ρij T maxðρij Þ   L log logðnÞ a    ¼ D¼ d d 0 logðnÞ þ log log a L p P aðkÞyðn  kÞ ¼ xðnÞ yðnÞ ¼ k¼1 T 1X ðρ Þ2 ðtÞ T t¼1 ij Pn 2 j¼1 ðρij  μÞ

Pρij ¼

σ2 ¼

n

3.1. Deconvolution analysis We employed two deconvolution techniques; continuous decomposition analysis (CDA) and discrete decomposition analysis (DDA), to the raw skin conductance signal Xi to decompose it into its phasic (SCR) Pi and tonic component (SCL) Ti . We consider the phasic component for our further analysis as it represents the most discriminative information about skin conductance data. 3.1.1. Continuous decomposition analysis The skin conductance data is made by superposition of subsequent skin conductance responses (SCRs or GSRs). Calculating the actual responses to an event-related sympathetic activity becomes tedious due to this nature of the original signal. By applying a continuous decomposition technique to separate the raw signal into phasic and tonic components this limitation is eliminated. Much useful information cannot be extracted from tonic component. Whereas the phasic component of SC data is considered for further investigation as it contains the actual event-related responses to any sympathetic activity predominantly in the form of distinct burst of peaks with a zero baseline (Benedek & Kaernbach, 2010). Phasic component is extracted in three steps: deconvolution of galvanic skin response (GSR) data, estimation of tonic activity and estimation of phasic activity. A specific change in skin conductivity is caused by secretion of sweat due to activity of the sudomotor nerve. In mathematical terms, the sudomotor nerve activity can be treated as a driver, containing distinct sequence impulse/bursts, which triggers a particular impulse response (i.e., SCRs). The outcome of this procedure can be described by convolution of the driver with impulse response function (IRF). IRF characterizes the shape of impulse response over time (Benedek & Kaernbach, 2010): Pi ¼ DriverPi IRF

(1)

Fig. 5. Wavelet packet decomposition of SC signal ranging from 0 to 25 Hz into 5 level subspaces. 5

O. Dehzangi et al.

Smart Health 14 (2019) 100085

The phasic activity is accompanied by a gradually changing tonic activity and is given as follows: Xi ¼ Ti þ Pi ¼ Ti þ DriverPi IRF

(2)

Tonic activity also comprises of a similar driver function along with IRF. The skin conductance signal can then be written as: Xi ¼ ðDriverPi þ DriverTi ÞIRF

(3)

Skin conductance datas deconvolution involves a phasic and tonic fraction. By estimating one of them, the other can be determined easily: Xi ¼ DriverXi ¼ ðDriverPi þ DriverTi Þ IRF

(4)

Fig. 3 and Fig. 4 represent the decomposition of skin conductance signal for two scenarios, normal and distracted. The top most subplot in Fig. 3 represents the original skin conductance signal, the middle subplot represents decomposition of tonic driver and the bottom subplot represents the phasic driver of the skin conductance signal. In Fig. 4 the topmost subplot represents the raw skin conductance signal; the middle subplot shows the signal after applying CDA and the bottom subplot represents the SCR after applying DDA. It can be observed from Fig. 3 that for normal scenario the phasic driver does not show impulse burst (bursts of consecutive peaks) as the subject is not under heavy workload. In Fig. 4 more number of impulse bursts is observed since the subject is undergoing both cognitive and visual stress. It is also observed that after applying DDA the peaks are not overlapping and appear as single peak that represents distraction in a discrete fashion. 3.1.2. Discrete decomposition analysis We can observe from topmost subplot of Fig. 4 that the SCR peaks are appearing to be overlapping. Thus, detecting distinct peaks is

Fig. 6. T-Test analysis at sub-band level between normal and phone scenarios.

Fig. 7. T-Test analysis at sub-band level between normal and text scenarios. 6

O. Dehzangi et al.

Smart Health 14 (2019) 100085

Table 2 Feature selection model-comparison. Methods

Advantages

Disadvantages

Examples

Filter

1. 2. 3. 1. 2. 3. 1. 2.

No interaction with learner

ReliefF

Learner dependent selection

Random Forest

1. Expensive. 2. Risk of overfitting. 3. Classifier dependent selection.

FSCNCA

Embedded

Wrapper

Lower computational cost than wrapper. Fast. Provides good generalization Interaction with learner. Lower computational cost than wrappers Captures correlational between features Interaction with learner. Captures correlation between features.

tedious. We employ a technique to separate the individual SCR peaks called discrete decomposition analysis (DDA) (Alexander et al., 2005). In this method, the skin conductance signal is converted into a time-series with short time-constant and as a result separated peaks are extracted from the estimated underlying driver signal (i.e. SCR) (Alexander et al., 2005). The differential equation given below governs the skin conductance time-series: Pi ðtÞ ¼ ðγ 0 γ 1 Þ

d2 y dy þ ðγ 0 þ γ 1 Þ þ Xi dt2 dt

(5)

Here γ 0 and γ 1 are time-constants, Xi ðtÞ skin conductance and Pi ðtÞ the driver function (SCR). At t ¼ 0 an impulsive spike in the driver generates an SCR in a biexponential function eðt=γ0 Þ  eðt=γ1 Þ . The convolution of the driver function Pi ðtÞ with a biexponential function defines the signal Xi ðtÞ in Eq. (6). Separated peaks are obtained by deconvolving the signal to obtain the driver. The bottom subplot of Fig. 4 shows the SC signal after applying DDA. 3.2. Segmentation The data was segmented into 4 s window with 3 s overlap. This was employed to meet the short-response time requirement of driver alerting system on the edge. 3.3. Feature extraction Features were extracted from each of the windows generated from the previous step. Several time domain features were extracted from the decomposed phasic component of the skin conductance signal after applying both CDA and DDA. DDA was used to extract measures like number of peaks and sum of amplitude of the peaks, these measures were extracted based on our observation from Fig. 4. CDA was used to extracted several time domain measures that represents the discriminative space better such as mean, entropy, accumulated galvanic skin response, maximum value, katz feature and auto regressive coefficients. The output of CDA was also used to extract spectral measure using wavelet packet transformation wherein we extract power and variance measures from each sub-band generated. List of extracted features are illustrated in Table 1 along with description and formula. 3.4. Wavelet packet transform To analyze physiological signals containing high-frequency noise component wavelets are most frequently considered as they demonstrate good frequency resolution to localize low frequency component and finite time resolution to resolve high frequency component (Akay, 1997). Wavelet packet transform (WPT) was introduced by summing up the connection between multiresolution approximations and wavelets (Coifman, Meyer, & Wickerhauser, 1992). The WPT could be considered as a tree of subspaces, Ω0;0 the root node of the tree (initial signal space). Generally every node in the tree can be denoted as Ωp;q , with p denoting the present level of decomposition and q denotes the sub-band index withing the present level of decomposition. Each node in the tree is decomposed into two orthogonal subspaces: an approximation space Ωp;q → Ωpþ1;2q plus a detailed subspace Ωp;q → Ωpþ1;2qþ1 (Englehart, 1998). This is performed by splitting the orthogonal basis φp ðρij  2p qÞq2Pi of Ωp;q into two additional orthogonal bases φpþ1 ðρij  2pþ1 qÞq2Pi of Ωpþ1;2q and ψ pþ1 ðρij  2pþ1 qÞq2Pi of Ωpþ1;2qþ1 (Mallat, 2008), where φp;q ðρij Þ and ψ p;q ðρij Þ are scaling and wavelet functions respectively as presented in (Mallat, 2008) as     ρij  2p q 1 φp;q ρij ¼ pffiffiffiffiffiffiffi φ 2p j2p j  

1

ψ p;q ρij ¼ pffiffiffiffiffiffiffi ψ j2p j



(6)



ρij  2p q

(7)

2p

7

O. Dehzangi et al.

Smart Health 14 (2019) 100085

where 2p is a parameter that measures the level of compression and 2p q a translation parameter that contains time location of the wavelet. Fig. 5 represents WPT decomposition of SC signal with 5 levels of decompositions. 3.5. WPT subBand analysis WPT was employed on the data from every window generated in the previous step, power and variance measures were extracted from all the sub-bands at 5th level of decomposition as it is the highest resolution. We then employed t-test to check statistical difference between non-distracted and distracted state. Two scenarios were considered i) Normal (non distracted state) vs Phone (distracted state) and ii) Normal vs Text (distracted state). Figs. 6 and 7 illustrate the result of the statistical t-test between non-distracted and distracted state with the resulting p-values color mapped to the wavelet decomposition trees. Where brighter color represents lower p-value and vice-versa. Observing the p-values for the frequency sub-bands at the 5th level of wavelet decomposition of both Figs. 6 and 7 we could identify several sub-bands demonstrating lower p-value that shows a significant statistical difference between non-distracted and distracted scenarios. 3.6. High dimension feature characterization and selection ’Curse of dimensionality’ is a phenomena in which as the feature dimension increases the amount of data required to produce dependable output increases. Also the computational cost increases exponentially (Concha, Xu, & Piccardi, 2010). Feature selection is the process of removing redundant features while maintaining the performance of the classifier. One of the advantages of feature selection is it maintains the original feature space (Saeys, Inza, & Larra~ naga, 2007). The Feature selection is divided into three models: Filter method which selects the best subsets of features by evaluating and ranking them independently. Wrapper method uses classifier to evaluate which features are reliable. Embedded method searches for best features while constructing the classifier (Tulum, Artu, & Bolat, 2013). Table 2 provides a brief comparison of the different models. Following feature selection models are evaluated in this study. 3.6.1. ReliefF For each sample, M nearest neighbor with different class label is chosen. So the nearest neighbor with same sample class label is termed as nearest hit and rest are termed as nearest miss. This nearest hits and misses are evaluated to rank the features (Saeys et al., 2007). The algorithm for the ReliefF method is described in Algorithm 1. Algorithm 1 ReliefF Input:- Training instances T: Attribute vector and class Values Output:- Vector W: Estimation of quality of attributes 1. Set all the initial weights W[A] ¼ 0; 2. for i ¼ 1 to m do begin ▹ m: training set size 3. Randomly select an training instance Ri ; 4. find M nearest hits Hj ; 5. for each class C 6¼ class Ri do; 6. From class C find M nearest misses Mj ðcÞ; 7. for A ¼ 1 to a do ▹ a: dimension of feature vector # " XM    P P PðCÞ dif ðA; Ri ; Hi ðCÞlðmM 8. W A ¼ W A  kj¼1 dif ðA; Ri ; Hj Þ lðm  MÞ þ j¼1 c6¼classðRi Þ 1  PðclassðRi ÞÞ 9. end; 10. Apply stopping criteria; 11. end; 12. end;

3.6.2. Random forest feature selection method Decision trees is build using samples of original feature space and different averaging algorithms are applied to improve the accuracy n-Canedo, S ~o, & Alonso-Betanzos, 2013). The algorithm for Random forest model is described in Algorithm 2. (Bolo anchez-Maron Algorithm 2 Random forest feature selection Input:- Training samples with xi feature vector and yi class label Output:- Reduced feature vectors. 1. Choose the number of trees m to build; 2. for i ¼ 1 to m do begin 3. Select a sample from training instances; (continued on next page)

8

O. Dehzangi et al.

Smart Health 14 (2019) 100085

Algorithm 2 (continued ) 4. Train the sample to build a tree; 5. for each split do 6. Choose random k features from original features P; 7. Choose the best features from the K features and partition the data; 8. end; 9. Apply stopping criteria for tree model without pruning; 10. end;

Fig. 8. CDA spectral- Classifier accuracy across subjects.

Fig. 9. CDA temporal- Classifier accuracy across subjects.

Fig. 10. DDA- Classifier accuracy across subjects. 9

O. Dehzangi et al.

Smart Health 14 (2019) 100085

3.6.3. Feature selection by neighborhood component analysis (FSCNCA) Feature weights are learned using diagonal adaptation of Neighborhood component (NCA). The classification accuracy is maximized by linear transformation of the features. NCA uses distance metric to find the linear transformation. Differentiating objective function is used to obtain transformation matrix B. In this study we have used the Stochastic gradient descent (SGD) solver. Initially, minimum classification loss is obtained by tunning the regularization parameter λ by cross validation. Weight of the features are calculated by fitting NCA model using best λ value and SGD solver. Threshold is applied on the weight to obtain the feature subsets (Yang, Wang, & Zuo, 2012). The algorithm for FSCNCA is described in Algorithm 3.

Fig. 11. CDA spectral þ DDA features- Classifier accuracy across subjects.

Fig. 12. All features- Classifier accuracy across subjects.

Fig. 13. Prediction speed.

10

O. Dehzangi et al.

Table 3 Area under- ROC curve. AUC region- Considering Normal driving as positive class and texting and on phone while driving as negative class Subjects

11

Sub Sub Sub Sub Sub Sub Sub Sub Sub Sub Sub Sub Sub Sub Sub

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

CDA-Temporal þ DDA

CDA- Spectral

ALL 74 features

SVML

SVMQ

SVMG

KNN

EB

SVML

SVMQ

SVMG

KNN

EB

SVML

SVMQ

SVMG

KNN

EB

0.63 0.7 0.81 0.4 0.68 0.71 0.74 0.88 0.93 0.95 0.87 1 0.92 0.72 0.81

0.77 0.73 0.77 0.86 0.86 0.8 0.93 0.86 0.92 0.91 0.91 0.97 0.92 0.8 0.85

0.79 0.79 0.84 0.88 0.86 0.89 0.94 0.96 0.94 0.96 0.95 1 0.96 0.75 0.77

0.76 0.83 0.79 0.81 0.82 0.88 0.92 0.9 0.93 0.91 0.93 1 0.96 0.67 0.67

0.92 0.95 0.91 0.95 0.97 0.94 0.96 0.97 0.94 0.98 0.99 1 1 0.91 0.97

0.9 1 0.97 1 0.87 0.94 1 0.97 0.95 1 0.91 1 0.99 0.9 0.94

0.9 0.99 0.98 0.99 0.94 0.96 0.99 0.96 0.95 1 0.96 1 1 0.94 0.97

0.94 1 0.96 1 0.91 0.99 1 0.98 0.96 1 0.99 1 1 0.97 0.98

0.92 0.99 0.98 0.99 0.93 0.97 1 0.99 0.96 1 0.99 1 1 0.97 0.99

0.93 1 0.98 1 0.98 0.98 1 0.99 0.99 1 0.98 1 1 0.96 0.99

0.88 0.99 0.96 0.99 0.82 0.94 0.99 0.97 0.94 1 0.88 1 0.95 0.85 0.91

0.87 0.98 0.91 0.97 0.92 0.91 0.99 0.95 0.92 0.97 0.92 1 0.94 0.86 0.93

0.93 0.99 0.96 1 0.89 0.96 1 0.98 0.95 0.98 0.97 1 0.97 0.85 0.85

0.91 0.99 0.97 1 0.82 0.97 1 0.96 0.96 0.99 0.95 1 0.98 0.87 0.81

0.99 1 0.99 1 0.99 0.98 1 0.99 0.99 1 1 1 1 0.99 1

Smart Health 14 (2019) 100085

O. Dehzangi et al.

Smart Health 14 (2019) 100085

Algorithm 3 FSCNCA Input:- Training sample T: with xi feature vector and yi class label. Define the initial step length as α, kernel width σ, regularization parameter λ and small positive constant η. Output:- Quality of the feature vectors. 1. Initialize W(0)¼(1,1,.,.), εð0Þ ¼  ∞, t ¼ 0 and repeat; 2. for i ¼ 1, …N do 3. Compute Pij and Pi using W t ; P Pi ¼ yij Pij j

Pij ¼ P

kðDw ðxi ; xj ÞÞ

k6¼i kðDw ðxi ; xj ÞÞ  P Dw ðxi ; xj Þ ¼ dl¼1 W 2l xil

i 6¼ J   xjl 

4. for l ¼ 1,….d do   X X   P P   1 ðtÞ 5. △t ¼ 2 Pij xil  xjl   Pi Pi yil Pij xil  xjl   λ W l

σ

i

j6¼i

i

j

6. t ¼ tþ1; 7. W ðtÞ ¼ W ðt1Þ þ α△; 8. εðtÞ ¼ εðW ðt1Þ ; 9. if εðtÞ > εðt1Þ then α ¼ 1:01α else α ¼ 0:4α;   10. until εðtÞ  εðt1Þ  < η 11. W ¼ W ðtÞ ; returnW;

3.7. Driver inattention identification According to ’No free-lunch theorem’(Wolpert-1996) a particular classifier cannot be best for all different kind of datasets. In order to achieve a robust driver inattention identification we have compared the performance using Support Vector Machine (linear, quadratic and guassian), KNN and Ensemble bagged classifier. They are briefly discussed in this section. 3.7.1. Support vector Machine(SVM) SVM is a supervised learning model which simultaneously minimizes classification error and maximizes the geometric margin. SVM classifies datasets by finding the hyperplane that separates the data points of one class with another. In order to find the hyperplane, the original feature space xi are mapped to higher dimension space using a kernel function. Kernel function are as follows (Malzahn & Opper, 2005, p. P11001):  ! !T ! LinearSVM : K ! xi ; xj ¼ xi xj

(8)

2  !  T xj xi ! QuadraticSVM : K ! xi ; xj ¼ 1 þ !

(9)

! !2 2  ! GaussianSVM : K ! xi ; xj ¼ expð xi  xj Þ =2σ

(10)

3.7.2. K Nearest Neighbor(kNN) kNN is a supervised classifier model which is based on Euclidean distance between a test sample and the training samples. The Euclidean distance between sample xi with p features and xl is defined as: dðxi ; xl Þ ¼

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  2ffi ðxi1  xl1 Þ2 þ …: þ xip  xlp

(11)

For k-nearest neighbors, the predicted class label of test sample x is set equal to the most frequent true class among k nearest training samples (Peterson, 2009). In our study the k is set to 10 which is obtained by method discussed in (Anava & Levy, 2016). 3.7.3. Ensemble bagged classifier(EB) EB works on the principle of generating multiple data from the original data set. Increasing the size of the training set reduces the variance of the prediction by tuning the prediction to expected output. A learning set L consist of data fðyn ; xn Þ; n ¼ 1; …; Ng Y's are class label. Using the learning set a predictor ϕðX; LÞ is formed. A repeated bootstrap samples fLðBÞg from L is generated to form fϕðX;LðBÞÞg. The predictor for EB is voted to form ϕBðxÞ from fϕðX; LðBÞÞg (Breiman, 1996). In our study, the learner type is decision tree and numbers of learner considered for ’bootstrap aggregating’ is 30. The performance of the classifiers is evaluated by estimators of accuracy, Area under the curve(AUC) and prediction speed. In order to compute these estimators we measure the true positives(tp ), true negatives(tn ), false positives(fp ) and false negatives(fn ). True positive t

f

p p rate is given by TPR ¼ tp þf and False negative rate is given by FPR ¼ fp þt . n n

Accuracy ¼

tp þ tn tp þ tn þ fp þ fn

(12)

12

O. Dehzangi et al.

Smart Health 14 (2019) 100085

Z AUC ¼

∞ ∞

TPRðTÞð  FPR0 ðTÞÞdT

(13)

4. Results 4.1. Individual feature subsets analysis The statistical median across all subjects is considered to overcome the bias in accuracy measurement due to any individual subject. Individual feature subsets are evaluated using different classifiers to determine the feature sets providing better classification for driver inattentiveness. As shown in Fig. 8 CDA spectral feature subsets provides an average accuracy of 85.6% using Ensemble classifier(EB). But a better accuracy is obtained by only using CDA temporal feature subset which is 92.2% using EB as shown in Fig. 9. From Fig. 10 it can be seen that considering DDA feature subset provides the least average accuracy of 71%. When the classifiers are compared, EB provides the best accuracy for all the feature subsets. But the prediction time required for EB as shown in Fig. 13 is twice as compared to rest of the classifiers. So a trade-off between accuracy and prediction speed is to be considered (see Figs. 11 and 12). 4.2. Grouped feature subsets analysis When individual feature subsets where considered CDA temporal provided the best classification accuracy and DDA provided the least. The combined CDA temporal and DDA provides better average accuracy of 93% compared to only CDA temporal feature set. Combining all the feature an average accuracy of 92.8% is achieved which is marginally less compared to CDA temporal and DDA combined features. So for better evaluation the area under the ROC curve is computed to make comparison among the features. As indicated in Table 3 for subject 1 the classification accuracy using CDA temporal and DDA features are higher compared to classification accuracy with all 74 features but the AUC is higher when all the 74 features are evaluated. The table indicates that AUC is largest when Ensemble bagged classifier is used to classify driver inattentiveness using 74 feature vectors. 4.3. Evaluation using reduced feature set The Feature selection using Neighborhood component analysis (FSCNCA) was used to evaluate the quality of the features by computing the feature weight. As seen in Fig. 14 selecting more than 15 features for classification is redundant. The feature weight is positive for 15 features, for rest of the features the weight is reduced to 0. Hence the performance of the classifier is evaluated by reducing the feature between 5 and 15 by applying different feature selection techniques. As seen in Fig. 15. Random forest feature selection model provides higher average accuracy with only 10 features compared to average classification accuracy using 74 features for all the classifiers. An accuracy of 93.51% is achieved using 15 features selected by Random forest model along with Ensemble bagged classifier which is comparable to accuracy achieved with 74 features. Model complexity and Curse of dimensionality can be reduced with smaller feature space while still maintaining the performance of the classifier. 5. Conclusion In this study, we investigated the feasibility of using GSR to identify drivers inattentiveness under naturalistic driving condition. A

Fig. 14. Feature selection criteria. 13

O. Dehzangi et al.

Smart Health 14 (2019) 100085

Fig. 15. Accuracy Metric: Comparison across classifiers and feature reduction methods with respect to no. of features.

variety of features where extracted by employing deconvolution techniques and obtain the wavelet and spectro-temporal features. Feature selection models were adopted to search for optimal feature set. In our study we achieved the accuracy of  94% by generating a compact 15-D discriminative space, via exploring feature extraction and selection analysis of the resulting space. Our proposed driver distraction detection on edge provides evident results of GSR as reliable indicator for driver inattentiveness identification. References Administration, N. H. T. S., et al. (2013). Official us government website for distracted driving. Akay, M. (1997). Wavelet applications in medicine. IEEE Spectrum, 34, 50–56. Alexander, D. M., Trengove, C., Johnston, P., Cooper, T., August, J., & Gordon, E. (2005). Separating individual skin conductance responses in a short interstimulusinterval paradigm. Journal of Neuroscience Methods, 146, 116–123. Anava, O., & Levy, K. (2016). k*-nearest neighbors: From global to local. In Advances in neural information processing systems (pp. 4916–4924). Ayata, D., Yaslan, Y., & Kamas¸ak, M. (2016). Emotion recognition via random forest and galvanic skin response: Comparison of time based feature sets, window sizes and wavelet approaches. In Medical technologies national congress (TIPTEKNO), 2016 (pp. 1–4). IEEE. Benedek, M., & Kaernbach, C. (2010). A continuous measure of phasic electrodermal activity. Journal of Neuroscience Methods, 190, 80–91. Bol on-Canedo, V., S anchez-Maro~ no, N., & Alonso-Betanzos, A. (2013). A review of feature selection methods on synthetic data. Knowledge and Information Systems, 34, 483–519. https://doi.org/10.1007/s10115-012-0487-8.

14

O. Dehzangi et al.

Smart Health 14 (2019) 100085

Box, S. (2009). New data from Virginia tech transportation institute provides insight into cell phone use and driving distraction. Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140. Chen, L.-l., Zhao, Y., Ye, P.-f., Zhang, J., & Zou, J.-z. (2017). Detecting driving stress in physiological signals based on multimodal feature analysis and kernel classifiers. Expert Systems with Applications, 85, 279–291. Ciabattoni, L., Ferracuti, F., Longhi, S., Pepa, L., Romeo, L., & Verdini, F. (2017). Real-time mental stress detection based on smartwatch. In Consumer electronics (ICCE), 2017 IEEE international conference on (pp. 110–111). IEEE. Coifman, R. R., Meyer, Y., & Wickerhauser, V. (1992). Wavelet analysis and signal processing. In Wavelets and their applications. Citeseer. Concha, O. P., Xu, R. Y. D., & Piccardi, M. (2010). Robust dimensionality reduction for human action recognition. In Proceedings of the 2010 international Conference on digital image computing: Techniques and applications DICTA ’10 (pp. 349–356). Washington, DC, USA: IEEE Computer Society. https://doi.org/10.1109/ DICTA.2010.66. Dehzangi, O., Rajendra, V., & Taherisadr, M. (2018a). Wearable driver distraction identification on-the-road via continuous decomposition of galvanic skin responses. Sensors, 18, 503. Dehzangi, O., Sahu, V., Taherisadr, M., & Galster, S. (2018b). Multi-modal system to detect on-the-road driver distraction. In 2018 21st international conference on intelligent transportation systems (ITSC) (pp. 2191–2196). IEEE. Dehzangi, O., & Taherisadr, M. (2018). Driver distraction detection using mel cepstrum representation of galvanic skin responses and convolutional neural networks. In 2018 24th international conference on pattern recognition (ICPR) (pp. 1481–1486). IEEE. Dehzangi, O., & Taherisadr, M. (2019). Eeg based driver inattention identification via feature profiling and dimensionality reduction. In Advances in body area networks I (pp. 107–121). Springer. Donmez, B., Boyle, L., & Lee, J. D. (2003). Taxonomy of mitigation strategies for driver distraction. In Proceedings of the human factors and ergonomics society annual meeting (Vol. 47, pp. 1865–1869). Los Angeles, CA: Sage Publications Sage CA. Englehart, K. (1998). Signal representation for classification of the transient myoelectric signal. Kim, J. Y., Jeong, C. H., Jung, M. J., Park, J. H., & Jung, D. H. (2013). Highly reliable driving workload analysis using driver electroencephalogram (eeg) activities during driving. International Journal of Automotive Technology, 14, 965–970. Liang, Y., Reyes, M. L., & Lee, J. D. (2007). Real-time detection of driver cognitive distraction using support vector machines. IEEE Transactions on Intelligent Transportation Systems, 8, 340–350. Liu, M., Fan, D., Zhang, X., & Gong, X. (2016). Human emotion recognition based on galvanic skin response signal feature selection and svm. In Smart city and systems engineering (ICSCSE), international conference on (pp. 157–160). IEEE. Liu, T., Yang, Y., Huang, G.-B., & Lin, Z. (2015). Detection of drivers distraction using semi-supervised extreme learning machine. In , Vol. 2. Proceedings of ELM-2014 (pp. 379–387). Springer. Mallat, S. (2008). A wavelet tour of signal processing: The sparse way. Academic press. Malzahn, D., & Opper, M. (2005). A statistical physics approach for the analysis of machine learning algorithms on real data. Journal of Statistical Mechanics: Theory and Experiment, 2005, P11001. http://stacks.iop.org/1742-5468/2005/i¼11/a¼P11001. Metz, B., Sch€ omig, N., & Krüger, H.-P. (2011). Attention during visual secondary tasks in driving: Adaptation to the demands of the driving task. Transportation Research Part F: Traffic Psychology and Behaviour, 14, 369–380. Min, B.-C., Seo, S. H., Kim, J. K., Kim, H.-S., Choi, M.-H., Kim, H.-J., et al. (2013). 1g-35 changes of driving performance and skin conductance level of experienced taxi drivers due to distraction tasks. Japanese Journal of Ergonomics, 49, S556–S558. Nourbakhsh, N., Wang, Y., & Chen, F. (2013). Gsr and blink features for cognitive load classification. In IFIP conference on human-computer interaction (pp. 159–166). Springer. Organization, W. H. (2015). World report on ageing and health. World Health Organization. Peterson, L. E. (2009). K-nearest neighbor. Scholarpedia, 4, 1883. https://doi.org/10.4249/scholarpedia.1883.Revision#136646. Rajendra, V., & Dehzangi, O. (2017). Detection of distraction under naturalistic driving using galvanic skin responses. In Wearable and implantable body sensor networks (BSN), 2017 IEEE 14th international conference on (pp. 157–160). IEEE. ~ saga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, 23, 2507–2517. www.scopus.com. Cited By :1959. Saeys, Y., Inza, I., & LarraA Taherisadr, M., Asnani, P., Galster, S., & Dehzangi, O. (2018). Ecg-based driver inattention identification during naturalistic driving using mel-frequency cepstrum 2d transform and convolutional neural networks. Smart Health, 9, 50–61. Taherisadr, M., & Dehzangi, O. (2019). Eeg-based driver distraction detection via game-theoretic-based channel selection. In Advances in body area networks I (pp. 93–105). Springer. € N. T., & Bolat, B. (2013). Performance evaluation of feature selection algorithms on human activity classification. In 2013 IEEE INISTA (pp. 1–4). Tulum, G., ArtuAx, https://doi.org/10.1109/INISTA.2013.6577634. Victor, T. (2005). Keeping eye and mind on the road. Ph.D. thesis. Acta Universitatis Upsaliensis. Wang, W., Jiang, X., Xia, S., & Cao, Q. (2010). Incident tree model and incident tree analysis method for quantified risk assessment: An in-depth accident study in traffic operation. Safety Science, 48, 1248–1262. Wang, S., Zhang, Y., Wu, C., Darvas, F., & Chaovalitwongse, W. A. (2015). Online prediction of driver distraction based on brain activity patterns. IEEE Transactions on Intelligent Transportation Systems, 16, 136–150. Yang, W., Wang, K., & Zuo, W. (2012). Neighborhood component feature selection for high-dimensional data (Vol. 7, pp. 161–168).

15