Biomedical Signal Processing and Control 24 (2016) 1–10
Contents lists available at ScienceDirect
Biomedical Signal Processing and Control journal homepage: www.elsevier.com/locate/bspc
Computer-aided sleep staging using Complete Ensemble Empirical Mode Decomposition with Adaptive Noise and bootstrap aggregating Ahnaf Rashik Hassan ∗ , Mohammed Imamul Hassan Bhuiyan Department of Electrical and Electronic Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
a r t i c l e
i n f o
Article history: Received 3 May 2015 Received in revised form 21 August 2015 Accepted 1 September 2015 Keywords: EEG Sleep scoring CEEMDAN Bagging Ensemble methods
a b s t r a c t Computer-aided sleep staging based on single channel electroencephalogram (EEG) is a prerequisite for a feasible low-power wearable sleep monitoring system. It can also eliminate the burden of the clinicians during analyzing a high volume of data by making sleep scoring less onerous, time-consuming and error-prone. Most of the prior studies focus on multichannel EEG based methods which hinder the aforementioned goals. Among the limited number of single-channel based methods, only a few yield good performance in automatic sleep staging. In this article, a single-channel EEG based method for sleep staging using recently introduced Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) and Bootstrap Aggregating (Bagging) is proposed. At first, EEG signal segments are decomposed into intrinsic mode functions. Higher order statistical moments computed from these functions are used as features. Bagged decision trees are then employed to classify sleep stages. This is the first time that CEEMDAN is employed for automatic sleep staging. Experiments are carried out using the well-known Sleep-EDF database and the results show that the proposed method is superior as compared to the state-of-the-art methods in terms of accuracy. In addition, the proposed scheme gives high detection accuracy for sleep stages S1 and REM. © 2015 Elsevier Ltd. All rights reserved.
1. Introduction Sleep is a rapidly reversible state that is characterized by loss of consciousness and reduced responsiveness to external stimuli [1]. In humans, sleep related ailments deteriorate the quality of lives of the patients and are the second most causes of complaints requiring medical attention. [2]. The purpose of sleep staging is to classify sleep stages which is essential for sleep disorder diagnosis and sleep research. Traditionally, all night polysomnographic (PSG) recordings are visually scored by experts based on Rechtschaffen and Kales’s (R&K) recommendations [3] or a new guideline developed by the American Academy of Sleep Medicine (AASM) [4]. The major changes of AASM standards comprise EEG derivations, the merging of stages S3 and S4 into N3, the abolition of stage “movement time”, the simplification of many context rules as well as the recommendation of sampling rates and filter settings for PSG reporting and for user interfaces of computer-assisted sleep analysis [5]. According to R&K standard, there are six sleep stages, namely – Awake (AWA), Non-Rapid Eye Movement stages 1–4 (S1–S4) and Rapid Eye Movement (REM).
∗ Corresponding author. Tel.: +880 1927093150. E-mail address: nehal
[email protected] (A.R. Hassan). http://dx.doi.org/10.1016/j.bspc.2015.09.002 1746-8094/© 2015 Elsevier Ltd. All rights reserved.
PSG recording involves a minimum of 11 channels, including electroencephalogram (EEG), electromyogram (EMG), electrooculogram (EOG), oxygen saturation (SpO2), electrocardiogram (ECG), etc. Manual scoring requires sleep scorers to apply visual pattern recognition of a huge amount of signals. The classification of 8-h (whole night) recording requires approximately 2–4 h [6]. Also, the process’s onerous and subjective nature makes it error-prone. Currently, PSG examination requires the patient to undergo overnight sleep studies in a specially equipped sleep laboratory. As a result, the sleep quality of the subject can be reduced due to the unfamiliar environment. An automatic sleep staging scheme can assist the clinical staff and speed up the diagnosis of various sleep disorders. Again, automatic sleep scoring with reduced channel requirement is essential for a portable sleep quality evaluation device [7]. In this respect, the use of only EEG signals for computerized sleep scoring has been explored by various researchers for a feasible sleep quality evaluation device. However, most of these methods use multiple EEG channels which prohibit portability and wearability of the device [8]. An automatic sleep scoring algorithm using single channel EEG can ensure wearability and portability of sleep quality evaluation at home. Moreover, excessive wire connections for PSG cause sleep disturbance, whereas automatic sleep scoring based on single channel EEG reduces sleep disturbance caused by recording wires [8].
2
A.R. Hassan, M.I.H. Bhuiyan / Biomedical Signal Processing and Control 24 (2016) 1–10
Various single or multichannel based methods for automated sleep scoring have been reported in the literature [9]. Agarwal and Gotman [10] segmented five-channel PSG data in quasi-stationary components, computed features such as amplitude, dominant rhythm, spindles, frequency weighted energy, etc. and used kmeans clustering to classify sleep stages. Held et al. [11] presented a neuro-fuzzy classifier to classify sleep stages of healthy infants from four-channel PSG recordings. Karkovská et al. [12] extracted many features such as average amplitude, variance, spectral powers, coherence, fractal exponent, etc. from data collected from six EEG channels, two EOG channels and one EMG channel and classified using quadratic discriminant analysis. Liang et al. [8] used multiscale entropy and autoregressive model parameters as features and linear discriminant analysis as classifier for single-channel automatic sleep scoring. Ronzhina et al. [13] used the power spectrum density of single channel EEG in an artificial neural network for sleep staging. In [14], six energy features obtained from single channel EEG were fed into an Elman neural network classifier for sleep classification. Zhu et al. [15] generated difference visibility graph (VG) and horizontal VG from single channel EEG signal and extracted nine features from them to classify using support vector machine. Imtiaz [16] utilized spectral edge frequency, absolute and relative power of the signal for only REM detection from single channel EEG. Koch et al. [17] put forward a Latent Dirichlet Allocation topic model based method using four-channel multiple physiological signals (EEG and EOG) for sleep staging. In this article, an automatic method for sleep stage classification using single channel EEG is proposed. At first, EEG signal segments are decomposed into intrinsic mode functions using Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN). Higher order statistical moments such as mean, variance, skewness and kurtosis are then computed from the intrinsic mode functions. As compared to its predecessors such as Empirical Mode Decomposition (EMD) and Ensemble Empirical Mode Decomposition (EEMD), CEEMDAN gives an exact reconstruction of EEG signal without mode mixing and with a better spectral separation of the mode functions. Moreover, like EMD or EEMD it is data-driven and requires no prior basis function. Although EMD and EEMD have been used in various physiological signal analysis, CEEMDAN has been applied only recently for processing such signals [18]. Classification of sleep stages is performed using an ensemble learning based classifier, namely Bootstrap Aggregating (Bagging). It should be mentioned that to the best of our knowledge, classification of sleep stages in the CEEMDAN domain is yet to be reported in the literature. In addition, the application of Bagging in sleep stage classification is introduced in this work. The rest of the paper is organized as follows. Section 2 describes the experimental data that are used in this work. Section 3 gives a brief description of CEEMDAN along with its predecessors – EMD and EEMD. Description and statistical analyses of the features used are also provided. Bagging is described briefly in Section 4. Experimental results of this work are discussed in Section 5. Finally, some concluding remarks are given in Section 6.
2. Materials The recordings used for evaluation of the proposed scheme were obtained from Caucasian males and females (21–35 years old) without any medication. The data can be accessed in Physionet Data Bank’s [19] Sleep-EDF Database [20,21]. The first four recordings (marked as sc* ) were obtained in 1989 from ambulatory healthy volunteers during 24 hours in their normal daily life. The last four data recordings (marked as st* ) were obtained in 1994 from subjects who had mild difficulty falling asleep but were otherwise healthy. They contain horizontal EOG, Fpz–Cz and Pz–Oz EEG
Table 1 Description of train and test epochs.
Train Epochs Test Epochs
AWA
S1
S2
S3
S4
REM
Total
4027 4028
302 302
1810 1811
336 336
313 314
804 805
7592 7596
data, each sampled at 100 Hz. EEG signal from Pz–Oz channel yields better classification performance than that of the Fpz–Cz channel [8,13,15,22]. So in our study Pz–Oz channel EEG signal is used. Experts already scored each 30 s of EEG data and generated the hypnogram in accordance with the R&K recommendations [3]. So the interval of each epoch in this study is defined as 30 s or (30 × 100 =) 3000 data points. Each epoch was scored by expert scorers in one of the eight classes: AWA, S1, S2, S3, S4, REM, MVT (Movement Time) and ‘Unscored’. The entire data-set is divided into two halves – the odd numbered epochs are chosen as the training data and the rest of them are used as test data. Table 1 summarizes the number of epochs of different classes that are used in this work. 3. Methods 3.1. Empirical Mode Decomposition Empirical Mode Decomposition (EMD) [23] aims to generate highly localized time-frequency estimation of a signal in a datadriven fashion by decomposing it into a finite sum of intrinsic mode functions (IMF) or modes. Each mode must satisfy two conditions: 1 The number of extrema and the number of zero crossings must be the same or differ at most by one. 2 At any point the mean value of the envelope defined by the local maxima and the envelope defined by the local minima is zero. EMD iteratively decomposes an N-point EEG epoch X into amplitude and frequency modulated oscillatory IMFs using the following steps: 1 Set h1 = X. 2 Identify the local maxima and minima of h1 . 3 Obtain the envelope of local maxima vmax and that of local minima vmin using cubic spline interpolation. 4 Generate the local mean curve m by generating the upper and lower envelopes: m=
vmax + vmin 2
(1)
5 Compute h2 by subtracting the local mean curve from h1 : h2 = h1 − m
(2)
6 Repeat steps (2)–(5) until the difference between hk+1 and hk (SD(k)) defined as follows reaches a predefined value . SD(k) =
hk+1 − hk 2 < hk 2
(3)
where . is the Euclidian L2-norm. 7 Set c1 = hk as the first mode. 8 Find the residue, r1 = x − c1 . Steps (1)–(7) are known as sifting. 9 Substitute X in 1) with r. Repeat steps (1)–(7) to find the rest of the IMFs c2 , c3 , . . ., cL . Thus, the input signal can be decomposed into L IMFs until the residue becomes a monotonic function such that further extraction
A.R. Hassan, M.I.H. Bhuiyan / Biomedical Signal Processing and Control 24 (2016) 1–10
3
of an IMF is not possible. The input X can be reconstructed from all the IMFs as: X=
L
cj + rL
(4)
j=1
where rL is the residue of the Lth iteration. EMD and its variants such as Bivariative EMD (BEMD) [24] and Multivariate EMD (MEMD) [25] are widely used for EEG and other physiological signal analysis. However, EMD and many of its extensions suffer from mode mixing problem. Ensemble Empirical Mode Decomposition, a variation of EMD, resolves this problem. 3.2. Ensemble Empirical Mode Decomposition A noise-assisted adaptive data analysis based extension of EMD, called EEMD has been proposed in [26] with a view to eradicating the mode mixing problem. In EEMD, a low-level random noise is added to the input which is decomposed by EMD. The process of implementing EMD on the modified signal is referred as one trial. The trial is repeated I times to obtain the final mode. The steps of EEMD are as follows: 1 Calculate xi (n) = x(n) + wi (n) where wi (n) with i = 1, 2, . . ., I represents different realizations of white Gaussian noise. 2 Compute EMD of each of the signals xi (n) and find the corresponding modes IMF ik (n). Here, k = 1, 2, . . ., K denotes the modes. ¯ k (n) by averag3 Find the modes of EEMD method, denoted by IMF ing IMF ik (n) as: 1 IMF ik (n) I I
¯ k (n) = IMF
(5)
i=1
¯ k (n) does not necessarily obey the aforementioned condiIMF tions required to be an IMF. Fig. 1. I MF 3 and I MF 5 of CEEMDAN of various sleep stages. Note the variations of the IMFs in different sleep stages.
3.3. Complete Ensemble Empirical Mode Decomposition with Adaptive Noise Let us define an operator Ej (.) that produces the jth mode of EMD. If wi (n) with i = 1, 2, . . ., I are different realizations of white Gaussian noise with zero mean and unit variance and 0 is the standard deviation of the white Gaussian noise, the CEEMDAN algorithm can be described as follows [27].
6 Compute the kth residue for k = 2, 3, . . . , K as: k (n) rk (n) = rk−1 (n) − IMF
(9) (wi (n))
with i = 1, 2, . . . , I up 7 Decompose realizations r1 (n) + 1 E1 to their first EMD mode and define the (k + 1)th mode as: 1 IMF E1 (rk (n) + k Ek (wi (n))) k+1 (n) = I I
1 Calculate x(n) + 0 wi (n). 2 Decompose the above I signals by EMD to obtain their first modes. 1 of CEEMDAN as: 3 Compute the first mode IMF
1 (n) = 1 ¯ 1 (n) IMF IMF i1 (n) = IMF I
(10)
i=1
8 Go to step 6 for next k.
I
(6)
i=1
4 Find the first residue as: r1 (n) = x(n) − IMF 1 (n)
(7)
5 Decompose realizations r1 (n) + 1 E1 (wi (n)) with i = 1, 2, . . ., I up to their first EMD mode. k (k = 1 for this stage) is the standard 2 (n) can deviation of white Gaussian noise of the kth stage. IMF be found as:
2 (n) = 1 IMF E1 (r1 (n) + 1 E1 (wi (n))) I I
i=1
(8)
Steps 6 to 8 are repeated until the residue becomes a monotonic function such that further extraction of an IMF is not possible. Fig. 1 shows two such IMFs for various sleep stages. If k is the total number of modes and rk (n) is the final residue, the input x(n) can be reconstructed from all the IMFs as: x(n) =
K
k (n) + rk (n) IMF
(11)
k=1
In this work, the number of CEEMDAN I MFs used in our experiments is 7. In other words, first seven CEEMDAN I MFs obtained from each of the EEG epochs have been used for feature extraction. Various advantages of CEEMDAN have motivated its use in the proposed scheme instead of EMD or EEMD. EMD is data-driven but
4
A.R. Hassan, M.I.H. Bhuiyan / Biomedical Signal Processing and Control 24 (2016) 1–10
Table 2 1 − IMF 4 . Average values of the extracted features for IMF Feature
Sleep stage
Average 1 IMF
2 IMF
3 IMF
4 IMF
(V)
S1 S2 S3 S4 AWA REM
3.031153 1.226598 −.159141 .167599 −7.929811 .807605
.432753 −.006403 .006155 .004828 −.362181 .061388
1.520149 −0.010145 0.217871 −0.114864 −1.39015 0.907914
−.473342 −1.483662 −2.062482 −2.269088 6.237831 .352223
2 ((V)2 )
S1 S2 S3 S4 AWA REM
13.210036 2.925169 10.068412 5.033820 −13.74603 8.115511
−2.688957 2.334506 4.01949 4.821161 −.430228 0.691742
6.016233 −0.295120 −1.858850 8.901551 −.602067 1.732176
0.511286 3.834665 18.084503 −0.239571 1.457157 −1.391964
S1 S2 S3 S4 AWA REM
168812.9 483052.3 559806.4 134029.6 660923.6 349624.5
S1 S2 S3 S4 AWA REM
29242.59 10642.67 36225.88 109862.01 38769.39 6982.84
355821.5 1191133.9 1850634.1 605234.7 1362644.2 401974.1 8619.51 −.026594 −.04495 33775.67 .012557 .010642
it suffers from mode mixing problem which causes disparate oscillations in the same mode or similar oscillations in different modes. Despite effectively resolving the mode mixing problem, there are a number of caveats which makes EEMD unsuitable for practical implementation. Decomposing the same signal several times with EEMD can produce different number of modes [27]. If we reconstruct the signal after the application of EEMD, residual noise is obtained [27]. Moreover, EEMD is also computationally expensive. CEEMDAN on the other hand, provides an exact reconstruction of the original signal and a better spectral separation of the modes. The computational cost of CEEMDAN is also lower [27]. These factors motivated the use of CEEMDAN instead of EMD or EEMD in our proposed feature extraction scheme. It is worth mentioning that the traditional time-frequency transforms such as wavelet transform based methods’ performance is affected by the choice of the best basis function. The best basis function is application-specific and may vary from case to case. On the other hand, CEEMDAN is data-driven and does not require a prior basis function which makes it an attractive choice for processing highly nonlinear and non-stationary signals such as sleep-EEG signals. 3.4. Higher order statistical moment based features For an M-point I MF, Z = {z1 , z2 , z3 , . . ., zM } mean (), variance ( 2 ), Skewness () and Kurtosis () can be expressed mathematically as: 1 zi M M
=
(12)
i=1
1 (zi − )2 M M
2 =
(13)
i=1
1 zi − M M
=
i=1
460555.6 1446385.1 4250575.3 2926513.6 2219869.3 374199.6
3 (14)
1 zi − M M
=
5021.35 −.020329 .082336 33528.95 .026482 .012261
726370.3 1175381.2 3839735.8 11956597.9 3071760.7 407185.9 .010206 −0.000333 .00229 −0.045553 .023987 −0.002768
4 (15)
i=1
2 signifies the dispersion of values of the overall population with respect to the mean. On the other hand, is a measure of the degree of asymmetry of a distribution around its mean. For a unimodal distribution, if the data are spread out more to the left of the mean than to the right, is negative. Similarly, if the data are spread out more to the right of the mean than to the left, is positive. describes the peakedness of a data distribution. Higher values of evince a higher, sharper peak of the data distribution. The opposite scenario can be observed for lower values of . Fig. 1 demonstrates the dissimilarity of CEEMDAN I MFs among various sleep stages. This fact is confirmed from numerous visual inspections. The differences of a particular CEEMDAN I MF among various sleep stages manifest the effectiveness of CEEMDAN as we can see from Fig. 1. In this work we utilize higher order statistical features to capture this dissimilarity mathematically. Table 2 shows the average of , 2 , 1 to IMF 4 for various stages of sleep. It and computed from IMF can be seen that the average values are clearly distinguishable for different stages of sleep. The box-whisker plots of in Fig. 2 serve as a visual hypothesis test and also confirm that statistical features capture the differences of Fig. 1 mathematically. Since in general, the notches do not overlap in Fig. 2, the selected set of features possesses good discriminatory capability. Statistical hypothesis testing must be performed for the feature selection stage of any classification problem. This ensures that the chosen set of features has discriminatory capability among various classes. Hypothesis testing also helps us ascertain whether the discriminatory capability of the selected set of features is statistically significant or not. To assess whether the values of the features in the six classes differ significantly, we perform a nonparametric Kruskal–Wallis one-way analysis of variance test [28]. Unlike its parametric equivalent which is known as one-way analysis of variance (ANOVA), Kruskal–Wallis test does not require an assumption of normal distribution of the data samples. Apart from Kruskal–Wallis one-way analysis of variance, we also perform
A.R. Hassan, M.I.H. Bhuiyan / Biomedical Signal Processing and Control 24 (2016) 1–10
5
Fig. 2. Box-whisker plots of (a) , (b) 2 , (c) , and (d) of I MF 1 manifest superior discriminatory capability and efficacy of the selected features.
multiple comparison test. Both of the aforementioned tests are carried out in MATLAB’s Statistics Toolbox at 95% confidence level. Hence, a difference is statistically significant if p < ˛(= 0.05). All the features pass both of the the hypothesis tests, indicating that their differences among the six stages of sleep are statistically significant. 4. Bagging Ensemble learning is a process by which multiple classification models are strategically generated and combined to solve a particular machine learning problem. Bootstrap Aggregating, known as Bagging is one of the earliest ensemble-meta algorithms proposed by Breiman [29]. It is one of the most intuitive and simplest to implement, yet yields surprisingly good performance [30]. The whole process of Bagging is depicted as follows. Input: • Sequence of N training instances S = [(xi , yi )] of C classes with labels yi ∈ ω, ω = {ω1 , ω2 , . . ., ωC } and i = 1, 2, 3, . . ., N. • Weak learning algorithm WeakLearn. • Percent (or fraction) F to create bootstrapped training data. • Integer T specifying the number of iterations. for t = 1, 2, 3, . . ., T do 1. Take a bootstrapped replica St by randomly drawing F percent of S. 2. Invoke WeakLearn with St and receive the hypothesis (classifier) ht . 3. Add ht to the ensemble E. end Test: Given unlabeled instance x:
1. Evaluate the ensemble E = {h1 , h2 , . . ., hT } on x. 1, if ht picks class ωj 2. vt,j = 0, Otherwise 3. Obtain total vote received by each class:
Vj =
T
vt,j
t=1
where j = 1, 2, . . , C 4. Choose the class that receives the highest total vote as the final classification. In Bagging, diversity is obtained by using bootstrapped replicas of the training data. At first, training data subsets are randomly drawn with replacement from the entire training data. Each training data subset St is then used to train a different predictor of the same type and receive a hypothesis ht . This predictor is denoted by WeakLearn in the aforementioned algorithm. Individual hypotheses are then combined by taking a majority vote of their decisions. For any given unlabeled instance, the class chosen by most classifiers is the ensemble decision. In this study, we use decision tree as our weak learning algorithm. Decision trees sequentially split the feature space into unique regions corresponding to the classes. They are multistage decision systems wherein classes are sequentially rejected until we reach a finally accepted class. The ensemble of these weak decision-trees with low correlation, low bias but high variance, makes Bagging a strong classifier with both low bias and low variance. The performance of Bagging depends on the number of iterations T as the aforementioned steps of Bagging evinces. Here the number of iterations T is also the total number of decision trees (WeakLearn). The choice of the number of decision trees T is crucial. If it is too small, the classifier will yield poor performance. If it
6
A.R. Hassan, M.I.H. Bhuiyan / Biomedical Signal Processing and Control 24 (2016) 1–10
Table 3 Description of various classes. Class
Sleep stages
6 5 4 3 2
AWA, S1, S2, S3, S4, REM AWA, S1, S2, SWS (S3-S4), REM AWA, S1-S2, SWS (S3-S4), REM AWA, NREM (S1-S4), REM AWA, Sleep (REM &NREM)
is too large, the classifier can be computationally very expensive. We shall explicate how T is chosen in the next section.
5. Experimental results and discussions Experiments are performed using sleep-EDF database to study the efficacy of the proposed algorithm. The volume of training and test epochs of EEG for Bagging are set as in Table 1. Note that 2–6 class sleep stage classification problems are considered in our experiments. These classes are described in Table 3. The five cases explicated in Table 3 are clinically relevant and consistent with the experimental settings with the state-of-the-art studies [8,15,22]. S3 and S4 are combinedly called slow wave sleep due to the prominence of slow oscillations that are generated within the neocortex [31]. Moreover, the newly proposed AASM guideline merges S3 and S4 into N3. Consequently, the study of the combination of S3 and S4 and the study of 5-stage classification have particular significance. Again, states S1 and S2 together are called shallow sleep [32]. That is why 4-stage classification is considered in Table 3. The detection of REM and non-REM is essential for the diagnosis of certain sleep disorders including narcolepsy and REM behavior disorder (RBD) [33]. This fact motivated the analysis of 3-stage classification in this work. In a sleep quality monitoring system, on the other hand, the duration of overnight sleep has to be determined in order to ascertain the sleep quality of the subject. This is the reason why 2-stage classification is considered in Table 3. Besides, the choices of the classes in Table 3 facilitate comparison of performance of our method with the previously published ones. The experiments are done using MATLAB 2013a on a computer with Intel(R) Core(TM) i5-3470, 3.2 GHz CPU, 4 GB of RAM. The results of the experiments along with analyses and discussions are presented in this Section.
5.1. Evaluation criteria The objective measures used to evaluate the performance of the proposed method are accuracy, sensitivity and specificity. They can be expressed by the following formulae: Accuracy =
TP + TN TP + FP + TN + FN
(16)
Sensitivity =
TP TP + FN
(17)
Specificity =
TN TN + FP
(18)
where TP is the number of positive class epochs identified correctly, TN is the number of negative class epochs classified correctly, FP is the number of negative class epochs identified incorrectly as positive class and FN is the number of positive class epochs identified incorrectly as negative class. Since automated sleep staging is a multi-class classification problem, we have calculated class-specific sensitivity and class-specific specificity in this study [34–36].
Fig. 3. Out-of-bag classification error as a function of the number of weak learners (decision trees).
5.2. Ensemble quality evaluation One major advantage of Bagging is that one can evaluate the predictive power of the model using out-of-bag (oob) observations without using a set aside test dataset. Computing the classification performance of the model for oob observations is essential for many reasons. Firstly, it manifests the predictive power of the model. Secondly, we can find out the optimal number of decision trees needed to obtain the best accuracy. Thirdly, this approach is particularly conducive to determine the efficacy of the model when we do not have a lot of data. Fourthly, it has been proven empirically that oob estimate is as accurate as using a test set of the same size as the training set [29]. The oob error is an unbiased estimate of the true ensemble error and clearly eliminates the need of cross-validation or a separate test dataset. Fig. 3 depicts the oob classification error for 2 to 6-stage sleep classification. It is worth-mentioning that oob error is computed using the training data-set. It can be seen from Fig. 3 that the classification error is the lowest if the numbers of decision trees are 72, 82, 172, 119, 30 (indicated by arrows) for 6 stage to 2-stage sleep classification respectively. Beyond these values, the classification error changes are negligible as we can observe from Fig. 3. Hence, these values are used for classifying the test data. Fig. 4 shows that the proposed method yields good accuracy values for a wide range of values of number of iterations and minimum number of leaves of decision tree. This confirms the robustness of the proposed scheme. In other words, this shows that the proposed method does not work only for a few sporadic values of T but demonstrates high values of accuracy even for suboptimal choices of number of decision trees. The low classification error for various number of decision trees seen in Fig. 3 also corroborates with this and indicates the robustness of the proposed algorithm. 5.3. Performance for various classifiers The performance of the proposed method is first investigated for various classifiers including Bagging. The purpose of this investigation is to establish the suitability of Bagging as compared to the traditionally used classifiers. Table 4 reports the performance of some of these classifiers for our feature extraction scheme. Apart from Bagging, we used five classifiers- Naive Bayes, Discriminant
A.R. Hassan, M.I.H. Bhuiyan / Biomedical Signal Processing and Control 24 (2016) 1–10
7
Fig. 4. Variation of accuracy with number of iterations and minimum number of leaves of decision tree. The high levels of accuracy for a wide range of value of these two parameters confirm the robustness of the proposed scheme.
Analysis (DA), Neural Network, k-Nearest Neighbors (kNN) and Adaptive Boosting (Adaboost). For a detailed review of these classifiers, [37] can be consulted. All Experiments are carried out using Matlab’s Statistics Toolbox and Neural Network Toolbox. For neural network, a two-layer feed-forward neural network is used to classify various sleep stages. We employ hyperbolic tangent sigmoid transfer function in both the hidden and the output layer of neurons. The number of neurons is equal to the number of classes in the output layer whereas the number of neurons is set to 20 in the hidden layer. We utilize scaled conjugate gradient algorithm to train the network. As soon as the mean squared error (MSE) tends to increase during the validation process, the training process stops with a view to avoiding overfitting. The training, validation and testing data are set to 60%, 5% and 35% of the entire dataset respectively. For kNN, accuracy of each class is found for k = 1, 2, 3, . . ., 100, where k is the number of nearest neighbors and the best results are reported in Table 4. In this work, the highest accuracy value for kNN has been obtained for k = 1. Discriminant Analysis (DA) classification model is used to classify sleep stages for three popular discriminant functions- Linear or Fisher Discriminant, Quadratic Discriminant and Mahalanobis distance based function. The computational cost of DA, Bagging and Naive Bayes are very low as opposed to the rest of the classifiers. For linear DA, the model has Table 4 Performance of the proposed method for various classifiers. The highest accuracy values are indicated in boldface. Classifier
6-Class
5-Class
4-Class
3-Class
2-Class
Naive Bayes DA (linear) DA (quadratic) DA (Mahalanobis) Neural network kNN AdaBoost Bagging
42.59% 30.41% 83.49% 77.06% 78.66% 42.99% 86.89% 86.89%
74.57% 34.60% 87.42% 81.91% 81.28% 46.53% 89.52% 90.69%
58.45% 36.04% 76.02% 85.58% 85.77% 47.84% 91.48% 92.14%
67.76% 35.78% 73.92% 89.12% 87.67% 50.46% 93.18% 94.10%
59.50% 47.18% 67.89% 99.08% 97.76% 52.56% 98.29% 99.48%
the same covariance matrix for each class and only the means vary. On the contrary, for quadratic DA, both means and covariances of each class vary. This could be why quadratic DA gives higher accuracy than linear DA and gives good accuracy values for 6 and 5-stage classification. However, Mahalanobis discriminant function performs better than both of the aforementioned discriminant functions for 2 to 4-stage classification as Table 4 reveals. We can see from Table 4 that in general, the performance of the classifiers become worse as the number of classes are increased owing to class imbalance problem. A close inspection of Table 4 also manifests that ensemble learning based classifiers such as AdaBoost and Bagging give consistent and superior performance for 2 to 6-stage classification which suggests that ensemble learning based classification models can be a promising choice for automated sleep staging. Nevertheless, Bagging emerges as the best classifier outperforming other ones in terms of accuracy as we can see from Table 4. 5.4. Comparison with existing methods For comparison, results obtained from the same dataset are first considered. Table 5 provides accuracies of various methods that utilize Sleep-EDF dataset. All the accuracies reported in this section
Table 5 Performance comparison of various state-of-the-art methods that employ SleepEDF dataset. Only the highest accuracy values of each method are reported. The highest accuracy values are indicated in boldface. Authors
6-Class
5-Class
4-Class
3-Class
2-Class
Berthomier et al. [22] Hsu et al. [14] Linag et al. [8] Ronzhina et al. [13] Zhu et al. [15] Proposed method
– – – 76.70% 87.50% 86.89%
71.2% 87.20% 83.60% – 88.90% 90.69%
74.5% – – 81.42% 89.30% 92.14%
88.3% – – 88.97% 92.60% 94.10%
95.4% – – 96.90% 97.90% 99.48%
8
A.R. Hassan, M.I.H. Bhuiyan / Biomedical Signal Processing and Control 24 (2016) 1–10
Table 6 Confusion matrix for 5-stage sleep classification. Automatic classification
Expert
S1 S2 S3 and S4 AWA REM
S1
S2
S3 and S4
AWA
REM
142 3 4 42 17
55 1673 43 79 109
1 45 585 18 4
32 54 16 3838 24
72 36 2 51 651
Sen (%)
Spe (%)
47.02 92.38 90 95.28 80.87
99.03 94.80 98.93 96.03 97.48
higher than those of [8] and [15]. Again, the class-specific specificity values of the proposed scheme for all of the stages of sleep are high as Tables 6 and 7 suggest. The proposed method’s S4 detection accuracy is slightly low as Table 7 suggests. This is could be due to the similarity of the brain rhythms of S3 and S4 stages. The EEG recording of both S3 and S4 are dominated by the appearance of delta waves (0–4 Hz). This observation is corroborated by the fact that when S4 is combined with S3, the detection accuracy increases to 90% as we can see from Table 6. Although AASM published the new sleep scoring manual in 2007, R&K standards are still in use in most sleep laboratories around the world. This is why many recent works in the literature continue to report the performance of their sleep scoring algorithm based on R&K standards [15,40,41]. We have already mentioned that the major changes in the AASM manual from R&K standards are the merging of stages S3 and S4 into N3 and the simplification of many context rules as well as the recommendation of sampling rates and filter settings for PSG reporting and for user interfaces of computer-assisted sleep analysis. It is to be noted that we have already reported the classification results by merging stages S3 and S4 as 5-stage classification in this work. The 5-stage classification accuracy of the proposed scheme is relatively high (90.69%) as we can see from Table 5, which clearly indicates that the proposed scheme will yield good algorithmic performance for AASM scoring as well. Furthermore, while the definition of the specifications for data acquisition, such as minimal sampling rates, filter settings and display resolutions, is required, most of these recommendations have been standard in many sleep laboratories for years and thus should not affect study results based on R&K standards significantly [5]. As a result, it can be expected that the algorithmic performance of the automated sleep staging scheme propounded herein will be similar in both standards. It is well-known that human beings do not spend equal amount of time in all of the stages of sleep. Consequently, the sleep-EEG signals do not contain equal number of epochs for each of the six stages of sleep. In fact, the distribution of epochs among various stages of sleep are highly imbalanced. A close observation of Table 1 confirms this fact. It can be seen from Table 1 that more than half of the epochs belong to AWA stage. The epochs of S1, S3 and S4 are very small in number with respect to the rest of the sleep stages. This introduces class imbalance problem in sleep-EEG signal analysis. Class imbalance is a problem that is common to
and in Table 5 are the best values for a given method. It can be seen that the proposed method outperforms the others for 2- to 5stage classification. For 6-stage classification, it gives an accuracy which exceeds that of all the other works except for [15]. As we can see from Table 5, the difference of 6-stage classification accuracy between [15] and the proposed method is trivial. Results of several prior studies that did not use Sleep-EDF dataset are also reported for the sake of comparison. Karkovská and Mezeiová [12] reported 81% agreement with the hypnograms of experts for 5-stage classification. Fraiwan et al. [38] used Renyi’s entropy based features extracted from various time-frequency distributions for sleep stage identification from single channel EEG. Charbonnier et al. [39] utilized multiple physiological signals (EEG, EMG and EOG) to devise a multichannel-based two stage classification scheme. Fraiwan et al. [38] and Charbonnier et al. [39] reported 83% and 85.50% accuracy respectively for five-stage sleep classification using their own sleep data. Our algorithm yields better accuracy than all of the aforementioned works for 5-class classification. Berthomier et al. [22] propounded a fuzzy logic based iterative method to classify sleep stages from single channel EEG data. Besides Sleep-EDF dataset, Berthomier et al. [22] also used their own EEG recordings and provided 96%, 92.10%, 84.90%, 82.90% for 2- to 5-stage classification respectively. Thus, the accuracy values obtained from both datasets in [22] are lower than those of our method. From 2- to 5-stage sleep classification, the accuracies of the proposed scheme are by far the best accuracies reported using more than 10,000 epochs from the public sleep EEG dataset. 5.5. Discussion Here we discuss some of the advantages of the proposed method and some significance of the results. A major obstacle in automatic sleep staging is the classification of the sleep stages S1 and REM. In fact, they are almost indistinguishable by visual inspection. Although the works of [8] and [15] report high accuracy, they give a sensitivity of 28.45% and 15.8% respectively for S1 in 5-stage classification. On the other hand, as seen from the confusion matrix in Table 6, the proposed method gives a sensitivity value of 47.02% which is much higher than those of [8] and [15]. This observation is further confirmed from the confusion matrix in Table 7 that shows a similar sensitivity value for 6-stage sleep classification. In addition, the proposed method provides 95.28% and 92.38% sensitivity for AWA and S2 respectively in 5-stage classification which are Table 7 Confusion matrix for 6-stage sleep classification. Automatic classification
Expert
S1 S2 S3 S4 AWA REM
S1
S2
S3
S4
AWA
REM
143 1 6 37 49 12
51 1612 28 58 58 132
0 63 263 51 6 0
0 27 11 122 0 4
25 18 26 46 3829 26
83 90 2 0 86 631
Sen (%)
Spe (%)
47.35 89.01 78.27 38.85 95.06 78.39
98.40 93.85 98.14 99.36 95.16 95.81
A.R. Hassan, M.I.H. Bhuiyan / Biomedical Signal Processing and Control 24 (2016) 1–10
multifarious application domains. When instances of one class in a training data set vastly outnumber the instances of other classes, traditional data mining algorithms tend to favor classifying examples as belonging to the overrepresented (majority) class and ends up creating suboptimal classification models in the process [42]. Despite showing comparable or better performance than various state-of-the-art algorithms, the performance of the proposed automated sleep staging scheme is affected by class imbalance problem. How novel variants of Bagging and other ensemble learning algorithm can counteract class imbalance- can be an interesting topic of further research.
6. Conclusions In this article, an automatic method for sleep stage classification using CEEMDAN and higher-order moments has been proposed. Statistical analyses have been carried out to justify the use of higher-order moments. Bagging has been employed to classify sleep stages. The predictive power of Bagging and the choice of optimal number of decision trees have also been investigated. The performance of the proposed method has been studied using a number of classifiers including Bagging. The results show that Bagging is superior to other classifiers while using the same features in terms of accuracy. Next, the performance of the proposed method has been compared to those of the state-of-the-art algorithms for 2 to 6 sleep stage classification. It has been demonstrated that in general, our method provides better accuracy than those of the others. Additionally, it gives superior classification of the sleep stages S1 and REM, as compared to that of several recently reported results. We would also like to mention that it uses single channel EEG, whereas most of the works available in the literature employ multichannel EEG signals [10,11] and sometimes multiple physiological signals [10,17,39]. While the former complicates automatic sleep staging as discussed in Introduction, the latter suffers from several drawbacks. First, it complicates the preparation procedure for the subject [8]. Secondly, it requires more electrodes, leading to more inferences in the recording and thus degrading its quality. Because of being single-channel based, the proposed method is free from such limitations. Again, the fact that the proposed scheme uses only one channel and does not require any artifact or noise rejection scheme- is a major advantage of the proposed method. We thus conclude, as the experimental results and analyses unequivocally suggest that the proposed algorithm is efficacious and the performance of the algorithm makes this work an important step towards a fully automated sleep quality evaluation system.
References [1] A. Arzi, L. Shedlesky, M. Ben-Shaul, K. Nasser, A. Oksenberg, I.S. Hairston, N. Sobel, Humans can learn new information during sleep, Nat. Neurosci. 15 (10) (2000) 1460–1465. [2] M.W. Mahowald, C.H. Schenck, Insights from studying human sleep disorders, Nature 437 (7063) (2005) 1279–1285. [3] A. Rechtschaffen, A. Kales, Manual of standardized terminology, techniques and scoring systems for sleep stages of human subjects, U. G. P. Office, 1968, Washington DC Public Health Service. [4] C. Iber, S. Ancoli-Israel, A.L. Chesson, S.F. Quan, The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology and Technical Specification, American Academy of Sleep Medicine, Westchester, USA, 2005. [5] D. Moser, P. Anderer, G. Gruber, S. Parapatics, E. Loretz, M. Boeck, G. Kloesch, E. Heller, A. Schmidt, H. Danker-Hopfe, B. Saletu, J. Zeitlhofer, G. Dorffner, Sleep classification according to aasm and rechtschaffen & kales: effects on sleep scoring parameters, Sleep 32 (2) (2009) 139–149. [6] T. Penzel, R. Conradt, Computer based sleep recording and analysis, Sleep Med. Rev. 4 (2) (2000) 131–148. [7] C.-T. Lin, L.-W. Ko, J.-C. Chiou, J.-R. Duann, R.-S. Huang, S.-F. Liang, T.-W. Chiu, T.-P. Jung, Noninvasive neural prostheses using mobile and wireless EEG, Proc. IEEE 96 (7) (2008) 1167–1183.
9
[8] S.-F. Liang, C.-E. Kuo, Y.-H. Hu, Y.-H. Pan, Y.-H. Wang, Automatic stage scoring of single-channel sleep EEG by using multiscale entropy and autoregressive models, IEEE Trans. Instrum. Meas. 61 (6) (2012) 1649–1657. [9] S. Motamedi-Fakhr, M. Moshrefi-Torbati, M. Hill, C.M. Hill, P.R. White, Signal processing techniques applied to human sleep EEG signals a review, Biomed. Signal Process. Control 10 (2014) 21–33. [10] R. Agarwal, J. Gotman, Computer-assisted sleep staging, IEEE Trans. Biomed. Eng. 48 (12) (2001) 1412–1423. [11] C. Held, J. Heiss, P. Estevez, C. Perez, M. Garrido, C. Algarin, P. Peirano, Extracting fuzzy rules from polysomnographic recordings for infant sleep classification, IEEE Trans. Biomed. Eng. 53 (10) (2006) 1954–1962. [12] A. Krakovská, K. Mezeiová, Automatic sleep scoring: a search for an optimal combination of measures, Artif. Intell. Med. 53 (1) (2011) 25–33. [13] M. Ronzhina, O. Janouˇsek, J. Koláˇrová, M. Nováková, P. Honzík, I. Provazník, Sleep scoring using artificial neural networks, Sleep Med. Rev. 16 (3) (2012) 251–263. [14] Y.L. Hsu, Y.T. Yang, J.S. Wang, C.Y. Hsu, Automatic sleep stage recurrent neural classifier using energy features of EEG signals, Neurocomputing 104 (0) (2013) 105–114. [15] G. Zhu, Y. Li, P. Wen, Analysis and classification of sleep stages based on difference visibility graphs from a single channel EEG signal, IEEE J. Biomed. Health Inf. PP (99) (2014), 1–1. [16] S. Imtiaz, E. Rodriguez-Villegas, A low computational cost algorithm for REM sleep detection using single channel EEG, Ann. Biomed. Eng. 42 (11) (2014) 2344–2359. [17] H. Koch, J.A. Christensen, R. Frandsen, M. Zoetmulder, L. Arvastson, S.R. Christensen, P. Jennum, H.B. Sorensen, Automatic sleep classification using a data-driven topic model reveals latent sleep states, J. Neurosci. Methods 235 (0) (2014) 130–137. [18] A. Humeau-Heurtier, P. Abraham, G. Mahe, Analysis of laser speckle contrast images variability using a novel empirical mode decomposition: Comparison of results with laser doppler flowmetry signals variability, IEEE Trans. Med. Imaging 34 (2) (2015) 618–627, http://dx.doi.org/10.1109/TMI.2014.2364079. [19] A.L. Goldberger, L.A.N. Amaral, L. Glass, J.M. Hausdorff, P.C. Ivanov, R.G. Mark, J.E. Mietus, G.B. Moody, C.-K. Peng, H.E. Stanley, PhysioBank, PhysioNet. PhysioToolkit, Components of a new research resource for complex physiologic signals, Circulation 101 (23) (June 2000) e215–e220. [20] B. Kemp, A. Zwinderman, B. Tuk, H. Kamphuisen, J. Oberye, Analysis of a sleepdependent neuronal feedback loop: the slow-wave microcontinuity of the EEG, IEEE Trans. Biomed. Eng. 47 (9) (2000) 1185–1194. [21] B. Kemp, The sleep-edf database online (2014). URL http://www.physionet.org/ physiobank/database/sleep-edf/. [22] C. Berthomier, X. Drouot, M. Herman-Stoca, P. Berthomier, J. Prado, D. BokarThire, O. Benoit, J. Mattout, M.-P. d’Ortho, Automatic analysis of single-channel sleep EEG: validation in healthy individuals, Sleep 30 (11) (2007) 1587–1595. [23] N.E. Huang, Z. Shen, S.R. Long, M.C. Wu, H.H. Shih, Q. Zheng, N.-C. Yen, C.C. Tung, H.H. Liu, The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis, Proc. R. Soc. Lond. A: Math. Phys. Eng. Sci. 454 (1971) (1998) 903–995. [24] C. Park, D. Looney, P. Kidmose, M. Ungstrup, D. Mandic, Time-frequency analysis of EEG asymmetry using bivariate empirical mode decomposition, IEEE Trans. Neural Syst. Rehabil. Eng. 19 (4) (2011) 366–373. [25] G. Wang, C. Teng, K. Li, Z. Zhang, X. Yan, The removal of eog artifacts from EEG signals using independent component analysis and multivariate empirical mode decomposition, IEEE J. Biomed. Health Inform. PP (99) (2015), 1–1. [26] Z. Wu, N.E. Huang, Ensemble empirical mode decomposition: a noise-assisted data analysis method, Adv. Adapt. Data Anal. 01 (01) (2009) 1–41. [27] M. Torres, M. Colominas, G. Schlotthauer, P. Flandrin, A complete ensemble empirical mode decomposition with adaptive noise, in: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011, pp. 4144–4147. [28] W.H. Kruskal, W.A. Wallis, Use of ranks in one-criterion variance analysis, Journal of the American Statistical Association 47 (260) (1952) 583–621. [29] L. Breiman, Bagging predictors, Mach. Learn. 24 (2) (1996) 123–140. [30] R. Polikar, Ensemble based systems in decision making, IEEE Circuits Syst. Mag. 6 (3) (2006) 21–45. [31] L. Marshall, H. Helgadottir, M. Molle, J. Born, Boosting slow oscillations during sleep potentiates memory, Nature 444 (7119) (2006) 610–613. [32] H. tieng Wu, R. Talmon, Y.-L. Lo, Assess sleep stage by modern signal processing techniques, IEEE Trans. Biomed. Eng. 62 (4) (2015) 1159–1168. [33] A. Iranzo, J.L. Molinuevo, J. Santamara, M. Serradell, M.J. Mart, F. Valldeoriola, E. Tolosa, Rapid-eye-movement sleep behaviour disorder as an early marker for a neurodegenerative disorder: a descriptive study, Lancet Neurol. 5 (7) (2006) 572–577. [34] Siuly, Y. Li, A novel statistical algorithm for multiclass EEG signal classification, Eng. Appl. Artif. Intell. 34 (2014) 154–167. [35] E.D. beyli, Decision support systems for time-varying biomedical signals: EEG signals classification, Expert Syst. Appl. 36 (2, Part 1) (2009) 2275–2284. [36] N. Fatma Guler, E. Ubeyli, Multiclass support vector machines for EEG-signals classification, IEEE Trans. Inf. Technol. Biomed. 11 (2) (2007) 117–126. [37] K.P. Murphy, Machine Learning: A Probabilistic Perspective, The MIT Press, 2012. [38] L. Fraiwan, K. Lweesy, N. Khasawneh, H. Wenz, H. Dickhaus, Automated sleep stage identification system based on time-frequency analysis of a single EEG channel and random forest classifier, Comput. Methods Prog. Biomed. 108 (1) (2012) 10–19.
10
A.R. Hassan, M.I.H. Bhuiyan / Biomedical Signal Processing and Control 24 (2016) 1–10
[39] S. Charbonnier, L. Zoubek, S. Lesecq, F. Chapotot, Self-evaluated automatic classifier as a decision-support tool for sleep/wake staging, Comput. Biol. Med. 41 (6) (2011) 380–389. [40] T. Kayikcioglu, M. Maleki, K. Eroglu, Fast and accurate PLS-based classification of EEG sleep using single channel data, Expert Syst. Appl. 42 (21) (2015) 7825–7830.
[41] X. Long, J. Foussier, P. Fonseca, R. Haakma, R.M. Aarts, Analyzing respiratory effort amplitude for automated sleep stage classification, Biomed. Signal Process. Control 14 (2014) 197–205. [42] C. Seiffert, T. Khoshgoftaar, J. Van Hulse, A. Napolitano, RUSBoost: a hybrid approach to alleviating class imbalance, IEEE Trans. Syst. Man Cybern. Part A: Syst. Humans 40 (1) (2010) 185–197.