Biomedical Signal Processing and Control 19 (2015) 102–114
Contents lists available at ScienceDirect
Biomedical Signal Processing and Control journal homepage: www.elsevier.com/locate/bspc
A permutation Lempel-Ziv complexity measure for EEG analysis Yang Bai, Zhenhu Liang, Xiaoli Li ∗ Institute of Electrical Engineering, Yanshan University, Qinhuangdao 066004, China
a r t i c l e
i n f o
Article history: Received 6 November 2014 Received in revised form 31 March 2015 Accepted 2 April 2015 Keywords: Electroencephalography Permutation Lempel-Ziv complexity Anesthesia Epileptic seizure
a b s t r a c t Objective: In this study we develop a new complexity measure of time series by combining ordinal patterns and Lempel-Ziv complexity (LZC) for quantifying the dynamical changes of EEG. Methods: A neural mass model (NMM) was used to simulate EEG data and test the performance of the permutation Lempel-Ziv complexity (PLZC) in tracking the dynamical changes of signals against different white noise levels. Then, the PLZC was applied to real EEG data to investigate whether it was able to detect the different states of anesthesia and epileptic seizures. The Z-score model, two-way ANOVA and t-test were used to estimate the significance of the results. Results: PLZC could successfully track the dynamical changes of EEG series generated by the NMM. Compared with the other four classical LZC based methods, the PLZC was most robust against white noise. In real data analysis, PLZC was effective in differentiating the different anesthesia states and sensitive in detecting epileptic seizures. Conclusions: PLZC is simple, robust and effective for quantifying the dynamical changes of EEG. Significance: We suggest that PLZC is a potential nonlinear method for characterizing the changes in EEG signal. © 2015 Elsevier Ltd. All rights reserved.
1. Introduction Detection of dynamical changes of nonlinear time series is an important issue in physical, medical, engineering and biology sciences [1]. In mental and neurological disorders, structural changes in the electroencephalogram (EEG) frequently indicate an underlying pathological process. Accurately capturing the transitions from a normal to an abnormal state can be helpful for the diagnosis and treatment of underlying disease. For example, a reliable dynamical measurement based on EEG in anesthesia can give an important reference guide in assessing the anesthesia state of patients [2–4]. During the last decades, a number of methods have been proposed to detect dynamical changes of physiological systems, such as the permutation entropy [5], the Lyapunov exponent [6], the Hurst exponent [7], the correlation dimension [8] and the LempelZiv complexity (LZC) [9,10]. Particularly, LZC proposed by Lempel and Ziv [10] and its derivatives have found numerous applications in characterizing the randomness of biological signals, especially in EEG analysis [11–14]. LZC evaluates the randomness of finite data-series by measuring the number of distinct sub-strings and
∗ Corresponding author. Tel.: +86 10 5880 2032; fax: +86 10 5880 2032. E-mail address:
[email protected] (X. Li). http://dx.doi.org/10.1016/j.bspc.2015.04.002 1746-8094/© 2015 Elsevier Ltd. All rights reserved.
their rate of occurrence along a given sequence. Generally LZC is easily affected by artefacts and should not be used in distinguishing deterministic chaos from noise [15–17]. Hence, traditional LZC algorithm should be improved. Recently, an ordinal time series analysis method was proposed by Bandt and Pompe [18] termed permutation entropy. The permutation entropy measures the irregularity of non-stationary time series. It considers the order relations between the values but not the values themselves. It has shown several advantages, such as simplicity, robustness and low complexity in computation [18,19]. Facilitated by these advantages, permutation entropy [5,18,20,21] and its derivatives such as permutation conditional mutual information [22] and permutation transfer entropy [23] have been successfully used in understanding chaotic systems especially in physiological systems. These motivate us to explore whether a permutation-based LZC method, namely the permutation LempelZiv complexity (PLZC) could better detect the dynamical changes in nonlinear time series like EEG. In this study, the PLZC algorithm was derived; then the performance of the PLZC was tested using EEG data generated by a neural mass model. Through application on real anesthesia and epilepsy EEG data, we found that the PLZC was more capable of extracting the dynamic characteristics in EEG signals. Several statistical methods, like the Z-score model, two-way ANOVA test and t-test, were used to verify the significance of the results.
Y. Bai et al. / Biomedical Signal Processing and Control 19 (2015) 102–114
2. Methods 2.1. Lempel-Ziv complexity Lempel–Ziv complexity analysis is based on a coarse-graining: to estimate LZC, the signal {s(n)}, must be transformed into a finite sequence whose elements are only a few symbols [24], typically a binary one [25]. Conventionally, the signal s(i) is converted into a binary x(i) by a coarse-graining process. The coarse-graining process determines how much information can be retained from the original signal. In this paper, LZC measures based on four commonly used coarse-graining methods (mean: LZCmean , median: LZCmedian , mid-point: LZCmid-p , k-means: LZCk-means ) are calculated and compared with PLZC. In coarse-graining approaches, the original signal is converted to a 0–1 sequence by comparing the signal with threshold(s) determined by the corresponding method (mean, median, mid-point, k-means). Among these methods, LZCmean , LZCmedian and LZCmid-p only need one threshold Td , while the LZCk-means needs two threshold values. LZCmean
(a)
For each window, the mean value of the window time series is selected as Td [26]. The coarse graining can be expressed as,
x (i) =
0,
if s (i) < Td
1,
(1)
otherwise
LZCmedian
(b)
Similar to the LZCmean , the median value of the window time series is chosen as Td . The algorithm then subtracts Td from each data point (s (i) − Td ), we define the positive difference as 1, and the others as 0 [24]. LZCmid-p
(c)
The mid-point value of the window length is determined by the smallest value smin and the largest value smax , Td =
smin + smax 2
(2)
Then the subsequent step is the exactly the same as explained in Eq. (1). LZCk-means
(d)
This method is based on the grouping of data around centroids corresponding to points around which most of the data is agglomerated. In the initial iteration of the method, we set the two initial centroids as z1 (1) = sm + εsm and z2 (1) = sm − εsm . Following [27,28], we set ε = 0.005, and sm is the mean value of the data points of the signal. Then the distance between the centroid to each data point can be obtained,
D1i = s (i) − z1 (1)2 D2i = s (i) − z2 (1)2
103
motifs [3]. The number of possible motifs is m !, where m is the number of data points in each motif. As shown in Fig. 1, when m is equal to 3, there are three points in each motif and total six possible motifs can be obtained (Fig. 1(A)). The parameter determines the sample points spanned by each section of the motif. Fig. 1(B) shows different motifs for different lags ( = 1 and 2) under the given order (m = 3). In this way, we transformed the EEG data points into a finite symbolic sequence with possible elements of 1–6 (the corresponding motif number). In order to ensure that all data points are mapped into one of the possible permutations, when values were equal, we ordered by the index of the data (i.e. the index acted as the tie-breaker for equal values). For example, if x(i) = x(i + 1), we would regard it as x(i) < x(i + 1) for permutation. The most important parameters for this permutation procedure are m and . As showed in Fig. 1(C), different parameters would generate different symbolic sequences. The selection of these two parameters will be discussed in Section 3.2. Then, we calculated the complexity of the permutation sequence based on Lempel-Ziv complexity. There are generally two definitions for the complexity parsing. In this study, we used the primary LZC measure [10], which is most widely used in applications with two or more symbols. As shown in Fig. 2, the detailed permutation Lempel-Ziv complexity measure is calculated as follows: Block 1: Transform the signals into a finite sequence {x(n)} with the permutation procedure above-mentioned. In this way, the signals will be replaced by no more than m ! kinds of symbols representing permutation patterns. Block 2: Initialize the PLZC measure. Let S and Q denote the first and the second symbol, respectively, of the {x(n)} and set the complexity factor c(n) = 1. Block 3: Merge S and Q into SQ. SQ v denotes the string derived from SQ after the operation of deleting the last character of SQ. For example, S = x(1), x(2),..., x(i), Q = x(i + 1),..., x(i + j − 1), x(i + j), then SQ = x(1), x(2) v,..., x(i), x(i + 1), . . ., x(i + j − 1), x(i + j) and SQ v = x(1), x(2),..., x(i), x(i + 1), . . ., x(i + j − 1). Block 4: Judge whether Q has reached the last symbol of the sequence. If reached, proceed with normalization (below). Block 5: List the subsequences of SQ v and collect all of them in a vocabulary denoted as SQ vsub . Then, if Q belongs to SQ vsub , go to Block 6; otherwise, Q is a new sequence and go to Block 7. Block 6: Update Q by adding the next symbol to it and go to Block 3. Block 7: Update S x as SQ and set Q equal to the next symbol of {x(n)}. Meanwhile, the complexity factor c(n) is increased by one. And then go to block 3. Block 8: Now c(n) is the complexity of the symbol sequence {x(n)} which denotes the number of distinct patterns in the source sequence. The total number of subsequences present in {x(n)} has an upper bound [29], denoted as L(n): L(n) = c(n)[logm! {c(n)} + 1]
(5)
(3) Then, the PLZC output is the normalized c(n), defined as:
Now, the signal can be converted into a binary sequence:
x (i) =
1ifD1i < D2i 0ifD1i
≥
D2i
(4)
2.2. Permutation Lempel-Ziv complexity In this study, we adopted a method using order patterns to generate symbolic sequences for use with LZC. The permutation scheme is illustrated in Fig. 1. The patterns are also referred to as
PLZC =
c(n)[logm! c(n) + 1] n
(6)
where n denotes the total length of the symbol sequence. When n is very large, PLZC can be simplified as
PLZC =
c(n){logm! n} n
(7)
104
Y. Bai et al. / Biomedical Signal Processing and Control 19 (2015) 102–114
Fig. 1. Illustration of the permutation process with m = 3. (A) The possible motifs for m = 3. (B) Transform of a signal into a symbol sequence using permutation.
3. Simulation and results 3.1. Neural mass model In order to test the performance of PLZC, a neural mass model, known as the nonlinear lumped-parameter cerebral cortex (LPCC) model [30,31] was used in this study. The model is governed by six differential equations, as follows:
⎧ y˙ 0 (t) = y3 (t) ⎪ ⎪ ⎪ ⎪ ⎪ y˙ 3 (t) = AaS(y1 − y2 ) − 2ay3 (t) − a2 y0 (t) ⎪ ⎪ ⎪ ⎪ ⎨ y˙ 1 (t) = y4 (t) ⎪ y˙ 4 (t) = Aa{p(t) + C2 S[C1 y0 (t)]} − 2ay4 (t) − a2 y1 (t) ⎪ ⎪ ⎪ ⎪ ⎪ y˙ 2 (t) = y5 (t) ⎪ ⎪ ⎪ ⎩ 2
loop, while C3 C4 mean the average number of synaptic contacts in the inhibitory feedback loop. The extrinsic input p(t) represents a Gaussian white noise with assigned mean value and variance (mean (p) = 75, std (p) = 25), and it describes the overall density of action potentials coming from other regions. In this model, the differential equations were solved numerically using a fourth-fifth order Runge–Kutta algorithm and all the simulations were set to zero for initialization. In order to avoid transient behavior, we discarded the first 1000 points of the simulated signals. 3.2. Parameter selection
(8)
y˙ 5 (t) = Bb{C4 S[C3 y0 (t)]} − 2by5 (t) − b y2 (t)
Following [31,32], we set all the parameters in this model based on physiology. The parameters A and B modulated the balance of excitation and inhibition of the neurons, respectively. By altering the excitatory and inhibitory parameters, the model can produce signals that strongly resemble intracranial EEG recording [31]. The parameters a and b are membrane average time constant and dendritic tree average time delays, respectively. Parameters C1 C2 mean the average number of synaptic contacts in the excitatory feedback
The lag refers to the number of sample points spanned by each section of the motif. It gives the resultant fraction characteristics of the motifs. In this study, we chose the using an autocorrelation function (ACF). ACF calculates the cross-correlation of a signal with itself. When the ACF has decayed to e−1 of its peak value, the corresponding lag was regarded as the optimal value [31,33]. We use the neural mass model in illustrating the selection procedure. The simulated EEG signals are shown in Fig. 3(A) with sampling rate of 100 Hz. As shown in Fig. 3(B), when > =9 the ACF will be less than e−1 . Therefore, in this signal analysis, = 9 was recommended as the optimal value for the PLZC calculation. In practical applications, the selection of the lag is associated with sampling frequency. We resampled the simulation signal at different rates, and then calculated the corresponding using the ACF e−1 rule.
Y. Bai et al. / Biomedical Signal Processing and Control 19 (2015) 102–114
105
Fig. 2. The flowchart of the PLZC calculation.
Fig.3(C) gives the relationship between and the sampling rate. We can see that with increasing sampling rate, the corresponding lag value increased. This linear relation between the sampling frequency and the lag was also suggested in [22]. Hence, we suggested that the ACF should be calculated to select an appropriate lag before PLZC calculation, especially with different sampling rates. The order m means the number of data points that are included in the motif. In the traditional permutation process, m was usually recommended to be 3–7 [18,22]. When m < 3, there would be too few possible patterns and the permutation would do not make sense. While for m > 7, the PLZC computation will be too complex and expensive, because the coarse-graining procedure of PLZC generates m ! possible values. If we select a larger m, more symbols may be obtained and the PLZC will spend more time in computation. Furthermore, to ensure that every possible ordinal motif occurs in the signals of length N, the condition m ! ≤ N − (m − 1) must hold. Moreover, N should satisfy the condition: N » m ! + (m − 1) as described in [22,31,34] to avoid under sampling. So, in order to be sensitive to the instantaneous characteristic changes of nonlinear dynamic systems and also economize computation time, a large value of m and N should not be selected. We recommend m = 4 for N ≥ 1000, or m = 5forN ≥ 2000, based on our experience, as the value is usually smaller than m !. In this study, we choose a low dimension m = 4 and N = 1000 when calculating PLZC.
3.3. Simulation results In this study, we used the neural mass model to imitate epileptic activity, following the parameter settings of [31]. The excitatory parameter A was increased from 3.25 to 3.8 during the time interval 30–130s. The simulated EEG signals were sampled at 200 Hz and the main parameter values (Table 1) of this model were the standard values introduced in [35]. In this way, the model produced signals that were similar to EEG during transition from normal to epileptic activity. We tested whether the variation of the PLZC and the classical LZC methods as a function of time or certain time-varying parameters could accurately indicate interesting dynamical changes in the EEG time series. We partitioned the long time series into (overlapping or non-overlapping) blocks of data of short length N, and computed PLZC for each data block. The simulated EEG and the corresponding results are depicted in Fig. 4. As the excitatory parameter increased, simulated epileptic activity emerged in the signal. The spectrogram shows the transformation from normal state to epilepsy state (see in Fig. 4(C)). All the methods can track the state variation in EEG data, though they output different complexity values in the same state (shown in Fig. 4(D)). All of the indices were normalized between 0 and 1. As the state transformed, PLZC had the most distinct change (near 0.4), compared to the others. Correlation coefficients were measured
106
Y. Bai et al. / Biomedical Signal Processing and Control 19 (2015) 102–114
Fig. 3. The parameter selection for lag using the autocorrelation function (ACF). When the ACF decays to e−1 of its peak value, the corresponding lag is selected as for permutation. (A) Simulated EEG signal. (B) Selection for the simulated signal. (C) The relationship between and sampling rate.
between these indices and the excitatory parameter A. Statistics based on 50 runs showed that the PLZC had higher correlation coefficient (r) than the other methods (see in Fig. 4(E)) which suggest that PLZC was more sensitive and could track the dynamic changes more accurately. 3.4. Robustness to noise In order to test the robustness of PLZC, we added independent white noise to the simulated EEG signals and investigated PLZC’s performance against noise compared with the other four traditional Lempel-Ziv complexity methods. Simulated EEG series represent normal (A = 3.25 in Table 1) and epileptic states (A = 3.8 in Table 1) were generated by the neural mass model and Gaussian white noise generator using Matlab 2013B. For each state, 30 simulated series of 3 × 104 data points were generated and then augmented with Table 1 The main parameters of the neural mass model followed [35]. Parameter
Interpretation
Value
A
Average excitation synaptic gain Average inhibitory synaptic gain Membrane average time constant and dendritic tree average time delays Average number of synaptic contacts in the excitatory feedback loop Average number of synaptic contacts in the inhibitory feedback loop
(3.25 − 3.8)mV
B a, b
C1 , C2
C3 , C4
22 mV a = 100s−1 b = 50s−1
increasing levels of white noise. The signal-to-noise ratio (SNR) was set from −20 dB to 20 dB with step size one dB. All results were based on statistics computed from the 30 repetitions. T-test was used to test whether the complexity values calculated in these two states were significantly different. For all tests, statistical significance was set at the 0.05 level. We used this significance level as the standard for evaluating if a method could distinguish the different states. Firstly, we calculated the PLZC on Gaussian white noise and compared the results with results calculated from the normal (nonepileptic) state signals with different amounts of augmented white noise. As shown in Fig. 5, the PLZC output for noise-contaminated data was significantly different from noise until the SNR decreased to 2 dB. In contrast, the best performance of the other three measures was 4 dB. This indicates that PLZC is more robust to Gaussian white noise. As shown in Fig. 6, with increasing noise, it became more difficult to clearly distinguish normal and epileptic states for all methods. However, PLZC showed satisfying performance in distinctly differentiating the two states until the SNR decreased to −8 dB, whereas the limits for the other measures were in the range of −5 to 3 dB. The results demonstrated that the proposed method based on permutation was more robust to noise and measured the dynamical characteristics of EEG signals at different states better than the other four LZC-based indices. 4. Application to real EEG recordings
C1 = 135 C2 = 0.8C1
C3 = 0.25C1 C4 = 0.25C1
4.1. Application to anesthesia EEG recordings 4.1.1. EEG data collection and preprocessing Real EEG data were recorded at the No.1 hospital of Qinhuangdao, China. EEG signals were recorded during anesthesia from 12 patients (seven female) aged 20–65 years who underwent hysterectomy or cholecystectomy operations. Subjects classified
Y. Bai et al. / Biomedical Signal Processing and Control 19 (2015) 102–114
107
Fig. 4. The NMM simulation and results with different LZC methods. (A) Simulated EEG signal by neural mass model, including normal state and epileptic activity. (B) The corresponding parameter A in the model, modulated from 3.25 to 3.8 to change the excitation in this model. (C) The spectrogram of the simulated EEG. (D) The time course of the indices from the five methods calculated from the simulated EEG. (E) Correlation coefficient between the indices and parameter A.
as American Society of Anesthesiologists (ASA) I-II were enrolled. Approval was obtained from the ethics committee of No.1 hospital of Qinhuangdao, and all the subjects provided written informed consent. All EEG recordings were collected using Bio-Acquisition Systems (Bio-AMP8, Kangpu Medical, Huzhou, Zhejiang, China). The frequency band of the amplifier ranged from 0.5 to 100 Hz. All impedances were checked regularly throughout the surgical
procedure and kept below 5 k. All data during surgeries were recorded and stored on a laptop for off-line analysis. EEG signals were recorded using five electrodes fixed to the scalp with paste at positions Fp1, F7, Fp2, F8 and Fpz. Electrodes were placed in accordance with the international 10–20 system. The ground electrode position was Fpz. We used a bipolar montage to collect two EEG signals from four EEG electrodes (Fp1-F7 and Fp2-F8).
108
Y. Bai et al. / Biomedical Signal Processing and Control 19 (2015) 102–114
Fig. 5. Performance of the five methods in distinguishing the “pure” Gaussian white noise from simulated EEG data with different levels of noise. Thirty series of 3 × 104 data points of Gaussian white noise and normal state EEG data generated by the neural mass model were used. (A–E) Results of each method. We use the (p ≤ 0.05) significance level on the t − test as the criterion that indicates whether a method can successfully distinguish the simulated EEG data from noise.
Fig. 6. Performance of the five methods in distinguishing simulated EEG data of different clinical states under different levels of noise. The normal and epileptic EEG data were generated by the neural mass model. We also used the t − test (p < 0.05) to assess whether the methods can differentiate the two states in a statistically significantly manner.
Y. Bai et al. / Biomedical Signal Processing and Control 19 (2015) 102–114
109
Fig. 7. An EEG recording from one patient and corresponding complexity versus time. Deep anesthesia state is indicated using vertical line. (A) An EEG recording under anesthesia. (B) The spectrogram of the EEG signal. (C) The corresponding PLZC and LZC indices. (D) The Z-score values of the PLZC.
4.1.2. PLZC analysis PLZC and the other four LZC indexes were calculated on EEG data from all 12 patients with a windowing technique. The length of the moving window was 10fs and the overlap was half of the window length, where the fs was the sampling rate. Fig. 7 shows an EEG recording from one patient, its spectrogram and corresponding five normalized indices. The spectrogram reflects the changes in frequency contents, and anesthesia state transitions, awake-deep anesthesia-recovery, is clearly seen. PLZC of deep anesthesia was significantly lower than that of awake state, and PLZC increased
when the patient recovered from anesthesia. The LZCk-means results also had this tendency, but its variation was smaller. It is difficult to separate different anesthesia states for LZCmedian , LZCmean and LZCmid-p . Next, we tested whether the PLZC of the real EEG was significantly different from the PLZC of linearly filtered noise. In this test, the null hypothesis H0 is that the original data are linearly filtered noise. To assess a significant PLZC difference from this null hypothesis, we used an empirical random shuffling method to generate surrogate signals to estimate the Z-score under H0 [36,37].
110
Y. Bai et al. / Biomedical Signal Processing and Control 19 (2015) 102–114
Fig. 8. (A–E) The statistical box plots of PLZC and LZC in awake, deep anesthesia (deep) and recovery states. The notation *** , ** and * indicate significant difference between the different states at P < 0.001, P < 0.01 and P < 0.05, respectively, through one-way ANOVA. (F) Plots of variance of mean value for the five methods with the three clinical states.
The surrogate data procedure destroys the regularity of the components of a signal, and this way the signal would be more random, similar to Gaussian white noise. The Z-scores of the original data were calculated as PLZCnorm =
| PLZC-mean(PLZCshuff )|
(9)
std(PLZCshuff )
Where PLZCshuff represents the PLZC index of Nsurro surrogate signals under the null hypothesis, mean(PLZCshuff ) and std(PLZCshuff ) are the mean and standard deviation of PLZCshuff . The PLZCnorm expresses the number of standard deviations PLZC is away from the mean PLZC of the surrogate data. The 95% statistical threshold may be defined as the mean plus 2 standard deviations. Note that this statistical threshold based on the normality assumption may be verified using the Kolmogorov–Smirnov goodness-of-fit test. Assuming that PLZC is approximately normally distributed in the surrogate data ensemble, the null hypothesis can be rejected at the p < 0.05 level when PLZCnorm > 1.96. In the present study, an ensemble of 100 surrogate data sets were constructed from each original epoch. After analyzing the anesthesia data, results showed that all the Z-score values ranged from 13.7 to 101.6 for the 12 subjects. This means that the original data was significant different from white noise, and the null hypothesis (the original data are linearly filtered Gaussian noise) could be rejected in all subjects. The indices calculated from these 12 patients are given as mean ± std for different states in Table 2. One-way ANOVA was used to test if the indices among different clinical states differed Table 2 The PLZC and LZC indices at different anesthesia states (mean ± std).
PLZC LZCmedian LZCmean LZCk-means LZCmid-p
Awake
Deep anesthesia
Recovery
0.833 ± 0.018 0.541 ± 0.120 0.761 ± 0.222 0.956 ± 0.090 0.489 ± 0.505
0.690 ± 0.010 0.538 ± 0.155 0.628 ± 0.197 0.922 ± 0.102 0.467 ± 0.424
0.786 ± 0.113 0.638 ± 0.210 0.745 ± 0.256 0.961 ± 0.226 0.540 ± 0.630
significantly for each measurement. The notation *** , ** and * indicate significant at P < 0.001, P < 0.01 and P < 0.05, respectively. As shown by Boxplots for all indices in Fig. 8, for both the measurements, the mean value of indices during the deep anesthesia state were generally lower than awake state and recovery state, especially for PLZC (the variance was significant, both p < 0.001). Two-way ANOVA test followed by Bonferroni post hoc test was performed for these indices with three anesthesia states in order to check whether the distributions over the states were significantly different. The Levene test was used in advance to check whether the variances were homogeneous. We found that the different LZC procedures had significant interaction with the anesthesia states (F(8165)=7.65, p < 0.001). The post hoc test also claimed significant interaction between pairwise anesthesia states and pairwise LZC procedures. Based on these tests, we plotted the mean values of indices during the three states for all the measurements together for comparison (shown in Fig. 8(F)). We can see that the variances of PLZC between the two paired states (awake-deep anesthesia and deep anesthesia-recovery) were more prominent. This was also indicated by the partial eta squared (2 ) value, where in awake-deep anesthesia states partial 2 was 0.71 for PLZC, 0.55 for LZCmedian , 0.09 for LZCmean , 0.11 for LZCk-means and 0.02 for LZCmid-p , while in deep anesthesia-recovery states, the partial 2 was 0.66 for PLZC, 0.70 for LZCmedian , 0.32 for LZCmean , 0.09 for LZCk-means and 0.21 for LZCmid-p . In addition, PLZC indices had less overlap between the two paired states (shown in Fig. 8(A–E)). This suggests that the PLZC is useful for detecting different anesthesia states. The results show that PLZC advances our ability to differentiate anesthesia states based on EEG. 4.2. Application to epileptic seizure recordings 4.2.1. EEG data collection and preprocessing The data consists of a group of EEG recordings obtained from seven patients (three female) with absence epilepsy [38]. These patients were aged 8–21 years. The EEG data were recorded by a Neurofile NT digital video EEG system from scalp surface electrodes (International 10-20 System) at 256 Hz sampling rate, using
Y. Bai et al. / Biomedical Signal Processing and Control 19 (2015) 102–114
111
Fig. 9. An EEG recording of a patient during epileptic seizure. Seizure period is indicated using vertical line. (B) The spectrogram of the EEG signal, clearly reflecting the epileptic activity by its frequency contents. (C) The PLZC and LZC indices calculated from the EEG signal. (D) The Z-score test result.
a 16-bit analog-to-digital converter, and filtered within a frequency band of 0.5–35 Hz. These EEG recordings were approved by the ethics committee of Peking University People’s Hospital and the patients provided written informed consent that their clinical data may be used and published for research purposes. In this study, signal recorded from electrode C3 was used, and absence seizure EEG signals were selected with major artefacts excluded. The start times of seizures were identified by an epilepsy neurologist.
clearly show the alterations in frequency components of the EEG between normal state and the epileptic state. As we can see, the PLZC values decreased distinctly during the epileptic seizure and then increased to higher levels as the patient recovered to normal state. The corresponding Z-scores showed that the null hypothesis (the original data are linearly filtered Gaussian white noise) could be rejected in all subjects, with Z-scores ranging from 2.2 to 228
4.2.2. PLZC analysis To investigate the usability of PLZC in EEG signal analysis, we tested how well PLZC could detect epileptic seizures. We selected seven EEG recordings from seven different patients. These EEG data were 1 min in length, which included normal state and absence epileptic seizure. Fig. 9 shows an example EEG signal, its spectrogram and the corresponding indices. The changes of EEG are most economically presented in the form of the spectrogram. The colors
Table 3 The PLZC and LZC indices at normal state and epileptic state (mean ± std).
PLZC LZCmedian LZCmean LZCk-means LZCmid-p
Normal
Epileptic
0.884 ± 0.010 0.348 ± 0.026 0.422 ± 0.360 0.960 ± 0.131 0.385 ± 0.277
0.518 ± 0.252 0.255 ± 0.438 0.351 ± 0.420 0.946 ± 0.100x 0.329 ± 0.333
112
Y. Bai et al. / Biomedical Signal Processing and Control 19 (2015) 102–114
Fig. 10. (A–E) The box plots of PLZC and LZC in the normal and epileptic states. The notation *** , ** and * indicate significant difference between the different states at P < 0.001, P < 0.01 and P < 0.05 levels, respectively, through t-test. (F) Plots of variance of mean value for the five methods with the two epileptic states.
for these seven recordings. These results suggest that the PLZC was capable of tracking the epileptic state. In order to compare the ability of these indices in detecting epilepsy seizures, the indices in all seven recordings were collected and t-test analysis was performed between normal and epileptic state for each method. The indices in normal state and epileptic state are given in Table 3 in the form of mean ± std, and the corresponding boxplots are shown in Fig. 10. We can see that, for all the measurements, mean values of indices during epileptic state were lower than normal state. The difference between the two states were significant for PLZC and LZCmedian (p < 0.001), and for LZCmean (p < 0.05). Nevertheless, PLZC has less overlap between the two states, which suggests that PLZC can more reliably distinguish normal and epileptic states. In order to compare the capacity of these five methods in detecting the epileptic state, we performed the same statistical analysis as used in Section 4.1.2. The interaction between method and seizure state were significant (F(460)=10.8, p < 0.01). The post hoc test showed that, taken in pairs, every pair of methods also had significant interaction between the two seizure states. The mean values of indices during the two states are shown in Fig. 10(F) for all methods. The partial eta squared (2 ) value was 0.96 for PLZC, 0.79 for LZCmedian , 0.17 for LZCmean , 0.03 for LZCk-means and 0.50 for LZCmid-p . The results show that PLZC indices have more significant variance between the two seizure states. The smaller overlap and significant variation indicate that PLZC has a strong capacity for identifying epileptic activity and describing the dynamical characteristics in epileptic EEG data. 5. Discussion The Lempel-Ziv complexity measure is a useful approach for estimating the randomness of finite symbolic sequences. Larger LZC implies a greater chance of occurrence of new sequence patterns and thus more complex dynamical behavior. As the LZC algorithm is a nonlinear dynamical measure indicating the rate of appearance of new patterns in a time series, only the activities affecting the patterns of underlying system are considered by LZC. It does not matter whether the system is deterministic
or stochastic [39]. EEG has been recognized to result from a dynamical system which is neither purely chaotic nor stochastic. Generally, the normal EEG signal behaves in a disorganized manner, but in some states such as anesthesia, epileptic seizure and sleep, the EEG signals tend to be rhythmic. This change is always reflected by the complexity. Even though the LZC algorithm describes the EEG in the time domain instead of the frequency domain, it has been shown to be sensitive to changes in both the power spectrum and the amplitude distribution in the time domain [39]. Therefore, it is reasonable to characterize the dynamical changes of EEG signals using the LZC algorithm. LZC based algorithms have also been widely used in EEG signals analysis recently, especially in characterizing the EEG signals of some special brain states and several mental and neurological disorders. For example, [24] calculated LZC to detect levels of consciousness during deep anesthesia. [40] demonstrated that LZC is a reliable measure for early detection of seizure onset. LZC has been used to characterize the EEG present in many disorders, such as depression [39], Parkinson’s disease [41], schizophrenia [15], and Alzheimer’s disease [11,42]. The coarse-graining process in LZC is crucial, as it determines how much inherent information can be retained and will consequently influence its performance. There are many different coarse-graining procedures which have been proposed in EEG signals analysis [28,43]. However, these methods almost all focus on threshold value selection, such as the four classical coarse-graining procedures (K-Means, Mean, Mid-point and Median) we chose in this study. From our point of view, these threshold based LZC procedures have a limitation when characterizing EEG signals. As the binary sequence is simply constructed using a threshold value (Td ), this would result in a loss of information from the original signal. This lost information could relate to meaningful aspects of brain activity. In other words, the present coarse-graining procedures cannot precisely reflect dynamical changes in EEG signals. In general, for EEG signals, slow neural rhythms have greater amplitude than faster rhythms. This relation between the amplitude and frequency causes a particular interaction in these classical LZC coarse-graining procedures: high amplitude (low frequency) signals will likely be
Y. Bai et al. / Biomedical Signal Processing and Control 19 (2015) 102–114
assigned to different symbolic values, but low amplitude (high frequency) signals will likely be assigned to the same symbolic value [44], as low amplitudes will frequently oscillate far from the threshold. Hence, the classical LZC procedures have low sensitivity in assessing low amplitude changes in EEG signals. In this paper, we proposed a novel algorithm, PLZC, which combines permutation with Lempel-Ziv complexity, and explored the possibility of using the PLZC to differentiate clinical states based on EEG data. Unlike other LZC based methods, PLZC transforms the EEG signal into a symbolic sequence by a permutation procedure. Then, the complexity of the series is estimated by measuring the rate of distinct substrings recurring along the symbolic sequence. The classical LZC procedures reflect the magnitude of the signal points, while the PLZC reflects the mutual relation of the signal points (described as motifs) and the variation of these motifs indicates the changes of the signal itself. Changes in both high frequency and low frequency will alter these motifs, and variations in both will be present in the permutation. This property is especially useful in characterizing brain state changes based on EEG signals because changes in brain activities are always accompanied with EEG frequency changes. Hence, in the analysis of data from anesthesia and epilepsy in this study, the ability to better detect frequency changes may have contributed to PLZC’s better performance. Permutation entropy is also a permutation procedure based measurement used in many studies for EEG analysis [45,46]. Compared with entropy, complexity is a nonparametric, simpleto-calculate measure and can give reliable results for short signal segments. It does not require any assumption about the stationary of a time series [47]. These features are important in most experimental and clinical studies. Furthermore, recent research suggested that the complexity algorithm is more sensitive to the presence of high-frequency components (19–47 Hz) compared to the entropy based algorithm in assessing the clinical depth of sedation with EEG [17]. This is an important feature for an anesthesia monitor, because the change of depth of anesthesia is generally accompanied by high frequency changes. We will focus on the detailed comparison between PE and PLZC in our future work. 6. Conclusion The results of this study showed that PLZC was more robust and outperformed four classical LZC measures in detecting anesthesia states and epileptic seizures. Our findings demonstrated the possibility of using PLZC to replace conventional LZC procedures in analyzing the dynamical behavior of the brain in some special clinical states. The PLZC method is a promising tool for a robust nonlinear EEG signal analysis. In this paper we have only focused on EEG analysis in seizure detection and assessment of the depth of anesthesia. In the future, we will attempt to investigate whether PLZC has potential application in detecting other physiological information based on EEG data, such as the measurement of clinical pain intensity and identification of neurological disorder states. We expect that PLZC can be applied to the detection of certain brain disorders and can serve as an important tool for understanding brain function. Interest disclosure None of the authors have potential conflicts of interest to be disclosed. Acknowledgment This research is supported by the National Natural Science Fund for Distinguished Young Scholars of China (no. 61025019),
113
the National Natural Science Foundations of China (no. 61304247, 61203210), and the Natural Science Foundation of Hebei Province of China (F2014203127). Many thanks to Zheng Li for reviewing this paper and checking the language.
References [1] Y. Cao, W.W. Tung, J.B. Gao, V.A. Protopopescu, L.M. Hively, Detecting dynamical changes in time series using the permutation entropy, Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 70 (2004) 046217. [2] A. Anier, T. Lipping, V. Jantti, P. Puumala, A. Huotari, Entropy of the EEG in transition to burst suppression in deep anesthesia: surrogate analysis, in: Engineering in Medicine and Biology Society (EMBC), Annual International Conference of the IEEE, Buenos Aires, 2010, pp. 2790–2793. [3] E. Olofsen, J.W. Sleigh, A. Dahan, Permutation entropy of the electroencephalogram: a measure of anaesthetic drug effect, Br. J. Anaesth. 101 (2008) 810–821. [4] A. Schultz, M. Siedenberg, U. Grouven, T. Kneif, B. Schultz, Comparison of Narcotrend Index, Bispectral Index, spectral and entropy parameters during induction of propofol–remifentanil anaesthesia, J. Clin. Monit. Comput. 22 (2008) 103–111. [5] X.L. Li, S.Y. Cui, L.J.a. Voss, Using permutation entropy to measure the electroencephalographic effects of sevoflurane, Anesthesiology 109 (2008) 448–456. [6] A. Babloyantz, A. Destexhe, Is the normal heart a periodic oscillator? Biol. Cybern. 58 (1988) 203–211. [7] I. Simonsen, A. Hansen, O.M. Nes, Determination of the Hurst exponent by use of wavelet transforms, Phys. Rev. E 58 (1998) 2779–2787. [8] P. Grassberger, I. Procaccia, Characterization of strange attractors, Phys. Rev. Lett. 50 (1983) 346. [9] N. Radhakrishnan, B. Gangadhar, Estimating regularity in epileptic seizure time-series data, Eng. Med. Biol. Mag. IEEE 17 (1998) 89–94. [10] A. Lempel, J. Ziv, On the complexity of finite sequences, Inform. Theory IEEE Trans. 22 (1976) 75–81. [11] D. Abasolo, R. Hornero, C. Gomez, M. Garcia, M. Lopez, Analysis of EEG background activity in Alzheimer’s disease patients with Lempel-Ziv complexity and central tendency measure, Med. Eng. Phys. 28 (2006) 315–322. [12] M. Aboy, R. Hornero, D. Abasolo, D. Alvarez, Interpretation of the Lempel-Ziv complexity measure in the context of biomedical signal analysis, IEEE Trans. Biomed. Eng. 53 (2006) 2282–2288. [13] L. Huang, P. Yu, F. Ju, J. Cheng, Prediction of response to incision using the mutual information of electroencephalograms during anaesthesia, Med. Eng. Phys. 25 (2003) 321–327. [14] X.S. Zhang, R.J. Roy, Derived fuzzy knowledge model for estimating the depth of anesthesia, IEEE Trans. Biomed. Eng. 48 (2001) 312–323. [15] M. Sabeti, S. Katebi, R. Boostani, Entropy and complexity measures for EEG signal classification of schizophrenic and control participants, Artif. Intell. Med. 47 (2009) 263–274. [16] Abásolo D, James C J and Hornero R, 2007, Non-linear analysis of intracranial electroencephalogram recordings with approximate entropy and Lempel-Ziv complexity for epileptic seizure detection. In: Engineering in Medicine and Biology Society, 2007. EMBS 2007. 29th Annual International Conference of the IEEE: IEEE) pp. 1953–6. [17] R. Ferenets, T. Lipping, A. Anier, V. Jantti, S. Melto, S. Hovilehto, Comparison of entropy and complexity measures for the assessment of depth of sedation, Biomed. Eng. IEEE Trans. 53 (2006) 1067–1077. [18] C. Bandt, B. Pompe, Permutation entropy: a natural complexity measure for time series, Phys. Rev. Lett. 88 (2002) 174102. [19] C. Bandt, Ordinal time series analysis, Ecol. Model. 182 (2005) 229–238. [20] A.A. Bruzzo, B. Gesierich, M. Santi, C.A. Tassinari, N. Birbaumer, G. Rubboli, Permutation entropy to detect vigilance changes and preictal states from scalp EEG in epileptic patients. A preliminary study, Neurol. Sci. 29 (2008) 3–9. [21] D. Li, X. Li, Z. Liang, L.J. Voss, J.W. Sleigh, Multiscale permutation entropy analysis of EEG recordings during sevoflurane anesthesia, J. Neural Eng. 7 (2010) 046010. [22] X.L. Li, G.X. Ouyang, Estimating coupling direction between neuronal populations with permutation conditional mutual information, Neuroimage 52 (2010) 497–507. [23] Z. Li, X. Li, Estimating temporal causal interaction between spike trains with permutation and transfer entropy, PloS One 8 (2013) e70894. [24] X.-S. Zhang, R.J. Roy, E.W. Jensen, EEG complexity as a measure of depth of anesthesia for patients, Biomed. Eng. IEEE Trans. 48 (2001) 1424–1433. [25] E. Estevez-Rams, R. Lora Serrano, B. Aragon Fernandez, I. Brito Reyes, On the non-randomness of maximum Lempel Ziv complexity sequences of finite size, Chaos 23 (2013) 023118. [26] H.X. Zhang, Y.S. Zhu, Z.M. Wang, Complexity measure and complexity rate information based detection of ventricular tachycardia and fibrillation, Med. Biol. Eng. Comput. 38 (2000) 553–557. [27] Y. Linde, A. Buzo, R.M. Gray, Algorithm for vector quantizer design, IEEE Trans. Commun. 28 (1980) 84–95. [28] Zhou S, Zhang Z and Gu J, 2011, Interpretation of coarse-graining of Lempel-Ziv complexity measure in ECG signal analysis Conference proceedings: Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference 2011, 2716–9.
114
Y. Bai et al. / Biomedical Signal Processing and Control 19 (2015) 102–114
[29] J. Hu, J. Gao, J.C. Príncipe, Analysis of biomedical signals by the Lempel-Ziv complexity: the effect of finite data size Biomedical Engineering, IEEE Trans. 53 (2006) 2606–2609. [30] F.L. Da Silva, A. Hoeks, H. Smits, L. Zetterberg, Model of brain rhythmic activity, Kybernetik 15 (1974) 27–37. [31] G. Ouyang, C. Dang, D.A. Richards, X. Li, Ordinal pattern based similarity analysis for EEG recordings, Clin. Neurophysiol.: Off. J. Int. Fed. Clin. Neurophysiol. 121 (2010) 694–703. [32] B.H. Jansen, V.G. Rit, Electroencephalogram and visual evoked potential generation in a mathematical model of coupled cortical columns, Biol. Cybern. 73 (1995) 357–366. [33] Shelhamer M., 2006, Nonlinear dynamics in physiology: World Scientific. [34] J.M. Amigó, M.B. Kennel, L. Kocarev, The permutation entropy rate equals the metric entropy rate for ergodic information sources and ergodic dynamical systems, Phys. D: Nonlinear Phenom. 210 (2005) 77–95. [35] F. Wendling, J.J. Bellanger, F. Bartolomei, P. Chauvel, Relevance of nonlinear lumped-parameter models in the analysis of depth-EEG epileptic signals, Biol. Cybern. 83 (2000) 367–378. [36] J. Theiler, S. Eubank, A. Longtin, B. Galdrikian, J.D. Farmer, Testing for nonlinearity in time-series—the method of surrogate, Data Phys. D 58 (1992) 77–94. [37] M. Palus, D. Hoyer, Detecting nonlinearity and phase synchronization with surrogate data, IEEE Eng. Med. Biol. Mag.: Q. Mag. Eng. Med. Biol. Soc. 17 (1998) 40–45. [38] G. Ouyang, J. Li, X. Liu, X. Li, Dynamic characteristics of absence EEG recordings with multiscale permutation entropy analysis, Epilepsy Res. 104 (2013) 246–252. [39] Y. Li, S. Tong, D. Liu, Y. Gai, X. Wang, J. Wang, Y. Qiu, Y. Zhu, Abnormal EEG complexity in patients with schizophrenia and depression, Clin. Neurophysiol.: Off. J. Int. Fed. Clin. Neurophysiol. 119 (2008) 1232–1241.
[40] C.C. Jouny, G.K. Bergey, Characterization of early partial seizure onset: frequency, complexity and entropy, Clin. Neurophysiol.: Off. J. Int. Fed. Clin. Neurophysiol. 123 (2012) 658–669. [41] C.C. Chen, Y.T. Hsu, H.L. Chan, S.M. Chiou, P.H. Tu, S.T. Lee, C.H. Tsai, C.S. Lu, P. Brown, Complexity of subthalamic 13–35 Hz oscillatory activity directly correlates with clinical impairment in patients with Parkinson’s disease, Exp. Neurol. 224 (2010) 234–240. [42] J. Dauwels, K. Srinivasan, M. Ramasubba Reddy, T. Musha, F.B. Vialatte, C. Latchoumane, J. Jeong, A. Cichocki, Slowing and loss of complexity in Alzheimer’s EEG: two sides of the same coin? Int. J. Alzheimer’s Dis. (2011) 539621. [43] R. Nagarajan, Quantifying physiological data with Lempel-Ziv complexity—certain issues, IEEE Trans. Biomed. Eng. 49 (2002) 1371–1373. [44] A.J. Ibanez-Molina, S. Iglesias-Parro, M.F. Soriano, J.I. Aznarte, Multiscale Lempel-Ziv complexity for EEG measures, Clin. Neurophysiol.: Off. J. Int. Fed. Clin. Neurophysiol. 126 (2015) 541–548. [45] N. Nicolaou, J. Georgiou, Detection of epileptic electroencephalogram based on permutation entropy and support vector machines, Expert Syst. Appl. 39 (2012) 202–209. [46] E. Ferlazzo, N. Mammone, V. Cianci, S. Gasparini, A. Gambardella, A. Labate, M.A. Latella, V. Sofia, M. Elia, F.C. Morabito, Permutation entropy of scalp EEG: A tool to investigate epilepsies: suggestions from absence epilepsies, Clin. Neurophysiol. 125 (2014) 13–20. [47] M. Talebinejad, A.D. Chan, A. Miri, A Lempel-Ziv complexity measure for muscle fatigue estimation, J. Electromyogr. Kinesiol.: Off. J. Int. Soc. Electrophysiol. Kinesiol. 21 (2011) 236–241.