Accepted Manuscript Refined generalized multiscale entropy analysis for physiological signals Yunxiao Liu, Jing Wang, Youfang Lin, Pengjian Shang
PII: DOI: Reference:
S0378-4371(17)30780-X http://dx.doi.org/10.1016/j.physa.2017.08.047 PHYSA 18493
To appear in:
Physica A
Received date : 4 November 2016 Revised date : 14 June 2017 Please cite this article as: Y. Liu, J. Wang, Y. Lin, P. Shang, Refined generalized multiscale entropy analysis for physiological signals, Physica A (2017), http://dx.doi.org/10.1016/j.physa.2017.08.047 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
*Highlights (for review)
Highlights
RMSEσ2 provides accurate estimates of complexity and suitable for short time series RMSEσ2 gives higher separability between groups of healthy and pathological states We discuss the effect of outliers and data loss for the estimation of complexity
*Manuscript Click here to view linked References
Refined Generalized Multiscale Entropy Analysis for Physiological Signals Yunxiao Liu1 , Jing Wang1,∗ , Youfang Lin1 , Pengjian Shang2 1 Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, PR China 2 Department of Mathematics, School of Science, Beijing Jiaotong University, Beijing 100044, PR China
Abstract Multiscale entropy analysis has become a prevalent complexity measurement and been successfully applied in various fields. However, it only takes into account the information of mean values (first moment) in coarse-graining procedure. Then generalized multiscale entropy (MSEn ) considering higher moments to coarse-grain a time series was proposed and MSEσ2 has been implemented. However, the MSEσ2 sometimes may yield an imprecise estimation of entropy or undefined entropy, and reduce statistical reliability of sample entropy estimation as scale factor increases. For this purpose, we developed the refined model, RMSEσ2 , to improve MSEσ2 . Simulations on both white noise and 1/f noise show that RMSEσ2 provides higher entropy reliability and reduces the occurrence of undefined entropy, especially suitable for short time series. Besides, we discuss the effect on RMSEσ2 analysis from outliers, data loss and other concepts in signal processing. We apply the proposed model to evaluate the complexity of heartbeat interval time series derived from healthy young and elderly subjects, patients with congestive heart failure and patients with atrial fibrillation respectively, compared to several popular complexity metrics. The results demonstrate that RMSEσ2 measured complexity a) decreases with aging and diseases, and b) gives significant discrimination between different physiological/pathological states, which may facilitate clinical application. Keywords: multiscale entropy, complexity, heart rate, outlier, data loss
Preprint submitted to Journal of LATEX Templates
June 14, 2017
1. Introduction Many physiological signals, such as cardiac inter-beat series, human gait time series, long-term recorded from complex human systems regulated by underlying multiple control mechanisms, may carry important information of such complex systems. Thus, how to investigate or monitor the dynamical status of such complex systems from these one-dimensional time series has gained large consideration from researchers of various fields. Among which, the measurement of complexity of physiological signals has been widely studied and shown as a good indicator of healthy/pathological states [1, 2, 3, 4, 5, 6, 7]. Such 10
complexity-related measures have potentially crucial applications to distinguish time series generated either by different systems or by the same system under different conditions.
Traditional entropy-based metrics, such as Shannon entropy [8], approximate entropy [9], sample entropy [10], etc. have been proposed to quantify the regularity/irregularity of signals. They assign higher entropy values to more irregular/unpredictable signals and lower entropy values to more regular/predictable signals. However, there is no straightforward relationship between the complexity and the regularity/irregularity. When these entropy-based approaches are 20
applied to physiologic signals [7, 11], they may lead to some misleading results. For instance, they assign higher entropy values to heartbeat interval series from pathological subjects than that of healthier subjects. Intuitively, complexity is related with “meaningful structure richness” [11], which in biology, it is associated with the ability of living systems to adapt to a changing environment. Therefore, a good complexity measures assigns higher values to the output signal of healthier system with meaningful and rich structure, and lower values for both random dynamics or predictable systems.
To address above issues, Costa et al. introduced a new method, multiscale 30
entropy (MSE) analysis in 2002 [12], providing with a good complexity measure-
2
ment, which takes into account the multiple temporal scales of complex signals and computes the sample entropy across a range of different scales. Their results [12] show that the complexity of RR interval time series of healthy states is significantly higher than that of disease states, and the complexity of healthy young subjects is higher than that of the healthy elderly subjects, which all support a general theory that “complexity loss” with disease and aging.
The MSE method has become a prevailing method to quantify the complexity of time series. It has been applied successfully in different research fields 40
in the past decades, including biomedical time series [13, 14, 15, 16, 17], rainfall time series [18], time series of traffic flow [19, 20], time series of river flow [21], electroseismic time series [22], and so on. On the other hand, the MSE method has been improved by many scholars aiming at overcoming different limit conditions. For example, the composite multiscale entropy (CMSE) has been proposed in order to reduce the variance of estimated entropy values at large scales [23], the refined CMSE (RCMSE) [24] has been proposed in order to address undefined entropy, the modified MSE (MMSE) [25] has been proposed in order to overcome the imprecise estimation of entropy and undefined entropy values for short-term time series. Besides, multivariable multiscale entropy [26]
50
is introduced to analyze multivariable time series and has been applied in different fields [26, 27, 28].
When we recall the algorithm of MSE, it mainly consists of the coarsegraining procedure and the sample entropy calculation for each coarse-grained time series. However, in the coarse-graining procedure of MSE, it only considers the information of first moment (mean value) of time series, but ignores information of high moments (e.g. skewness, kurtosis) of times series. Recently, Costa et al. generalized the MSE method to a family of methods (MSEn ) by using different moments in coarse-graining procedure [29], where the subscript denotes 60
the moment used to coarse-grain a time series, and then implemented MSEσ2 which uses the second moment to coarse-grain the time series. Costa et al.’s 3
results have demonstrated that MSEσ2 can differentiate RR time series from healthy subjects and patients with congestive heart failure (CHF). We know that the length of coarse-grained series decreases as the scale factor increases, when applied to short time series, this may yield an imprecise estimation of sample entropy or induce undefined entropy. To overcome these obstacles, we proposed a refined MSEσ2 ( RMSEσ2 ) analysis in this paper, inspired by the works from Ref. [24], which considers all possible coarse-grained time series. In addition, we adjusted the parameter selection scheme [29]. 70
The rest of the paper is organized as follows. We first briefly review the SampEn, MSE, MSEσ2 algorithms, and describe the proposed RMSEσ2 in detail. In Section 3, we first validate the effectiveness of proposed method by using synthetic white and 1/f noise. Then we also study the effect of outliers and data loss for RMSEσ2 , and the effect of frequency, amplitude, noise power and randomness for RMSEσ2 . In Section 4, we employ the proposed method to the real-world datasets derived from different physiological and pathological conditions. Finally, we summarize and discuss our findings in Section 5.
2. Methods 80
In this section, we first briefly review the sample entropy (SampEn), MSE and MSEσ2 algorithms, and then introduce the proposed refined MSEσ2 (RMSEσ2 ) in this work. 2.1. Sample entropy Let {xi }N i=1 represent a time series of length N , and its sample entropy can be calculated as follows [10]. (1) Construct template vectors with dimension m by using following equation. xm i = {xi
xi+1
···
xi+m−1 },
4
1 ≤ i ≤ N − m.
(1)
(2) Calculate the distance between the two vectors by using the infinity norm, and then a match occurs when dij is smaller than a predefined tolerance r. m dij = ||xm i − xj ||∞ , 90
1 ≤ i, j ≤ N − m.
(2)
m (3) We call (xm i , xj ) an m-dimensional matched vector pair if dij is less than or
equal to a tolerance r. let nm represent the total number of m-dimensional matched vector pairs. (4) Repeat the steps (1) to (3) for m = m+1, and nm+1 is obtained to represent the total number of (m + 1)-dimensional matched vector pairs. (5) The sample entropy is defined as the logarithm of the ratio of nm+1 to nm , that is, SampEn(x, m, r) = −ln
nm+1 . nm
(3)
2.2. MSE The MSE algorithm is performed through the following two procedures. (1) Let {xi }N i=1 represent a time series of length N , we construct consecutive 100
coarse-grained time series by averaging a successively increasing number of data points in non-overlapping windows. For a given scale factor τ , the elements of the coarse-grained time series are calculated according to the following equation.
yjτ =
1 τ
jτ ∑
xi ,
i=(j−1)τ +1
1≤j≤
N . τ
(4)
The length of each coarse-grained time series is equal to the length of the original time series divided by the corresponding scale factor τ . For scale 1, the coarse-grained time series {y 1 } is the original time series. Fig. 1 illustrates that the coarse-grained procedures for scales 2 and 3. (2) Calculate the sample entropy for each coarse-grained time series, and plot as a function of the scale factor τ . 5
6FDOH x
x
x
x
x
x
x
ĂĂ
xi
xi
xi
ĂĂ
ĂĂ
y
y
y
ĂĂ
yj !
ĂĂ
xi
xi
xi "
6FDOH x
x
x
x
x
x
x
xi
xi
ĂĂ
ĂĂ
y
Fig. 1.
y
ĂĂ
yj !
xi
xi "xi #
!
The procedure of original time series is coarse-grained for scales 2 and 3, modified
from Ref. [13]. For scale 1, the coarse-grained time series is the original time series.
110
According to MSE curves, we can compare the relative complexity of the time series based on the following two guidelines: (1) If the MSE curve shows a monotonically decreasing trend, then it contains information only in the smallest scale; (2) If one MSE curve is higher than another curve for the majority of scales, the former is considered more complex than the latter. 2.3. MSEσ2 (1) For a given time series {xi }N i=1 , we divide the original time series into nonoverlapping segments of length τ , the elements of the coarse-grained time series are calculated as yjτ =
1 τ −1
jτ ∑
(xi − x ¯)2 ,
j(τ −1)+1
1≤j≤
N , τ
(5)
where x ¯ represents the mean values of sequence in each segment. 120
(2) Calculate the sample entropy for each coarse-grained time series {yjτ }, and plot as a function of the scale factor τ .
6
2.4. RMSEσ2 The proposed RMSEσ2 incorporates the following procedures. (1) As shown in Fig. 2, we can construct τ coarse-grained time series for each scale factor τ . For a given scale factor τ , the k th coarse-grained time series τ τ τ ykτ = {yk,1 , yk,2 , · · · , yk,p } is defined as τ yk,j =
1 τ −1
jτ∑ +k−1
j(τ −1)+k
(xi − x ¯ )2 ,
1≤j≤
N , 1 ≤ k ≤ τ, τ
(6)
where x ¯ represents the mean values of sequence xj(τ −1)+k , · · · , xjτ +k−1 . (2) At a scale factor τ , the matched vector pairs, nm+1 and nm k,τ is calculated k,τ for all τ coarse-grained series. 130
m+1 represent the mean of nm (3) Let n ¯ m+1 ¯m τ τ and n k,τ and nk,τ , respectively. The
RMSEσ2 value at a scale factor of τ is defined as the logarithm of the ratio of n ¯ m+1 to n ¯m τ τ , that is the RMSEσ 2 at a scale factor of τ is computed as follows. RM SEσ2 (x, τ, m, r) = −ln
n ¯ m+1 τ . n ¯m τ
(7)
At a scale factor τ , in the MSEσ2 , sample entropy is computed by only using first variance coarse-grained time series, while in the RMSEσ2 , the sample entropy of all τ coarse-grained series are calculated. For example, when τ = 3, (3)
the MSEσ2 only uses the coarse-grained series y1 , but the RMSEσ2 uses all (3)
(3)
coarse-grained series y1 , y2
(3)
and y3 . There has an advantage is that it can
reduce the probability of yielding undefined entropy and increases the accuracy 140
of entropy estimation. In RMSEσ2 method, we set the parameters m=2, r=15% of the first variance coarse-grained time series standard deviation according to idea of Ref. [29]. In the application section of this study, we make comparisons between the proposed RMSEσ2 and several existing methods, such as MSE, MSEσ2 , MSEσ and MFEσ . The last two methods are described in detail in Ref. [30]. MSE based on standard deviation (MSEσ ) is similar to MSE but uses the standard deviation to coarse-grain time series; MFE based on standard 7
6FDOH x
x
x
x
x
x
x
ĂĂ
xi
xi
xi
ĂĂ
ĂĂ
y#
!"
y#$#
!"
y#$!
!"
y!"#
i
!
y $ j !"# ! # ! xk "
ĂĂ
xi
k !i
x
x
x
x
x
x
x
ĂĂ
xi
xi # "
xi
xi
ĂĂ
ĂĂ
y!
!"
y!#$
!"
y!#!
!"
y
i
!
y
ĂĂ
!"
! " $j
# !x
!
k
"
xi
#
xi
"
k !i #
6FDOH x
x
x
x
x
x
x
ĂĂ
xi
xi
xi
xi
xi
ĂĂ
ĂĂ
y# x
!"
y#$# !"
x
y#$% !"
x
x
x
y$% j !"# !
ĂĂ
x
x
ĂĂ
$
i
# !x
k
"
xi
k !i
xi
xi
xi
xi
xi $ &xi # "
xi
ĂĂ
ĂĂ
y# x
x
!" y !" #$% x
y#$# !"
x
x
x
y$% j ! " !
ĂĂ x
ĂĂ
xi
xi
xi
x &x &x # i # ! xk " i # i $ i " $ k !i xi
xi
ĂĂ
ĂĂ
y!
Fig. 2.
!"
y!#$ !"
y!#$ !"
ĂĂ
y"% j !"# !
x &x ' i # ! xk " i $ i"" $ k !i &$
xi
Show that the coarse-grained procedures for scales 2 and 3 of RMSEσ2 method,
modified from Ref. [22].
8
#
2
Sample entropy
1.8 1.6 1.4 1.2 1 1/f noise white noise
0.8 0.6
5
10
15
20
25
30
35
40
Scale factor Fig. 3. The RMSEσ2 analysis of 100 simulated white and 1/f noise time series, the length of original time series is 4 × 104 . Parameters are m=2, r=15% of the first coarse-grained time series’ standard deviation. Symbols represent mean values of entropy for the 50 simulations and bars represent the corresponding standard deviation. deviation (MFEσ ) is another complexity metric calculating the fuzzy entropy for each standard deviation coarse-grained time series [30]. When calculating sample entropy, we all set the parameter m = 2 and r = 15% of the original 150
time series’ standard deviation for MSE, MSEσ and MFEσ methods; and set the parameter m = 2, r = 0.5% of the original series’ standard deviation for MSEσ2 followed Costa’s study [29].
3. Simulation In this section, synthetic white noise and 1/f noise are used to evaluate the effectiveness and statistical reliability of the RMSEσ2 . Subsequently we study the effect of outliers and data loss for RMSEσ2 results, respectively. At last, we evaluate the effect of several straightforward concepts in signal process9
1.8 2.4
Sample entropy
Sample entropy
1.6 2.2 2 1.8 1.6 1.4
RMSEσ 2 MSEσ 2
1.2 1 0.8 0.6
1.2
RMSEσ 2 MSEσ 2
0.4 5
10
15
20
25
30
35
40
5
Scale factor
10
15
20
25
30
35
40
Scale factor
(a) 1/f noise
Fig. 4.
1.4
(b) white noise
The RMSEσ2 and MSEσ2 analysis of 100 simulate (a) 1/f noise and (b) white
noise with 2 × 103 . Parameters are m = 2, r = 15% of the first variance coarse-grained time series’ standard deviation. Symbols represent mean values of entropy for the 100 time series and bars represent the standard deviation.
ing, including frequency, amplitude, noise power and randomness, for RMSEσ2 160
results. In order to evaluate the effectiveness of RMSEσ2 , we first test the RMSEσ2 on 100 independent white and 1/f noise signals with 4×104 data points. Results are presented in Fig. 3. It can be observed that a) the values of sample entropy of white noise is monotonically decreasing with the increase of scale factor when τ > 3, while the values of sample entropy of 1/f noise almost holds constant in all scales; b) The values of sample entropy 1/f noise is higher than white noise in all scales, which all indicate that the 1/f noise has complex structure across multiple scales, but white noise does not. These results are consistent with previous research [31, 32].
170
Furthermore, to investigate the improvements of the RMSEσ2 can increase the accuracy of entropy estimation and reduce the probability of inducing undefined entropy, we then applied the RMSEσ2 and MSEσ2 on simulated 1/f noise and white noise with 2 × 103 data points. Here, we set parameters m = 2 and r = 15% of the first variance coarse-grained time series’ standard deviation.
10
There were 100 independent noise samples used in each simulation, the corresponding results are illustrated in Fig. 4(a) and (b), respectively. For 1/f noise, the RMSEσ2 method provides a more accurate and reliable entropy estimation than the MSEσ2 . While for white noise, the mean of entropy values obtained by 180
using both methods are nearly equal, but the SD of the former is far less than that of the latter. 3.1. Effect of outliers Data obtained from many practical applications usually include a number of outliers, for example, cardiac beats not originating in the sinus node may be treated as outliers, which don’t have physiologic meaning, but may dramatically affect the calculation of entropy. Thus it is necessary to investigate the effect of outliers for the proposed method. A simple way of generating surrogate series with outliers is as follows. For a given percentage of outliers p, we randomly select the position where outliers will be added. At each selected position, the
190
value of selected point is replaced with x ˜k = xk + εk , called outlier, where xk represents the raw value of selected position, ϵk is set to values with n times of the standard deviation of original series. In this work, we use n = 1, 2, 3, 4, 5, respectively.
The experimental results are presented in the Fig. 6. We can find from Fig. 6(a) that outliers can greatly reduce the entropy of 1/f noise time series, especially when the amplitude of outliers is larger. In contrast, the effect of outliers is relatively small for white noise, in some case, it can be negligible. This is because the presence of outliers can lead the increase of standard de200
viation of time series, and then increase the tolerance r, which may lower the requirement of matching condition of two template vectors. As 1/f noise is a correlated signal, its variance is relatively small, but the presence of outliers will greatly increase the variance of 1/f noise, leading to that more template vectors are similar, so we get the smaller value of entropy. However, white noise is essentially a completely random signal, outliers may not affect it except the 11
1.8
2
raw data n=1 n=2 n=3 n=4 n=5
1.5
Sample entropy
Sample entropy
1.6
1 raw data n=1 n=2 n=3 n=4 n=5
0.5
1.2 1 0.8
0
0.6 5
10
15
20
25
30
35
40
5
Scale factor
10
15
20
25
30
35
40
Scale factor
(a) 1/f noise
Fig. 5.
1.4
(b) white noise
The effect of outliers for results of RMSEσ2 analysis of 100 simulate (a) 1/f noise,
and (b) white noise with 4 × 104 data points. Parameters are m = 2, r = 15% of the first variance coarse-grained time series’ standard deviation. Symbols represent mean values of entropy for the 100 time series and bars represent the standard deviation.
outlier is extremely large. Therefore, when we apply this method to real data, if the outliers is relatively large, we first should remove the outliers.
3.2. Effect of data loss 210
In real physiological signals, data can be missing or unavailable to a very large extent which, once recorded in the past, often cannot be generated again. Data loss of physiological signals can be caused by failure of the data collection equipments, as well as the removal of artifacts or noise-contaminated data segments. Thus, it is necessary to find out the effect of data loss for the RMSEσ2 results. For this end, we study the effect of data loss for RMSEσ2 results via synthetic white noise and 1/f noise.
The procedure of synthetic series with predefined percentages p of data loss are employing a segmentation approach [33] to generate surrogate signals by 220
randomly removing data segments from original series and stitching together the remaining parts. In detail, for time series {xi }N i=1 , we first construct a bi-
nary sequence {gi }N i=1 randomly, the number of g(i) = 0 represents the total number of missing data which equals to pN . Then the position i where g(i) = 0 12
2
1.8
1.8 1.7 1.6 0% 10% 30% 50% 70% 90%
1.5 1.4 1.3
1.4 1.2 1 0.8
1.2
0.6 5
10
15
20
25
30
35
40
5
Scale factor
10
15
20
25
30
35
40
Scale factor
(a) 1/f noise
Fig. 6.
0% 10% 30% 50% 70% 90%
1.6
Sample entropy
Sample entropy
1.9
(b) white noise
The effect of data loss for results of RMSEσ2 analysis of 100 simulate (a) 1/f noise,
and (b) white noise with 4 × 104 data points. Parameters are m = 2, r = 15% of the first variance coarse-grained time series’ standard deviation. Symbols represent mean values of entropy for the 100 time series and bars represent the standard deviation.
will correspond to the positions at which data points in x(i) are removed, while the position i where g(i) = 1 will correspond to the positions in x(i) where data points are preserved.
Fig. 6 shows that the effect of data loss on both white noise and 1/f noise is quite different. For 1/f noise, data loss has strong effect on results when 230
p > 50%, while for white noise, data loss has almost no effect on results for any p. This is because that the 1/f noise is a correlated signal and has complex structure across multiple temporal scales, data loss destroys the internal correlation of 1/f noise, while white noise is a uncorrelated signal, data loss only reduces the length of series and has no essential effect on results. The above analysis reminds us to take the RMSEσ2 results with data loss p ≥ 50% with caution.
3.3. Effect of frequency, amplitude, noise power and randomness We also use several synthetic signals which are described in Ref. [30] to 240
study the performance of RMSEσ2 . All signals have a sampling frequency of 1000 Hz and a time length of 100 s, therefore they have 105 data points. Firstly, 13
1
1.5
Ampulitude
Ampulitude
1 0.5
0
-0.5
0.5 0 -0.5 -1
-1
-1.5 0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
Time(s)
50
60
70
80
90
100
Time(s)
(a)
(b) 60
60
6 5.5 5
50
50
5
4 3.5 3
30
2.5 2
20
Temporal window
Temporal window
4.5
40
40
4
30
3
20
2
10
1
1.5 1
10
0.5
5
10
15
20
25
30
5
35
10
15
20
(c)
35
(d) Ampulitude
4
Ampulitude
30
2
6
2 0 -2
1
0
-1
-2
-4 0
10
20
30
40
50
60
70
80
90
0
100
10
20
30
40
50
60
70
80
90
100
Time(s)
Time(s)
(e)
(f)
60
60
1.4
1.8
1.2
1.6
1.4
1
40
50
0.8
30 0.6
20
Temporal window
50
Temporal window
25
Scale factor
Scale factor
40 1.2
30 1
20
0.8
0.4
10
0.6
10 0.2
0.4
5
Fig. 7.
10
15
20
25
30
35
5
10
15
20
Scale factor
Scale factor
(g)
(h)
25
30
35
The RMSEσ2 analysis of several synthetic signals. (a) A constant chirp signal, and
(b) the corresponding RMSEσ2 results. (c) A sinusoid modulated amplitude chirp signal, and (d) the corresponding RMSEσ2 results. (e) A noise affecting signal and (f) the corresponding RMSEσ2 results. (g) A stochastic signal progressively turns into periodic signal which was generated by a mix process and (h) the corresponding RMSEσ2 results. All signals has a sampling frequency of 1000 Hz and a time length 100 s. Parameters are m = 2, r = 15% of the first variance coarse-grained time series’ standard deviation.
14
we use two kinds of synthetic signals to clarify how the RMSEσ2 changes when the amplitude and frequency change. The first signal is a constant amplitude chirp signal whose frequency is swept logarithmically from 0.1 Hz to 30 Hz in 100 s (see Fig. 5(a)). The second signal is a chirp signal whose amplitude is modulated by a pure sinusoid (see Fig. 5(b)). To test whether RMSEσ2 is sensitive to frequency or amplitude, we apply RMSEσ2 to each of the two kinds of signals by using a moving window of 4×104 points with 3.9×104 overlap. The corresponding results are presented in Fig. 5(c) and 5(d), respectively. In Fig. 250
5(c), we observe that the sample entropy is low when the window occupied the beginning of the signal, where signal has low frequency. The sample entropy increase with moving of window, while signal has high frequency. Thus the sample entropy increases with the increase of frequency. Similar observation can be found in Fig. 5(d), but there has a little difference. We can found that there has a similar to the situation the Fig. 5(c) every 20 windows (divided by the red dotted line in Fig. 5(d)), this may be due to that the signal is modulated by a sinusoid. Secondly, in order to inspect how the proposed method changes with the level of noise affecting signal, we used an amplitude-modulated quasi-periodic
260
signal with additive white gaussian noise (WGN) of diverse. The signal was generated as an amplitude-modulated sum of two sine waves with frequencies at 0.5 Hz and 1 Hz (see Fig. 5(e)). The first 20 s doesn’t have any noise. After that, WGN was added to the signal, with the noise power increasing every 10 s. Fig. 5 (g) shows the corresponding RMSEσ2 results. In the first few windows, the signal has fewer noise, so the sample entropy is small and almost unchanged. With the increase of noise power, the signal is more and more similar to white noise, the RMSEσ2 curves present the monotonically decreasing trend, which is also can be observed in Fig. 4(b). Finally, in order to observe how the proposed method changes when a s-
270
tochastic sequence progressively turns into a periodic deterministic signal, we
15
generated a MIX process defined as follows, M IX = (1 − z)x + zy
(8)
where z denotes a random variable which equals 1 with probability p and equals 0 with probability 1 − p, x shows a periodic time series created by √ √ √ xk = 2sin(2πk/12) and y is a uniformly distributed variable on [− 3, 3]. The synthetic time series was based on a MIX process whose parameter varied between 0.01 and 0.99 linearly (see Fig. 5(f)), and RMSEσ2 results are illustrated in Fig. 5(h). With the moving the windows, a stochastic window series progressively turns into a periodic deterministic series, the sample entropy is gradually decreasing . Based on above analysis, we should take the RMSEσ2 280
results carefully when is applied to similar signals.
4. Application In this section, we firstly apply the RMSEσ2 to cardiac heartbeat (RR) interval time series come from healthy young and elderly subjects, patients with congestive heart failure and patients with atrial fibrillation. Then we compare our proposed method with several existing methods. 4.1. Datasets The data for the healthy group include 26 young subjects (13 men and 13 women, aged (mean ± SD) 35 ± 7.4 years, range 20 − 50 years), and 46 elderly subjects (22 men and 24 women, aged 65 ± 4.0 years, range 58 − 76 years) 290
respectively. ECG recordings were sampled at 128 Hz. The data for the congestive heart failure (CHF) group include 43 subjects (28 men and 15 women, aged 55 ± 11.3 years, range 22 − 78 years), 14 recordings were sampled at 250 Hz and 29 recordings were sampled at 128 Hz. The data for the atrial fibrillation (AF) group include 40 subjects, ECG recordings were sampled at 250 Hz. All subjects in the healthy group, CHF group and AF group have nearly 24 hours recordings. To obtain a reasonable sample entropy, when m = 2, Costa et al.[34]
16
suggested that the time series be longer than 750 data points. Thus, the length of data we used in experiment is 4 × 104 , the maximum scale factor is 40. All data used in this work are available at http://physionet.org and described in 300
Ref. [35].
Before employing RMSEσ2 analysis, we first filtered all signals to exclude artifacts, premature ventricular complexes, and missed beat detections followed the instruction described in Ref. [29]. In brief, we exclude the central point of a moving window of length L if it lies outside the range (1 − a) ∗ x ¯ to (1 + a) ∗ x ¯, where x ¯ represents the mean value of data points within that moving window excluding the central point, and a is a positive number ≤ 1. Here, we also used parameters L = 41 and a = 0.2. 4.2. Results 310
Fig. 8 shows that the results of applying RMSEσ2 to RR time series from four groups, including healthy young and elderly subjects, patients with congestive heart failure and patients with atrial fibrillation. It is observed that (1) in almost scales, the curves of healthy subjects is on the top, suggesting that the complexity of RR series under healthy states is higher than those of RR series under pathological states; (2) Compared with the healthy young, the complexity of healthy elderly is more lower, indicating that the complexity decreases with aging; (3) AF is a severe heart disease, the heartbeat interval time series of patients with atrial fibrillation is very irregular and is similar to the white noise, thus the curves of patients with atrial fibrillation presents the analogous
320
trend, that is monotonically decreasing with the increase of scale factor.
We also applied original MSE [12] and MSEσ2 [29] to analyze the same data, the results were shown in Fig. 9. We observe from Fig. 9(a) that the MSE method can be used to distinguish different RR time series, but have some shortages. First, the entropy values of atrial fibrillation is higher than healthy elderly in half scales; second, there is not significant difference between patients with 17
RMSEσ 2
Sample entropy
2.5
2
Healthy Young Healthy Elderly CHF AF
1.5
1
0.5 5
10
15
20
25
30
35
40
Scale factor Fig. 8.
The results of applying RMSEσ2 method to real-world RR time series obtained
from different pathologically conditions which are healthy, congestive heart failure and atrial fibrillation. where parameters are m = 2, r = 15% of the first variance coarse-grained time series’ standard deviation.
18
MSE
2.2
1.8
Healthy Young Healthy Elderly CHF AF
2.5
Sample entropy
Sample entropy
2
1.6 1.4 1.2 1
2 1.5 1 0.5
0.8 0.6
0 5
Fig. 9.
MSEσ 2
3 Healthy Young Healthy Elderly CHF AF
10
15
20
25
30
35
40
5
10
15
20
25
Scale factor
Scale factor
(a)
(b)
30
35
40
The results of applying (a) MSE [12] and (b) MSEσ2 [29] to real-world RR time series
obtained from different pathologically conditions which are healthy, congestive heart failure and atrial fibrillation, where parameters are m = 2, r = 15% of the first variance coarsegrained time series’ standard deviation in MSEσ2 and r = 15% of the standard deviation of original time series in MSE.
congestive heart failure and healthy elderly subjects. However, our method not only can distinguish the different types of RR under physiological and pathological conditions, but also can distinguish significantly the RR time series between 330
elderly healthy subjects and patients with congestive heart failure. However, the MSEσ2 [29] method exhibits different results, especially for the sample entropy curves of AF shown in Fig. 9(b). Although the values of sample entropy of AF subjects monotonously decrease as scale factors grow, they are far above that of healthy subjects across all scale factors (from 1 to 40 in this study). The length of RR recording we obtained in practice are usually limited, restricting the range of scale factors to be analyzed. Thus it is difficult to observe that the complexity of AF is lower than that of healthy ones in enough large scales.
We next compared the proposed method with MSE and MFE based on devi340
ation, denoted as (MSEσ ) and (MFEσ ) [30]. The results is illustrated in Fig. 10. We can find that the difference between the complexity curves for elderly healthy subjects and patients with congestive heart failure is negligible. Additionally, AF subjects have higher sample entropies than that of healthy groups across
19
MSEσ
1.6
1.2
Healthy Young Healthy Elderly CHF AF
0.6
Sample entropy
Sample entropy
1.4
1 0.8 0.6 0.4
0.5 0.4 0.3 0.2 0.1
0.2 0
0 5
Fig. 10.
MFE σ
0.7 Healthy Young Healthy Elderly CHF AF
10
15
20
25
30
35
40
5
10
15
20
25
Scale factor
Scale factor
(a)
(b)
30
35
40
The results of applying (a)MSEσ and (b)MFEσ [30] based on standard deviation to
real-world RR time series obtained from different pathologically conditions which are healthy, congestive heart failure and atrial fibrillation, where parameters are m = 2, r = 15% of the original time series’ standard deviation.
almost all scales, indicating AF subjects are more complex, which is deviated with our intuitive cognition of complexity. Based on above analysis, the proposed RMSEσ2 may be a good indicator for the diagnosis of healthy/pathological states, which also need to be validated by a large amount of case study.
5. Conclusion 350
In this study, we propose RMSEσ2 method, a complexity measurement, which adopts the second moment (variance) of time series to coarse-grain the original time series, and considers all coarse-grained time series. Meanwhile, we set the parameter r as the 15% of first variance coarse-grained time series’ standard deviation. In simulation section, we first use the synthetic white noise and 1/f noise to valid the effectiveness of proposed method. Results show that our proposed method can be used to measure the complexity of white and 1/f noise. And then we discuss the effect of outlier and data loss for it, results suggest that (1) too large outliers may have some influence for correlated signals, but almost no effect for uncorrelated signals. (2) Data loss has littler essential effect on
360
out method, except the percentage the data loss is large the 50%. Finally, we 20
discuss the effect of several concepts, such as frequency, amplitude, noise power, for our proposed method. Furthermore, we applied the method to cardiac interbeat interval time series of healthy young and elderly subjects, patients with congestive heart failure and patients with atrial fibrillation. Results shown that the underlying dynamics of physiological systems under healthy states is more complex, thus has relatively higher complexity. We also found the complexity of the young is more complex than that of the elderly under healthy states, which indicates the complexity is decreasing with aging. According to the new idea of Bartsch et al. [36], 370
human organism is an integrated network, complex physiological systems are nodes in network, which continuously interact, the failure of one system will trigger the breakdown of the entire network [36]. A healthy network has a strong correlation between the various systems, but with the increasing of age, the correlation may be weakened, this may be the main reason of complexity loss with aging. Lastly, we compared the proposed methods with several existing methods, it shown its advantages as follows. On one hand, it provides more accurate estimate of entropy, especially suitable for short-term time series. On the other hand, it provides the higher separability compared with other existing methods, which may facilitate the clinical diagnosis.
380
High moments of series except for variance, such as skewness, kurtosis and others may include important information of underlying dynamics of physiological system, they can further help us to understand underlying mechanism of physiological system. But it is worthy of attention that these statistics need adequate amount of data to estimate, it puts forward a new demand for data sets which we used. In future, we may make some related work by using these information of high moments.
Acknowledgement Financial supports by National Natural Science Foundation of China (61603029 and 61603028), China Postdoctoral Science Foundation (2015M580040 and 043206005),
21
390
and the Fundamental Research Funds for the Central Universities (K16JB00140) are gratefully acknowledge.
References [1] L. A. Lipsitz, Physiological complexity, aging, and the path to frailty, Science’s SAGE KE 2004 (16) (2004) pe16. [2] I. Rezek, S. J. Roberts, Stochastic complexity measures for physiological signal analysis, IEEE Transactions on Biomedical Engineering 45 (9) (1998) 1186–1191. [3] A. Eke, P. Herman, L. Kocsis, L. Kozak, Fractal characterization of complexity in temporal physiological signals, Physiological measurement 23 (1) 400
(2002) R1. [4] N. Wessel, A. Schumann, A. Schirdewan, A. Voss, J. Kurths, Entropy measures in heart rate variability data, in: International Symposium on Medical Data Analysis, Springer, 2000, pp. 78–87. [5] A. L. Goldberger, C.-K. Peng, L. A. Lipsitz, What is physiologic complexity and how does it change with aging and disease?, Neurobiology of aging 23 (1) (2002) 23–26. [6] A. Porta, S. Guzzetti, N. Montano, R. Furlan, M. Pagani, A. Malliani, S. Cerutti, Entropy, entropy rate, and pattern classification as tools to typify complexity in short heart period variability series, IEEE Transactions
410
on Biomedical Engineering 48 (11) (2001) 1282–1291. [7] S. M. Pincus, Assessing serial irregularity and its implications for health, Annals of the New York Academy of Sciences 954 (1) (2001) 245–267. [8] C. E. Shannon, The mathematical theory of communications, i and ii, Bell System Technical Journal 27.
22
[9] S. Pincus, Approximate entropy (apen) as a complexity measure, Chaos: An Interdisciplinary Journal of Nonlinear Science 5 (1) (1995) 110–117. [10] J. S. Richman, J. R. Moorman, Physiological time-series analysis using approximate entropy and sample entropy, American Journal of PhysiologyHeart and Circulatory Physiology 278 (6) (2000) H2039–H2049. 420
[11] P. Grassberger, Information and complexity measures in dynamical systems, in: Information dynamics, Springer, 1991, pp. 15–33. [12] M. Costa, A. L. Goldberger, C.-K. Peng, Multiscale entropy analysis of complex physiologic time series, Physical review letters 89 (6) (2002) 068102. [13] M. Costa, A. L. Goldberger, C.-K. Peng, Multiscale entropy analysis of biological signals, Physical review E 71 (2) (2005) 021906. [14] A. Humeau, G. Mah´e, F. Chapeau-Blondeau, D. Rousseau, P. Abraham, Multiscale analysis of microvascular blood flow: A multiscale entropy study of laser doppler flowmetry time series, IEEE transactions on biomedical
430
engineering 58 (10) (2011) 2970–2973. [15] A. Humeau-Heurtier, G. Mahe, S. Durand, P. Abraham, Multiscale entropy study of medical laser speckle contrast images, IEEE Transactions on Biomedical Engineering 60 (3) (2013) 872–879. [16] Y. Ma, P.-H. Tseng, A. Ahn, M.-S. Wu, Y.-L. Ho, M.-F. Chen, C.-K. Peng, Cardiac autonomic alteration and metabolic syndrome: an ambulatory ecgbased study in a general population, Scientific Reports 7. [17] Y. Ma, K. Zhou, J. Fan, S. Sun, Traditional chinese medicine: potential approaches from modern dynamical complexity theories, Frontiers of medicine 10 (1) (2016) 28–32.
440
[18] C.-M. Chou, Wavelet-based multi-scale entropy analysis of complex rainfall time series, Entropy 13 (1) (2011) 241–253. 23
[19] Y. Ruo-Yu, Z. Qing-Hua, Multi-scale entropy based traffic analysis and anomaly detection, in: Intelligent Systems Design and Applications, 2008. ISDA’08. Eighth International Conference on, Vol. 2, IEEE, 2008, pp. 151– 157. [20] J. Wang, P. Shang, J. Xia, W. Shi, Emd based refined composite multiscale entropy analysis of complex signals, Physica A: Statistical Mechanics and its Applications 421 (2015) 583–593. [21] Z. Li, Y.-K. Zhang, Multi-scale entropy analysis of mississippi river flow, 450
Stochastic Environmental Research and Risk Assessment 22 (4) (2008) 507– 512. [22] L. Guzman-Vargas, A. Ram´ırez-Rojas, F. Angulo-Brown, Multiscale entropy analysis of electroseismic time series, Natural Hazards and Earth System Sciences 8 (4) (2008) 855–860. [23] S.-D. Wu, C.-W. Wu, S.-G. Lin, C.-C. Wang, K.-Y. Lee, Time series analysis using composite multiscale entropy, Entropy 15 (3) (2013) 1069–1084. [24] S.-D. Wu, C.-W. Wu, S.-G. Lin, K.-Y. Lee, C.-K. Peng, Analysis of complex time series using refined composite multiscale entropy, Physics Letters A 378 (20) (2014) 1369–1374.
460
[25] S.-D. Wu, C.-W. Wu, K.-Y. Lee, S.-G. Lin, Modified multiscale entropy for short-term time series analysis, Physica A: Statistical Mechanics and its Applications 392 (23) (2013) 5865–5873. [26] M. U. Ahmed, D. P. Mandic, Multivariate multiscale entropy: A tool for complexity analysis of multichannel data, Physical Review E 84 (6) (2011) 061918. [27] M. U. Ahmed, D. P. Mandic, Multivariate multiscale entropy analysis, IEEE Signal Processing Letters 19 (2) (2012) 91–94.
24
[28] M. U. Ahmed, T. Chanwimalueang, S. Thayyil, D. P. Mandic, A multivariate multiscale fuzzy entropy algorithm with application to uterine emg 470
complexity analysis, Entropy 19 (1) (2016) 2. [29] M. D. Costa, A. L. Goldberger, Generalized multiscale entropy analysis: application to quantifying the complex volatility of human heartbeat time series, Entropy 17 (3) (2015) 1197–1203. [30] H. Azami, A. Fern´ andez, J. Escudero, Refined multiscale fuzzy entropy based on standard deviation for biomedical signal analysis, arXiv preprint arXiv:1602.02847. [31] Y.-C. Zhang, Complexity and 1/f noise. a phase space approach, Journal de Physique I 1 (7) (1991) 971–977. [32] H. C. Fogedby, On the phase space approach to complexity, Journal of
480
statistical physics 69 (1) (1992) 411–425. [33] Q. D. Ma, R. P. Bartsch, P. Bernaola-Galv´an, M. Yoneyama, P. C. Ivanov, Effect of extreme data loss on long-range correlated and anticorrelated signals quantified by detrended fluctuation analysis, Physical Review E 81 (3) (2010) 031101. [34] M. Costa, C.-K. Peng, A. L. Goldberger, J. M. Hausdorff, Multiscale entropy analysis of human gait dynamics, Physica A: Statistical Mechanics and its applications 330 (1) (2003) 53–60. [35] A. L. Goldberger, L. A. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, H. E. Stanley, Phys-
490
iobank, physiotoolkit, and physionet, Circulation 101 (23) (2000) e215– e220. [36] A. Bashan, R. P. Bartsch, J. W. Kantelhardt, S. Havlin, P. C. Ivanov, Network physiology reveals relations between network topology and physiological function, Nature communications 3 (2012) 702.
25