Medical Engineering & Physics 34 (2012) 1503–1509
Contents lists available at SciVerse ScienceDirect
Medical Engineering & Physics journal homepage: www.elsevier.com/locate/medengphy
A new approach to detect congestive heart failure using sequential spectrum of electrocardiogram signals Chandrakar Kamath ∗ Electronics and Communication Department, Manipal Institute of Technology, Manipal 576104, India
a r t i c l e
i n f o
Article history: Received 8 December 2011 Received in revised form 26 February 2012 Accepted 1 March 2012 Keywords: Approximate entropy Binary occupancy Congestive heart failure ECG classification Sequential spectrum Symbolic dynamics
a b s t r a c t The aim of this study is to evaluate the discriminative power of sequential spectrum analysis of the short-term electrocardiogram (ECG) time series in separating normal and congestive heart failure (CHF) subjects. The raw ECG time series is transformed into a series of discretized binary symbols and the distribution of mono-sequences (i.e., tuples containing only one type of symbol ‘0’ or ‘1’) is computed. The relative distribution of mono-sequences containing only one type of symbol constitutes binary occupancy for that symbol in the sequential spectrum. The quantified approximate entropies of the binary occupancies in the sequential spectra are found to have potential in discriminating normal and CHF subjects and thus can significantly add to the prognostic value of traditional cardiac analysis. The statistical analyses and the receiver operating characteristic curve (ROC) analysis confirm the robustness of this new approach, which exhibits an average accuracy, average sensitivity, average positive predictivity, and average specificity, all 100.0%. © 2012 IPEM. Published by Elsevier Ltd. All rights reserved.
1. Introduction Despite numerous recent advances in the field of medicine, congestive heart failure (CHF) has been difficult to manage with in clinical practice and mortality rate has remained high [1]. As a consequence the development of new methods and measures of mortality risk in CHF, including sudden cardiac death, is still a major challenge. Besides this, there is a need to reach remote and under served communities with life saving healthcare. A reliable automated classification system combined with high-speed communication can resolve this issue. This work is an attempt to develop such an automated system to discriminate between normal and congestive heart failure subjects. Physiological data more often show complex structures, which cannot be quantified or interpreted using linear methods. The classical nonlinear methods suffer from the disadvantage of dimensionality. Further, there are not enough samples in the time series to arrive at a reasonable estimate of the nonlinear measures. From this point of view it is advisable to resort to methods, which can quantify system dynamics even for short time series, like the symbolic dynamics. The prime advantages of symbolic dynamics are the following: if the fluctuations of the two data series are governed by different dynamics then the evolution of the symbolic sequences is not related. The resulting symbolic sequences histograms give a reconstruction of their respective histories and provide a visual
∗ Tel.: +91 0820 4294139. E-mail address:
[email protected]
representation of the dynamic patterns. In addition, they may be used as a basis to build statistics to compare the regions that show different dynamical properties and indicate which patterns are predominant. Thus methods of symbolic dynamics are useful approaches for classifying the underlying dynamics of a time series. Parameters of time domain and frequency domain often leave these dynamics out of consideration. Besides computational efficiency, symbolic methods are also robust when noise is present. However, it is always advisable to filter any noise in the data series prior to the application of the proposed symbolic method. The process of symbolization can be used to represent any possible variation over time, depending on the number of symbols and the sequence lengths used. This is a very powerful property because it does not make any assumptions about the nature of the signals/patterns (e.g., it works equally well for both linear and nonlinear phenomena). Symbolic time series analysis has found application for the past few decades in the field of complexity analysis, including astrophysics, geomagnetism, geophysics, classical mechanics, chemistry, medicine and biology, mechanical systems, fluid flow, plasma physics, robotics, communication, and linguistics [2]. To be specific, in medicine, various implementations of symbolic sequences have been used to characterize encephalography (EEG) signals to understand the interaction between brain structures during seizures [3]. Under mechanical systems, symbolic methods were applied to combustion data from internal combustion engines to study the onset of combustion instabilities [4] and in multiphase flow data-symbolization were found to be useful in characterizing and monitoring fluidized-bed measurement signals [5]. Symbolic dynamics, as an approach to investigate complex
1350-4533/$ – see front matter © 2012 IPEM. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.medengphy.2012.03.001
1504
C. Kamath / Medical Engineering & Physics 34 (2012) 1503–1509
systems, has found profound use in the analysis of heart rate variability signals [6–10]. However, there is hardly any literature where symbolic dynamics is applied for analysis of ECG signals. There are many ways symbolic dynamics can be used for analysis of time series and all of them require coding, i.e., converting the time series into symbolic series. The differences in symbolic methods are usually in their coding procedure or used complexity indices. In this contribution we employ sequential spectrum [11] as the symbolic method and approximate entropy [12–14] as a measure of complexity of the sequential spectrum to classify ECG signals obtained from standard Holter recordings from PhysioNet database [15] into normal and CHF subjects. The rationale behind the application of sequential spectrum and approximate entropy measures is that both are suitable for short-term segments of the ECG signal. Receiver operating characteristic (ROC) plots were used to evaluate the ability of these complexity measures to discriminate normal from CHF subjects. Both the approximate entropy measures (ApEn0 : approximate entropy of the binary occupancies of symbol ‘0’ and ApEn1 : approximate entropy of the binary occupancies of symbol ‘1’) yielded excellent results with an average accuracy, average sensitivity, average positive predictivity, and average specificity, all, 100.0%. The remainder of this paper is organized as follows: Section 2.1 explains the data used and the basic data pre-processing. Section 2.2 introduces the symbolic dynamics transformations and concepts of mono-sequences, binary occupancy and sequence spectrum of ECG signals. Approximate entropy measure is discussed in Section 2.2.3. In the next two sections approach to statistical analysis is presented. Experimental results are discussed in Section 3. Finally a conclusion part is presented.
2. Methods and materials 2.1. ECG records All the ECG records used are from the benchmark PhysioNet databases [15]. The work involves 18 ECG records from normal sinus rhythm (Fantasia database-fanatasia and MIT-BIH normal sinus rhythm (NSR) database-nsrdb) and ECG records of 15 subjects with severe CHF (NYHA class 3–4) from BIDMC congestive heart failure (CHF) database-chfdb. The Fantasia database includes 20 young (21–34 years) and 20 elderly (68–85 years) healthy subjects. Both young and elderly groups include equal number of men and women. The NSR database includes 5 men, aged 26–45 years, and 13 women, aged 20–50 years. The CHF database includes 11 men, aged 22–71 years, and 4 women, aged 54–63 years. For sake of comparison and validation, two groups Group-I and GroupII are constituted as explained below. A normal sinus rhythm database is formed with two sub-groups, with 10 records each: (1) Normal-1, with 8 records from Fantasia (f1o01–f1o02; f1o07; f1o09–f1o10; f2o04; f2o06; f2o08) and 2 records from NSR (16265 and 16273) databases; (2) Normal-2, with 8 records from Fantasia (f1o03–f1o06; f1o08; f2o01–f2o03) and 2 records from NSR (16420 and 16483) databases. Likewise, a CHF database is formed with two sub-groups, each with 7 ECG records: (1) CHF-1 (chf01–chf07) and (2) CHF-2 (chf08–chf10, chf12–chf15). Group-I includes Normal1 and CHF-1 sub-groups, while Group-II includes Normal-2 and CHF-2 sub-groups. Since factors like age differences and differing male-to-female ratios between compared groups will have an impact on the results, when statistical analyses are carried out, the above grouping is done to match gender and age between Group-I and Group-II. From each record the modified limb lead II is only considered for analysis. The resolution is 200 samples per mV. The sampling frequency of normal sinus rhythm signal from Fantasia is 250 Hz, while that of NSR is 128 Hz and that of CHF signal is 250 Hz.
Since the sampling frequency does influence upon the calculated indices it is necessary to have the same sampling frequency for all the records. For this reason ECG signals from NSR database are first re-sampled at 250 Hz. Then each record is divided into segments of equal time duration (14 s), with 3500 samples/segment in both normal sinus rhythm and CHF database. The reason for choosing 3500 samples/segment is explained in Section 3. A total of 6912 segments from normal sinus rhythm and CHF database are analyzed. All the records are normalized before analysis. Also the signals from Fantasia database are comparatively noisy and hence we use an 8-point moving average filter out this high-frequency noise. 2.2. Symbolic dynamics and sequential spectrum 2.2.1. Static transformation and dynamic transformation Symbolic dynamics, as an approach to investigate complex systems, facilitates the analysis of dynamic aspects of the signal of interest. The concept of symbolic dynamics is based on a coarse-graining of the dynamics [3]. That is the range of original observations or the range of some transform of the original observations such as the first difference between the consecutive values, is partitioned into a finite number of regions and each region is associated with a specific symbolic value so that each observation or the difference between successive values is uniquely mapped to a particular symbol depending on the region into which it falls. The former mapping is called static transformation and the latter dynamic transformation. Thus the original observations are transformed into a series of same length but the elements are only a few different symbols (letters from the same alphabet), the transformation being termed symbolization. A general rule of thumb is that the partitions must be such that the individual occurrence of each symbol is equiprobable with all other symbols or the measurement range covered by each region is equal. This is done to bring out ready differences between random and nonrandom symbol sequences. The transformations into symbols have to be chosen context dependent. For this reason, we use complexity measures on the basis of such context-dependent transformations, which have a close connection to physiological phenomena and are relatively easy to interpret. This way the study of dynamics simplifies to the description of symbol sequences. Some detailed information is lost in the process but the coarse and robust properties of the dynamic behavior are preserved and can be analyzed [3]. 2.2.2. Mono-sequences, binary occupancy and sequential spectrum of ECG signals In this study, we use dynamic transformation approach for the symbolic dynamics [4]. Such a differenced symbolization scheme is relatively insensitive to extreme noise spikes in the data. In this approach arithmetic differences between adjacent data points of the ECG signal define the symbolic values. We symbolize the positive difference as a ‘1’ and the negative difference as a ‘0’ as shown in the equation below.
Si =
1
if xi − xi−1 ≥ 0
0
if xi − xi−1 < 0
(1)
After symbolization the next step is to partition the symbol sequence into windows of width W symbols each. It is sliding window technique and if the shift is smaller than W, then the consecutive windows overlap. The next step is the construction of temporal patterns in each window, by the identification of short ordered sequences of only one type of symbol (0 or 1), termed tuples or mono-sequences, from the symbol series by gathering groups of symbols in the temporal order. The mono-sequence Nx0 (or Nx1), is a homogeneous sub-sequence containing N (N = 1,2,. . .,W), only one type of symbol S = 0 (or 1). The lengths of the mono-sequences
C. Kamath / Medical Engineering & Physics 34 (2012) 1503–1509
correspond to their respective frequencies. Next the tuples [NxS], i.e., mono-sequences of length N are counted in the jth window. Thus we evaluate the cardinality, Lj [NxS], for all possible values of N, and arrive at the distribution of cardinalities in the jth window. From the cardinalities we compute the respective binary occupancies, Oj [NxS], for the mono-sequences [NxS] in the window j as given by Eq. (2) below. Oj [NxS] = Lj [NxS]
N W
(N = 1, 2, . . . , W )
(2)
Each plot of binary occupancies, Oj [NxS] with (S = 0 or S = 1), vs the length of the mono-sequences, N, constitutes sequentialspectrum for the window j. Thus, the binary occupancies characterize the distribution of monotonic intervals of length N, in the analyzed time series: decreasing intervals for S = 0 and increasing intervals for S = 1. Though this distribution is referred to as spectrum, this is neither a transformation to frequency domain nor the inverse transform does exist. The shape, distribution and width of the sequential spectrum contain information about the analyzed time series.
2.2.3. Approximate entropy (ApEn) Approximate entropy (ApEn) is a measure of irregularity in the data without any a priori knowledge of the system generating them [12]. ApEn is scale invariant and model independent, evaluates both dominant and subordinant patterns in the data, and discriminates series for which clear feature recognition is difficult. It is immune to low level noise and robust to meaningful information with a reasonable number of data points. Large values of ApEn indicate more complexity or irregularity in the data and vice versa. In the following a short description of the formal implementation of the ApEn is given, for further details see [13]. Given a time series with M data points, x(1),x(2),. . .,x(M). To compute ApEn m-dimensional vector sequences, pm (i), are constructed from data series like [pm (1),pm (2),. . .,pm (M − m + 1)], where 1 ≤ i ≤ M − m + 1. If the distance between two vectors pm (i) and pm (j) is defined as |pm (i) − pm (j)|, then Ci m (d) = [Number of vectors such that |pm (i) − pm (j)| < d]/(M − m + 1), where m specifies the pattern length and d defines the criterion of similarity. Cim (d) is considered as the mean of the fraction of patterns of length m that resemble the pattern of the same length that begins at index i. ApEn is computed using Eq. (3) below. Let ˚m (d) =
ln(C m (d)) i i
and ˚m+1 (d)
=
(M − m + 1) i
i
(M − m)
for 1 ≤ i ≤ M − m
2.2.4. Approximate entropy of binary occupancies and t-tests We define the approximate entropies, ApEn0 and ApEn1 , of the binary occupancies, O[Nx0] and O[Nx1], respectively, in the sequential spectra. ApEn0 and ApEn1 are computed for each of the sequential-spectrum for the window or ECG segment in the normal and CHF subjects. In this work we use approximate entropies, ApEn0 and ApEn1 , and show that each one is capable of distinguishing between normal and CHF subjects. To assess the use of these parameters individual and pair-wise significance tests (Student’s t-tests) are performed. To compare the regularity of the fluctuation functions between the normal and CHF subjects we also compute the mean and standard deviation of the difference between the corresponding approximate entropies, ApEn0 and ApEn1 , of the two sub-groups. Parameters are regarded as statistically significant if p < 0.05. 2.2.5. Receiver operating characteristic (ROC) analysis and C-statistic As mentioned above, individual and pair-wise significance tests (Student’s t-tests) are used to evaluate the statistical differences between the approximate entropy values, ApEn0 and ApEn1 , of the binary occupancies, O[Nx0] and O[Nx1], respectively, for normal and CHF subjects. If significant differences between normal and CHF subjects are found, then the ability of the nonlinear analysis method to discriminate normal from CHF subjects is evaluated using receiver operating characteristic (ROC) plots in terms of AUC (C-statistic). ROC curves are obtained by plotting sensitivity values (which represent the proportion of the patients with diagnosis of CHF who test positive) along the y-axis against the corresponding (1-specificity) values (which represent the proportion of the correctly identified normal subjects) for all the available cutoff points along the x-axis. Accuracy is a related parameter that quantifies the total number of subjects (both normal and CHF) precisely classified. The area under ROC curve (AUC), also called C-statistic, measures this discrimination, that is, the ability of the test to correctly classify those with and without the disease and is regarded as an index of diagnostic accuracy. The optimum threshold is the cut-off point in which the highest accuracy (minimal false negative and false positive results) is obtained. This can be determined from the ROC curve as the closest value to the left top point (corresponding to 100% sensitivity and 100% specificity). A C-statistic value of 0.5 indicates that the test results are better than those obtained by chance, where as a value of 1.0 indicates a perfectly sensitive and specific test. 3. Results and discussion
for 1 ≤ i ≤ M − m + 1
ln(C m+1 (d)
1505
(3)
then ApEn(m, d, M) = ˚m (d) − ˚m+1 (d) for m ≥ 1 where, ApEn(0, d, M) = −˚1 (d) This is the key to a measure of irregularity: small values of ApEn indicate regularity and large values imply substantial fluctuations or irregularity in the time series. As suggested by Pincus [13,14], in this work ApEn is estimated by using widely established parameter values of m = 1 or m = 2, and d = 10–25% of the standard deviation of the original data sequence. Normalizing d in this manner gives ApEn a translation and scale invariance, in that it remains unchanged under uniform process magnification, reduction, or constant shift to higher or lower values [13,14]. It is also found that time series of length M ≥ 60 data points will lead to statistically reliable and reproducible results.
To test for statistical significance of sequential spectrum approach, first we analyze the ECG data from normal and CHF subjects of Group-I and show that approximate entropy of the binary occupancy, ApEn0 or ApEn1 , alone is sufficient to distinguish between normal and CHF subjects. Next, we validate our approach conducting another case study on normal and CHF subjects from Group-II. Both ApEn0 and ApEn1 are analyzed from segments of 3500 samples and averaged to obtain the mean value over the entire recording period for each subject. The ECG records used are pre-processed and segmented as mentioned in Section 2.1. A total of 128 segments per NSR subject and 256 segments per CHF subject are analyzed. Dynamic transformation as given in Eq. (1) is first applied on each segment to arrive at a symbol string with a range of two possible symbols {0,1} (binary symbolization) and binary occupancies, Oj [NxS] with (S = 0 or S = 1) are computed. This is repeated for all the segments (all values of j) of the subject and a plot of average binary occupancies vs the length of the mono-sequences is made. Empirically it is found that for a reliable sequence spectrum, that is, for all the binary
1506
C. Kamath / Medical Engineering & Physics 34 (2012) 1503–1509
a
0.16
Normal CHF
0.14
Binary occupancy0
0.25
Binary occupancy0
a
3000 3500 4000
0.2 0.15 0.1
0.12 0.1 0.08 0.06 0.04
0.05
0.02
0 0
5
10
15
20
25
0 0
30
Length of Mono-sequence
60
80
0.16
b
3000 3500 4000
100
Normal CHF
0.14
Binary occupancy1
0.25
Binary occupancy0
40
Length of Mono-sequence
b
0.2 0.15 0.1
0.12 0.1 0.08 0.06 0.04
0.05 0 0
20
0.02
5
10
15
20
25
0 0
30
Length of Mono-sequence Fig. 1. Effect of length of time series on the binary occupancy O[Nx0] of sequence spectra of (a) a normal subject (f1o03) and (b) a CHF subject (chf07), for three different lengths W = 3000, 3500 and 4000.
occupancies that exist in the time series (considering both normal and CHF) to be captured in the sequence spectrum the length of the raw ECG time series must be at least 2000 samples (at a sampling frequency of 250 Hz). Each binary occupancy amounts to the relative contribution of the mono-sequence of a specific length into the analyzed symbol series. As a consequence it is found that the overall shape of the sequence spectrum remains almost the same for time series with length greater than the minimum length. This is demonstrated in Fig. 1(a) for a normal subject (f1o03) and Fig. 1(b) for a CHF subject (chf07) for time series of three different lengths with W = 3000, 3500 and 4000. It is found that the overall shape of the sequence spectrum for binary occupancy-0 remains nearly the same irrespective of the length of the time series. This statement is also true for binary occupancy-1. In this work we have chosen W = 3500 samples. Fig. 2(a) shows a comparison of averaged sequential spectra of the binary occupancies, O[Nx0] for normal and CHF subjects from Group-I and Fig. 2(b) shows a comparison of sequence spectra of the binary occupancies, O[Nx1] for normal and CHF subjects from Group-I. It is found from Fig. 2 that both the sequential spectra corresponding to normal subjects show a longer width compared to CHF subjects. All binary occupancies over the range n = 4–15 contribute significantly in the CHF cases than those for normal. However, all other binary occupancies contribute significantly in the normal cases than those for CHF. In general, if the data series has a length W, then the resulting binary occupancy in the sequence spectrum will also have the same length W. However, the sequence spectrum will decay to zero after a finite length, Lm , which is much smaller than W (i.e., Lm W). This fact can be readily observed from Fig. 2(a) and (b). In our case, for example, with W = 3500, Lm < 100. Hence it is sufficient to consider the first L = 100 points of the binary
20
40
60
80
100
Length of Mono-sequence Fig. 2. Comparison of sequential spectra of normal (f1o03) subject and CHF (chf07) subject from Group-I: (a) for binary occupancies O[Nx0] and (b) for binary occupancies O[Nx1].
occupancy in the sequence spectrum for the computation of ApEn. This readily fulfills the requirement of length of the series for computation of ApEn, mentioned above in Section 2.2.3. To arrive at near-optimal values of m and d for the computation of ApEn, we propose a procedure, a little different from that in [13]. Instead of using Student’s t-test to arrive at near-optimal values of m and d, we employ ROC AUC as explained here. We estimate ApEn0 and ApEn1 for the normal and CHF subjects (Group-I), for m = 1 and m = 2, and several values of d = 0.05, 0.1, 0.15 and 0.2 times the standard deviation (SD) of the original data sequence. Each of the estimated series, ApEn0 and ApEn1 , for the normal and CHF subjects is used in ROC analysis and AUC (or C-statistic) is computed. In Table 1, we show the AUC values estimated for different values of m and d. It is found that irrespective of m and d the AUC is the same, being equal to 1. This implies that one can choose any combination of m and d for best results and we choose m = 1 and d = 0.1 times the SD of the data. These optimal values of m and d are used for further
Table 1 Descriptive results of ROC analysis for discriminating normal and CHF subjects in Group-I for different values of m and d. SD is standard deviation of the time series. m
d
ApEn0 AUC (C-statistic)
ApEn1 AUC (C-statistic)
1 1 1 1 2 2 2 2
0.05 SD 0.10 SD 0.15 SD 0.20 SD 0.05 SD 0.10 SD 0.15 SD 0.20 SD
1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
C. Kamath / Medical Engineering & Physics 34 (2012) 1503–1509
1507
Table 2 Descriptive results of sequence spectrum analysis for Group-I. All values are expressed as mean ± SD (median) [95% CI] (non-paired Student’s t-test; p < 0.0001). Subject
ApEn0 (1, 0.1 SD)
ApEn1 (1, 0.1 SD)
Normal CHF
0.1495 ± 0.02948 (0.1663) [0.1070 0.1794] 0.02363 ± 0.005239 (0.02478) [0.0154 0.0331]
0.2123 ± 0.04097 (0.2341) [0.1528 0.2579] 0.02907 ± 0.008925 (0.03062) [0.0194 0.0396]
a
compared to those of the normal subjects. Of course, experimental studies are necessary to confirm the mechanisms behind the increased regularity of binary occupancies of CHF subjects. We also perform the Student’s t-test for paired data. The results are tabulated in Table 3. Parameters are regarded as statistically significant if p < 0.05. The values of test statistic and p-value reveal that both the approximate entropies ApEn0 and ApEn1 are statistically significant. Next, the ability of the ApEn0 and ApEn1 to discriminate normal from CHF subjects (Group-I) is evaluated using receiver operating characteristic (ROC) plots, which are shown in Fig. 4(a). For the case of ApEn0 (shown by dash-dot line), it is found that the AUC is 1.0000 with sensitivity = 100.0%, specificity = 100.0%, positive predictivity = 100.0%, and accuracy = 100.0%. For the case of ApEn1 (shown by solid line) also, the AUC is found to be 1.0000 with sensitivity = 100.0%, specificity = 100.0%, positive predictivity = 100.0%, and accuracy = 100.0%. These results substantiate our finding that either ApEn0 or ApEn1 alone is sufficient to separate normal from CHF subjects.
a
0.18
1
0.8
Sensitivity
analysis. The corresponding ApEn is usually expressed as ApEn(1, 0.1 SD). The distribution of approximate entropies ApEn0 and ApEn1 for the normal and CHF subjects (Group-I) are shown using Boxwhiskers plots in Fig. 3(a) and (b), respectively. In Fig. 3(a) for ApEn0 , the boxes (inter-quartile range) as well as whiskers of normal and CHF subjects are distinctly non-overlapping. In Fig. 3(b) for ApEn1 , the same is true. These plots show that either ApEn0 or ApEn1 can be used to distinguish between normal and CHF subjects. The results of statistical analysis of non-paired Student’s t-test for normal and CHF subjects of Group-I are depicted in Table 2. All values are expressed as mean ± standard deviation (median) [95% confidence interval]. For normal subjects, we find the following approximate entropies (mean ± S.D.): ApEn0 = 0.1495 ± 0.02948 and ApEn1 = 0.2123 ± 0.04097. For CHF subjects we find the following approximate entropies (mean ± S.D.): ApEn0 = 0.02363 ± 0.005239 and ApEn1 = 0.02907 ± 0.008925, both different from those of normal. These distributions show that either ApEn0 or ApEn1 alone is adequate to distinguish between normal and CHF subjects. It can be observed that both the normal and CHF subjects show a larger value of ApEn1 as compared to that of ApEn0 . This implies a reduced regularity in binary occupancies O[Nx1] compared to O[Nx0]. It is also found that ApEn0 and ApEn1 for CHF subjects are always smaller than the corresponding values of the normal subjects, the former being significantly smaller. This implies an increased regularity of binary occupancies O[Nx0] and O[Nx1] in the CHF subjects
0.6
0.4
Subject ApEn0
0.16 0.14
0.2
Subject ApEn0 Subject ApEn1
0.12 0.1
0
0.08
0.2
0.4
0.6
0.8
1
1-Specificity
0.06 0.04
b
0.02 Normal-1
CHF-1
0.25 0.2 0.15
1
0.8
Sensitivity
b
Subject ApEn1
0
0.6
0.4
0.2
Subject ApEn0 Subject ApEn1
0.1 0
0.05
0
0.2
0.4
0.6
0.8
1
1-Specificity Normal-1
CHF-1
Fig. 3. The distribution of approximate entropies (a) ApEn0 and (b) ApEn1 using Box-whiskers plots (with outliers) for normal and CHF subjects from Group-I.
Fig. 4. (a) ROC curve for ApEn0 (dash-dot line) and ApEn1 (solid line) for Group-I. (b) ROC curve for ApEn0 (dash-dot line) and ApEn1 (solid line) for Group-II. The diagonal line (dotted line) from 0,0 to 1,1 represents ROC curve that cannot discriminate between normal and CHF subjects.
1508
C. Kamath / Medical Engineering & Physics 34 (2012) 1503–1509
Table 3 p-Values and tstat (test statistic) values of paired t-test for ApEn0 and ApEn1 of healthy and CHF subjects from Group-I. Parameter
ApEn0 (1, 0.1 SD)-CHF
ApEn0 (1, 0.1 SD)-normal ApEn1 (1, 0.1 SD)-normal
p = 2.7156 × 10−06 ; tstat = 7.5385
ApEn1 (1, 0.1 SD)-CHF p = 1.9258 × 10−06 ; tstat = 7.7680
Table 4 Descriptive results of sequence spectrum analysis for Group-II. All values are expressed as mean ± SD (median) [95% CI] (non-paired Student’s t-test; p < 0.0001). Subject
ApEn0 (1, 0.1 SD)
ApEn1 (1, 0.1 SD)
Normal CHF
0.1320 ± 0.02691 (0.1279) [0.0946 0.1844] 0.02363 ± 0.00935 (0.02407 [0.0129 0.0332]
0.1612 ± 0.04083 (0.1506) [0.1058 0.2465] 0.0396 ± 0.01323 (0.03847) [0.0215 0.0585]
Table 5 p-Values and tstat (test statistic) values of paired t-test for ApEn0 and ApEn1 of healthy and CHF subjects from Group-II. Parameter
ApEn0 (1, 0.1 SD)-CHF −05
p = 8.4720 × 10
ApEn0 (1, 0.1 SD)-normal ApEn1 (1, 0.1 SD)-normal
; tstat = 5.6106 p = 0.0011; tstat = 4.1597
Finally, we validate our approach conducting another case study on normal and CHF subjects from Group-II. The results of statistical analysis of non-paired Student’s t-test for normal and CHF sub-groups of Group-II are depicted in Table 4. All values are expressed as mean ± standard deviation (median) [95% confidence interval]. For normal sub-group, we find the following approximate entropies (mean ± S.D.): ApEn0 = 0.1320 ± 0.02691 and ApEn1 = 0.1612 ± 0.04083. For CHF sub-group we find the following approximate entropies (mean ± S.D.): ApEn0 = 0.02363 ± 0.00935 and ApEn1 = 0.0396 ± 0.01323, both different from normal. These
a
0.25
Subject ApEn0
0.2
Normal CHF
0.15 0.1 0.05 0 0
5
10
15
Subject number
b
0.35
Subject ApEn1
0.3
Normal CHF
0.25
distributions show that either ApEn0 or ApEn1 can be used to distinguish between normal and CHF subjects. We also perform the Student’s t-test for paired data. The results are tabulated in Table 5. Parameters are regarded as statistically significant if p < 0.05. The values of test statistic and p-value reveal that both the approximate entropies ApEn0 and ApEn1 are statistically significant. Next, the ability of the ApEn0 and ApEn1 to discriminate normal from CHF sub-groups (Group-II) is evaluated using receiver operating characteristic (ROC) plots, which are shown in Fig. 4(b). For the case of ApEn0 (shown by dash-dot line), it is found that the AUC is 1.0000 with sensitivity = 100.0%, specificity = 100.0%, positive predictivity = 100.0%, and accuracy = 100.0%. For the case of ApEn1 (shown by solid line) also, AUC is found to be 1.0000 with sensitivity = 100.0%, specificity = 100.0%, positive predictivity = 100.0%, and accuracy = 100.0%. These results again confirm our finding that either ApEn0 or ApEn1 alone is sufficient to distinguish between normal and CHF subjects. The efficacy of the proposed approach can be visualized by plotting a scattergram, with subjects along x-axis and the value of ApEn0 or ApEn1 along y-axis for (say, 30 subjects from Group-I and II) 15 normal (shown by ‘o’) and 15 CHF subjects (shown by ‘+’) as in Fig. 5. The solid horizontal line marks the threshold (mean of minimum ApEn for normal subjects and maximum of ApEn for CHF subjects on the respective plot). It is interesting to see the separation of ApEn0 or ApEn1 values between normal and CHF subjects. This means ApEn0 or ApEn1 exhibits variability large enough to separate normal from CHF. These plots substantiate our finding that either ApEn0 or ApEn1 alone is sufficient to distinguish between normal and CHF sub-groups and thus can significantly add to the prognostic value of traditional cardiac analysis.
4. Conclusion
0.2 0.15 0.1 0.05 0 0
ApEn1 (1, 0.1 SD)-CHF
5
10
15
Subject number Fig. 5. Scattergram of (a) subject ApEn0 and (b) subject ApEn1 , for 15 normal and 15 CHF subjects from Group-I and -II.
We apply sequential spectrum analysis to the first difference of nonstationary raw ECG time series from normal and CHF subjects. We show that this approach can identify the monotonicity in the ECG signals and tell us how long these periods are, when the signal is increasing or decreasing. The quantified approximate entropies of the binary occupancies of the sequential spectrum are found to have potential in discriminating normal and CHF subjects and thus can significantly add to the prognostic value of traditional cardiac analysis. These approximate entropies can easily be analyzed from ambulatory ECG recordings without time consuming preprocessing and hence, may have practical implications for risk stratification. In real time applications the value for specificity is more
C. Kamath / Medical Engineering & Physics 34 (2012) 1503–1509
important than the value for sensitivity and with this approach average specificity is of the same order as sensitivity (100%). The usual nonlinear methods applied to time series (other than symbolic dynamics) usually demand long-term series, at times of 24 h length. Acquiring 24 h record just for screening purpose is not amenable. Although the ECG data we use contains both 30 min and 20 h duration records, our method uses short-term segments, of the order of 14 s duration. Hence the method is suitable for screening large populations in a short time. Authors’ contributions The author, who is also the corresponding author, is the sole contributor to this work. Conflict of interest statement The author declares that they have no competing interests. References [1] Sandercock GRH, Brodie DA. The role of heart rate variability in prognosis for different modes of death in chronic heart failure. Pacing Clin Electrophysiol 2006;79(8):892–904. [2] Daw CS, Finney CEA, Tracy ER. A review of symbolic analysis of experimental data. Rev Sci Instrum 2003;74(2):915–30.
1509
[3] Xu JH, Liu ZR, Liu R. The measures of sequence complexity for EEG studies. Chaos 1994;4(11):2111–9. [4] Daw CS. Observing and modeling nonlinear dynamics in an internal combustion engine. Phys Rev Lett 1998;57(3):2811–9. [5] Finney CEA, Nguyen K, Daw CS, Halow JS. Symbol-sequence statistics for monitoring fluidization. In: Proceedings of the ASME heat transfer division. 1998. p. 405–11. [6] Kurths J, Voss A, Saparin P, Witt A, Kleiner HJ, Wessel N. Quantitative analysis of heart rate variability. Chaos 1995;5:88–94. [7] Porta A, D’Addio G, Pinna GD, Maestri R, Gnecchi-Ruscone T, Furlan R, et al. Symbolic analysis of 24 h Holter heart period variability series: comparison between normal and heart failure patients. Comput Cardiol 2005;32:575–8. [8] Tobaldini E, Porta A, Wei S-G, Zhang Z-H, Francis J, Casali KR, et al. Symbolic analysis detects alterations of cardiac autonomic modulation in congestive heart failure rats. Auton Neurosci 2009;150(1–2):21–6. [9] Wessel N, Schwarz U, Saparin PI, Kurths J. Symbolic dynamics for medical data analysis. In: Attractors, signals, and synergetics. 2002. p. 45–61. [10] Voss A, Kurths J, Kleiner HJ, Witt A, Wessel N, Saparin P, et al. The application of methods of nonlinear dynamics for the improved and predictive recognition of patients threatened by sudden cardiac death. Cardiovasc Res 1996;31:419–33. [11] Stepien RA. New method for analysis of nonstationary signals. Nonlinear Biomed Phys 2011;5(3):1–8. [12] Cysarz D, Bettermann H, Van Leeuwen P. Entropies of short binary sequences in heart period dynamics. Am J Physiol Heart Circ Physiol 2000;278:H2163–72. [13] Hornero R, Abásolo D, Jimeno N, Sánchez C, Jesús Poza J, Aboy M. Variability, regularity, and complexity of time series generated by schizophrenic patients and control subjects. IEEE Trans Biomed Eng 2006;53(2):210–8. [14] Pincus SM. Assessing serial irregularity and its implications for health. Ann N Y Acad Sci 2001;954:245–67. [15] Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PCh, Mark RG, et al. PhysioBank, PhysioToolkit, and PhysioNet. Components of a new research resource for complex physiologic signals. Circulation 2000;101:e215–20 [Available at http://circ.ahajournals.org/cgi/content/full/101/23/e215].