Effect of decimation on the classification rate of non-linear analysis methods applied to uterine EMG signals

Effect of decimation on the classification rate of non-linear analysis methods applied to uterine EMG signals

Disponible en ligne sur www.sciencedirect.com IRBM 34 (2013) 326–329 Research in Imaging and Health Technologies Effect of decimation on the classi...

442KB Sizes 3 Downloads 59 Views

Disponible en ligne sur

www.sciencedirect.com IRBM 34 (2013) 326–329

Research in Imaging and Health Technologies

Effect of decimation on the classification rate of non-linear analysis methods applied to uterine EMG signals A. Diab a,b,∗ , M. Hassan c , B. Karlsson b , C. Marque a a

CNRS UMR 7338 Biomécanique et Bio-ingénierie, Université de technologie de Compiègne, 60200 Compiègne, France b School of Science and Engineering, Reykjavik University, Reykjavik, 101 Iceland c Laboratoire traitement du signal et de l’image, Inserm, Université de Rennes-1, 35042 Rennes, France Received 7 May 2013; received in revised form 24 July 2013; accepted 26 July 2013 Available online 21 September 2013

Abstract Recently, much attention has been paid to the use of non-linear analysis techniques for the characterization of biological signals. Several measures have been proposed to detect non-linear characteristics in time series. The effect of sampling frequency on the performance of these non-linear methods has rarely been evaluated. In this paper, we present a preliminary study of this effect for four methods that are widely used in non-linearity detection: Time reversibility, Sample Entropy, Lyapunov Exponents and Delay Vector Variance. These methods have been applied to real uterine EMG signals with the aim of distinguishing between pregnancy and labor contractions. The signals were used to classify contractions before and after decimating them by a factor 10. The results show that decimation improves the performance for sample entropy. It reduces it considerably for Delay Vector Variance and only slightly for Time reversibility and Lyapunov exponents. Time reversibility still gives the highest classification rate. The methods were much less computationally expensive after down sampling. © 2013 Elsevier Masson SAS. All rights reserved.

1. Introduction Non-linear time series analysis gives information about nonlinear features of biological signals, caused by the underlying non-linear physiological mechanisms related to most biological systems. The uterus is a very poorly understood organ. It is deceptively simple in structure but its functioning is quite complex as it moves from pregnancy towards labor [1]. This behavior, as observed by its electrical activity (electrohysterogram EHG = uterine electromyogram), indicates that there are numerous interconnected control systems involved (electric, hormonal, mechanical). When working together, they give rise to the non-linear character observed in the EHG [2]. There is a growing literature reporting non-linear biosignal analysis such as EEG [3], ECG [4], EMG [5] and EHG [6]. Several applications of non-linear analysis methods have been done on uterine EMG signals. We can cite here the comparison between Approximate Entropy, Correntropy and Time reversibility [5], the use of

Sample Entropy [7] and the use of Detrended Fluctuation Analysis [4], as well as sensitivity and robustness analysis of four non-linear methods [2]. In most of these studies, the authors have reported some practical disadvantages of the methods, mainly an impractically large calculation time for real time or immediate analysis. The study of the effect of decimation (or down sampling) on the performances of these methods, which is the main objective of this paper, is rare in the literature. In fact we have only found one example of this systematic analysis for Sample entropy [7]. Four methods: Time reversibility [8], Sample Entropy [9], Lyapunov Exponents [10] and Delay Vector Variance [11] were used in this work. We tested the sensitivity of these methods to decimation on real EHG signals for the differentiation between pregnancy and labor contractions. This study aims to compare the method’s performances with and without down sampling. 2. Material and methods 2.1. Data



Corresponding author. CNRS UMR 7338 Biomécanique et Bio-ingénierie, Université de technologie de Compiègne, 60200 Compiègne, France. E-mail address: [email protected] (A. Diab). 1959-0318/$ – see front matter © 2013 Elsevier Masson SAS. All rights reserved. http://dx.doi.org/10.1016/j.irbm.2013.07.010

The methods used here are “monovariate” in that we used only one channel (bipolar vertical7: Vb7) from the 4*4 recording

A. Diab et al. / IRBM 34 (2013) 326–329

matrix located on the women’s abdomen. This channel is located on the median vertical axis of the uterus [12]. The signal was recorded on women in France and in Iceland. In Iceland, we recorded signals on 22 women at the Landspitali University hospital, following a protocol approved by the relevant ethical committee (VSN02-0006-V2). In France, we recorded signals on 27 women at the Center of Amiens for Obstetrics and Gynaecology, following a protocol approved by the local ethical committee (ID-RCB 2011-A00500-41). The sampling frequency of EHG signals was 200 Hz. They were segmented manually to extract segments containing contraction bursts. After segmentation we got 115 labor and 174 pregnancy EHG bursts that we used for the analysis. 2.2. Methods 2.2.1. Time reversibility A time series is supposed to be reversible only if its probabilistic properties are invariant with respect to time reversal. Time irreversibility (TR) can be taken as a strong signature of nonlinearity [13]. In this paper we used the simplest way, described as follows to compute time reversibility for a signal:    N 1 (Sn − Sn−τ )3 Tr (τ) = N −τ n=τ+1

where N is the signal length and τ is the time delay. 2.2.2. Sample Entropy Sample Entropy (SampEn) is the negative natural logarithm of the conditional probability that a dataset of length N, having repeated itself for m samples within a tolerance r, will also repeat itself for m + 1 samples. Thus, a lower value of SampEn indicates more regularity in the time series. We used the way described in [7] to compute SampEn: for a time series of N points, x1 ,x2 ,. . .,xN we define subsequences, also called template vectors, of length m, given by: y(m) = (xi ,xi+1 ,. . .,xi+m–1 ) where i = 1,2,. . .,N-m + 1. Then the following quantity is defined: Bim (r) as (N-m–1)−1 times the number of vectors Xjm within r of Xim , where j ranges from 1 to N-m, and j = / i to exclude self-matches, and then define: Bm (r) =

N−m 1  m Bi (r) N −m i=1

−1 times the number Similarly, we define Am i (r) as (N-m–1) m+1 m+1 within r of Xi , where j ranges from 1 to of vectors Xj N-m, where j = / i, and set

Am (r) =

N−m 1  m Ai (r) N −m i=1

The parameter SampEn(m,r) is then defined as   limN→∞ − ln Am (r) /Bm (r) , which can be estimated by the statistic:   SampEn (m, r, N) = − ln Am (r) /Bm (r)

327

N is the length of the time series, m is the length of sequences to be compared, and r is the tolerance for accepting matches. 2.2.3. Lyapunov exponents Lyapunov exponent (LE) is a quantitative indicator of system dynamics, which characterizes the average convergence or divergence rate between adjacent tracks in the phase space. We used the way described in [14] to compute LE:  

1 λ = lim lim log Δyt / Δy0 t → Δ →0 t y0 ∞ where Δy0 and Δyt represent the Euclidean distance between two states of the system, respectively to an arbitrary time t0 and a later time t. 2.2.4. Delay Vector Variance The Delay Vector Variance (DVV) method is used for detecting the presence of determinism and non-linearity in a time series and is based upon the examination of local predictability of a signal. We use the measure of unpredictability σ ∗2 described in [15]: A time series can be represented conveniently in the phase space by using time delay embedding. When time delay is embedded into a time series, it can be represented by a set of delay vectors (DVs) of a given dimension. If m is the dimension of the delay vectors then it can be expressed as X(k) = [x(k-mτ ). . .X(kτ) ], where τ is the time lag. Now for every DV X(k), there is a corresponding target, namely the next sample xk . A set βk (m,d) is generated by grouping those DVs that are within a certain Euclidean distance (d) to DV X(k). This Euclidean distance will be varied in a manner standardized with respect to the distribution of pair-wise distances between DVs. Now for a given embedding dimension m, a measure of unpredictability σ ∗2 (target variance) is computed over all sets of βk . The mean μd and the standard deviation σ d are computed over all pair-wise Euclidean distances between DVs given by x (i) − x (j) (i = / j). The sets βk (m,d) are generated such that βk = {x (i) \ x (k) − x (j) ≤ d} i.e., sets which consist of all DVs that lie closer to X (k) than a certain distance d, taken from the interval [μd − nd ∗ σd ; μd + nd ∗ σd ] where nd is a parameter controlling the span over which to perform DVV analysis. For every set βk (m,d) we compute the variance of corresponding targets σk2 (m, d). The average over the N sets βk (m,d) is divided by the variance of the time series signal σk2 , σk gives the inverse 2 measure of predictability, namely target variance σ ∗ . 2 (1/N) N 2 k=1 σk σ∗ = 2 σx 3. Results Our signals were recorded with a sampling frequency of 200 Hz. The useful content of EHG is known to range between 0.1 and 3 Hz [16]. A high sampling frequency made the calculations more intensive, but on the other hand, a low sampling

328

A. Diab et al. / IRBM 34 (2013) 326–329

Fig. 1. ROC curves for the labor/pregnancy classification for all methods without (a) and with down sampling (b).

frequency may negatively affect the results of the methods. To do this, we applied the non-linear methods on the same signals with the original sampling frequency (200 Hz), and after decimation (20 Hz). We chose a factor 10 for the decimation based on the signal bandwidth mentioned above. A 20 Hz sampling frequency is sufficient to respect the Shannon theory while keeping a correct time resolution. We investigated the effect of down sampling on the performances of the non-linear parameters for pregnancy/labor classification. Our objective is to see if down sampling has a significant influence on the results, and, in this case, if it increases or decreases the classification performance. If the influence is small, we will gain in computational time without affecting the classification rate. To evaluate the performance of the proposed parameters for pregnancy/labor classification, we used classical Receiver Operating Characteristic (ROC) curves. A ROC curve is a graphical tool that permits the evaluation of a binary, i.e. two-class classifier. A ROC curve is the curve corresponding to TPR (True Positive Rate or sensitivity) vs. FPR (False Positive Rate or 1-Specificity) obtained for different classification parameter thresholds. ROC curves are classically compared by means of the Area Under the Curve (AUC), which is estimated by the trapezoidal integration method. We plot the ROC curves for the four methods by using the original sampling rate (200 Hz) in Fig. 1(a), and the decimated Table 1 Comparison of ROC curve Area Under the Curve (AUC) for labor/pregnancy classification. Classification parameter

TR DVV SampEn LE Computational time

ROC curve AUC Fs = 200 Hz

Fs = 20 Hz

0.842 0.615 0.478 0.758 8.27 h

0.809 0.541 0.672 0.731 18 min

signals (20 Hz) in Fig. 1(b). The AUC for the ROC curves is presented in Table 1. It is clear from Fig. 1 that TR and LE are not strongly affected by down sampling, as their AUC in the two cases are similar (Table 1). DVV is more affected than TR and LE. Its already low discrimination power becomes even worse. The performance of SampEn increases drastically after down sampling as its AUC goes from 0.478 to 0.672 (Table 1). The computation time for all methods decreases from 8.27 hours to 18 minutes after down sampling (Table 1).

4. Discussions – Conclusions This paper presents the results of a preliminary study of the effect of down sampling on four non-linear methods (TR, SampEn, DVV, LE) applied to uterine EMG signals classification. Indeed down sampling can clearly reduce the computational costs. In this paper, we tested if down sampling induced an effect on the classification rate of the methods by decimating our EHG signals from 200 Hz to 20 Hz. Methods were compared by using ROC curves and their associated AUC. The main findings concerning the effect of down sampling are: (i) the performance of LE and TR, which were evidenced to be the most powerful methods for labor/pregnancy classification in our previous study [1], are slightly reduced. AUC of TR decreases from 0.842 to 0.809. But this may be an acceptable trade off with the computation time saved; (ii) the performances of the DVV method are more affected than TR and LE; (iii) the performances of SampEn is, as expected, highly influenced by down sampling due to SampEn’s dependence on signal point numbers and sampling, as indicated in its name. They are remarkably increased by down sampling. But SampEn AUC stays lower than TR, TR remaining the best method for pregnancy/labor classification. Down sampling by a factor of 10 decreases the computational time by a factor of 27.5. This makes the clinical application of these methods

A. Diab et al. / IRBM 34 (2013) 326–329

much more realistic and will help us to attempt to use these methods for the prediction of normal and preterm labor. We conclude that down sampling does not cause a significant decrease to the classification rate of the studied methods. The classification results remain similar despite the fairly drastic down sampling that we performed in this limited preliminary study. The increase in the classification rate of SampEn confirms that this method is very sensitive to the sampling frequency of the signal. Thus, to optimize the performances of this method, the sampling frequency needs to be carefully chosen. Future work will tackle the question of the optimal sampling frequency for all these methods in general and for SampEn in particular. We will, for example, apply the methods on signal decimated by several factors ranging from 20 to 5 in order to choose the optimal sampling frequency that gives the highest AUC. After the validation of the classification rate of our parameters by using ROC curve, we will attempt to use the most efficient parameter as input to a more advanced and sophisticated classifier such as Support Vector Machine (SVM), for the detection of preterm labor. Acknowledgement French Ministry of Research, French Ministry of Foreign Affair and the EraSysBio+ program. References [1] Hassan M, Terrien J, Karlsson B, Marque C. Application of wavelet coherence to the detection of uterine electrical activity synchronization in labor. IRBM 2009;31(3):182–7. [2] Diab A, Hassan M, Marque C, Karlsson B. Quantitative performance analysis of four methods of evaluating signal nonlinearity: Application to uterine EMG signals. Conf Proc IEEE Eng Med Biol Soc 2012:1045–8.

329

[3] Takahashi T, Raymond YC, Mizuno T, Kikuchi M, Murata T, Takahashi K, et al. Antipsychotics reverse abnormal EEG complexity in drug-naive schizophrenia: a multiscale entropy analysis. Neuroimage 2010;51(1): 173–82. [4] Shiogai Y, Stefanovska A, McClintock PVE. Nonlinear dynamics of cardiovascular ageing. Phys Rep 2010;488(2–3):51–110. [5] Xie H-B, Zheng YP, Guo JY, Chen X. Cross-fuzzy entropy: a new method to test pattern synchrony of bivariate time series. Inf Sci 2010;180(9):1715–24. [6] Hassan M, Terrien J, Karlsson B, Marque C. Comparison between approximate entropy, correntropy and time reversibility: application to uterine electromyogram signals. Med Eng Phys 2011;33(8):980–6. [7] Vrhovec J. Evaluating the progress of the labour with sample entropy calculated from the uterine EMG activity. Elektrotehniski Vestnik Electrotech Rev 2009;76(4):165–70. [8] Diks C, van Houwelingen JC, Takens F, DeGoede J. Reversibility as a criterion for discriminating time series. Phys Lett A 1995;201(2–3):221–8. [9] Richman JS, Moorman JR. Physiological time-series analysis using approximate entropy and sample entropy. Am J Physiol Heart Circ Physiol 2000;278(6):H2039–49. [10] Wolf A, Jack B, Swift, Harry L, Swinney, Vastano JA. Determining Lyapunov exponents from a time series. Phys D 1985;16(3):285–317. [11] Gautama T, Mandic DP, Van Hulle MM. The delay vector variance method for detecting determinism and nonlinearity in time series. Phys D 2004;190(3–4):167–76. [12] Karlsson B, Terrien J, Guómundsson V, Steingrímsdóttir T, Marque C. Abdominal EHG on a 4 by 4 grid: mapping and presenting the propagation of uterine contractions. Springer: In: 11th Mediterranean Conference on Medical and Biomedical Engineering and Computing; 2007. [13] Schreiber T, Schmitz A. Surrogate time series. Phys D 2000;142(3–4): 346–82. [14] Fele-Zorz G, Kavsek G, Novak-Antolic Z, Jager F. A comparison of various linear and non-linear signal processing techniques to separate uterine EMG records of term and pre-term delivery groups. Med Biol Eng Comput 2008;46(9):911–22. [15] Kuntamalla RLS. The effect of aging on nonlinearity and stochastic nature of heart rate variability signal computed using Delay Vector Variance Method. Int J Comput Appl 2011;14(5):40–4. [16] Devedeux D, Marque C, Mansour S, Germain G, Duchene J. Uterine electromyography: a critical review. Am J Obstet Gynecol 1993;169(6): 1636–53.