Low-complexity lossless multichannel ECG compression based on selective linear prediction

Low-complexity lossless multichannel ECG compression based on selective linear prediction

Biomedical Signal Processing and Control 57 (2020) 101705 Contents lists available at ScienceDirect Biomedical Signal Processing and Control journal...

2MB Sizes 0 Downloads 28 Views

Biomedical Signal Processing and Control 57 (2020) 101705

Contents lists available at ScienceDirect

Biomedical Signal Processing and Control journal homepage: www.elsevier.com/locate/bspc

Low-complexity lossless multichannel ECG compression based on selective linear prediction Dominik Rzepka ∗ ˙ e-Health Data Science Unit, Comarch S.A., ul. Zyczkowskiego 27, 31-864 Kraków, Poland

a r t i c l e

i n f o

Article history: Received 27 May 2019 Received in revised form 22 August 2019 Accepted 5 October 2019

a b s t r a c t In this paper we present low-complexity method for multichannel lossless compression, dedicated for portable ECG acquisition systems. Each channel is selectively decorrelated using two linear predictors, separately for the heart beats and the background signal. Next, cross-channel correlations are used to determine pairs of strongly dependent channels, which allows for further decorrelation of signals using cross-channel predictors. Prediction is therefore obtained at small computational cost, using a simple QRS detector and three short linear predictors per channel, updated periodically using fast Levinson-Durbin recursion. Finally, the prediction error and the estimate of its variance is used in entropy encoding with highly effective method of asymmetric numeral systems. The average compression ratio depends on the similarity between channels and varies from 2.92 (MIT-BIH ADB, low similarity) to 3.43 (INCART, high similarity). © 2019 Elsevier Ltd. All rights reserved.

1. Introduction Remote monitoring of heart condition is becoming increasingly popular, since heart diseases are presently one of the major causes of death [1]. The main diagnostic signal used for heart diagnosis is the electrocardiogram (ECG), recorded using electrodes placed on the patient’s body and converted into sequence of discrete-time samples using a portable device. In some systems the automated diagnosis using ECG is performed directly in the portable device carried by a patient (for example [2,3]). Such an approach, however, limits the complexity of the signal processing methods, due to the power consumption constraints of battery-powered portable devices, while the needs for computational resources keep rising with the desired quality of service. The main advantage of the remote monitoring of heart condition is allowing patients to lead their normal life without keeping them in the hospital. As a consequence, the quality of ECG signal collected from patients during their daily routine is sometimes very poor, primarily due to the electrical activity of muscles and electrode displacement. Providing high reliability of the motoring system is therefore a strong motivation for introducing redundancy and processing multiple records of ECG simultaneously. Since the multichannel processing requires high computational resources, it may be better to send the ECG sig-

∗ Corresponding author. E-mail address: [email protected] https://doi.org/10.1016/j.bspc.2019.101705 1746-8094/© 2019 Elsevier Ltd. All rights reserved.

nal using wireless link to the central processing system and perform analysis there, if the energy requirements of the data transmission are smaller than cost of the advanced processing in the portable device. Effectiveness of the transmission can be increased if the volume of the data is decreased using energy-efficient compression method [4,5]. Lossy compression methods significantly outperform lossless ones in terms of the compression ratio (CR), at the cost of introduced distortions [6]. Unfortunately, lossy methods often cannot be used in the ECG processing, because of the strict processing distortion constraints for the ECG acquisition devices [4]. The problem is therefore limited to finding the method yielding high CR given the limited computational resources. The latter requirement severely limits the applicability of advanced algorithms, based for example on the adaptive wavelet transform [7,8], neural networks [9], Burrows-Wheeler transformation [10] or clustering [11]. The solution could be low-complexity compression methods, based on delta encoding [12], linear prediction coding (LPC) [13–15] or adaptive linear prediction [4], but they do not take advantage of the redundancy between channels. This possibility is exploited in [15], where signal from one channel is subtracted from the signal in other channel. However, this simple approach may be ineffective for some ECG records because of differences in amplitude, beat shape and polarity. In this paper we propose multichannel signal decorrelation performed with linear cross-channel prediction. We use one channel to predict the values of the other one, and in order to optimize the choice of channel pairs and to maximize the number chan-

2

D. Rzepka / Biomedical Signal Processing and Control 57 (2020) 101705

obtained from a statistical analysis of the input. The predicted signal is subtracted from the original and it is expected that the residual error has lower entropy than the original signal. The prediction can rely on the previous samples of the decorrelated signal (in-channel decorrelation) or on the samples from the other signal (cross-channel decorrelation). 2 Entropy coding: expressing the samples using symbols with variable bit width, which is on average smaller than the bit width of the input signal. The optimal encoding would require the knowledge of the residual error distribution and the method of assigning symbols of bit width equal to the amount of information carried by each sample of the residual error. Fig. 1. General scheme of lossless signal encoding.

nels benefiting from prediction, we develop a method of arranging pairs of mutually dependent channels into a tree. Furthermore, to improve linear prediction in the individual channels, we introduce the selective linear prediction by using different LPC coefficients for heart beats and for the background noise in the ECG signal. Beat locations are detected approximately using simple thresholdbased algorithm. Finally, the error variance is estimated and the signal is subject to entropy coding using Asymmetric Numeral Systems method [16]. The signal is divided into packets and the estimation of the signal properties is performed for each packet independently. The resulting CR depends mainly on the similarities between channels, the length of packets, the noise level and the representativeness of the signal fragment used for the predictor’s training. For 120s-long packets, mean CR varies from 2.92 (MIT-BIH ADB; weak dependence between channels) to 3.46 (INCART; strong dependence). The paper is organized as follows: Section 2 describes the characteristics of the ECG signal which can be exploited for designing effective compression. Next, the principles of lossless compression are introduced. Section 3 is focused on the linear prediction and its extension into the selective linear prediction using simple beat detector. Section 4 presents the cross-channel linear prediction and the algorithm for finding pairs of channels used in the prediction. Section 5 covers the entropy coding and the choice of the probability mass function reflecting statistics of the signal. Section 6 summarizes the arrangement of processing blocks and the order of their operation in the estimation and the prediction stages respectively. Section 7 discusses the results and includes the comparison with other methods. Section 8 summarizes the paper.

The estimation of the aforementioned signal properties (samples interdependence, residual error distribution) can be performed in the following ways (Table 1): • Fixed predictor - the characteristics of the signal is assumed a priori, and it is not evaluated during encoding, but rather hard-coded into the structure of the codec (a popular example is the delta encoding [12]). Due to the lack of flexibility, the performance with nonstationary signals is suboptimal. • Adaptive prediction - the characteristics of the signal is estimated for every sample from the properties of the preceding samples. This approach suits particularly well to the nonstationary signals. Additional advantage is that the coefficients of the predictor do not have to be sent along the compressed samples, since they can be calculated from the recovered signal in the decoder [4,14,17]. • Training sequence prediction - the signal is divided into blocks and the characteristics of the signal is analyzed using training sequence preceding every block. This method is applicable to the signals whose characteristics does not change substantially during the length of the block. The presence of the training sequence introduces a necessary processing delay between encoding and decoding of the samples. The prediction coefficients must be added to the encoded signal only for the first block, since for the others they can be calculated in the decoder from the recovered signal of the former block (see Section 6 for details). • Block prediction - the signal is divided into blocks and the prediction is performed on each block using coefficients obtained from prior block-wise statistical analysis. As a result the prediction coefficients are optimal on average, but may be suboptimal locally. Coefficients must be sent along each block of the encoded data to allow decoding.

2. Preliminaries 2.2. Signal characteristics 2.1. Lossless compression The main objective of lossless compression is to represent a data using smaller than original number of bits by avoiding a statistical redundancy, while keeping the ability to recover the input data exactly. For a portable device the low computational cost and implementation simplicity is more important than the optimality of the compression ratio. Furthermore, the data must be transmitted without excessive delay, since the device is used for the online monitoring of a patient’s condition. Typically, the removal of statistical redundancy can be divided into two main stages (Fig. 1). 1 Decorrelation: processing the signal in order to reduce its entropy, which is possible if the signal samples are redundant. In such a case the value of a sample can be approximately predicted using values of the other available signal samples, to the degree to which the samples are statistically dependent. The parameters of the predictor reflect samples’ dependence and should be

In order to choose efficient methods of redundancy reduction, the characteristics of the signals should be taken under consideration. The ECG signal can be modeled as a sum of four components, xecg (k) = xbw (k) + xbg (k) + xqrs (k) + xemg (k).

(1)

• The baseline wander xbw (k) is a slowly varying component (usually below 0.5 Hz) corresponding primarily to the respiration and electrode impedance changes due to perspiration [18]. Because of its distinctive frequency characteristics it can be easily and accurately predicted. • The background signal xbg (k) is an electrical noise, and in some cases also special type of heart activity, such as F-wave in atrial flutter [19]. Signal xbg (k) is assumed to have much smaller amplitude than xqrs (k). The frequency characteristics depends on the specific patient and recording device characteristics, and the signal can be decorrelated using linear prediction after proper adaptation of its coefficients.

D. Rzepka / Biomedical Signal Processing and Control 57 (2020) 101705

3

Table 1 Different modes of prediction. Prediction mode

Statistical dependency analysis

Adequate signals

Processing delay

Fixed Adaptive Training sequence Block

At the design stage At every sample Training sequence before encoded block Entire encoded block

Stationary Strongly nonstationary Weakly nonstationary Moderately nonstationary

No No Length of training sequence Length of encoded block



• The heartbeat component xqrs (k) = ϕ (t − ti ) consists of the i i PQRST complexes ϕi (t), separated by the intervals Ti = ti+1 − ti , Ti ∈ 0.6 ÷ 1 s (in healthy patient’s ECG). The beats have usually similar shape ϕi (t) ≈ ϕi−1 (t) and appear in similar distances Ti ≈ Ti−1 . However, despite the similarity of consecutive intervals Ti , the signal can be considered only quasi-periodic, since the rhythm tends to exhibit both systematic variations (accelerations and decelerations) and anomalies (premature beats, pauses, chaotic rhythm during atrial fibrillation). This makes harder to apply beat cancellation by subtraction of learned beat template [13], since their position should be determined with high precision. But if the xqrs (k) is separated from the other components, then the prediction based on its spectrum can also be applied. • The electromyographic noise xemg (k) comes from the muscle activity and usually appears in the form of bursts. Its spectrum depends on the electrode arrangement, but most of the power is concentrated in the range 0 ÷ 500 Hz [20]. Unpredictability of the burst position and its non-stationary characteristics makes it the hardest component to compress.

The baseline wander xbw (k) and the background noise xbg (k) can be considered as weakly nonstationary, while the heartbeat component xqrs (k) is quasi-periodic and electromyographic noise xemg (k) is strongly nonstationary, but appears sporadically. This distinguishes ECG signal from audio signals, which exhibit clearly nonstationary characteristics. As a result, the most appealing approaches are the training sequence prediction and the block prediction. The processing delay of the block prediction approach makes it less attractive in the remote health monitoring, which leaves training sequence prediction as the preferred method.

Coefficients overhead No No Only1st block Every block

2.4. Overview of the proposed method The proposed method of compression is based on processing chain shown in Fig. 2. It consists of three reversible prediction stages, followed by entropy coding: 1 In-channel predictive coding: (a) First order prediction (delta encoding) aimed at baseline wander signal xbw (k) (Section 3.1) (b) Selective linear predictive coding (SLP, Section 3.2), performing prediction with different characteristics of approximately separated background noise and heartbeat component. Method of rough separation is based on simple, approximate detector of QRS complexes (Section 3.3). 2 Cross-channel linear predictive coding (XC-LP), exploiting similarities between channels (Section 4.1). All but one channel have their reference channel, chosen based on the channel crosscorrelation (Section 4.2). 3 Entropy coding using asymmetric numeral systems method, with assumption that vm (k) values follow t-Student distribution (Section 5). 3. In-channel decorrelation 3.1. Preventing overflow using fixed prediction with delta encoding The presence of the baseline wander xbw (k), which is independent on the other components and has fixed low-pass characteristics allows to use the most straightforward predictor of the next sample, that is xˆ m (k) = xm (k − 1).

(2)

The prediction error in this case is obtained by differentiation, 2.3. Data characteristics The ECG signal is usually recorded in M = 1 ÷ 12 parallel channels x(k) = [x1 (k), ..., xM (k)] using fixed point representation with Nb = 8 ÷ 16 bits and the sampling rate in the range fs = 125 ÷ 1000 Hz. Inspection of the popular ECG databases indicates that the most common setup is 12-bit precision and sampling rate fs ≈ 250 Hz [21]. Two databases (summarized in Tab. 2 ) were used in experiments and for the performance evaluation. The choice of INCART database (St.-Petersburg Institute of Cardiological Technics 12-lead arrhythmia database [21]) was motivated by the large number of channels, allowing for the channel interdependency exploitation. MIT-BIH arrhythmia database [22] was used for a comparison with the state-of-the-art compression methods in the Section 7. INCART and MIT-BIH ADB records stored in PhysioNet database are by default normalized by dividing the integer values from ADC by the gain value, so in order to operate on the integers again, the signal was denormalized.

∗ ym (k) = xm (k) − xˆ m (k),

(3)

and such prediction scheme is known as delta encoding [23]. For most signals, the output of a delta encoder is on average smaller than its input, but in the event of abrupt change of baseline in the ECG, (3) produces a high-amplitude spike. Since the computations are performed using fixed-point variables, such worst-case amplitude increase must be handled to avoid an overflow and data loss. Fortunately, such situations in ECG signals are rare and can be considered as outliers, removed from the standard dataflow and treated as an additional encoder output. The high amplitude ∗ (k) are therefore detected and replaced by zeroes, samples in ym



ym (k) =

∗ (k) ym

∗ (k)| < Y |ym max ,

0

∗ (k)| ≥ Y |ym max .

(4)

∗ (k) and the occurrence instants k are gathered The outlier values ym in the set y∗m and km ∗ ∗ [y∗m , km ] = {ym (k), k : |ym (k)| ≥ Ymax }.

(5)

4

D. Rzepka / Biomedical Signal Processing and Control 57 (2020) 101705

Fig. 2. Diagram of processing chain for single channel. The values denoted in blue are estimated during training stage and provided as parameters.

Table 2 Parameters of test databases. Database

INCART

Channels Sampling rate Precision Record length Records

12 257 Hz 16 bits 30 minutes 75

MIT-BIH ADB (mitdb) 2 360 Hz 11 bits 30 minutes 48

The value of Ymax should be set according to the dynamic range of number representation used for signal ym (k). For typical 12-bit representation of xm (k) it may be reasonable to use 8-bits for ym (k), for which Ymax = 127. The effectiveness of a prediction can be expressed using the average ratio of standard deviation of input and output signals, which for brevity we denote as the power ratio (PR)

PR

y x



1   (ym (k)) = M  (xm (k)) M

x

= 16% ± 0.09%

(6)

(7)

which indicates that the average power of the output of the delta encoder is 16% of the input power.

3.2. Linear prediction The characteristics of other components of ECG signal (xqrs (k), xemg (k), xbg (k)) cannot be accurately predicted a priori and the statistical analysis must be performed on the given signal. Assuming the model of a wide-sense stationary stochastic process, the dependence between samples exhibits in the non-flat characteristics of the signal spectrum. Therefore the decorrelation can be obtained by determining the signal spectrum and applying the filter Am (z) = 1 + am (1)z −1 + . . . + am (Llp )z −Llp (where Llp is the order of the filter) with the reciprocal frequency characteristics,

zm (k) =

Llp  =0

am ()ym (k − )

zm (k) = ym (k) −

Llp 

− am ()ym (k − ) =

(9)

=1

where  (xm (k)) is population standard deviation of signal xm (k). For INCART database the power ratio assumes the value

y

Alternatively decorrelation can be viewed as subtraction of the linear prediction



m=1

PR

Fig. 3. a) Linear prediction encoder, b) linear prediction decoder.

(8)

= ym (k) − yˆ m (k)

(10)

Llp

obtained using FIR filter Am (z) = − am ()z − (Fig. 3a). The =1 decorrelation process can be reversed using the IIR filter structure (Fig. 3b). The encoding-decoding process is lossless, which can be observed by analyzing the initial signal samples processing. For k = 1 we have zm (1) = ym (1), since the output yˆ m (k) of the filter Am is 0, as we assume ym (k) = 0 for k < 1. Therefore for k = 1 input of the filter Am in the decoder is the same as in the encoder, so their output yˆ m (k) for k = 2 is also the same. Consequently, decoder is able to perform ym (k) = zm (k) + yˆ m (k) also for all k > 1, as long as all values of ym (k), yˆ m (k) and zm (k) are represented exactly (the same refers to delta encoder presented in previous section). This is possible for integer values of ym (k) and integer coefficients am = [am (1), am (2), . . ., am (Llp )]. Using integers for am , however, severely reduces the precision of modeling the reciprocal frequency characteristics. To keep an integer yˆ m (k) with real-valued am the output of filter Am is rounded to the nearest integer both in the encoder and decoder (which is denoted as  · ), resulting in integer zm (k) and the desired lossless processing. Because of the rounding operation, such a scheme is in fact no longer strictly linear, which shows that the predictor can be in principle arbitrary causal nonlinear transformation [24]. This observation will be used for the further improvements in Section 3.3. The calculation of coefficients of the filter Am (z) based on the spectrum of the signal ym (k) can be accomplished efficiently by solving the filtering equation Ym am = −ym ,

(11)

D. Rzepka / Biomedical Signal Processing and Control 57 (2020) 101705

where



ym (1)

0

⎢ ⎢ ⎢ ym (2) ym (1) ⎢ ⎢ ⎢ .. ⎢ . ym (2) ⎢ ⎢ .. Ym = ⎢ ⎢ ym (Klp ) . ⎢ ⎢ ⎢ ⎢ 0 ym (Klp ) ⎢ ⎢ ⎢ ⎣ 0 0 0

0

··· ..

.

..

.

..

.

..

.

..



0

⎥ ⎥ ⎥ ⎥ ⎥ .. ⎥ ⎥ . ⎥ ⎥ ⎥ ym (1) ⎥ ⎥ ⎥ ⎥ ym (2) ⎥ ⎥ ⎥ ⎥ .. ⎦ . 0

.

···

5

(12)

Fig. 4. Average power ratio PR

z y

for different lengths of a) predictor Llp (with

Klp = 4), b) training sequence Klp (with Llp = 4).

ym (Klp )

is a (Klp + Llp ) × Llp dimensional convolution matrix and the righthand side (Klp + Llp ) × 1 dimensional vector is given by



T

ym = ym (2), ym (3), . . ., ym (Klp ), 0, . . ., 0

.

The constants Klp and Llp are respectively the length of linear predictor training sequence and the length of the predictor am . Solving (11) using the least-squares approach YTm Ym am = −YTm ym

(13)

Fig. 5. Detection of beats and selective linear prediction.

leads to the Yule-Walker equations 3.3. Selective linear prediction

Rm,m am = −rm,m ,

(14)

with the autocorrelation matrix



⎢ ⎢ ⎢ Rm,m = ⎢ ⎢ ⎢ ⎣

rm,m (0)

rm,m (1)

rm,m (1)

rm,m (0)

.. .

..

rm,m (Llp − 1)

···

.

and the autocorrelation vector



···

rm,m (Llp − 1)

..

.

.. .

..

.

rm,m (1)

rm,m (1)

T

rm,m = rm,m (1), rm,m (2), . . ., rm,m (Llp )

.

⎤ ⎥ ⎥ ⎥ ⎥, ⎥ ⎥ ⎦

(15)

rm,m (0)

(16)

The autocorrelation function of a signal ym (k) is obtained by setting n = m in the general cross-correlation formula



K−−1

rm,n () =

ym (k)yn (k + )

(17)

k=0

and it is calculated on the signal chunk of length K = Klp . System 2 ) using the (14) can be solved efficiently with complexity O(Llp Levinson-Durbin method, owing to its Toeplitz structure [25]. The important factor determining predictor performance is the length of the predictor Llp . Since the xqrs (k) component of an ECG signal is not strictly periodic, then using a linear predictor of length corresponding to the approximate R-R interval is not going to be effective. A reasonable option is a short predictor, which is confirmed by the results shown in Fig. 4a (INCART database records) indicating that increasing the length above Llp = 4 yields only small improvements in the performance. Second parameter of the processing is the length of the training sequence Klp . The results (Fig. 4b) for values Klp ∈ 2 ÷ 10s show slow decrease of the power ratio corresponding to an increase in training sequence length. The value Klp = 4 results with the most substantial drop of the power ratio and it was used in the further experiments.

Assuming that the baseline wander signal xbw (k) is successfully decorrelated into white noise by delta encoder, the linear prediction must handle the decorrelation of the signals mixture xqrs (k) + xemg (k) + xbg (k). The characteristics of heartbeat component clearly differs from the characteristics of the other components (see [6] for frequency analysis) and it would be profitable to extract xqrs (k) to perform its prediction separately. The precise extraction of beats requires detection of their positions and the complex processing [19], resulting with relatively high computational requirements, unfeasible for a portable device. Instead, we propose a simple method for the approximate separation of beats from the background signal, giving reasonable increase of the performance without an excessive computational cost. The presented approach to beat detection is causal, that is based on the processing of the previous samples to decide whether the currently processed samples contains a beat or not. Such design was chosen to avoid including locations of beats in the compressed data as an additional overhead, because causal detector is able to operate not only in the encoder but also in the decoder, on the already decoded samples of ym (k). Therefore any decision made by detector in the encoder occurs also in the decoder (similarly as the process of filtering the signal using filter Am , described in Section 3.2), and there is no risk of introducing loss in the decoding process. The block diagram of this processing method is shown in Fig. 5. The criterion of distinguishing the beat signal from the background relies on the comparison of the local signal’s magnitude to the magnitude of the most recent extremum in last Knorm samples, preceding the currently processed signal sample. In the absence of strong distortions and noise, such an extremum usually corresponds to R-peak of a heartbeat. If the ratio of the magnitudes exceeds the predefined threshold, the signal is assumed to be the part containing the heartbeat. The magnitude of the recent R-peak is calculated by averaging the rectified input signal, avg

ym (k) =

Lavg  =1

havg ()|ym (k − )|

(18)

6

D. Rzepka / Biomedical Signal Processing and Control 57 (2020) 101705

Fig. 7. Average power ratio PR

z v

for different values of threshold  and switch-off

delay Ld (Lqrs = 4,Lbg = 4, Lavg = 3).

Fig. 6. a) Rectification and averaging signal from delta encoder; b) normalization, comparison with the threshold  and resulting switching signal s(k); c) estimates of background and heartbeat components.

where havg () =

1 Lavg

for  ∈ 1, . . ., Lavg . The indication of the beat

presence sm (k) is calculated by comparing the threshold  to the avg normalized signal ym (k). Normalization is performed using the value of local maximum from the last Knorm samples,



avg

ym (k)

sm (k) = 1Ld





avg

 − .

max ∈ (k−Knorm ,k) ym ()

(19)

Fig. 8. Cross-channel prediction using filter Bm (z); a) encoding (decorrelation), b) decoding.

(20)

different enough from the other part, and it is profitable to predict it separately from the background signal and the P,Q,S,T waves (Fig. 7).

The signals bg yˆ m (k)

= ym (k) (1 − s (k))

qrs

yˆ m (k) = ym (k)s(k)

(21)

can be considered as an approximation of separated background noise and heartbeat components in signal ym (k). To allow fast detection of the beat the length Lavg of the moving averager should be kept low. The experiments show that the value Lavg = 3 gives reasonable filtering effect and the responds sufficiently fast for the beginning of the beat. The length of the normalization window Knorm should be set to include at least one beat preceding current sample. Since the heart rate may drop to about 40 bpm, then the recommended value should be Knorm  1.5 s. The comparison function 1Ld ( · ) is based on Heaviside step function, but also includes a delayed switch-off after Ld samples to cover the final part of the beat, whose amplitude is usually smaller than the amplitude of an R-peak: 1Ld (u(k)) = {

1 if any 0

u() > 0,  ∈ (k − Ld , k),

(22)

otherwise.

bg

am (k) ∗ ym (k)

if sm (k) = 1,

qrs am (k) ∗ ym (k)

if sm (k) = 0.

Each of the ECG channels is a representation of the same electrical processes occurring in respective parts of a heart, but recorded from the different location and corrupted with a different noise. Since the useful signal usually dominates over the noise, then a substantial correlation between channels can be expected. This allows for an approximate linear prediction of the signal zm (k), using the signal zn (k) from the other channel and the prediction coefficients bm = [bm (0), bm (1), . . ., bm (Lch − 1)],



Lch −1

bm ()zn (k − ).

(24)

=0

Then the prediction error is given by

avg

zm (k) =

4.1. Cross-channel linear prediction

zˆm (k) =

The illustrative example of signals ym (k), ym (k) and sm (k) is presented in Fig. 6. Finally, depending on the value of sm (k), the decorrelation is bg qrs performed using one of the predictors, Am (z) or Am (z):



4. Cross-channel decorrelation

(23)

The key parameters of the selective linear prediction are threshold  and length of the switch-off delay Ld . The results obtained for different settings (Fig. 5) show that the lowest PR is obtained for the low threshold  value and almost zero delay Ld . This indicate that characteristics of the strongest peak in the beat (R-wave) is





vm (k) = zm (k) − zˆm (k) ,

(25)

and the prediction setup is presented in Fig. 8a, along with the decoder in Fig. 8b. Similarly as in the case of linear prediction filters, the coefficients of Bm (z) can be designed using filtering equation Zn bm = zm ,

(26)

where Zn is (Kch + Lch ) × Lch dimensional convolution matrix for the signal zn (k) (analogous to (12)), and the right-hand side (Kch + Lch ) × 1 dimensional vector is given by zm = [zm (1), zm (2), . . ., zm (Kch ), 0, . . ., 0]T .

(27)

D. Rzepka / Biomedical Signal Processing and Control 57 (2020) 101705

7

The least-squares approach ZTn Zn bm = ZTn zm

(28)

gives Rn,n bm = rn,m

(29)

with zn (k) autocorrelation matrix Rn,n . In contrast to Yule-Walker equations (14) the vector on the right-hand side of (29) is the crosscorrelation (17) between signals zm (k) and zn (k) from channels n and m, rn,m = [rn,m (0), rn,m (1), . . ., rn,m (Lch − 1)]T ,

(30)

calculated on the signal chunk of length K = Kch . Fig. 9. Average power ratio PR

4.2. Cross-channel dependency

(31)

The cross-channel prediction starts from the channels m : N(m) = n0 , then continues to the channels m : N (N (m)) = n0 and so on. The optimal assignment n = N opt (m) should minimize the prediction energy N opt (m) = argmin

M 

ˆ N(m):M →M ∅ m=1

vm|N(m) (k) 2 , ˆ

(32)

where vm|n (k) is the prediction error of the channel m decorrelated using the channel n. Unfortunately, determining N opt (m) by calculation of vm|n (k) for all combinations m ∈ M, n ∈ M\ {m} would require obtaining of M(M − 1) sets of coefficients bm . This can be computationally too demanding to perform on a portable device, and a simplified method is needed. We propose determining suboptimal assignment N sub (m) based on verifying the dependence between channels using the crosschannel correlation and voting procedure for determining the base channel n0 . The channel correlation matrix Rxc with elements xc Rn,m =



rn,m (0) rn,n (0)



,

(33)

rm,m (0)

is calculated using (17) on the chunk of signals z1 (k), ..., zM (k) of length K = Kxc .Then for each channel m we find his vote - the chan/ m with the highest correlation, nel N max (m) = N max (m) :=

argmax nˆ ∈ (1,...,M)\{m}

xc Rn,m . ˆ

(34)

The channel with the highest number of votes is chosen as a base reference n0 := argmax

nˆ ∈ (1,...,M)

M  m=1

[nˆ = N max (m)].

z

; a) impact of the predictor Bm (z) length and the

choice of assignment function; b) impact of the threshold  and switch-off delay Ld in the selective in-channel prediction on the cross-channel prediction.

From (24) it is clear that decoding zm (k) from vm (k) requires the knowledge of signal zn (k), coefficients bm and the assignment of the source channel for the prediction, denoted as n = N(m). The assignment function can be therefore defined as N(m) : M → M ∅ where M = (1, . . ., M) is a set of channel indices and M ∅ = {M, ∅}. In such case at least one channel (indexed with n0 ) must be sent without such cross-channel decorrelation, to make possible decoding of the other channels. For this channel we have N(n0 ) = ∅ and bn0 = [0, 0, ..., 0]. Next constraint is the absence of the reference loops - if the channel n1 is used for prediction of the channel n2 (N(n1 ) = n2 ), then n2 cannot be used for prediction of n1 (N(n2 ) = / n1 ). This also refers to non-direct prediction between channels n1 and n2 , for example N(N(n1 )) = n2 . In general assignment function N(m) must follow, N ◦ N ◦ · · · ◦ N(m) = / m.

v

(35)

(where [ω] = 1 if ω is true and [ω] = 0 otherwise). Next, the assignment function N sub (m) is found using the following procedure: 1 Two set of channels, connected C and unconnected U are defined, and initialized as follows: C := {n0 }, U := (1, ..., M)\n0 . 2 All elements m ∈ U for which N max (m) ∈ C are moved from U to C. For such elements N sub (m) := N max (m). If U = ∅, the procedure is finished (all elements are connected). If U = / ∅ and no elements were transferred from U to C, then step 3) is performed, otherwise step 2) is repeated. 3 The remaining elements m ∈ U for which N max (m) ∈ / C refer to each other, that is their most correlated channels do not belong to C. Among such channels, the one with the highest correlation to a channel from U is found,





xc N conn , mconn := argmax Rn, . ˆ m ˆ

(36)

ˆ ∈U nˆ ∈ C,m

The channel mconn is moved from U to C, and N sub (mconn ) := N conn . Then the step 2) is performed. 4.3. Effectiveness The effectiveness of the cross-channel prediction, depends on a couple of factors. The capabilities of predictors Bm (z) are constrained by their length Lch , while their adequateness increases with the training time Kch . But the results presented in Fig. 9 a indicate that the choice of predictor assignment function N(m) is even more important. The assignment N sub (m) was compared with the fixed assignment function



N fix (m) =

1,

for m = 1

m − 1,

for m > 1

,

(37)

which does not require performing the procedure described in Section 4.2. The assignment N sub (m) visibly outperforms N fix (m), which justifies the computational efforts required for its calculation. What is more, it is sufficient to use short training sequence Kxc = 1 to obtain satisfactory result. The results in Fig. 9a were obtained for non-selective prediction (Section 3.2), that is for entirely linear processing preceding cross-channel prediction stage. Selective prediction introduces bg nonlinearity, since one part of the signal is predicted using Am (z) qrs and the other part - using Am (z). Meanwhile, the predictor Bm (z) is trained using entire training signal, which is suboptimal if it was transformed nonlinearly beforehand. This could be solved by using common detection of beats for all channels and separate bg qrs cross-channel predictors Bm (z) and Bm (z), but this would require additional processing and result in an increase of computational

8

D. Rzepka / Biomedical Signal Processing and Control 57 (2020) 101705

Fig. 10. Processing order with two modes of initialization: a) asymmetric, b) symmetric.

Fig. 11. Signal flow (estimation-only blocks in gray).

requirements. With limited resources this suboptimality can be accepted, but the threshold  (which impacts the level of nonlinearity in selective linear prediction) should be set to  thevalue which   minimizes the product of average power ratio PR yz and PR zv ,

 

or the final compression ratio. The impact of  and Ld on the PR zv is presented in Fig. 9b. As was expected, the best PR

z

qrs

y

is obtained

for high value of , which results in using Am (z) only for short periods when signal exceeds . 5. Entropy coding The entropy coding is based on the principle of assigning the short codes to the more probable symbols and the longer codes to the less probable ones. The measure of lossless compression quality is the compression ratio, CR =

[inputbitstreamlength] = [outputbitstreamlength]

MK tot Nb

M Ktot m=1

k=1

,

Hvm (k)

where Ktot is the number of samples in the record, Nb is the precision of number representation (bits) and Hvm (k) is the size of the code for vm (k) (bits). For the probability mass function (PMF) pV (v) of the opt symbol v the optimal size Hv of the corresponding code is opt

Hv

= log2

1 bits. pV (v)

absence in the latter part of the packet. The alternative is to employ density estimation methods to find approximate probability mass function, but such an option is computationally demanding. The more practical, often used approach is to assume a fixed probability distribution, and to estimate its variance  2 from the data (since the mean  is zero due to use of delta encoding). The important factor is a choice of the distribution matching the statistics of the data. The distribution is sometimes determined also by the choice of the encoding method, since some of them are suited well only to specific type of the distribution (such as Golomb and Rice coding schemes assuming the geometric distribution [26]). In the present work, however, we decided to use a recently introduced fast encoding method of Asymmetric Numeral Systems [16], which does not constrain the choice of a distribution. Several distributions were tested and the best performance was achieved by discretizing the t-Student distribution with the ˛ = 3 degrees of freedom, The probability density function (PDF) of this distribution is given by

(38)

The probability of symbols occurrence pV (v) is therefore another quantity which needs to be either assumed, estimated, or determined exactly for the data block, prior to an entropy coding. Due to the reasons presented in the Section 2.1 the method of estimation using training sequences was chosen. In such a case simple histogram estimation cannot be used, since the absence of some values in the training sequence does not guarantee their

fV (v) =



2

2  1 + v 2

,

(39)



and the cumulative density function is FV (v) =

1 2



+

v2

2v + 2 arctan + 2

 v  

.

(40)

The results obtained for ideal code sizes (38) are presented in Tab. 3 along with the outcome of the ANS compression. The ANS compression method introduces very small overhead, which insignificantly reduces the compression ratio Tab. 3 . On the other hand ANS outperforms other accurate entropy coding schemes, such as arithmetic coding or range coding, as it requires

D. Rzepka / Biomedical Signal Processing and Control 57 (2020) 101705

9

Table 3 Compression ratio for different probability distributions, calculated theoretically and obtained with ANS entropy coder. Probability distribution

Compression ratio (CR)

Geometric (ideal) Laplace (ideal) t-Student ˛ = 4 (ideal) t-Student ˛ = 3 (ideal) t-Student ˛ = 3 (ANS)

3.22 3.32 3.41 3.46 3.43

Fig. 13. Compression ratio vs. packet length [s].

Fig. 12. Training order.

only one multiplication per encoded symbol. In the range variant of ANS (rANS) encoding relies on the iterations rk = ck−1 mod pZ V (vk ), ck =

(41)

B · (vk − rk ) + FVZ (vk ) + rk . pZ v V ( k)

(42)

where c0 = 1, starting from k = 1 (for clarity we use here nota(v ) and tion vk = vm (k), omitting channel index m). Functions pZ V k FVZ (vk ) are integer-valued counterparts of PMF and CDF, obtained by scaling and usually rounding:









Z pZ V (vk ) & = B · pV (vk ) , FV (vk ) & = B · FV (vk )

(43)

FVZ (vk ) = B · FV (vk ) .

(44)

Decoding is performed starting from k = Kpacket , with descending k, rk = ck modB

(45)

vk := argmin FVZ (v + 1) > rk

(46)

v∈Z

ck−1 =

v · v − rk ) pZ V ( k) ( k − FVZ (vk ) + rk . B

(47)

v can be expressed in terms of powers of 2, If the function pZ V ( k) then the multiplications can be replaced with binary shifts. The signal vk = v(k) from the given packet is encoded into the number C = cKpacket . Since ck+1 ≥ ck , then to avoid overflow of the variable ck , the encoded representation can be stored in multiple integer variables using renormalization procedure [16]. 6. Signal flow Two main operations performed by the codec is signal estimation (predictor training) and compression/decompression, where the estimated parameters are used for prediction. The estimation consists of four stages (Fig. 12): 1 Linear predictive coder training: estimation of Rm,m (m ∈ M) and qrs qrs calculation of predictors coefficients Aqrs = [a1 , . . ., aM ] and bg

bg

Abg = [a1 , . . ., aM ]. 2 Cross-channel dependency: estimation of cross-channel correlation Rxc and dependency tree N sub (m). 3 Cross-channel predictors training: estimation of Rn,n and rn,m (where m ∈ M and n = N sub (m)) and calculation of predictors coefficients B = [b1 , . . ., bM ]. 2 ]. 4 Estimation of prediction error variance  2 = [12 , . . ., M

Fig. 14. Average theoretical compression ratio for different values of the threshold  and the switch-off delay Ld (Lqrs = 4, Lbg = 4, Lavg = 3); SLP (left), SLP+XC-LP (right). Note that the color scales for the images are different.

The estimation is performed periodically to avoid mismatch between the characteristics of the signal during the training sequence and the latter, encoded part. As a result the signal is divided into packets of length Kpacket , which should be adjusted to the variability of the signal. The impact of Kpacket on the compression ratio is presented in Fig. 13. Interestingly, the prediction parameters has to be sent only along with the first packet, since the parameters of the latter packets can be estimated on the decoder’s side using already decoded signal (asymmetric initialization, Fig. 10a). In such case, the initial part of the first packet is used as the training sequence. Alternatively, the initial part of the signal can be sent without encoding to the decoder’s side and used on both sides as the training sequence (Fig. 10b). The asymmetric method, however, seems to be more efficient, since the data size of the estimates Aqrs , Abg , B, N(m),  2 for a reasonable values of parameters Llp , Lch and M will be considerably smaller than the size of the training sequence Ktrain = Klp + Kxc + Kch + Kvar . The blocks used in the signal flow in the training phase are shown in Fig. 11 in gray. After training stage the parameters Aqrs , Abg , B, N(m) and  2 and are used by the compression blocks (in white). The result of a packet compression is a datastream c = [C1 , . . ., CM ] and the set of outliers Y∗ , K = [x∗1 , k1 , . . ., x∗M , kM ].The decoder inverses the operation using [c, Y∗ , K] as an input, and the prediction parameters Aqrs , Abg , B, N(m),  2 . 7. Performance evaluation The choice of predictors’ parameters influences not only the average input/output power ratio, but also the distribution of the predictors’ output values. As a result the set of parameters obtained to maximize total power attenuation (presented in Section 3 and Section 4) can be used as a starting point for optimizing compression ratio, as the minimization of variance of the predictors output is not equivalent to minimization of its entropy. The results of final optimization in terms of parameters of selective prediction is presented in Fig. 14 and the final parameters are listed in the Tab. 4 . The compression ratio of a certain record depends substantially on the dependence between channel, which in turn is influenced

10

D. Rzepka / Biomedical Signal Processing and Control 57 (2020) 101705 Table 6 Average compression ratios for different predictor setups and signal parameters.

Table 4 Compressor parameters.

Prediction

Parameters In-channel prediction Cross-channel prediction Entropy coding Packetization

Lqrs = 4, Lbg = 4, Lavg = 3, Ld = 0,  = 0.1, Knorm = 2s, Klp = 4s Lch = 6, Kxc = 1s, Kch = 4s Kvar = 2s, PDF: t-Student ˛ = 3 (39) Kpacket = 120s

Fig. 15. a) Illustrative example of compression ratio for the packets in record INCARTDB/I59 (the reference channel n0 marked with the black circles); b) signal from the channel 1.

by the noise level. This can be observed in Fig. 15 depicting changes of CR factor for each packet. The compression ratio is high for the clean part of the signal, but the performance drops when the noise appears. The strong influence of cross-channel prediction is supported by the small drop of CR factor for the main reference channel n0 (marked with circles) which is processed using only selective linear prediction. In the case of noisy signals, the crosschannel prediction may be a source of adverse effect and worsen the compression ratio, instead of improving it. The averaged results of compression are presented in Tab. 6. To investigate the impact of each individual technique included in the described algorithm on the final result, the tests of compression were performed with linear prediction (LP) or selective linear prediction (SLP), as well as with/without cross-channel linear prediction (XC-LP). The results strongly depend on the characteristics of the signals in database. The higher CR in INCART database for pure in-channel prediction LP/SLP without XC-LP stems from the higher (and apparently excessive) number of bits than in MIT-BIH arrhythmia database (16 bits compared to 11 bits). The positive impact of cross-channel prediction is evident in INCART database because of the high number of channels (12 compared to 2), which facilitates finding highly similar channels. In the case of MIT-BIH arrhythmia database there are only two channels available and (what is worse from the perspective of signal compression) the position of the elec-

Compression ratio

In-channel

Cross-channel XC-LP XC-LP

LP SLP LP SLP

MIT-BIH ADB, 11 bits, M = 2 2.75 2.90 2.80 2.92

INCART, 16 bits, M = 12 2.92 3.00 3.33 3.43

trodes in this database are usually chosen in a way that allows to see in one channel the beats which are poorly visible in the other [27]. Such arrangement makes the shape of beats in channels radically different, which substantially reduces the effectiveness of linear cross-channel prediction. To relate the performance of the presented method to the known approaches, the CR factors obtained by our method are compared with several other algorithms in Tab. 5 . MIT-BIH Arrhythmia database is the set of signals used most often as a benchmark of compression ratio, but unfortunately such comparison may not be fully reliable. The reason for this is the fact, that records in this database are sampled using 11-bit precision, but saved in 12-bit format. This introduces ambiguity to the value of uncompressed file size, and as a result also to CR factor, which is sometimes defined as ratio of the uncompressed and compressed file sizes. Assuming 12-bits gives higher CR but it does not correspond to the actual precision and does not reflect the quality of the compression method. Unfortunately, this distinction is sometimes ommited in the literature. Also, as mentioned before, MIT-BIH Arrhythmia database is not representative for benchmarking methods taking advantage of multiple channels, so it would be more fitting to consider the presented result as a measure of performance of the selective linear prediction method and the estimation based on training sequences rather than the effectiveness of XC-LP. 8. Conclusions In this paper we introduced a new compression scheme based on selective linear prediction, cross-channel prediction and Asymmetric Numeral Systems entropy coding, suited for low-power processing of the multichannel ECG signals. Simple method for determining cross-channel dependency and short linear predictor achieves the performance approaching much more computationally demanding, nonlinear methods, if the multiple channel ECG is available. In the case of single channel, or weakly dependent multiple channels the compression ratio still outperforms linear prediction schemes based on adaptive linear prediction. The reason is the use of two short separate predictors for beats and background signal and also calculation of predictors’ coefficients using part of the signal as a training sequence, instead of their continuous adaptation. Similarly to adaptive prediction, the method of selective prediction introduced in present study uses different prediction characteristics for different signal parts. However in the adaptive system the adaptation speed must be fitted to the speed of the variation in the local signal characteristics, which makes it suscep-

Table 5 Comparison of performance with state-of-the-art compression methods on MIT-BIH Arrhythmia Database. Reference Chen et al. [23] Deepu&Lian [4] Chen&Wang [14] Duda et al. [7] Kannan&Eswaran [9] Arnavut [10] This work

Prediction

Entropy coding

CR

Delta encoding Adaptive LP Adaptive LP Wavelet Transform Neural network LP Selective+multichan. LP

Huffman Coding Coding-packaging Huffman Coding MS-VLI Huffman Coding Burrows-Wheeler ANS

1.92 2.31 2.43 2.61 3.09 4.3 2.92

D. Rzepka / Biomedical Signal Processing and Control 57 (2020) 101705

tible to a local disturbances. By contrast, in the selective approach the predictor is trained on the large number of beat fragments and background signal, which makes it more robust to local disturbances due to averaging. This method seems to be better suited to the quasi-stationary nature of ECG signals. The resulting compression method maintains low computational complexity as well as high compression efficiency and can be easily implemented in a low-power ECG acquisition portable device. References [1] G.A. Roth, C. Johnson, A. Abajobir, F. Abd-Allah, S.F. Abera, G. Abyu, M. Ahmed, B. Aksut, T. Alam, K. Alam, et al., Global, regional, and national burden of cardiovascular diseases for 10 causes, 1990 to 2015, Journal of the American College of Cardiology (2017) 23715. [2] T. Jeon, B. Kim, M. Jeon, B.-G. Lee, Implementation of a portable device for real-time ECG signal analysis, Biomedical engineering online 13 (1) (2014) 160. ˜ J. Díaz, Design and implementation of a simple [3] H. Azucena, E. Ríos, R. Pena, portable biomedical electronic device to diagnose cardiac arrhythmias, Sensing and Bio-Sensing Research 4 (2015) 1–10. [4] C.J. Deepu, Y. Lian, A joint QRS detection and data compression scheme for wearable sensors, IEEE Trans. Biomed. Engineering 62 (1) (2015) 165–175. [5] M. Elgendi, A. Mohamed, R. Ward, Improving remote health monitoring: A low-complexity ECG compression approach, Diagnostics 8 (1) (2018) 10. [6] M. Elgendi, A. Mohamed, R. Ward, Efficient ECG compression and QRS detection for e-health applications, Scientific reports 7 (1) (2017) 459. [7] K. Duda, P. Turcza, T.P. Zielinski, Lossless Ltdc:ECG:rtdc compression with lifting wavelet transform, Instrumentation and Measurement Technology Conference, 2001. IMTC 2001. Proceedings of the 18th IEEE 1 (2001) 640–644. [8] S.-G. Miaou, S.-N. Chao, Wavelet-based lossy-to-lossless ECG compression in a unified vector quantization framework, IEEE Transactions on Biomedical Engineering 52 (3) (2005) 539–543. [9] R. Kannan, C. Eswaran, Lossless compression schemes for ECG signals using neural network predictors, EURASIP Journal on Applied Signal Processing 2007 (1) (2007) 102. [10] Z. Arnavut, ECG signal compression based on Burrows-Wheeler transformation and inversion ranks of linear prediction, IEEE transactions on biomedical engineering 54 (3) (2007) 410–418. [11] Q. Zhou, Study on ECG data lossless compression algorithm based on k-means cluster, in: Future Computer and Communication, 2009. FCC’09. International Conference on, IEEE, 2009, pp. 91–93.

11

[12] E. Chua, W.-C. Fang, et al., Mixed bio-signal lossless data compressor for portable brain-heart monitoring systems, IEEE Transactions on Consumer Electronics 57 (1) (2011) 267–273. [13] A. Koski, Lossless ECG encoding, Computer Methods and Programs in Biomedicine 52 (1) (1997) 23–33. [14] S.-L. Chen, J.-G. Wang, VLSI implementation of low-power cost-efficient lossless ECG encoder design for wireless healthcare monitoring application, Electronics Letters 49 (2) (2013) 91–93. [15] D.-S. Kim, J.-S. Kwon, A lossless multichannel bio-signal compression based on low-complexity joint coding scheme for portable medical devices, Sensors 14 (9) (2014) 17 516–17 529, sep. [16] J. Duda, K. Tahboub, N.J. Gadgil, E.J. Delp, The use of asymmetric numeral systems as an accurate replacement for Huffman coding, in: Picture Coding Symposium (PCS), 2015, IEEE, 2015, pp. 65–69. [17] R. Yu, C.C. Ko, Lossless compression of digital audio using cascaded RLS-Ltdc:LMS:rtdc prediction, IEEE transactions on speech and audio processing 11 (6) (2003) 532–537. [18] L. Sörnmo, Time-varying digital filtering of ECG baseline wander, Medical and Biological Engineering and Computing 31 (5) (1993) 503. [19] R. Alcaraz, J.J. Rieta, Adaptive singular value cancelation of ventricular activity in single-lead atrial fibrillation electrocardiograms, Physiological measurement 29 (12) (2008) 1351. [20] A. Van Boxtel, A. Boelhouwer, A. Bos, Optimal EMG signal bandwidth and interelectrode distance for the recording of acoustic, electrocutaneous, and photic blink reflexes, Psychophysiology 35 (6) (1998) 690–697. [21] A.L. Goldberger, L.A.N. Amaral, L. Glass, J.M. Hausdorff, P.C. Ivanov, R.G. Mark, J.E. Mietus, G.B. Moody, C.-K. Peng, H.E. Stanley, PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals, Circulation 101 (23) (2000) e215–e220 (June 13). [22] G.B. Moody, R.G. Mark, The impact of the mit-bih arrhythmia database, IEEE Engineering in Medicine and Biology Magazine 20 (3) (2001) 45–50. [23] S.-L. Chen, H.-Y. Lee, C.-A. Chen, H.-Y. Huang, C.-H. Luo, Wireless body sensor network with adaptive low-power design for biometrics and healthcare applications, IEEE Systems Journal 3 (4) (2009) 398–409. [24] M. Hans, R.W. Schafer, Lossless compression of digital audio, IEEE Signal processing magazine 18 (4) (2001) 21–32. [25] P. Vaidyanathan, The Theory of Linear Prediction, ser. Synthesis Lectures on Engineering Series, Morgan & Claypool, 2008 [Online]. Available: https://books.google.pl/books?id=ob511Mwc20EC. [26] A. Kiely, Selecting the Golomb parameter in Rice coding, IPN progress report 42 (2004) 159. [27] “Frequently asked questions about PhysioNet: What do the signal names MLII, V2,.. mean?” https://www.physionet.org/faq.shtml#signal-names.