Combination of EEG and ECG for improved automatic neonatal seizure detection

Combination of EEG and ECG for improved automatic neonatal seizure detection

Clinical Neurophysiology 118 (2007) 1348–1359 www.elsevier.com/locate/clinph Combination of EEG and ECG for improved automatic neonatal seizure detec...

2MB Sizes 0 Downloads 77 Views

Clinical Neurophysiology 118 (2007) 1348–1359 www.elsevier.com/locate/clinph

Combination of EEG and ECG for improved automatic neonatal seizure detection Barry R. Greene a,*, Geraldine B. Boylan b, Richard B. Reilly Philip de Chazal a, Sean Connolly d a

a,c

,

School of Electrical, Electronic & Mechanical Engineering, University College Dublin, Ireland b Department of Paediatrics and Child Health, University College Cork, Ireland c Cognitive Neurophysiology Laboratory, St. Vincent’s Hospital, Fairview, Dublin, Ireland d Department of Clinical Neurophysiology, St. Vincent’s University Hospital, Dublin, Ireland Accepted 7 February 2007 Available online 29 March 2007

Abstract Objective: Neonatal seizures are the most common central nervous system disorder in newborn infants. A system that could automatically detect the presence of seizures in neonates would be a significant advance facilitating timely medical intervention. Methods: A novel method is proposed for the robust detection of neonatal seizures through the combination of simultaneously-recorded electroencephalogram (EEG) and electrocardiogram (ECG). A patient-specific and a patient-independent system are considered, employing statistical classifier models. Results: Results for the signals combined are compared to results for each signal individually. For the patient-specific system, 617 of 633 (97.52%) expert-labelled seizures were correctly detected with a false detection rate of 13.18%. For the patient-independent system, 516 of 633 (81.44%) expert-labelled seizures were correctly detected with a false detection rate of 28.57%. Conclusions: A novel algorithm for neonatal seizure detection is proposed. The combination of an ECG-based classifier system with a novel multi-channel EEG-based classifier system has led to improved seizure detection performance. The algorithm was evaluated using a large data-set containing ECG and multi-channel EEG of realistic duration and quality. Significance: Analysis of simultaneously-recorded EEG and ECG represents a new approach in seizure detection research and the detection performance of the proposed system is a significant improvement on previous reported results for automated neonatal seizure detection.  2007 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved. Keywords: Neonatal seizure detection; EEG; ECG; EKG

1. Introduction Seizures in the neonate require immediate medical attention and represent a distinctive sign of central nervous system dysfunction. There is increasing evidence that neonatal seizures have an adverse effect on neurodevelopmental outcome, and predispose to cognitive, behavioural, or epileptic complications in later life (Levene, 2002). Neonatal seizures

*

Corresponding author. Tel.: +353 21 490 3793. E-mail address: [email protected] (B.R. Greene).

occur in 6% of low birth-weight infants (Volpe, 2001) and in approximately 2% of all newborns admitted to a neonatal ICU (Scher et al., 1993a). Seizures in this age-group are often subtle, difficult to diagnose and may be clinically silent, particularly after antiepileptic drug treatment, making diagnosis by clinical observation alone very unreliable (Boylan et al., 2002). Electroencephalography (EEG) is the most reliable method available to detect the majority of neonatal seizures but interpretation requires special expertise that is not readily available in most neonatal intensive care units least so on a 24-h basis. A system that could automatically detect the presence of seizures in

1388-2457/$32.00  2007 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.clinph.2007.02.015

B.R. Greene et al. / Clinical Neurophysiology 118 (2007) 1348–1359

newborn babies would be a significant advance, facilitating timely medical intervention. A number of studies have reported neonatal seizure detection methods based on the EEG (Liu et al., 1992; Gotman et al., 1997a; Celka and Colditz, 2002; Altenburg et al., 2003). Faul et al. (2005) provided a review and experimental comparison of three of the most commonly cited neonatal seizure detection algorithms. None performed sufficiently to be deemed suitable for use in the neonatal intensive care unit (ICU). Karayiannis et al. (2001) reported a video-based method for distinguishing myoclonic from focal clonic seizures and differentiating these types of seizures from normal infant behaviours. However, this approach does not provide a complete solution to the problem, as many neonatal seizures are not accompanied by this spectrum of body movements. The importance of autonomic changes may be underestimated in neonatal seizure detection research. Neonatal seizures are often associated with changes in heart and respiration rate (Greene et al., 2006b). Significant changes in heart rate may alert the clinician to the possibility of seizures and instigate further investigation with EEG. These findings led to the development of a neonatal seizure detection system based exclusively on the electrocardiogram (ECG) (Greene et al., 2006a). The aim of this study was to attempt to improve the neonatal seizure detection rate by combining simulta-

1349

neously-acquired ECG and EEG data. To the best of our knowledge this is the first method to combine the ECG with the EEG for seizure detection. 2. Data-set A data-set of 12 records from 10 term neonates containing 633 labelled seizure events, with mean seizure duration of 4.60 min, were recorded and analysed. The records had a mean duration of 12.84 h. Each record contained 7–12 channels of EEG and one channel of simultaneouslyacquired ECG. Ten records, sampled at 256 Hz, were made in the neonatal intensive care units of the Unified Maternity Hospitals in Cork, Ireland, using the Viasys NicOne video-EEG system. The remaining recording, sampled at 200 Hz, was recorded at Kings College Hospital, London, on a Telefactor Beehive video-EEG system. A total of 154.1 h of EEG and ECG were analyzed. The data-set used in this research is a resource of continuously-recorded digital video-EEG data and other physiological parameters in newborns with seizures in the first 3 days from birth. All newborns were full term (GA: 40–42 weeks) and had hypoxic ischaemic encephalopathy (HIE). All the data for each recording were included in the analysis regardless of record length or quality. Electrographic seizures were identified and annotated by an expert in neonatal EEG (GBB). Fig. 1 shows

Fig. 1. Example of a multi-channel electrographic seizure. Seizure onset and duration are marked.

1350

B.R. Greene et al. / Clinical Neurophysiology 118 (2007) 1348–1359

Table 1 Data characteristics for each record: number of seizure events, duration of recording, mean seizure duration Record

No. of seizure events

1 2 3 4 5 6 7 8 9 10 11 12

90 22 21 60 35 29 155 56 60 41 50 14 Total 633

Record duration (h)

Mean seizure duration (min)

10.01 10.42 24.53 14.25 14.40 10.01 24.04 13.17 5.20 5.69 17.33 5.05

2.77 7.33 5.41 1.56 10.02 2.15 5.28 1.99 1.05 1.16 4.88 11.64

Total 154.10

Mean 4.60

an example of an electrographic seizure from record 12, detected by both the patient-specific and patient-independent systems. Annotations give information on the time of onset and the duration of each electrographic seizure. Table 1 details the number of seizure events per record, the duration of each record and the mean seizure duration for each record. As the ECG and EEG signals were recorded simultaneously these annotations can be related directly in time to the ECG signal. The data-set contained a wide variety of seizure durations and seizure types. While the mean seizure duration across the data-set was 4.60 min, the mean seizure duration for each patient ranged from 1.05 min to 11.64 min. The data-set contained ‘Electrographic-only’ seizures as well as ‘Electroclinical’ seizures. Four records 2, 3, 10, 12 contained only ‘Electrographic-only’ seizures. Two records 9 and 11 contained only ‘Electroclinical seizures’. The remaining recordings contained both ‘Electrographic-only’ and ‘Electroclinical seizures’. Furthermore, the data-set contained focal, multi-focal and generalized seizures. 3. Method The combination of EEG and ECG for neonatal seizure detection was considered in the context of both patient-specific and patient-independent seizure detection classifiers. While the ideal scenario for this application is a patient-independent system capable of identifying all seizures from any patient with a zero false detection rate, a patient-specific system might also represent an advance in neonatal ICU monitoring. The algorithms considered in this study are epoch-based, so each seizure event was rounded to the nearest epoch length when mapping time annotations to epochs. An epoch containing P50% electrographic seizure activity was labelled as a seizure epoch.

3.1. ECG The algorithm reported in this paper utilises the same ECG features described previously, based on the R–R intervals for 60-s epochs of ECG (Greene et al., 2006a). 3.1.1. ECG pre-processing All ECG signals were filtered with a 20th order FIR band-pass filter (corner frequencies 8 and 18 Hz) to remove baseline wander, power-line noise and out of band noise. Before filtering, the mean of the ECG was removed from the signal. 3.1.2. R–R interval calculation The R–R interval is defined as the time in seconds between adjacent R-wave maximum (QRS) points. Robust detection of the QRS point is determined using a QRS detection algorithm as described by Benitez et al. (2001). The Hilbert transform of the first derivative of the signal was used to emphasize the R peaks. A moving window peak search was carried out with an adaptive threshold. As neonatal ECG often manifests elevated P-wave, a step back search was performed to isolate the P peak ensuring robust detection of the R-wave maximum. Correction for missing and extra QRS points was implemented as described by de Chazal et al. (2003). 3.1.3. ECG feature extraction The six ECG feature types considered in this study were calculated on a 60-s (15,360 samples for a record sampled at 256 Hz, 12,000 samples for 200 Hz) non-overlapping epoch basis. Features are based on the R–R intervals associated with each 60-s epoch. The features used in this study: • • • • • •

Mean R–R interval (lRR) Std. Dev. R–R intervals (rRR) Mean R–R interval spectral entropy (RR H) Mean change in the R–R interval (DRR) R–R interval coefficient of variation (dRR) R–R interval power spectral density (RR PSD)

The mean R–R interval, R–R interval standard deviation, R–R interval coefficient of variation, mean change in adjacent R–R intervals per epoch (DRR) as well as R– R interval spectral entropy each contributed one feature to the ECG feature vector. Relative features for lRR, rRR, DRR and dRR were obtained by subtracting the mean of the feature for the four preceding epochs as well as the mean for the four subsequent epochs (called lRR 0 , rRR 0 , DRR 0 and dRR 0 here) and each contributed one feature. The RR PSD features were calculated on an interval basis (Teich et al., 2000). The mean of the R–R intervals for each epoch was subtracted to yield a zero mean sequence. The sequence was then zero-padded to length 256 and the fast Fourier transform (FFT) taken. The resulting sequence was multiplied by its complex conjugate to yield a periodogram estimate of the R–R interval power

B.R. Greene et al. / Clinical Neurophysiology 118 (2007) 1348–1359

spectral density. A 64-point periodogram was obtained by averaging the values in four adjacent frequency bins. Only the first 33 of these constituted a valid PSD, with the first 32 of these points taken as a feature in its own right for each 60-s epoch. In total each epoch produced 41 features for each ECG feature vector. 3.1.4. ECG artefact detection There are a variety of artefacts that may be found in an ECG signal. In this paper we have attempted to reject two kinds of artefact from subsequent analysis, namely: ‘movement’ artefacts which are large signal spikes caused by movement of the electrodes, and ‘zero-signal’ artefact caused by the amplifier being ‘powered-off’ in the course of a recording. A new signal, Qecg, was constructed with the same sample rate as the ECG and provided a binary flag for the presence or absence of artefact for each sample of the ECG signal. To identify the artefact sections of the ECG a zero mean ECG signal was first calculated by subtracting the mean of the ECG from each sample and then processing this signal as follows: • The standard deviation of the absolute value of signal was calculated and any signal samples greater than six times the standard deviation were flagged as ‘movement’ artefact and Qecg was assigned the value 1 at these samples. • Any 10 sample epoch whose mean was 100 times smaller than the 5% trimmed mean of the signal was flagged ‘zero-signal artefact’ and each sample Qecg was assigned the value 1. • Unflagged samples of Qecg were assigned the value 0. This artefact measure was then associated with each 60-s epoch with the mean value for each epoch assigned as the Qecg value for that epoch. Fig. 2 shows the operation of the ECG artefact detector on section of the ECG recording for Record 1. ECG Artefact Detector 2 ECG Artefact Detector

Artefact 1.5

Amplitude (uV)

1

0.5

0

-0.5

-1 354

356

358

360 Time (s)

362

364

366

Fig. 2. Example of the operation of the ECG artefact detector on a ‘movement’ artefact for record 1.

1351

3.2. EEG 3.2.1. EEG pre-processing The EEG for each channel was low-pass filtered using a type II Chebyshev IIR filter with a corner frequency of 34 Hz to remove power-line noise along with out-of-band noise. 3.2.2. EEG feature extraction A set of EEG features and a novel multi-channel EEG classifier architecture (as shown in Fig. 3) were used. In order to determine the optimum method for combining information across EEG channels the authors carried out a separate study comparing the score level or ‘Late Integration’ of EEG channels (i.e. each channel processed independently and the scores combined) with feature level or ‘Early integration’ of multi-channel neonatal EEG (as discussed in this paper). We found that early integration provided greatly superior results to late integration for this application. Results from this study have recently been published (Greene et al., 2006c). The fundamental difference between our use of multiple EEG channels to other previous methods is that our method exploits the statistical inter-relationships between EEG channels. Processing EEG channels independently assumes an equal weighting for each EEG channel in the decision function and is less equipped to handle redundant features from ‘not-involved’ EEG channels. This is a weak assumption when one considers neonatal seizure EEG which is often multi-focal and migrates across EEG channels. Feature vectors containing m features from n channels were concatenated into a large feature vector which was then fed to a pattern classifier. Six features were extracted from each 2048 sample non-overlapping EEG epoch for each channel. The six features calculated per EEG epoch were: • • • • • •

Dominant spectral peak (F) Power ratio (P) Bandwidth of dominant spectral peak (BW) Nonlinear energy (N) Spectral entropy (H) Line length (L)

The features for each channel were sorted according to feature type and then each group of features sorted into numerical order. All the grouped-sorted features were then concatenated into a ‘super’ feature vector. The sorting function removes information about the spatial location of the seizure from the training set, preventing the classifier from expecting seizure activity in a particular channel. The sorting function behaves as a numerical feature selector for the patient-independent classifier, using the numerical differences between feature values of channels ‘involved’ in a seizure and channels ‘not-involved’. Features of ‘involved’ and ‘not-involved’ channels will be placed at opposing ends of the sorted,

1352

B.R. Greene et al. / Clinical Neurophysiology 118 (2007) 1348–1359

Channel 1

x11 x12 Feature Extraction

xs1 1 xs21

. . .

x1

x1m

Channel 2

x21 x22 Feature Extraction

. . .

. . .

xs1 2 xs2 2 x2

Feature Sort

x2 m

. . .

xsm 2 . . .

Decision Classifier

Ps

xs1n xs2n Channel n

xn1 xn 2 Feature Extraction

xn

. . .

. . .

xsmn

xnm Fig. 3. EEG classifier configuration.

concatenated feature vector. A classifier can then learn to distinguish the features of ‘involved’ channels from those of ‘not-involved’ channels using their rank in the sorted concatenated feature vector. The dominant spectral peak, power ratio and bandwidth features employed are those reported by Gotman et al. (1997b). The frequency spectrum was calculated for each epoch using the FFT. The dominant frequency was defined to be the frequency in the spectrum with the largest average power in its bandwidth. The bandwidth of the dominant spectral peak was defined as the width in Hertz (Hz) between the two half power points of the dominant spectral peak. The power ratio was defined as the ratio of the power in the dominant spectral peak to the power at the same frequency in the ‘background’ EEG, where the ‘background’ EEG is the average of the three epochs 60 s behind the current epoch (Gotman et al., 1997b). Recent evidence suggests that seizure activity represents a reduction in the complexity of the signal (Celka and Colditz, 2002). Spectral entropy can be interpreted as a measure of signal complexity and so represents a potential feature for seizure detection. D’Alessandro et al. (2003) employed the mean nonlinear energy of an EEG epoch in predicting epileptic seizure in adults. Esteller et al. (2001) proposed line length, an approximate measure of the fractal properties of the signal, as a potential feature for epileptic seizure onset detection. 3.2.3. EEG artefact rejection The stability of an EEG epoch has been used as an EEG signal quality measure Qeeg (Gotman et al., 1997b). The larger this value is relative to unity, the more likely

it is to contain artefact. The mean Qeeg across n EEG channels was taken as the EEG signal quality measure, Qeeg forPeach epoch, where Qeeg can be written as: n Qeeg ¼ 1n i¼1 Qeegi . 3.3. Classifier model Classification based on a linear discriminant (LD) classifier model was employed for all signal modes and configurations. Linear discriminant classifier models utilise class conditional mean vectors and a common covariance matrix. They provide optimal performance when features within a class have a normal distribution and the same variance across classes. The class conditional mean vectors and a common covariance matrix were estimated separately from the training data for the patient-specific and patient-independent classifiers. Weighting of the class conditional mean vectors and common covariance matrix by the duration of each record was implemented for the patient-independent classifier (Greene et al., 2006a). This ensures that records of differing lengths contribute equally to the training of a patient-independent classifier. 3.4. Combining the ECG and EEG information Two schemes for combining the information determined from the EEG and ECG signals were considered- the early integration (EI) scheme and the late integration (LI) scheme, and are discussed separately below. 3.4.1. Early integration of EEG and ECG features The EI configuration, hereafter referred to as the EI fusion configuration, involves concatenating the EEG

B.R. Greene et al. / Clinical Neurophysiology 118 (2007) 1348–1359

and ECG feature vectors into a single feature vector and feeding this ‘super’ feature vector to a pattern classifier. Fig. 4 gives a graphical description of the EI configuration. If the signal quality measures Qecg and Qeeg for an epoch were both over an empirically-derived threshold, that epoch is considered to contain artefact and the epoch is neglected from analysis for both signals. 3.4.2. Late integration of EEG and ECG classifications The late integration (LI) configuration, hereafter referred to as the LI fusion configuration, employs separate classifiers for each signal to determine a probability of seizure for each signal mode. These two probabilities are then combined to provide an overall probability of seizure. Fig. 5 gives a graphical description of the LI fusion configuration. The combination of the EEG and ECG signal modes at using the classifier’s confidence score, as is performed in this configuration, allows each signal to be weighted for improved classification performance. Static and dynamic weighting of the two signals was investigated. An expression for the overall probability of seizure is given in Eq. (1). The scalars as and bs are the static weights for the ECG and EEG signals, respectively. For the static weighting case, as = 1  bs. If a is varied over the range 0–1 the optimum static weights for each signal can be determined for both the patient-specific classifier and the patient-independent classifier. Fig. 6 shows the mean classification accuracy for the patient-specific classifier for the combined classifiers as the EEG static weight, as is varied from 0 to 1 in increments of 0.1. P sz ¼ as P ecg þ P eeg

ð1Þ

1353

Dynamic weighting takes account of a measure of quality in each signal (shown as ‘Q’ in Fig. 5). If an epoch is determined to contain artefact in either mode the corresponding weight for that mode is reduced appropriately causing the system to favour the decision for the other signal. The dynamic weight for each signal is calculated by subtracting the quality measure, scaled by dividing by the maximum value of that quality measure, from the static weight for each signal (It should be noted that a real-time system would require an empirically determined value in place of the maximum value of the quality measure used here, such a value should be chosen to be the largest value that may reasonably occur for this parameter and should not significantly differ from the Qmax value used here). Expressions for the dynamic weights for both signals are given in Eqs. (2) and (3). Qecg maxfQecg g Qeeg bd ¼ bs  maxfQeeg g

ð2Þ

ad ¼ as 

ð3Þ

Consequently, the probability of seizure Psz with dynamic weighting can be determined from Eq. (1) using ad and bd as the mode weights in place of as and bs. An epoch is labelled as seizure if Psz is over a given decision threshold. A decision threshold equal to 0.5 is used in this study. 3.4.3. Interpolation of mode frame rates The ECG was considered in non-overlapping epochs of 60 s (16,384 samples at 256 Hz). However the EEG was

Q

xs 1 xs 1

EEG Nx6

N

Feature Extraction

Sort

. . .

xs 1 . . .

xs 2 xs 2

xs 2 xs 2

. . .

xs

. . .

2

Concatenate

xs n xs n

y1

ECG Feature Extraction

42

y2

Interpolate

. . .

. . .

n

y1 y2 . . .

yz

yz

xs n y1 y2

. . .

xs

. . .

Q

Fig. 4. Early integration (EI) neonatal seizure detection configuration.

yz

Classifier

Ps

1354

B.R. Greene et al. / Clinical Neurophysiology 118 (2007) 1348–1359

Q

xs 1 xs 1 . . .

xs 2 xs 2

EEG N

Feature Extraction

Classifier

. . .

Sort

xs Nx6

. . .

Peeg

2

Ps

xs n xs n

Decision

. . .

xs

Pecg

n

y1

ECG

y2

Feature Extraction

42

Classifier

. . .

Interpolate

y1 y2 . . .

yz

yz Q

Fig. 5. Late integration (LI) neonatal seizure detection configuration.

considered in 2048 sample non-overlapping epochs. In order to facilitate direct comparison and fusion of the two signals the ECG frame rate must be matched to the EEG frame rate by means of interpolation. The EEG frame rate is a multiple of the ECG frame rate. The interpolation factor is the integer closest to this multiple, the frames are then shifted to ensure that the EEG and ECG windows remain synchronized. In the EI configuration this interpolation was performed at a feature level. A ‘super’ feature vector with a frame rate matching that of the EEG was passed to the classifier. In the LI configuration, this interpolation was performed at the score level. The output probability from the ECG classifier was Fusion weights vs mean Fusion Accuracy 90 mean Fusion accuracy mean EEG accuracy mean ECG accuracy

85

Accuracy (%)

80

75

70

65

60

0

0.1

0.2

0.3

0.4 0.5 0.6 EEG fusion weight

0.7

0.8

0.9

1

Fig. 6. Patient-specific classification accuracy for LI EEG fusion classifier as the EEG static weight is varied in the range 0–1 (ECG weight equals 1-EEG weight).

interpolated after sub-dividing the output for each 60 s ECG epoch into eight 8-s epochs (for the 256 Hz case) to match the frame rate from the EEG classifier and the two combined, as discussed in Section 3.4.2, for an overall probability of seizure. 3.5. Classifier performance estimation Each classifier configuration was considered as both a patient-specific and a patient-independent classifier. The performance of each patient-specific classifier was estimated using m fold cross-validation on each record. This involves randomly splitting each record into m sections or ‘folds’: m  1 of these folds are then used to train the classifier and the remaining fold is then used to test the performance of the classifier. By shuffling the data and repeating this procedure q times and averaging the resulting accuracies for the training and test sets an unbiased, low variance estimate of the classifier performance can be obtained. In this study 10 folds and 10 shuffles of the data were used. The patient-specific performance measures were then taken as the average of each measure across records. The performance of the generalized or patient-independent classifier was estimated using cross-validation across all records. This involved training the classifier model on (z  1) of the z records and using the zth record to test the classifier performance and then rotating through the z possible combinations of training and test sets. The mean of the results for all iterations is taken as the patient-independent performance estimate. This test provides a measure of the classifiers’ ability to generalize from the training set and classify from ‘unseen’ records.

B.R. Greene et al. / Clinical Neurophysiology 118 (2007) 1348–1359

3.6. Classifier performance measures The classifier architectures considered in this study were epoch-based. As a result, classification accuracy is an epoch-based measure. In contrast, the percentage of seizures detected by the system is an event-based measure. There is much inconsistency in the literature regarding the format of reported results. For this reason, we have presented our results in both epoch format and event format. The classification accuracy is defined as the percentage of epochs correctly classified by the system. The sensitivity is defined as the percentage of seizure epochs (as labelled by an expert in neonatal EEG) correctly identified as seizure epochs by the system. The specificity is defined as the percentage of labelled non-seizure epochs correctly classified as non-seizure by the system. The false detection rate (FDR) is defined as the percentage of nonseizure epochs incorrectly identified as seizure epochs and is equivalent to 100-specificity (%). Caution must be exercised in reporting false detection results. Many algorithms report these in terms of ‘clusters’ of false detections per hour (Gotman et al., 1997b). Although this can be a useful measure of clinical utility of the algorithm, it is not always an accurate assessment of algorithm performance as it is possible for an entire hour of false detections to be taken as a single false detection for that hour. False detections less than 30 s apart are grouped as a single false detection. The mean false detection per hour (FD/h) is included here for completeness. The seizure sensitivity or good detection rate (GDR) is defined as the percentage of electrographic seizure events as labelled by an expert in neonatal EEG (G.B.B.) correctly identified by the system. If a seizure was detected any time between the start and end of a labelled seizure this was considered a ‘good detection’. A receiver operating characteristic (ROC) curve is a graphical representation of class sensitivity against specificity as a threshold parameter is varied. The area under the ROC curve (calculated using trapezoidal numerical integration) is an effective way of comparing the performance of different features or classifiers and is equivalent to the Mann–Whitney version of the Wilcoxon rank-sum statistic (Zweig and Campbell, 1993). A random discrimination will give an area of 0.5 under the curve while perfect discrimination between classes will give unity area under the ROC curve. 4. Results The presented system is an epoch-based system so it makes intuitive sense to quantify its performance in terms of epoch-based measures such as accuracy, sensitivity and specificity as detailed above. From a clinical viewpoint, the most important measure of the clinical utility of a seizure detection system is the percentage of seizure events correctly detected by the system (GDR) along with the number of false detections. For this reason we have given

1355

our results in terms of both epoch measures and eventbased measures. The results are divided into two sections: the patientspecific classifier results appear in Section 4.1 and the patient-independent classifier results appear in Section 4.2. Within each section results are presented for each signal individually as well as the results from the combination of the EEG and ECG signals. It should be noted that although artefact rejection was employed in both Fusion configurations, each of the 633 seizure events in the dataset was included in our analysis. 4.1. Patient-specific results The mean patient-specific ECG GDR was found to be 99.36% with an FDR of 29.80%. On an epoch basis the ECG classifier had a mean classification accuracy of 69.09% with associated sensitivity and specificity of 60.06% and 69.48%, respectively. The patient-specific EEG classifier had a GDR of 93.64% and an FDR of 11.47%. The EEG classification accuracy was 84.55% with sensitivity 71.02% and specificity 89.23%. The EI fusion patient-specific classifier had a mean GDR of 95.82% with an FDR of 11.23%. The EI fusion classification accuracy was 86.32% with sensitivity of 76.37% and specificity of 88.77%. When static weighting of modes was employed, the LI fusion patient-specific classifier had a GDR of 95.18% and an FDR of 10.77% when as = 0.7 and bs = 0.3. The accuracy was 85.99%, sensitivity 73.69% and specificity 89.23%. Fig. 6 shows the classification accuracy as as is varied from 0 to 1. With dynamic weighting of modes the LI patient-specific GDR was 97.52% with an FDR of 13.18%. The mean classification accuracy was 84.66% with sensitivity 74.08% and specificity 86.82%. Table 2 gives a breakdown for the patient-specific results for all modes and configurations. The results given for the LI fusion configuration are those for dynamic weighting of modes as this method gave superior performance to static weighting. 4.2. Patient-independent results The mean patient-independent ECG GDR was 63.54% with an FDR of 38.00%. The mean classification accuracy Table 2 Patient-specific results

GDR (%) FD/h FDR (%) Accuracy (%) Sensitivity (%) Specificity (%)

ECG

EEG

Fusion EI

LI

99.19 0.68 30.79 68.98 59.69 69.21

93.64 5.52 11.47 84.55 71.02 88.53

95.82 5.63 11.23 86.32 76.37 88.77

97.52 3.96 13.18 84.66 74.08 86.82

Results for LI fusion with dynamic weighting for each signal.

1356

B.R. Greene et al. / Clinical Neurophysiology 118 (2007) 1348–1359

Table 3 Patient-independent results

a

GDR (%) FD/h FDR (%) Accuracy (%) Sensitivity (%) Specificity (%)

82.33 1.71 37.78 63.97 69.51 62.22

EEG

80.41 3.42 26.05 72.45 68.18 73.95

EEG

50

Fusion EI

LI

81.44 3.15 28.57 71.51 71.73 71.43

81.27 3.05 33.05 68.89 74.39 66.95

Amplitude (uV)

ECG

EEG Seizure

0 —50 —100

Seizure Onset —150 0

100

200

300

400 Time (s)

500

600

700

Patient Independent ROC curve

100

Fusion EI ROC ECG ROC

90

Fusion LI ROC 80

Label Probability of Seizure Decision Threshold

0.6

Seizure

Seizure

0.55 0.5 0.45

b

200

Amplitude (uV)

100

400

600 Epoch No.

800

1000

EEG Seizure

0

—100 —200 370

372

374

376

378

380 382 Time (s)

384

386

388

390

Patient Independent Classifier: Output Probability 1 Probability of Seizure Label

Good Detection

0.8

False Detections 0.6 0.4 Decision threshold = 0.5 0.2

50

100

150

200 Epoch

250

300

350

Fig. 8. (a) An example of a good detection for the patient-independent classifier, for a seizure in record 5. The top panel shows an EEG channel with seizure onset marked with a black arrow. Bottom panel shows the probability of seizure generated by the system for that seizure. (b) An example of a good detection and false detection for the patientindependent classifier, for a seizure in record 6. The top panel shows the EEG seizure event denoted by the dashed black line in the bottom panel.

EEG ROC

Fig. 8a gives two examples of a ‘good detection’ of a seizure event, showing the system output in terms of the probability of seizure Psz. Fig. 8b also shows an example of a false detection. In both cases an epoch was classified as seizure if Psz was greater than or equal to the decision threshold.

70

Sensitivity [%]

0.7 0.65

200

Probability of seizure

was 63.54% with associated sensitivity and specificity of 69.63% and 61.63%, respectively. The mean patient-independent EEG GDR was 80.41% with an FDR of 28.57%. The mean classification accuracy was 71.51% with associated sensitivity and specificity of 68.18% and 73.95%. The EI fusion patient-independent GDR was 81.44% with an FDR of 28.57%. The mean classification accuracy was 71.51% with sensitivity of 71.73% and specificity of 71.43%. With static weighting of modes the LI fusion patientindependent classifier had a GDR of 80.41% and an FDR of 27.87% when the mode weights were as = 0.7 and bs = 0.3. With dynamic weighting the GDR for this classifier was 81.27% with an FDR of 33.05%. Table 3 outlines the patient-independent results. The results given for the LI fusion configuration are those for dynamic weighting of modes as this method gave superior performance. These results were confirmed by ROC analysis. Fig. 6 shows the ROC curves for the ECG, EEG and LI Fusion classifiers. The ECG ROC area was 0.68 while the EEG ROC area was 0.76. The LI fusion ROC area was 0.76 while the EI fusion ROC area was 0.77 (Fig. 7).

Probability of Seizure

Patient Independent Classifier: Output Probability

Results for the LI fusion classifier employed dynamic weighting of each signal.

60 50 40 30

5. Discussion

20 10 0 0

20

40 60 Specificity [%]

80

100

Fig. 7. Patient-independent LI Fusion ROC curves with ECG and EEG ROC curves.

An approach is proposed for combining simultaneouslyrecorded ECG and EEG signals for more accurate and robust detection of neonatal seizures. Recent research by the authors has suggested that the ECG is suitable in its own right for use in seizure detection

B.R. Greene et al. / Clinical Neurophysiology 118 (2007) 1348–1359

algorithms, due to the fact that R–R interval timing, complexity and variability changes appear to be associated with neonatal seizures (Greene et al., 2006b). There are a number of existing methods for EEG-based neonatal seizure detection. Many are either based on a single channel of EEG (Gotman et al., 1997b; Celka and Colditz, 2002; Hassanpour et al., 2004) or use empirically-based decision thresholds (Altenburg et al., 2003; Liu et al., 1992) as opposed to a classifier model, trained on real multi-channel EEG. The novel EEG-based classifier architecture reported here exploits the statistical inter-relationships and synchronously recorded nature of the EEG by processing all available EEG channels, while employing a statistical classifier model. Manifestations of seizure were observed simultaneously in the EEG and ECG signals. The combination of the two signals supplies the neonatal seizure detection system with a broader seizure-specific information base, offering potentially superior seizure detection performance. The ECG and multi-channel EEG data used to evaluate the algorithms developed in this study are of the same duration and quality as that found in the neonatal ICU and so can be said to faithfully reflect the performance of the algorithms under real-world conditions. Many previous studies have selected small numbers of seizure and non-seizure EEG epochs instead of long-duration EEG recordings to evaluate their algorithms. Scher and co-workers have reported that the neonatal sleep cycle is approximately 1 h in duration (Scher et al., 1993b). Results for algorithms trained and validated on small non-continuous tracts of EEG shorter than 1 h will not reflect the performance of such algorithms on the unique stage-specific characteristics of the neonatal EEG sleep cycle. A similar observation can be made of algorithms that are defined and validated using a single channel of EEG. Performance figures given for such algorithms do not reflect the performance of such algorithms on real multi-channel EEG data. A robust system must be able to cope with all EEG records regardless of record quality and duration. Gotman et al. (1997b) reported a GDR of 71% and Liu et al. (1992) reported a GDR of 84% for their patient-independent neonatal seizure detection methods (Liu et al., 1992), with 1.7 false detections per hour and a false detection rate of 1.7%, respectively. Celka and Colditz reported a GDR of 93% with an FDR of 4% for their patient-specific neonatal seizure detection method (Celka and Colditz, 2002). An independent evaluation of these three methods, performed on the same data-set as is used here, found that the results reported in the source papers overestimated the performance of these algorithms and found none were suitable for use in a clinical environment (Faul et al., 2005). Our patient-independent results for EEG alone and for LI and EI combined ECG and EEG are an improvement on those reported by the evaluation of Faul et al. The results for the Gotman method were validated by a subsequent paper by results for a separate data-set containing

1357

281 h of EEG data from 54 patients in three centres (Gotman et al., 1997a). The mean seizure detection rate for this set was 69% with a mean of 2.3 false detections per hour. The size of this data-set must lend credence to these results. Our patient-independent results for EEG and ECG combined were an improvement on those reported by Gotman, and achieved using a methodology to ensure robust reproducible results. The data-set used by Celka and Colditz contained 4 neonates and does not detail the number of seizures or the duration of the recordings used (Celka and Colditz, 2002). Furthermore the results are based on a single channel of EEG. The data-set of Liu et al. used 58 30-s seizure epochs, selected for ‘prototypicality’, this may have had a biasing effect on their results as noted by Gotman et al. It has been noted that different classifier models offer potentially complementary information about the patterns to be classified, which could be harnessed to improve the performance of the selected classifier (Kittler et al., 1998). As a result it has been found that combining classifiers from different modes with generalized knowledge of the patterns to be classified generally yields improved, more robust, classification performance. Our results confirm this finding. The combined EEG and ECG classifiers out-performed both the ECG and EEG classifiers’ individual performances. While the GDR or FDR for an individual signal may have been comparable to that for the early integration (EI) or late integration (LI) fusion classifiers, when taken together, results for fusion were always superior to those for each signal individually. As a result, the combination of EEG and ECG has led to a more robust system for neonatal seizure detection than a system based exclusively on the EEG. Two methods for combining ECG and EEG were considered in this study. The EI fusion configuration was generally found to give better performance than the LI fusion configuration. The one exception to this trend gave a higher GDR to the patient-specific LI fusion classifier than the EI classifier. A LI configuration would possess a distinct advantage over an EI configuration in a real-world patient-monitoring scenario. The use of dynamic weighting allows the system to deal with the presence of artefact, electrode drop-off or interference. Such a configuration could also take into account local variations in feature characteristics allowing weighting of each signal in the decision function. Patient-specific neonatal seizure detection may have utility in the modern neonatal ICU. When an electroencephalographer is alerted to the presence of electrographic seizure, relevant sections and channels of the preceding EEG could then be labelled as seizure. These annotated seizures could then be used to adaptively train a base patientindependent classifier towards the individual patients’ electroclinical seizure characteristics. A patient-specific system as discussed in this paper, while an improvement on current systems falls short of the ideal neonatal seizure detection system. This is due to the fact that it would require

1358

B.R. Greene et al. / Clinical Neurophysiology 118 (2007) 1348–1359

significant manual intervention from an electroencephalographer, as outlined above, to ensure robust operation. However, such a system may prove to be more clinically useful than a patient-independent system which we have found to have potentially lower false detection rates. Monitors such as the CFM are often used in the neonatal intensive care unit despite the fact that they were originally designed for adult intensive care use. They are currently used in both term and preterm neonates for seizure detection, prognosis, and to assess the severity of encephalopathy in trials of therapeutic hypothermia. The cerebral function monitor (CFM) produces a one channel amplitude integrated EEG signal. Despite attempts to develop more sophisticated cerebral function monitors such as the compressed spectral array (CSA) system, it is the CFM which remains dominant in the NICU today. There have been criticisms of the CFM because of its limitation to a single EEG channel (plus a simultaneous ‘artefact detection’ channel) and the lack of detailed information compared with the conventional multi-channel EEG, especially when used for detection of neonatal seizure discharges (Eaton et al., 1994; Klebermass et al., 2001; Toet et al., 2002; Rennie et al., 2004). In the study by Rennie et al. up to 50% of seizures were missed particularly those that were of short duration, focal or of low amplitude. Therefore we would have to say that while these devices are in use in the NICU they have serious limitations and therefore the need for automated seizure detection from multi-channel EEG is even greater. The signal framework introduced here raises the possibility of multi-channel, multi-signal intelligent neonatal monitoring by taking account of, and combining, all available physiological parameters, for monitoring the state and wellbeing of newborns in the ICU. This framework could be further extended to all clinical patient-monitoring situations. It should be noted that the clinical utility of our patientindependent system is limited by the relatively high false detection rates reported for the patient-independent classifier in this paper. In future research we hope to reduce false detection rates for both patient-specific and patient-independent classifiers through the use of more advanced artefact detection and rejection algorithms. A more sophisticated normalisation scheme may lead to improved patient-independent performance. Further increases in system performance might be achieved by taking account of other recorded physiological signals such as the electrooculogram and cerebral blood flow velocity. However, in order for these signals to be included in multi-signal neonatal seizure detection systems, the spatial and temporal relation of these seizures to the electrographic seizure must first be quantified. The inclusion of non-signal information, such as gestational age, weight, maternal history, etc., into an automatic neonatal monitoring system has the potential to greatly improve the performance of neonatal seizure detection systems due to the highly variable age-dependent characteristics of the neonatal period.

6. Conclusion We describe a novel algorithm for neonatal seizure detection. Combination of an ECG-based classifier system with a novel multi-channel EEG-based classifier system led to improved seizure detection performance. The algorithm was evaluated using a large data-set containing ECG and multi-channel EEG of realistic duration and quality. Future work is needed to develop improvements in these algorithms and to explore the possible added diagnostic value of other combinations of physiological data in the automatic identification of seizures in this age-group. Acknowledgements This project was funded by an Irish Higher Education authority grant (HEA – 9300) and an interdisciplinary grant from the Health Research Board of Ireland. The authors would like to acknowledge the helpful technical assistance of Dr. Edmund Lalor and Mr. Brian O’Mullane. We also acknowledge the help and support of the nursing and medical staff of the Unified Maternity Services, Cork, and the parents and families of the babies involved in this study. References Altenburg J, Vermeulen RJ, Strijers RLM, Fetter WPF, Stam CJ. Seizure detection in the neonatal EEG with synchronization likelihood. Clin Neurophysiol 2003;114:50–5. Benitez D, Gaydecki PA, Zaidi A, Fitzpatrick AP. The use of the Hilbert transform in ECG signal analysis. Comput Biol Med 2001;31:399–406. Boylan GB, Rennie JM, Pressler RM, Wilson G, Morton M, Binnie CD. Phenobarbitone, neonatal seizures, and video-EEG. Arch Dis Child Fetal Neonatal Ed 2002;86:165–70. Celka P, Colditz P. A computer-aided detection of EEG seizures in infants: a singular-spectrum approach and performance comparison. IEEE Trans Biomed Eng 2002;49:455–62. D’Alessandro M, Esteller R, Vachtsevanos G, Hinson A, Echauz J, Litt B. Epileptic seizure prediction using hybrid feature selection over multiple intracranial EEG electrode contacts: a report of four patients. IEEE Trans Biomed Eng 2003;50:603–15. de Chazal P, Heneghan C, Sheridan E, Reilly R, Nolan P, O’Malley M. Automated processing of the single-lead electrocardiogram for the detection of obstructive sleep apnoea. IEEE Trans Biomed Eng 2003;50:686–96. Eaton DM, Toet M, Livingston J, Smith I, Levene M. Evaluation of the Cerebro Trac 2500 for monitoring of cerebral function in the neonatal intensive care. Neuropediatrics 1994;25:122–8. Esteller R, Echauz J, Tcheng T, Litt B, Pless B. Line length: an efficient feature for seizure onset detection. In: Proceedings of the 23rd annual international conference of the IEEE engineering in medicine and biology society, 2001, 2; 2001, p. 1707–10, vol. 2. Faul S, Boylan G, Connolly S, Marnane L, Lightbody G. An evaluation of automated neonatal seizure detection methods. Clin Neurophysiol 2005;116:1533–41. Gotman J, Flanagan D, Rosenblatt B, Bye A, Mizrahi EM. Evaluation of an automatic seizure detection method for the newborn EEG. Electroencephalography Clin Neurophysiol 1997a;103:363–9. Gotman J, Flanagan D, Zhang J, Rosenblatt B, Bye A, Mizrahi EM. Automatic seizure detection in newborns: methods and initial evaluation. Electroencephalography Clin Neurophysiol 1997b;103:356–62.

B.R. Greene et al. / Clinical Neurophysiology 118 (2007) 1348–1359 Greene BR, de Chazal P, Boylan GB, Reilly RB, Connolly S. Electrocardiogram based Neonatal Seizure detection. IEEE Trans Biomed Eng, TBME-00350-2005.R2, in press. Greene BR, deChazal P, Boylan GB, Reilly RB, O’Brien C, Connolly S. Heart and respiration rate changes in the neonate during electroencephalographic seizure. Med Biol Eng Comput 2006b;44:27–34. Greene BR, Reilly RB, Boylan G, de Chazal, P, Connolly S. Multichannel EEG based neonatal seizure detection. In: 28th international conferences of the IEEE-EMBS conference; 2006c. Hassanpour H, Mesbah M, Boashash B. Time-frequency feature extraction of newborn EEG seizure using SVD-based techniques. EURASIP J Appl Signal Process 2004;16:2544–54. Karayiannis NB, Srinivasan S, Bhattacharya R, Wise MS, Frost Jr JD, Mizrahi EM. Extraction of motion strength and motor activity signals from video recordings of neonatal seizures. IEEE Trans Med Imag 2001;20:965–80. Kittler J, Hatef M, Duin RPW, Matas J. On combining classifiers. IEEE Trans Patt Anal Mach Intell 1998;20:226–39. Klebermass K, Kuhle S, Kohlhauser-Vollmuth C, Pollak A, Weninger M. Evaluation of the cerebral function monitor as a tool for neurophysiological surveillance in neonatal intensive care patients. Childs Nerv Syst 2001;17:544–50. Levene M. The clinical conundrum of neonatal seizures. Arch Dis Child 2002;86:75–7.

1359

Liu A, Hahn JS, Heldt GP, Coen RW. Detection of neonatal seizures through computerized EEG analysis. Electroencephalography Clini Neurophysiol 1992;82:30–7. Rennie JM, Chorley G, Boylan GB, Pressler R, Nguyen Y, Hooper R. Non-expert use of the cerebral function monitor for neonatal seizure detection. Arch Dis Child Fetal Neonatal Ed 2004;89:F37–40. Scher MS, Aso K, Beggarly ME, Hamid MY, Steppe DA, Painter MJ. Electrographic seizures in preterm and full-term neonates: clinical correlates, associated brain lesions, and risk for neurologic sequelae. Pediatrics 1993a;91:128–34. Scher MS, Hamid MY, Steppe DA, Beggarly ME, Painter MJ. Ictal and interictal electrographic seizure durations in preterm and term neonates. Epilepsia 1993b;34:284–8. Teich MC, Lowen SB, Jost BM, Vibe-Rheymer K, Heneghan C. In: Akay M, editor. Nonlinear biomedical signal processing, vol. II. Piscataway (NJ): IEEE Press; 2000. Toet MC, van der Meij W, de Vries LS, Uiterwaal CSPM, van Huffelen KC. Comparison between simultaneously recorded amplitude integrated electroencephalogram (cerebral function monitor) and standard electroencephalogram in neonates. Pediatrics 2002;109:772–9. Volpe JJ. Neurology of the newborn. Philadelphia (PA): Saunders; 2001. Zweig M, Campbell G. Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem 1993;39:561–77.