EEG-based seizure detection in patients with intellectual disability: Which EEG and clinical factors are important?

EEG-based seizure detection in patients with intellectual disability: Which EEG and clinical factors are important?

Biomedical Signal Processing and Control 49 (2019) 404–418 Contents lists available at ScienceDirect Biomedical Signal Processing and Control journa...

4MB Sizes 0 Downloads 46 Views

Biomedical Signal Processing and Control 49 (2019) 404–418

Contents lists available at ScienceDirect

Biomedical Signal Processing and Control journal homepage: www.elsevier.com/locate/bspc

EEG-based seizure detection in patients with intellectual disability: Which EEG and clinical factors are important? Lei Wang a,∗ , Xi Long a,b , Ronald M. Aarts a,b , Johannes P. van Dijk c,d , Johan B.A.M. Arends a,c a

The Department of Electrical Engineering, Eindhoven University of Technology, The Netherlands Philips Research, HTC, 5656 AE Eindhoven, The Netherlands c The Department of Clinical Physics, Epilepsy Center Kempenhaeghe, The Netherlands d University of Ulm, Germany b

a r t i c l e

i n f o

Article history: Received 14 July 2018 Received in revised form 6 November 2018 Accepted 6 December 2018 Keywords: EEG Seizure detection Intellectual disability Imbalanced data Post-processing Multilevel analysis LDA SVM

a b s t r a c t Epilepsy is a commonly secondary disability in people with an intellectual disability (ID), affecting 22% of the ID population while 1% of general population. Surprisingly, EEG-based automated seizure detection in the ID population has not yet been sufficiently studied. The reasons are twofold. Firstly, long-term EEG recordings are few due to behavioral problems. Secondly, the annotation of EEG recordings has been proved difficult due to the complex EEG signal abnormalities caused by brain development disorders. As a result, the performance of automated seizure detection for ID people is largely unknown. In this work, we performed automated seizure detection on a retrospective dataset containing 615 h ambulatory scalp EEG from 29 participants with ID, including 91 seizures. To design a generic seizure detector for the ID people, we need to deal with three major problems: highly imbalanced data, heterogeneous dataset and difficult annotation. (1) For the imbalanced data, we used proper performance criteria (e.g., precision and recall curve) and employed a post-processing process (i.e., patient-specific detection thresholds). (2) For the heterogeneous dataset, we employed multi-domain EEG features that showed a better discriminative power in our dataset, and compared the linear and nonlinear (LDA vs. SVM with Gaussian kernel) classifiers and validated using a leave-one-out cross validation (LOOCV). (3) A stepwise EEG annotation procedure was used to improve the accuracy of annotation due to the presence of numerous seizure imitators and unclear contrast between ictal and interictal EEG activities. Results showed that LDA outperformed SVM with a clear margin of sensitivity, and achieved overall sensitivities 63.1–81.3%, a median FD/h of 1.0 and median latency of 11.5 s. Finally, we conclude that EEG signals of the ID population form a heterogeneous entity with respect to important factors: EEG discharge patterns, EEG backgrounds and EEG seizure visibility. The performance of the seizure detection varies significantly with these factors. The results presented here can serve as prior knowledge for designing a generic seizure detector for the ID patients and the non-convulsive seizure states (NCSS). © 2018 Elsevier Ltd. All rights reserved.

1. Introduction

Abbreviations: AL, seizure alarm length; AUCPR , area under curve (AUC) of P–R curve; DT, detection threshold; EMG seizure, discharge with EMG activity; FDs, false detections; FDt /h, time of FD per hour of recording; ID, intellectual disability; LOOCV, leave-one-out cross validation; LDA, linear discriminant analysis; NCSS, non-convulsive seizure states; PPV, positive predictive value; P–R, precision and recall; PS, prediction score; RUSBoost, random undersampling AdaBoosting; RF, random forests; RBF, radial basis function; SP, fast spike seizures; SPWA, spike-wave seizures; SVM, support vector machines; WA, wave seizures. ∗ Corresponding author. E-mail addresses: [email protected] (L. Wang), [email protected] (X. Long), [email protected] (R.M. Aarts), [email protected] (J.P. van Dijk). https://doi.org/10.1016/j.bspc.2018.12.003 1746-8094/© 2018 Elsevier Ltd. All rights reserved.

Abnormal brain development often results in an intellectual disability (ID), which includes an abnormally low intelligence quotient (IQ). Epilepsy is a common secondary disability in people with ID [1], often beginning in childhood and affecting approximately 22% of people with ID compared with 1% of the general population [2–4]. The seizures of ID people are often severe, frequent, and intractable to antiepileptic drugs [5]. Despite the strong association between ID and epilepsy [6], literature on the specific EEG characteristics of people with ID and epilepsy is scarce [7,8], and no systematic reports on the automated seizure detection in this population are available. An important reason may be that long-term video/EEG

L. Wang et al. / Biomedical Signal Processing and Control 49 (2019) 404–418

recordings of ID patients are rarely performed due to behavioral problems. Current research of seizure detection in patients with an ID is limited, and varies from non-EEG detection of major convulsive seizures [9–11] to EEG-based detection of minor seizures [12,13]. The EEG signals of ID patients are difficult to interpret because of abnormalities such as different ictal discharge types, background abnormalities, highly variable proportions and forms of interictal events, as well as unreliable annotations caused by a poor contrast between abnormal and normal EEG activities. Therefore, automated analysis of EEG signals of ID patients is very challenging. Furthermore, non-convulsive seizure states (NCSS) in ID patients occur frequently and are underdiagnosed [14,15]. NCSS is a heterogeneous entity [16], and the diagnosis cannot exclusively rely on EEG changes [17]. Until now, to the best knowledge of the authors, no automated methods have been published for a reliable diagnosis of NCSS. In our previous studies for the ID population, we evaluated the state-of-the-art seizure detection methods on the basis of epileptic epochs [13] and proposed new EEG features [20] to improve the detection performance. In this more clinically-orientated work we try to answer the question: which factors (clinical and from the EEG) are important for seizure detection in the ID population. In contrast to our previous study, we evaluate the seizure detection in an event-based manner instead of epoch-based, and the detection performance for each individual patient is assessed. The seizure detection task is limited to the detection of EEG seizures in this work. To avoid ambiguities, we specify the terminologies used in this study as shown in Table 1. More revised EEG-related terms can be found in elsewhere [21]. This study also serves as a preparation for the automated detection of NCSS in the ID population. The performance of automated seizure detection for ID patients is largely unknown. This work aimed (1) to construct an annotated EEG dataset of epileptic patients with an ID and (2) to develop the EEG-based seizure detector and evaluate the detection performance in a systematic manner. The EEG-based seizure detector can help neurologists to locate seizures in long-term EEG recordings for diagnosis, and contribute to a real-time monitoring system using the ambulatory equipment. The major contributions of this work are listed as follows. • We constructed the first, long-term EEG dataset of ID patients with the hierarchical annotation (in .XML files) including both EEG and non-EEG information. • This work evaluated the real-life data (i.e. highly imbalanced data) by using proper performance criteria, and we showed the relationship between the epoch detection performance (i.e., classification performance) and event detection performance. • This work optimized the seizure detection on the imbalanced data by employing a post-processing process (i.e., patient-specific detection thresholds), and we also evaluated the performance using predefined detection thresholds (DTs) (e.g., DT = 0.5). • We employed multi-domain EEG features that showed a better discriminative power in our dataset [13], and compared the linear and nonlinear classifiers (LDA vs. SVM) on this heterogeneous dataset by using LOOCV. • Important EEG and non-EEG factors were recognized by using a multilevel analysis, which evaluates mixed effects of hierarchical factors. This paper is organized into three major sections. Firstly, ‘materials and methods’ motivated the study design, introduced the EEG and patient information, the seizure detection method and performance criteria, as well as the statistical analysis methods. Secondly, ‘results’ described the obtained EEG dataset and patient demography. We reported the detection performance in two ways: using

405

predefined DTs and using patient-specific DTs. Important factors were recognized and the relationship amongst them was further reviewed. Finally, we discussed the effects of the heterogeneous data on classifiers. We compared our results with other studies on long-term EEG and interpreted the clinical relevance of the findings (i.e., important EEG factors). A possible application for detection of NCSS was also proposed. 2. Materials and methods 2.1. Study design This is a non-randomized retrospective observational clinical trial. The aim of this study is to evaluate the EEG and clinical factors that potentially affect the detection performance. Given that these factors are not normally distributed in patients [9], a limited patient sample size could miss some rare factors, while a larger sample size is more time-consuming for annotations. Therefore, we did not perform a random patient selection from the ID population. Based on the seizure detection performance in our pilot study [22], we needed at least eight patients for each EEG discharge pattern (˛ = 0.05, ˇ = 0.2). 2.2. Patient selection We selected participants with an ID who showed at least one EEG seizure in continuous 24-hour ambulatory EEG recordings with a good signal quantity. Clinical seizures without EEG change were excluded due to the lack of timing information between the clinical events and the EEG, since no synchronous video could be used in this field study (at the patients’ home). We also selected a number of patients who showed EEG seizures contaminated by EMG activities (discharge with EMG activity). This seizure type accounts for around 95% of clinical seizures in the ID population [9]. Note that such a control of patient selection may induce a source bias, and make the average detection performance less representative for the whole ID population. However, the detection performance on each seizure pattern thus is able to reveal. The study was approved by Kempenhaeghe’s ethical review board. 2.3. EEG data Patients with ID tend to suffer more severe behavioral problems in a hospital environment, which makes it difficult to record a longterm video/EEG data. Therefore, all ambulatory EEG data of the ID patients were recorded at home without video. The clinical information about the seizures was achieved from the diaries provided by caregivers, historical data and the final EEG reports. The continuous scalp EEG signals (sampling rate of 100 Hz) were acquired using 24 electrodes (or channels) of Ag/AgCL in positions according to the 10–20 positioning system, measured by the EEG recording equipment TMS (Twente Medical Systems) and reported with the EEG acquisition system BrainRT. 2.4. Annotations A stepwise EEG annotation procedure was preferred instead of a simple one-step approach with an inter-observer agreement test. At the first step, the EEG seizures were described by EEG technicians when preparing the EEG report. In the second step, all EEGs were annotated more accurately by a clinical neurophysiologist specialized in epilepsy. These annotations formed the basis of the final selection of EEG data and included the onset and offset of EEG seizures, type of onset/offset (clear/blurry) and the types of ictal EEG discharge patterns. The four EEG discharge patterns defined in this study (fast spike, spike-wave, slow wave and EMG) may occur

406

L. Wang et al. / Biomedical Signal Processing and Control 49 (2019) 404–418

Table 1 Terminology used in this study. Terminology

Definition in this study

(Clinical) Seizure

A transient occurrence of signs and/or symptoms due to abnormal excessive or synchronous neuronal activity in the brain. In clinical practice, a clinical seizure is often diagnosed based on the clinical signs supported by the EEG manifestations [18]. EEG discharge that is associated with a clinical seizure. Note that clinical seizures may not show clear EEG change because two-thirds of the cortex is enfolded in sulci and dipole discharges in sulci do not always project to scalp EEG electrodes [19]. The beginning time of a EEG seizure, i.e., the time that EEG signals begin to show any change in relation to a clinical seizure. Note that the EEG onset of a seizure could be earlier or later (from seconds to minutes) than the clinical seizure onset. The epileptiform discharges with certain EEG morphologies. Typical EEG discharge patterns include (fast) spikes, spike-wave complexes, rhythmic (slow) delta/theta waves. These were termed as ‘EEG seizure patterns’ in our previous study [13]. These EEG discharge patterns are associated with specific clinical seizure types. EEG seizures accompanied by electromyography (EMG) activity. They occur often in motor seizures such as tonic and myoclonic seizures in the ID population. Whether boundaries (onset and offset) of an EEG seizure based on visual inspection are clear or blurry. EEG seizures with clear (or blurry) boundaries are termed as (in)distinct seizures. Indistinct seizures can be located by using the clinical information and longer context EEG activities.

EEG seizure (EEG) seizure onset (EEG) discharge pattern

Discharge with EMG activity (EEG) seizure visibility

Fig. 1. A comparison between a distinct EEG seizure (subj #26) and an indistinct one (subj #3) with fast spike discharge pattern. Red lines show the onset of an EEG seizure and blue (dash) lines show the offset. For easy visualization, the EEG signals were plotted by using a common average montage.

in sequences or in all possible combinations (polyspike complexes) during an EEG seizure. Combined patterns at the same time were classified as mixed patterns. For indistinct seizures (see Fig. 1), we used the context EEG activities to determine the onset and offset. In the third step, doubtful EEG epochs were scored independently for a second time by two authors. If no agreement was present with the first scoring, the EEG epochs were excluded for further analysis. The excluded EEG epochs in total were less than 5% of all seizure EEG epochs that accounts for only 0.14% of whole EEG recording. Therefore, it has little effect on the reported performance.

We also recorded patients’ wake and sleep status that would potentially cause differences in EEG signals [23]. Due to the lack of automated classification methods of sleep/wake status in this ID population, the wake and sleep status was estimated from the diaries and the classification of the sleep differentiation by the EEG technicians. Note that one patient could have more than one awake stages during sleep. The boundary between awake and sleep states could have several minutes bias, therefore, the EEG segments (<1%) around the boundaries are excluded for statistical analysis.

L. Wang et al. / Biomedical Signal Processing and Control 49 (2019) 404–418 Table 2 Patient demographics including EEG and clinical factors. Factors

Median

Age (yrs) Seizure number Average seizure duration (s) EEG recording length (h) Gender ID level* Clinical seizures** Seizure patterns*** EEG backgrounda Interictal spike levelsb Occipital EEG typesc Seizure visibilityd Sleep typese

28 12–51 2 1–13 19 5.5–159.5 22.5 17.4–25.8 17 males and 12 females 3 light, 11 moderate, 15 severe 21 tc, 16 ton, 9 myoc, 11 unclassified SP = 12, SPWA = 6, WA = 9, EMG = 9, Mix = 6 Normal (n = 14), 1 (n = 8), 2 (n = 5), 3 (n = 2) 1 (n = 6), 2 (n = 4), 3 (n = 10), 4 (n = 9) Normal (n = 9), 1 (n = 6), 2 (n = 10), 3 (n = 4) Yes (n = 13), no (n = 16) Normal (n = 4), 1 (n = 13), 2 (n = 6), 3 (n = 6)

Range

*

IQ levels are 3 (severe, IQ < 30), 2 (moderate, 30–50), and 1 (light, 50–70). It records the number of subjects who have specific clinical seizure events, tc tonic–clonic, ton – tonic, myoc – myoclonic, others – unclassified types [18]. *** The four discharge patterns fast spike, spike-wave, wave, discharge with EMG activity and Mixed patterns are shortened as SP, SPWA, WA, EMG and Mixed, respectively. a EEG background types are normal (i.e., symmetric), 1 (regional abnormality), 2 (hemispheric abnormality), and 3 (bilateral abnormality). b Interictal spike levels are normal (none), 1 (<1%), 2 (1–10%), 3 (10–50%) and 4 (>50%). c Occipital types are normal (8–10 Hz), 1 (6–8 Hz), 2 (4–6 Hz) and 3 (2–4 Hz). d Whether the boundaries of EEG seizures are clear or not? [19]. e Sleep types are 1 (no phasic events such as K-complex, sigma spindles, normal sleep stages), 2 (only NREM/REM), and 3 (unclassified). **

2.5. EEG factors and clinical factors Clinicians rely heavily on EEG discharge patterns to identify and localize clinical seizures [24]. The performance of automated seizure detection varies with different EEG discharge patterns [13,25]. We used the following typical EEG discharge patterns [26,27] associated with clinical seizure types defined by the International League Against Epilepsy (ILAE) [18]: • (fast) spikes, • spike-wave complexes (or sharp/slow waves), • rhythmic (slow) delta/theta waves. Fast spikes often present during tonic seizures. Spike-wave patterns occur during myoclonic seizures, or at the end of tonic–clonic seizures. Rhythmic slow delta/theta waves may present during focal seizures. These three typical EEG discharge patterns are also generally described as polymorphic seizure patterns [18]. We defined an additional discharge pattern: discharge with EMG activity, which often exists in most motor seizures including tonic, tonic–clonic and myoclonic seizures. A previous study [9] in the ID population suggests that 95% of the clinical seizures are motor seizures, often accompanied by EMG activities. Therefore, it is not proper to simply exclude the seizure types as EMG artifacts [28]. In addition, the influence of EEG seizure visibility (defined as clear or blurry boundaries of an EEG seizure when doing visual annotation) [19,29]. The ID patient’s EEG background (continuously ongoing activities [21]) is often abnormal due to pathological conditions of the brain [30]. An abnormal EEG background is associated with intractability. Clinical factors including patient’s age [31], IQ level and sleep differentiation [23] were took into account. See Table 2 for detailed EEG and non-EEG factors. 2.6. Detection methods Our seizure segment detector is to identify a segment of the EEG discharges (including ictal and interictal) that may be accompanied by an EEG seizure. The detector is composed of four major units,

407

namely, EEG preprocessing, feature extraction and normalization, classification and post-processing. An illustration of the detector is shown in Fig. 2. For a more detailed description of the EEG features specialized for this ID population, we refer to our preliminary work [13]. The final determination of an EEG seizure depends on the definition of a seizure event, which however is different cross studies [32]. 2.6.1. EEG preprocessing In this work, we set a preprocessing rule that allows not only EEG signals but also a certain amount of artifacts (e.g., EMG) to be kept for the EEG seizure detection. The unipolar montage is used to avoid changing the synchrony among EEG channels [33]. Three EEG electrodes above eyes, Fp1, Fpz, and Fp2 are excluded because the signals are contaminated by the successive eye blink/movement artifacts. Firstly, on each EEG channel, the signals are filtered by using a 10th-order Butterworth bandpass filter with the lower and the higher cutoff frequency of 0.5 Hz and 45 Hz, respectively. Secondly, EEG channel selection has been performed to choose the channels that contain EEG with good signal quality by using a threshold policy. That is, in each non-overlapping sliding windows of two seconds, we keep only the channel in which the amplitude range ra of the EEG epoch is within [10–200] ␮V for further analysis (ra = (max(x) − min(x))/2, where x is the amplitude sequence of an EEG segment). The lower boundary (10 ␮V) was to reject artifacts caused by loose electrode-skin collection or sweating. The higher boundary (200 ␮V) was to reject excessive artifacts caused by movements, electrocardiogram (ECG), and excessive EMG activities. 2.6.2. Feature extraction and normalization Our previous study [13] proposed a large multi-domain feature set (including the traditional and newly-proposed features) and reported a significantly improved classification performance compared with a conventional feature set. Therefore, the EEG feature set proposed in our previous study was used here. It includes 47 features in the time, frequency, time-frequency, and spatiotemporal domains, as well as synchronization-based features. To speed up the training process of the classifiers (e.g. SVM), we used normalization of standardized ‘z-score’ (i.e., z = (x−)  ) to linearly map each feature into a common scale with an average of zero and standard deviation of one. Note that for each feature, the mean  and standard deviation  are estimated on the entire feature space (including all patients’ data) instead of on only each subject. Since the same linear transformation was performed for all subjects, it did not affect the separability of seizure and non-seizure classes for cross-subject classification validation. Otherwise, the distribution of each feature can be changed in the feature space, if the z-score is performed on each subject’s data separately. 2.6.3. Classification and post-processing Several classifiers with different complexities were used for validation of classification performance. They are the linear discriminant analysis (LDA), support vector machines (SVM) with Gaussian kernel, random forests (RF) and random undersampling adaptive boosting (RUSBoost). More descriptions about these classifiers can be found in [13]. In addition to the EEG features and classifiers, the postprocessing of classification also plays a role [34,35]. For example, a Kalman filter [36] and firing power [37] were used to reduce the noise of outputs of SVM classifiers. A sequential prediction score (PS) (or seizure probability [35]) between 0 and 1 is obtained by a linear mapping from classifiers’ output and further compared with a decision threshold (DT) for a decision making [38]. On imbalanced data, classifiers with default thresholding (i.e., DT = 0.5) is not optimal. However, classifiers with a proper threshold can outper-

408

L. Wang et al. / Biomedical Signal Processing and Control 49 (2019) 404–418

Fig. 2. An illustration of the EEG-based seizure segment detector. A 2-s segment with multi-channel raw EEG signals is fed in the detector. After the EEG preprocessing and feature extraction, we obtain a feature vector including 47 features and perform a feature normalization. To further optimize the output of classifiers, post-processing is applied and the final output is a number 0 (non-seizure segment) or 1 (candidate seizure segment).

form other methods specialized for imbalanced classification such as resampling and/or instance weighting [39]. Therefore, instead of using default DTs, we propose to use thresholding (i.e., DT vary between 0 and 1) to find optimal patient-specific DTs. In addition, we evaluated detection performance by using predefined DTs with different values.

2.6.4. Definition of the detection of seizure event The reported performance depends on more or less the definition of a seizure detection. The method of ‘any overlap’ being a detection tends to report an over-optimistic performance [40], and the approach that sensitivity and FD are separately evaluated by using different rules is also not realistic in a clinical practice. Therefore, we did not apply a rule such as that FDs occurring within a time span (30 s) are counted as only one FD [41]. The event-based seizure detection in this work (see Fig. 3) is performed three steps. First, the output of a classifier is mapped into the range between 0 and 1 (normalized PS) in the post-processing. The normalized PS is then converted into a binary sequence of 0 (non-seizure) and 1 (candidate seizure) by using a DT. Second, to filter the isolate seizure-like events during interictal EEG, we define a shortest alarm length (AL). That is, only continuous seizure candidate epochs that last longer than AL can trigger a seizure alarm. Finally, we compare raised seizure alarms with the experts’ annotation. The seizure alarm overlapping with experts’ annotation counts as a detected seizure. Otherwise, it is an FD. Choosing a longer AL can reduce FD, but at a cost of a longer latency and missed detection of short seizures. Three 2 s epochs [25], past 25 s epochs as a baseline [42] or a sliding window of 20 min [43] are used to make a seizure decision. In our dataset, the majority of seizures were short. We report the detection performance with AL of 2 s, 6 s, and 10 s (i.e., the length of a sliding window), respectively.

2.7. Evaluation of detection performance 2.7.1. Leave-one-out cross validation (LOOCV) In a LOOCV scheme, a classifier was trained on the pooled feature set from all but one of the patients’ EEG recordings, and it was subsequently used to classify/predict the seizure and non-seizure epochs in the excluded patient. This was repeated until each patient

was excluded and tested once. For each patient, we report a detection performance. 2.7.2. Criteria for classification performance The precision and recall (P–R) curve, i.e., a plot of the sensitivity vs. positive predictive value (PPV), is known as a more suitable performance metric than the receiver operating characteristic (ROC) curve (i.e., a plot of the sensitivity vs. 1-specificity) in imbalanced datasets [44]. Our previous study demonstrated that the area under the curve of a P–R curve (AUCPR ) is a more discriminative indicator of classification performance in the skewed dataset (i.e., seizure epochs account for only 0.14% of entire EEG recordings in our dataset) [13]. A larger value of AUCPR corresponds to a better epoch-based detection performance. We further show the association between AUCPR and the event-based performance. 2.7.3. Determine a patient-specific detection threshold (DT) The P–R curve can show the overall classification performance through AUCPR , and it can also determine a patient-specific DT. For each patient, we can determine an optimal DT by finding a threshold (between 0 and 1) that maximizes the F1 score (or F-measure) [39] on the P–R curve (see Fig. 4). When we increase a DT, the PPV increases (the FD drops), but at the cost of a decreasing sensitivity. Note that a patient-specific DT is posterior (for off-line analysis) since it requires the global EEG data of a patient. The approach of adaptively choosing an optimal DT in a real-time detection is still an open issue and under development. 2.7.4. Criteria for event-based detection performance The event-based sensitivity is defined as the ratio between the number of detected seizure events to the number of all annotated seizure events. The latency is the time lag between the annotated onset and the location of a seizure alarm. To make the performance comparable among datasets with different sizes (especially the recording length of interictal EEG), the FD rate is defined as the average number of FDs per hour (FD/h = #FD/the hours of entire recordings). In addition, to reports the accurate agreement between a classifier’s prediction and expert’s annotations [40], accumulated time (s) of FD per hour (FDt /h) proposed in our previous study [13] was used.

L. Wang et al. / Biomedical Signal Processing and Control 49 (2019) 404–418

409

Fig. 3. The definition of event-based seizure detection from the prediction score (PS) of a classifier. The ‘S’ and ‘NS’ denote seizure and non-seizure, respectively. The alarm length (AL) is a predefined shortest time to trigger a seizure alarm, i.e., if the continuous ‘S’ in the binary score is longer than AL, a seizure alarm is triggered. Comparing with experts’ annotation, an FD or a detected event with certain latency can be reported.

Fig. 4. An example of determining a patient-specific DT based on the P–R curve (Subj#16, AUCPR = 0.44). The circle shows the point with maximum F1 score, and it corresponds to a DT with value 0.9819 on a normalized PS between 0 and 1.

2.8. Statistical analysis In our dataset, each patient has score (continuous or categorical) variables (e.g., seizure duration and IQ levels) and grouping variables (e.g., seizure patterns), which makes a hierarchical data structure where variables (EEG/clinical factors) are nested. To determine the important factors that affect seizure detection, we perform a multilevel analysis by using a generalized linear mixedeffects (GLME) model [45], which takes structural variables with fixed and random effects measured at multiple hierarchical levels into account [46]. Fisher’s exact test was used to test random (or non-random) associations between two categorical factors (at 5% significance level). Mann–Whitney test was used to compare detection performance between two patient groups. A two-side Kruskal–Wallis test (˛ < 0.05) was used to compare detection performance among multi-groups. The Chi-Square test was used for testing the homogeneity of two subgroups of patients with multiple EEG/clinical factors, e.g., discharge patterns, and IQ levels.

Fig. 5. Flowchart of the patient selection procedure.

shown in Fig. 5. The EEG dataset of the selected 29 epileptic patients has a total EEG recording time of 615 h and contains 91 seizures (89 generalized, 2 focal), including 21 patients showing tonic–clonic seizures, 16 tonic seizures, 9 myoclonic seizures and 11 unclassifiable seizure types. The accumulated duration of EEG seizures across 29 patients is 3034 s. We used categorical score or grouping to record each patient’s EEG/clinical factors, as shown in Table 2. Patients’ usual clinical seizure types defined by ILAE were obtained from their medical history. Note that not all historical seizure types are present in this dataset. Therefore, the historical seizure types were not used for further analysis. 3.2. Event-based detection performance

3.1. Patients and demography

To understand the influence of a decision threshold (DT), we first report the event-based performance of seizure detection by using predefined DTs. We then report the performance by using patient-specific DTs (i.e., post-processing), which shows an optimized trade-off between the sensitivity and FD rate in each patient.

We finally included 29 epileptic patients (12 females, age 29 ± 13 y) with an intellectual disability (3 light, 11 moderate, 15 severe, with IQ at range of severe [0–30], moderate [30–50] and light [50–70]) for our EEG analysis. The selection procedure is

3.2.1. Performance of predefined decision thresholds (DTs) We tested predefined DTs with values ranging from 0 to 1 with a step size of 0.1 on both classifiers LDA and SVM. A DT with a fixed value is used for all patients to perform the seizure detection. Fig. 6

3. Results

410

L. Wang et al. / Biomedical Signal Processing and Control 49 (2019) 404–418

Fig. 6. Detection performance on interval values of threshold (AL = 6 s). The number of detected seizure is equal to N * sensitivity, where N (N = 91) is total number of the seizure events in the dataset. FDt /h is the accumulated time (s) of FD epochs per hour (the mean value and standard deviation on 29 patients). To report a meaningful result, the value of threshold 0.0001 was used instead of 0. Note that FDt /h is a more accurate measure than commonly-used FD/h (counting only FD number) because FD/h would record only one (or very few) FD that lasts hours when the threshold is close to 0, which is meaningless.

shows that, with the increasing of the threshold, both classifiers’ FDt /h reduces, but the sensitivity decreases simultaneously. LDA seems less sensitive to the threshold compared with SVM since LDA produces a more stable performance in a large range of DTs (between 0.1 and 0.9) than SVM, which also shows a longer FDt/h that may be undesirable in clinical practice. Two examples (see Figs. 15 and 16 in Appendix A) show the detailed output of LDA and SVM on same patients. 3.2.2. Performance of patient-specific decision thresholds (DTs) We use LDA to perform the seizure detection and report the event-based performance using patient-specific DTs. The LDA also achieved better performance than other classifiers when classifying non-seizure and seizure EEG epochs in most patients (see Fig. 13 in Appendix A). The AL should be smaller than a seizure duration to allow a right seizure alarm. We report detection performance of both LDA (Table 3) and SVM (RBF kernel) (Table 4) using AL with values of 2 s, 6 s and 10 s. A longer AL led to less FDs, but a reduced sensitivity and a longer latency. The large number of FD might be caused by numerous seizure imitators during interictal EEG in this ID population. The LDA achieved a better performance (clear margin of sensitivity) than SVM: sensitivity 75.0% with median FD/h of 1 and latency of 11.5 s when AL is 6 s. The detailed detection performance of LDA on each patient is shown in Fig. 14 of Appendix A. The accumulated number of detected seizures with respect to seizure duration is shown in Fig. 7. It shows that the seizures with a longer duration are easier to detect than the shorter ones. The seizures longer than 60 s were 100% detected. In total, 23 out of 43 seizures during wake and 27 out of 48 seizures during sleep were detected. There is no significant association found between the sensitivity and the status of wake or sleep (p = 0.6837, Fisher’s exact test). However, there were significantly less FDs during sleep than wake status (Fig. 8, paired t-test, p = 0.0162). Most undetected seizures are short. The possible factors are further addressed in the statistical results.

Fig. 7. Number of detected seizures with respect to seizure duration (AL = 6 s).

3.3. Statistical results of detection performance We first demonstrate the association between AUCPR , an indicator of classification performance (non-seizure vs seizure EEG epochs) and realistic, event-based detection performance (e.g. sensitivity and FD/h). We then statistically recognize the important EEG and clinical factors that affect detection performance.

Fig. 8. Average FD/h during wake and sleep status across all 29 patients (AL = 6 s). The FDs are counted on the recognized awake/sleep EEG segments. FDs during sleep are significantly less than during wake (paired t-test, p = 0.0162).

L. Wang et al. / Biomedical Signal Processing and Control 49 (2019) 404–418

411

Table 3 Detection performance of LDA by using different alarm length (AL). AL(s)

No. of detected seizures/seizures longer than AL

Sens. a (%)

FD/h (median, range)

FDt /hb (median, range)

Latency(s) (median, range)

2

74/91

81.3

6

57/76

75.0

10

41/65

63.1

2.87 [0.08, 131.0] 1.00 [0.04, 75.2] 0.17 [0, 42.4]

12.3 [0.1, 1455] 10.3 [0.2, 1447] 3.2 [0, 1426]

7.5 [4.0, 60.5] 11.5 [8.0, 64.5] 16.0 [12.0, 68.5]

a b

Seizures shorter than AL cannot trigger a seizure alarm, thus are excluded when computing sensitivity. Accumulated time (s) (per hour) of all FD epochs.

Table 4 Detection performance of SVM by using different alarm length (AL). AL(s)

No. of detected seizures/seizures longer than AL

Sens. * (%)

FD/h (median, range)

FDt /h** (median, range)

Latency(s) (median, range)

2

63/91

69.2

6

41/76

54.0

10

25/65

38.5

3.50 [0.12, 97.0] 0.58 [0.04, 52.8] 0.12 [0, 38.8]

12.5 [0.2, 1819] 5.6 [0.2, 1818] 1.5 [0, 1817]

6.7 [4.0, 62.0] 10.0 [8.0, 66.0] 14.0 [12.0, 38.0]

* **

Seizures shorter than AL cannot trigger a seizure alarm, thus are excluded when computing sensitivity. Accumulated time (s) (per hour) of all FD epochs. Table 5 p values of GLME model variables. Variablea

p value

Significancec

Visibility Pattern: WA + EMG Background Pattern: SPWA Pattern: SP + SPWA Pattern: SP + WA Interictal Pattern: EMG Pattern: SP Durationb Occipital type IQ Age

2.9904e−08 0.0008 0.0017 0.0229 0.0292 0.0717 0.0997 0.1908 0.2333 0.2891 0.4936 0.5577 0.9731

*** *** ** * * ns ns ns ns ns ns ns ns

a b c

Fig. 9. Detection performance of each patient (each point) by using the patientspecific DTs when AL = 6 s. The color represents the value of AUCPR . The capital letters represent a patient’s major discharge pattern: A (SP), B (SPWA), C (WA) and D (EMG). The dashed line is to guide eyes: 13 out of 14 patients at the left show AUCPR > 0.1, while 15 patients (AUCPR < 0.1) are at the right. Note that six patients who show two major patterns are denoted by two capital letters, e.g., ‘AB’.

3.3.1. AUCPR : overall classification performance for imbalanced data We used the patient-specific DT, i.e., an optimal trade-off between sensitivity and PPV (1-PPV = FD rate), with an AL of 6 s to perform the event-based seizure detection on each patient. Although the AUCPR can show the overall classification performance, it is sensitive to the imbalanced ratio, and the more imbalanced the data is, the lower the AUCPR [39]. To compare across studies with different imbalanced data levels, we also computed the sensitivity and FD/h, see Fig. 9. It shows the detection performance by using the patient-specific DTs. A larger AUCPR represents a better detection performance with either a higher sensitivity, a lower FD/h, or both. Given the highly imbalanced ratio of our EEG dataset (i.e., seizure epochs accounts for only 0.14% of whole EEG recordings), we found that the AUCPR larger than 0.1 showed a rel-

Variables are ranked according to p values. Duration is the accumulated time (s) of all EEG seizures on each patient. *** p < 0.001; ** 0.001 < p < 0.01; * 0.01 < p < 0.05; ns: p > 0.05.

atively good detection performance, i.e., sensitivity > 0.5, FD/h < 1. The AUCPR lower than 0.1 corresponds to either a low sensitivity or a high FD/h. 3.3.2. Multilevel modeling The GLME models the relationship between the detection performance (AUCPR ) and independent variables, i.e., EEG/clinical factors. Table 5 shows the important variables (with non-zero coefficients) and p values of their non-zero coefficients. Among all the EEG/clinical factors. Three factors including the visibility of EEG seizure boundaries, the EEG discharge patterns and EEG background are found to be significant with a confidence level of 95%. The four discharge patterns fast spike, spike-wave, wave and discharge with EMG activity are shortened as SP, SPWA, WA and EMG, respectively. Mixed patterns are denoted by containing patterns (e.g., WA + EMG). 3.3.3. Important EEG factors Seizure visibility: The seizure visibility was found to be the most important factor in the GLME model. Fig. 10 shows the detection

412

L. Wang et al. / Biomedical Signal Processing and Control 49 (2019) 404–418

Table 6 Across-patient seizure detection on long-term EEG data. Studies

Performance

EEG dataset

Methods

Saab et al. [42]

Sensi. 76.0%, FD/h of 0.34, median DD of 10 s

Furbass et al. [41]

Sensi. 81%, FD/h of 0.29 on 2-center data; sensi. of 67%, FD/h of 0.32 on CHB-MIT

Bayesian method, wavelet transform A computational method named EpiScan

Direito et al. [50]

Sensi. 38.47% and FD/h of 0.2

Mathieson et al. [38]

Sensi. of 52.6–75.0%, 0.04–0.36 FD/h

28 patients, 652 h of EEG, 126 seizures 3-center data: 205 patients, 310 retrospective patients, and CHB-MITa European epilepsy database, 216 patients, 16,729 h 70 babies from 2 centers

This work

LDA: Sensi. of 63.1–81.3%, median FD/h of 1.0, latency of 11.5 s; SVM: Sensi. of 38.5–69.2%, median FD/h of 0.58, latency of 10.0 s Sensi. of 68%, PPV of 81%, FDt /h of 0.76 s

Our previous studyb a b

29 ID patients, 615 h EEG with 91 seizures 29 ID patients, 615 h EEG with 91 seizures

multiclass SVM (linear kernel), spectral analysis SVM, features in the time-frequency domain LDA and SVM (RBF kernel), multi-domain features SVM (RBF kernel)

CHB-MIT: the Children’s Hospital of Boston-Massachusetts Institute of Technology dataset [51], which is the biggest freely available dataset by far. It performed a seizure pattern-specific detection on the same dataset and found that SVM achieved the best performance amongst several classifiers.

Fig. 10. Boxplot of AUCPR of patient groups with distinct (n = 13) and indistinct (n = 16) seizure boundaries (Mann–Whitney test, p = 0.00001).

performance of patients (n = 13) with distinct seizures is significantly different from those (n = 16) with indistinct seizures. However, a Chi-square test shows that the patients in the two groups are not comparable with respect to the factors: EEG discharge patterns, IQ level and EEG background (p = 0.0121). Therefore, we performed a Fisher’s exact test between visibility and other factors. The EEG background was found to show a nonrandom association (p = 0.0073) with visibility with an odds ratio of 10. This means that patients with abnormal background have about 10 times greater odds of showing indistinct EEG seizure than patients with a normal background. Interestingly, eight out of nine patients with discharges with EMG activity show distinct EEG seizures. Therefore, visibility of EEG seizures may be another possible reason why discharges with EMG activity has a better detection performance. Discharge patterns: We compare the detection performance AUCPR of four patient groups according to the discharge patterns (Fig. 11). The Kruskal–Wallis test shows that the distribution of AUCPR is different across four groups (p = 0.02). Patients with discharge pattern of EMG show the highest AUCPR . Patients with the discharge patterns of SP and WA show very low median values due to the detection failure on many patients (AUCPR < 0.1). By viewing the EEG signals of those patients, we found common EEG characteristics: (1) indistinct seizure boundaries; (2) extremely short seizures (i.e., T < 6 s); (3) focal or regional EEG changes during seizures (often associated with wave seizures); (4) poor contrast between seizure and seizure-like interictal EEG within/across patients.

Fig. 11. Boxplot of AUCPR of patient groups with the EEG discharge patterns: SP (n = 12), SPWA (n = 5), WA (n = 9) and EMG (n = 9). Note that six patients show two major discharge patterns and were counted twice.

Fig. 12. Average AUCPR of patient groups with EEG background types: normal (symmetric, n = 14); 1 (regional abnormality, n = 8); 2 (hemispheric abnormality, n = 5); 3 (bilateral abnormality, n = 2).

EEG background: The seizure detection performance declines with the increasing abnormality level of EEG background (see Fig. 12). In addition, we compared the AUCPR of two groups: the patients with normal EEG backgrounds and the patients with abnormal EEG backgrounds (including scores 1, 2, 3). The detection performance in patients with a normal background is significantly better than those with an abnormal one (Mann–Whitney test, p = 0.001). The Chi-square test shows that the patients in the two groups (normal and abnormal background) are homogeneous (sig-

L. Wang et al. / Biomedical Signal Processing and Control 49 (2019) 404–418

413

Fig. 13. Comparison of classification performance on each patient in LOOCV by using linear discriminant analysis (LDA), random forests (RF), random undersampling AdaBoosting (AdaBoost) and SVM with Gaussian kernels, respectively. The numbers of patients with AUCPR larger than 0.1 are 13 (LDA), 9 (RF), 9 (AdaBoost) and 8 (SVM).

Fig. 14. Event-based detection performance with AL = 6 s. The sensitivity is the proportion of detected seizures in the total number of seizures in each patient. The log scale is used to show FD/h. The latency is an average value on all detected seizures in one patient. All latencies are larger than 6 s. The null value of latency of some patients indicates that seizures are not detected in the patient.

Fig. 15. Normalized output (PS) of classifier LDA and SVM on the same patient (Subj#8, around 24 h EEG). Ann is experts’ annotation showing the locations of 12 seizures (0: non-seizure, 1: seizure). The LDA (AUCPR = 0.40) achieves a better classification performance than SVM (AUCPR = 0.24) on this patient. The optimal DTs are: DT = 1.2867e−04 (LDA), and DT = 0.5285 (SVM).

414

L. Wang et al. / Biomedical Signal Processing and Control 49 (2019) 404–418

nificance level of 0.05) with respect to the factors: EEG discharge pattern, IQ level and seizure visibility. Typical examples of normal and abnormal EEG background are shown in Figs. 17 and 18 in Appendix A. 4. Discussion Results show that the detection performance varies substantially across individuals. The variance of results is mainly caused by the different EEG factors across patients including the four discharge patterns, EEG background and seizure visibility. There were no clinical factors associated with the detection performance of EEG seizures. These findings are important for future applications such as the detection of NCSS [15]. 4.1. LDA versus SVM A complicated classifier (e.g., SVM with Gaussian kernel) did not perform better than a linear classifier (i.e., LDA) in the across-patient seizure detection. This is mainly caused by the heterogeneous EEG data of the ID population. The performance of a classifier on a new patient mainly depends on the deviation of the new patient from a learned distribution (on the training dataset) in the feature space. LDA tends to achieve better performance than SVM when the deviation of new data is large, while SVM tends to perform better when the deviation is smaller. Note that the deviation is small or almost close to zero in the patient-specific (i.e., within patients) detection and the seizure pattern-specific detection (if the EEG data is randomly divided into training and testing sets [13]). Indeed, it was found that in large-scale heterogeneous data, a linear classifier is often more robust than a nonlinear classifier [47]. The classifier ensembles may be another solution for the heterogeneous data problem [48]. In the case of our dataset, an ‘averaged’ performance between LDA and SVM can be more desirable because it may achieve generally higher classification performance on more patients. The patient-specific DT can be viewed as a post-processing of classification, which aims to help minimize the negative effects caused by the unbalanced and heterogeneous data problem. In practice, if longer EEG recordings (several days) of a specific patient are available, one day’s data can be used to obtain a patient-specific DT. In this case, the patientspecific DT can help avoid a huge failure (e.g., too many FDs) on an extremely ‘abnormal’ patient. In addition, adjusting the length of ALs may help further improve performance, especially for an online seizure detection. The choosing of an appropriate AL should depend on the prior information of an individual subject. Another undergoing project that our group is involved is performing a more extensive data collection (hundreds of patients) to obtain such prior knowledge (e.g., the seizure duration may be associated with a seizure pattern). 4.2. Comparison with other long-term EEG studies The state-of-the-art performance of seizure detection can be evaluated within or between persons: patient-specific seizure detection or within-patient seizure detection (i.e., testing on same persons, EEG segments of each person is available for training or optimizing detectors), across-patient seizure detection (i.e., testing on unseen persons). This can be done by using LOOCV and cross validation on multi-center datasets [41].The across-patient seizure detection has been proven to be much more challenging than the patient-specific detection. It is because that both manifestations of seizure EEG and background EEG can vary dramatically across individuals. The variability of EEG in ID people may be even larger than non-ID people due to uneven brain development disorders. These factors complicate the design of a generic seizure detector

for the ID people. To compare the seizure detection performance on non-ID people, we reviewed only studies that performed acrosspatient seizure detection on on long-term EEG recordings as shown in Table 6. Table 6 shows that in general, the state-of-the-art performance of the across-patient seizure detection is much worse than withinpatient seizure detection studies, where sensitivities of near 100% were often reported. See a review in [49]. Secondly, a large variance of performance can be seen across these studies. It suggests that the performance is mainly determined by the characteristics of a particular EEG dataset, i.e., the subject population. In addition, the low performance (a sensitivity of 38%) on the European epilepsy dataset (216 patients, multi-center datasets) indicates that a generic seizure detector is still missing. In the last, we also listed a previous study using the same EEG dataset here as a comparison. However, our previous study is not an across-patient seizure detection. Instead, it performed a seizure pattern-specific detection and found that SVM achieved a better performance than LDA. This is because that all subjects’ data in our previous study were gathered and randomized for a cross-validation testing. As a result, it minimized the heterogeneity of the EEG dataset by allowing a classifier to ‘see’ a part of EEG data of a testing subject. 4.3. Important factors and clinical relevance Seizure visibility: The distinct seizures show significantly better detection performance than indistinct seizures. The occurrence of indistinct seizures in this population may result from a combination of the EEG discharge pattern and EEG background. Such combinations have been found in spike-wave seizures [52] and non-convulsive seizure status [53]. Furthermore, the indistinct seizures may also result from slow transitions between normal and epileptic activities [54]. In addition, the significant association between seizure visibility and background indicates that the two factors may have a common origin. However, not all ID patients show abnormal backgrounds and indistinct seizures. Around half of the patients in this dataset show normal background activities and they tend to show more discharges with EMG activity, which are relatively easy to detect. Discharge patterns: The quantitative analysis of the surface EMG during epileptic seizures receives surprisingly little attention [28]. It seems to be a common practice to simply exclude all EMG activities to reduce FD [42]. However, in this study, the seizure detection shows desirable performance in patients with the seizure pattern of discharge with EMG activity. This finding agrees with our previous study that the EMG seizure epochs show the best classification performance. EEG signals with contamination of EMG (or muscle activities) still contain information that discriminates between epileptic and non-epileptic EEG discharge patterns (such as EMG artifacts caused by chewing and other voluntary muscular contractions) [55]. Given that 95% of the clinical seizures in this ID population are motor seizures that show the EMG seizure pattern [9], we can expect a desirable performance of this seizure detection method in a larger population. In contrast, the seizure detection shows low detection performance in patients with the seizure pattern of the fast spike, which agrees with the finding of our previous study that seizure epochs of fast spike show the worst classification performance. The fast spike often occurs at the seizure beginning with low-amplitude signals, referred to as fast intracerebral EEG activity [56] or the electrodecremental event [57]. The signals are significantly decorrelated during the fast spike seizures [57] and often have similar frequency components with fast interictal EEG activities or EMG artifacts with low amplitude. Therefore, they are difficult to distinguish from interictal EEG. The low performance of the spike-wave pattern may result from the high level of interictal epileptiform

L. Wang et al. / Biomedical Signal Processing and Control 49 (2019) 404–418

415

Fig. 16. Normalized output (PS) of classifier LDA and SVM on the same patient (Subj#16, around 24 h EEG). Ann is experts’ annotation showing the locations of 4 seizures (0: non-seizure, 1: seizure). The SVM (AUCPR = 0.55) achieves a better classification performance than LDA (AUCPR = 0.44) on this patient. The optimal DTs are: DT = 0.98 (LDA), and DT = 0.34 (SVM).

Fig. 17. Typical normal background of subject #16 with the Alfa activities on EEG channels P3-Pz. Note the eye-blink artifacts on EEG channels Fp1-Pp2.

discharges (IEDs) with a spike-wave pattern. Such spike-wave IEDs can occur during more than 50% of the entire EEG recordings in some individuals, while less than 1% in the intellectually normal epileptic patients. EEG background: The definition of background abnormities varies in different studies, e.g., hemispheric symmetry, a lack of faster activity [29], or the slow EEG in patients with ID. The definition of background abnormity is also related to age and circadian rhythm. For example, slow activities in the delta and theta band are normal in children but considered abnormal in adults [58]. In this study, we specifically define the background abnormalities as abnormal (i.e., mostly slow and irregular) interictal EEG activity classified according to its spatial distribution: focal (regional or one hemisphere) or bilateral generalized [59]. Results show that the detection performance tends to decline with the patients’ background abnormalities. Interestingly, ID patients often show similar EEG characteristics with neonatal EEG, and a strong association has been noticed between intractability and abnormal EEG background in childhood epilepsy [30]. ID patients have many seizure-like

artifacts and artifact-like seizures that also were found often in neonatal EEG [60]. Wake/sleep status: Apart from the EEG factors, seizure detection performance differs between wake and sleep status, i.e., patients tend to show more FDs during wake than during sleep status. A better detection performance may be expected when we use sleep- or wake-specific classifiers. However, the actual application will depend on reliable automated wake/sleep recognition methods, which is still a challenge because of the lack of specific EEG markers of sleep in this population, such as spindles or K-complexes [23].

4.4. Possible applications for detection of NCSS In clinical practice, long-term monitoring of EEG activity is often essential to detect series of seizures or NCSS [16]. Suitable ultra long-term (subcutaneous) EEG recording systems such as the Reveal® , are still lacking but under development [61]. Automated real-time detection of seizures will become essential elements of

416

L. Wang et al. / Biomedical Signal Processing and Control 49 (2019) 404–418

Fig. 18. Typical abnormal background of subject #1 with the prominent slow activities in centre brain region.

these long-term EEG recording systems. In the clinical practice of ID patients, the NCSS often lasts long (several minutes up to hours), with indistinct seizure boundaries due to a coexistence of the seizure and background state. The seizure detection method proposed in this study could be used in the NCSS detection. First, based on a well-annotated EEG dataset of NCSS, specific EEG features characterizing typical NCSS EEG, e.g., series of spike-waves [19] should be developed and added. Second, since NCSS is often longer and shows a coexistence of EEG discharge patterns and background patterns, the firing power approach with a longer sliding window (i.e., raising a seizure alarm when more than a half epochs in a sliding window are seizure candidates) will be more appropriate. Finally, given that the EEG factors have an effect on the seizure detection, the context information such as the context-based rules and other non-EEG physiological signals (e.g., heart rate/ECG) could be used to improve the detection performance.

5. Conclusions The EEG signals of the ID population form a heterogeneous entity with respect to important factors: EEG discharge patterns, EEG backgrounds and EEG seizure visibility. The performance of seizure detection varies significantly with these factors. The abnormal background and indistinct EEG seizures tend to occur simultaneously and may have a common origin. The seizure detector showed desirable performance on a dominant seizure pattern: discharge with EMG activity, which indicates a promising result for a generic detection in a larger population. A sensitivity of 100% was achieved for longer EEG seizures (>60 sec), making future automated detection of NCSS (with longer duration) feasible.

Acknowledgment We thank Prof. S. Van Huffel from KU Leuven for her insightful comments on this manuscript. We thank anonymous reviewers for the constructive reviews. We also thank the colleagues in Epilepsy Center Kempenhaeghe for the help on collecting and annotating the EEG dataset. Appendix A. More detailed results We compared several classifiers and chose the one with the generally best classification performance to perform seizure detection and statistical analysis. Results of LOOCV show large variance across individuals (see Fig. 13). LDA achieved better performance on a larger size of patients (AUCPR > 0 . 1, n = 13). More details about these classifiers and optimal parameters refer to our previous study [13] (see Figs. 14–18). References [1] I.L. Rubin, J. Merrick, D.E. Greydanus, D.R. Patel, Health Care for People with Intellectual and Developmental Disabilities Across the Lifespan, Springer, 2016. [2] K.J. Goulden, S. Shinnar, H. Koller, M. Katz, S.A. Richardson, Epilepsy in children with mental retardation: a cohort study, Epilepsia 32 (5) (1991) 690–697, http://dx.doi.org/10.1111/j.1528-1157.1991.tb04711.x. [3] S. McDermott, R. Moran, T. Platt, H. Wood, T. Isaac, S. Dasari, Prevalence of epilepsy in adults with mental retardation and related disabilities in primary care, Am. J. Ment. Retardation 110 (1) (2005) 48–56, pMID: 15568966. [4] J. Robertson, C. Hatton, E. Emerson, S. Baines, Prevalence of epilepsy among people with intellectual disabilities: a systematic review, Seizure 29 (2015) 46–62. [5] T. Matthews, N. Weston, H. Baxter, D. Felce, M. Kerr, A general practice-based prevalence study of epilepsy among adults with intellectual disabilities and of

L. Wang et al. / Biomedical Signal Processing and Control 49 (2019) 404–418

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19] [20]

[21]

[22]

[23]

[24] [25]

[26] [27]

its association with psychiatric disorder, behaviour disturbance and carer stress, J. Intell. Disability Res. 52 (2) (2008) 163–173. J.S. van Ool, F.M. Snoeijen-Schouwenaars, H.J. Schelhaas, I.Y. Tan, A.P. Aldenkamp, J.G.M. Hendriksen, A systematic review of neuropsychiatric comorbidities in patients with both epilepsy and intellectual disability, Epilepsy Behav. 60 (2016) 130–137, Available: http://www.sciencedirect. com/science/article/pii/S1525505016300324. U. Steffenburg, A. Hedstrm, A. Lindroth, L. Wiklund, G. Hagberg, M. Kyllerman, Intractable epilepsy in a population-based series of mentally retarded children, Epilepsia 39 (7) (1998) 767–775. R. Guerrini, P. Bonanni, A. Patrignani, P. Brown, L. Parmeggiani, P. Grosse, P. Brovedani, F. Moro, P. Aridon, R. Carrozzo, G. Casari, Autosomal dominant cortical myoclonus and epilepsy (ADCME) with complex partial and generalized seizures, Brain 124 (12) (2001) 2459–2475. T.M.E. Nijsen, J.B.A.M. Arends, P.A.M. Griep, P.J.M. Cluitmans, The potential value of three-dimensional accelerometry for detection of motor seizures in severe epilepsy, Epilepsy Behav. 7 (1) (2005) 74–84. S. Ramgopal, S. Thome-Souza, M. Jackson, N.E. Kadish, I.S. Fernndez, J. Klehm, W. Bosl, C. Reinsberger, S. Schachter, T. Loddenkemper, Seizure detection, seizure prediction, and closed-loop warning systems in epilepsy, Epilepsy Behav. 37 (2014) 291–307, Available: http://www.sciencedirect.com/science/ article/pii/S1525505014002297. J.B. Arends, J. van Dorp, D. van Hoek, N. Kramer, P. van Mierlo, D. van der Vorst, F.I.Y. Tan, Diagnostic accuracy of audio-based seizure detection in patients with severe epilepsy and an intellectual disability, Epilepsy Behav. 62 (2016) 180–185. L. Wang, J.B. Arends, X. Long, Y. Wu, P.J. Cluitmans, Seizure detection using dynamic warping for patients with intellectual disability, 2016 IEEE 38th Annual International Conference of the Engineering in Medicine and Biology Society (EMBC), IEEE, Conference Proceedings (2016) 1010–1013. L. Wang, J.B.A.M. Arends, X. Long, P.J.M. Cluitmans, J.P. van Dijk, Seizure pattern-specific epileptic epoch detection in patients with intellectual disability, Biomed. Signal Process. Control 35 (2017) 38–49, Available: http:// www.sciencedirect.com/science/article/pii/S1746809417300344. P.W. Kaplan, Assessing the outcomes in patients with nonconvulsive status epilepticus: nonconvulsive status epilepticus is underdiagnosed, potentially overtreated, and confounded by comorbidity, J. Clin. Neurophysiol. 16 (4) (1999) 341–352. S. Zehtabchi, S.G.A. Baki, S. Malhotra, A.C. Grant, Nonconvulsive seizures in patients presenting with altered mental status: an evidence-based review, Epilepsy Behav. 22 (2) (2011) 139–143. E. Trinka, H. Cock, D. Hesdorffer, A.O. Rossetti, I.E. Scheffer, S. Shinnar, S. Shorvon, D.H. Lowenstein, A definition and classification of status epilepticus-report of the ilae task force on classification of status epilepticus, Epilepsia 56 (10) (2015) 1515–1523, Available: http://onlinelibrary.wiley.com/store/10.1111/epi.13121/asset/epi13121. pdf?v=1&t=jeiazjrc&s=8df424bbbb23370e872df01fc8642671458db427. M. Holtkamp, H. Meierkord, Nonconvulsive status epilepticus: a diagnostic and therapeutic challenge in the intensive care setting, Ther. Adv. Neurol. Disord. 4 (3) (2011) 169–181, Available: https://www.ncbi.nlm.nih.gov/pmc/ articles/PMC3105634/pdf/10.1177 1756285611403826.pdf. R.S. Fisher, J.H. Cross, C. D’souza, J.A. French, S.R. Haut, N. Higurashi, E. Hirsch, F.E. Jansen, L. Lagae, S.L. Mosh, Instruction manual for the ilae 2017 operational classification of seizure types, Epilepsia 58 (4) (2017) 531–542, http://dx.doi.org/10.1111/epi.13671. R.S. Fisher, H.E. Scharfman, et al., How Can We Identify Ictal and Interictal Abnormal Activity? Springer, 2014, pp. 3–23. L. Wang, X. Long, J.B.A.M. Arends, R.M. Aarts, EEG analysis of seizure patterns using visibility graphs for detection of generalized seizures J. Neurosci. Methods 290 (2017) 85–94, Available: http://www.sciencedirect.com/ science/article/pii/S0165027017302510. N. Kane, J. Acharya, S. Benickzy, L. Caboclo, S. Finnigan, P.W. Kaplan, H. Shibasaki, R. Pressler, M.J. van Putten, A revised glossary of terms most commonly used by clinical electroencephalographers and updated proposal for the report format of the EEG findings. Revision 2017, Clin. Neurophysiol. Pract. 2 (2017) 170–185. L. Wang, P.J.M. Cluitmans, J.B.A.M. Arends, Y. Wu, A.V. Sazonov, Epileptic seizure detection on patients with mental retardation based on EEG features: a pilot study, 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Conference Proceedings (2015) 578–581. R. Degen, H.-E. Degen, The diagnostic value of the sleep EEG with and without sleep deprivation in patients with atypical absences, Epilepsia 24 (5) (1983) 557–566. W.T. Blume, G.B. Young, J.F. Lemieux, EEG morphology of partial epileptic seizures, Electroencephalogr. Clin. Neurophysiol. 57 (4) (1984) 295–302. R. Meier, H. Dittrich, A. Schulze-Bonhage, A. Aertsen, Detecting epileptic seizures in long-term human EEG: a new approach to automatic online and real-time detection and classification of polymorphic seizure patterns, J. Clin. Neurophysiol. 25 (3) (2008) 119–131. D.Y. Ko, Epileptiform Discharges, August 2016, Available: http://emedicine. medscape.com/article/1138880-overview. M. De Lucia, J. Fritschy, P. Dayan, D.S. Holder, A novel method for automated classification of epileptiform activity in the human electroencephalogram-based on independent component analysis, Med. Biol. Eng. Comput. 46 (3) (2008) 263–272.

417

[28] I. Conradsen, P. Wolf, T. Sams, H.B.D. Sorensen, n. Beniczky, Sá, Patterns of muscle activation during generalized tonic and tonic–clonic epileptic seizures, Epilepsia 52 (11) (2011) 2125–2132. [29] G.L. Krauss, The Johns Hopkins Atlas of Digital EEG: An Interactive Training Guide, Johns Hopkins University Press, 2011. [30] J.S. Ebersole, T.A. Pedley, Current practice of clinical electroencephalography, Eur. J. Neurol. 10 (5) (2003) 604–605, http://dx.doi.org/10.1046/j.1468-1331. 2003.00643.x. [31] M. Saadati, S. Faghihzadeh, S.H. Fesharaki, M. Gharakhani, Using Poisson marginal models for investigating the effect of factors on interictal epileptiform discharge in patients with epilepsy, J. Res. Med. Sci. 17 (9) (2012) 819–823. [32] R. DAmbrosio, J.W. Miller, What is an epileptic seizure? Unifying definitions in clinical practice and animal research to develop novel treatments, Epilepsy Curr. 10 (3) (2010) 61–66. [33] S. Schiff, Dangerous phase, Neuroinformatics 3 (4) (2005) 315–317. [34] B.R. Greene, W.P. Marnane, G. Lightbody, R.B. Reilly, G.B. Boylan, Classifier models and architectures for EEG-based neonatal seizure detection, Physiol. Meas. 29 (10) (2008) 1157. [35] A. Temko, E. Thomas, W. Marnane, G. Lightbody, G. Boylan, EEG-based neonatal seizure detection with support vector machines, Clin. Neurophysiol. 122 (3) (2011) 464–473. [36] L. Chisci, A. Mavino, G. Perferi, M. Sciandrone, C. Anile, G. Colicchio, F. Fuggetta, Real-time epileptic seizure prediction using AR models and support vector machines, IEEE Trans. Biomed. Eng. 57 (5) (2010) 1124–1132. [37] C. Teixeira, B. Direito, M. Bandarabadi, n. Dourado, Antó, Output regularization of SVM seizure predictors: Kalman filter versus the firing power method, 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Conference Proceedings (2012) 6530–6533. [38] S.R. Mathieson, N.J. Stevenson, E. Low, W.P. Marnane, J.M. Rennie, A. Temko, G. Lightbody, G.B. Boylan, Validation of an automated seizure detection algorithm for term neonates, Clinical Neurophysiology 127 (1) (2016) 156–168, Available: http://www.sciencedirect.com/science/article/pii/ S1388245715003168. [39] A. Sun, E.-P. Lim, Y. Liu, On strategies for imbalanced text classification using SVM: a comparative study, Decis. Support Syst. 48 (1) (2009) 191–201, Information Product Markets. Available: http://www.sciencedirect.com/ science/article/pii/S0167923609001754. [40] S.B. Wilson, M.L. Scheuer, C. Plummer, B. Young, S. Pacia, Seizure detection: correlation of human experts, Clin. Neurophysiol. 114 (11) (2003) 2156–2164. [41] F. Furbass, P. Ossenblok, M. Hartmann, H. Perko, A.M. Skupch, G. Lindinger, L. Elezi, E. Pataraia, A.J. Colon, C. Baumgartner, et al., Prospective multi-center study of an automatic online seizure detection system for epilepsy monitoring units, Clin. Neurophysiol. 126 (6) (2015) 1124–1131. [42] M.E. Saab, J. Gotman, A System to Detect the Onset of Epileptic Seizures in Scalp EEG, 2005. [43] A. Schad, K. Schindler, B. Schelter, T. Maiwald, A. Brandt, J. Timmer, A. Schulze-Bonhage, Application of a multivariate seizure detection and prediction method to non-invasive and intracranial long-term EEG recordings, Clin. Neurophysiol. 119 (1) (2008) 197–211. [44] H. He, E.A. Garcia, Learning from imbalanced data, IEEE Trans. Knowl. Data Eng. 21 (9) (2009) 1263–1284. [45] J.J. Hox, M. Moerbeek, R. van de Schoot, Multilevel Analysis: Techniques and Applications, 2017, Routledge. [46] X. Long, R. Haakma, T.R.M. Leufkens, P. Fonseca, R.M. Aarts, Effects of between-and within-subject variability on autonomic cardiorespiratory activity during sleep and their limitations on sleep staging: a multilevel analysis, Comput. Intell. Neurosci. 2015 (2015) 78. [47] A.D. Ker, Towards Robust Steganalysis: Binary Classifiers and Large, Heterogeneous Data (Thesis), 2013. [48] S. Gu, Y. Jin, Heterogeneous classifier ensembles for EEG-based motor imaginary detection, 2012 12th UK Workshop on Computational Intelligence (UKCI), IEEE, Conference Proceedings (2012) 1–8. [49] U.R. Acharya, S.V. Sree, G. Swapna, R.J. Martis, J.S. Suri, Automated EEG analysis of epilepsy: a review, Knowl. Based Syst. 5 (0) (2013) 147–165. [50] B. Direito, S. Teixeira, A. Cé, F. Sales, M. Castelo-Branco, N. Dourado, Antó, A realistic seizure prediction study based on multiclass SVM, Int. J. Neural Syst. 2016 (1750) 006. [51] A.H. Shoeb, Application of Machine Learning to Epileptic Seizure Onset Detection and Treatment (Thesis), 2009. [52] P.N. Taylor, Y. Wang, M. Goodfellow, J. Dauwels, F. Moeller, U. Stephani, G. Baier, A computational study of stimulus driven epileptic seizure abatement, PLOS ONE 9 (12) (2014) e114316. [53] P. Suffczynski, S. Kalitzin, F.H.L. Da Silva, Dynamics of non-convulsive epileptic phenomena modeled by a bistable neuronal network, Neuroscience 126 (2) (2004) 467–484. [54] F. Lopes da Silva, W. Blanes, S.N. Kalitzin, J. Parra, P. Suffczynski, D.N. Velis, Epilepsies as dynamical diseases of brain systems: basic models of the transition between normal and epileptic activity, Epilepsia 44 (Suppl 12) (2003) 72–83. [55] P. Jiruska, M. De Curtis, J.G.R. Jefferys, C.A. Schevon, S.J. Schiff, K. Schindler, Synchronization and desynchronization in epilepsy: controversies and hypotheses, J. Physiol. 591 (4) (2013) 787–797. [56] F. Wendling, F. Bartolomei, J.-J. Bellanger, R. Bourien, m.o. Jé, P. Chauvel, Epileptic fast intracerebral EEG activity: evidence for spatial decorrelation at seizure onset, Brain 126 (6) (2003) 1449–1459.

418

L. Wang et al. / Biomedical Signal Processing and Control 49 (2019) 404–418

[57] K.K. Jerger, T.I. Netoff, J.T. Francis, T. Sauer, L. Pecora, S.L. Weinstein, S.J. Schiff, Early seizure detection, J. Clin. Neurophysiol. 18 (3) (2001) 259–268. [58] J.G. Bogaarts, Quantitative EEG and Machine Learning Methods for the Detection of Epileptic Seizures and Cerebral Asymmetry (Thesis), 2017. [59] S. Noachtar, C. Binnie, J. Ebersole, F. Mauguiere, A. Sakamoto, B. Westmoreland, A glossary of terms most commonly used by clinical electroencephalographers and proposal for the report form for the EEG findings, Klin. Neurophysiol. 35 (1) (2004) 5–21.

[60] D.A. Shewmon, What is a neonatal seizure? Problems in definition and quantification for investigative and clinical purposes, J. Clin. Neurophysiol. 7 (3) (1990) 315–368. [61] J. Duun-Henriksen, T.W. Kjaer, D. Looney, M.D. Atkins, J.A. Sørensen, M. Rose, D.P. Mandic, R.E. Madsen, C.B. Juhl, EEG signal quality of a subcutaneous recording system compared to standard surface electrodes, J. Sens. 2015 (2015).