Classification of electroencephalographic seizure recordings into ictal and interictal files using correlation sum

Classification of electroencephalographic seizure recordings into ictal and interictal files using correlation sum

Computers in Biology and Medicine 39 (2009) 604 -- 614 Contents lists available at ScienceDirect Computers in Biology and Medicine journal homepage:...

930KB Sizes 0 Downloads 17 Views

Computers in Biology and Medicine 39 (2009) 604 -- 614

Contents lists available at ScienceDirect

Computers in Biology and Medicine journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / c b m

Classification of electroencephalographic seizure recordings into ictal and interictal files using correlation sum Maria Tito a , Mercedes Cabrerizo a,∗ , Melvin Ayala a , Armando Barreto a , Ian Miller b , Prasanna Jayakar b , Malek Adjouadi a a

Center for Advanced Technology and Education, Department of Electrical and Computer Engineering, Florida International University, 10555 W. Flagler Street, Miami, FL 33174, USA Department of Neurology, Miami Children's Hospital, 3100 South West 62nd Ave. Miami, FL 33155, USA

b

A R T I C L E

I N F O

Article history: Received 30 May 2008 Accepted 14 April 2009 Keywords: iEEG Epilepsy Seizure detection Correlation sum Nonlinear decision functions

A B S T R A C T

This study provides a performance evaluation of the correlation sum in terms of accuracy, sensitivity, and specificity in its ability to classify seizure files from non-seizure files. The main thrust of the study is whether computable properties (“metrics”) of EEG tracings over time allow a seizure to be detected. This study evaluates raw intracranial EEG (iEEG) recordings with the intent to detect a seizure and classify different EEG epoch files. One hundred twenty-six iEEG files from eleven sequential patients are processed and the correlation sum is extracted from non-overlapping scrolling windows of 1-s duration. The novelty of this research is in defining a generalized nonlinear approach to classify EEG seizure segments by introducing nonlinear decision functions with the flexibility in choosing any degree of complexity and with any number of dimensions, lending resiliency to data overlap and opportunity for multidimensional data analysis. A singular contribution of this work is in determining a 2-D decision plane, in this case, where duration is one dimension and window-based minima of the correlation sum is the second dimension. Also, experimental observations clearly indicate that a significant drop in the magnitude of the correlation sum signal actually coincides with the clinical seizure onset more so than the electrographic seizure onset as provided by the medical experts. The method with k-fold cross validation performed with an accuracy of 91.84%, sensitivity of 92.31%, and specificity of 91.67%, which makes this classification method most suitable for offline seizure detection applications. © 2009 Elsevier Ltd. All rights reserved.

1. Introduction Epilepsy is considered as one of the most common neurological disorder affecting 3 million people in the United States alone. According to Centers for Disease Control and Prevention, one out of 100 adults has active epilepsy, with approximately 20% of them who continue to experience seizures despite sufficient medical treatment. According to the World Health Organization, it is estimated that 50 million people worldwide have epilepsy, especially children and adolescents. Treatment options in intractable situations are limited with some resorting to focal resections of abnormal brain tissue when the epileptogenic region can be accurately defined; a critical task that may require intracranial EEG recordings of seizures to define their

∗ Corresponding author. Tel.: +1 305 348 7054; fax: +1 305 348 3707. E-mail addresses: [email protected] (M. Tito), [email protected] (M. Cabrerizo), [email protected] (M. Ayala), [email protected] (A. Barreto), [email protected] (I. Miller), [email protected] (P. Jayakar), [email protected] (M. Adjouadi). 0010-4825/$ - see front matter © 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.compbiomed.2009.04.005

onset and region of involvement. More recently, an alternative treatment option has evolved where chronic intracranial implants apply electrical stimuli directly to the brain surface with the ultimate aim of preventing or aborting seizures. The implants continuously record the EEG activity and apply the stimuli when seizures are detected or are presumed to be imminent. With these more recent developments in mind, the intent of this research endeavor is initially focused on developing an automated algorithm for the detection of seizures, based on iEEG data that would yield as the two primary aims: (1) a high sensitivity (i.e., minimum number of false negatives) and (2) a high specificity (i.e., minimum number of false positives). A stopping condition for delineating seizure from non-seizure files assumes the highest accuracy attainable to ensure that that both primary aims are concurrently met. Seizures occur intermittently and unpredictably. Hence, whether the planned treatment option is focal brain resection or chronic stimulation implants, massive amounts of EEG data needs to be analyzed preferably in real time to detect seizures; a challenge that can only be met through reliable and computationally efficient seizure detection paradigms. Over the past two decades, several automated

M. Tito et al. / Computers in Biology and Medicine 39 (2009) 604 – 614

seizure detection paradigms focusing mainly on EEG recorded from the scalp surface have been reported with different degrees of success and inherent challenges [1–7]. These studies include the use of multi-channel trends, application of neural networks, use of orthogonal transforms, genetic programming, and all dwell in either time or frequency domains. The scalp EEG, however, is known to have a much lower signal to noise ratio compared to EEG recorded directly from the cortical surface of the brain. In the area of epilepsy, where one of the most important goals is to detect and ultimately predict seizures, the use of many measures has been practiced by various research groups for many years with varying degrees of success. In the context of this study, many of the methods currently available in the specialized literature have been tested yielding different results. The experience gained through the different implementations reveal that the issue of contention is not in the implementation of such measures, but in determining which ones are more suitable. In this initial assessment, it was determined that the correlation sum is the temporal measure that performed best, while the Gamma power is found to be the most revealing frequency range for seizure detection purposes. This study focuses on the correlation sum, leaving the findings on Gamma power to be reported in another publication. EEG and iEEG signals have played an important role in the modeling of the brain's cortical dynamics and have been analyzed over the past two decades with much effort towards a better understanding of the functional characteristics of the brain, including the complex and yet to be resolved problem of seizure prediction [8–12]. Researchers have thus considered different approaches using a diversity of linear and nonlinear parameters in order to automate processes of seizure detection. In an effort to conduct a detailed investigation on iEEG data towards seizure detection, this study implements the correlation sum and analyzes its behavior in time, looking for patterns that would reveal that a seizure is developing. Correlation sum is currently the most common basis on which the claim of chaotic dynamics has been made in biological systems [13–15]. Correlation dimension oriented analysis applied to raw EEG and some variations including autocorrelation and entropy [16] are being directed with encouraging results, especially in seizure detection, where promising results have been substantiated [17–21]. Consequently, accurate detection of seizures with the minimal amount of false negatives remains the primary goal, and the development of an effective algorithm that would work across patients should serve as an important platform across all other standard features that have been proven useful in the different studies that have been reported in the past. This study presents a unique method for seizure detection based on aggregating the correlation sum of all electrodes into one single measure, and then projecting this measure into a two-dimensional plane where nonlinear decision functions classify seizure files (files that contain seizures with their onset identified) from non-seizure files. Such an outcome is ideal in minimizing the time for offline EEG analysis. The performance of the algorithm was evaluated making use of the receiver operating characteristics (ROC) terminology [22].

605

Table 1 Patients' information: demographics and availability of ictal and interictal files. Subject #

Age (years)

Gender

Interictal files

Ictal files

1 2 3 4 5 6 7 8 9 10 11

10 16 3 14 9 11 14 17 13 8 10

Male Male Female Male Male Female Male Male Male Female Male

0 0 0 0 15 15 36 12 9 4 1

5 2 3 2 3 3 6 2 5 2 1

recording electrodes and thus a different number of iEEG files that needed to be analyzed. Furthermore, the iEEG had different recording time lengths and different seizure outcomes. Up to 88 subdural electrodes were implanted on the surface of the cerebral cortex of each patient to record seizure activity. Intracranial recordings of the eleven patients were performed by using subdural strips or grid electrodes. In some cases, four contact depth electrodes were implanted. Recordings were performed during pre-surgical monitoring at the Miami's Children Hospital (MCH) using XLTEK Neuroworks Ver.3.0.5, equipment manufactured by Excel Tech Ltd. Ontario, Canada. This data was collected at 500 Hz sampling frequency and filtered to remove DC and high frequency components using a 0.1–70 Hz bandpass filter. The referential montage was used for the analysis of the files and further calculations. Two different methods to validate our results were implemented in this study. The first method which is a single training/testing set included a total of 35 (21 interictal and 14 ictal) iEEG data files collected randomly in a training phase to ascertain the reliability of the correlation sum in the seizure classification process. These data files (portions of the EEG) were selected randomly using a random generator process. The remaining 91 (71 interictal and 20 ictal) iEEG data files were then used in the testing phase. It is emphasized that once the decision function is established using the training set, it remains unchanged across all subsequent testing data that is received in a chronological manner as patients' data becomes available. The second method implemented is a k-fold cross validation which included three folds (33% each) from 60% of the data. Every time training was performed, a different single fold was used for cross validation and the remaining two folds were used for weight update, whereas the remaining 40% of the original data was used for pure testing. The overall performance was based on the average of the three different decision functions obtained. 2.2. Data analysis

2.1. Data collection

2.2.1. Data preprocessing The relevant information for the patients considered in this study is provided in Table 1. For all patients, the length of the files was approximately from 10 to 20 min. The EEG segments were classified into ictal or interictal by an experienced clinical neurophysiologist. It was statistically more appropriate in selecting randomly these files in order to evaluate the merit of the seizure classification algorithm, since the input to the algorithm was not the raw iEEG data itself but the transformed raw iEEG into correlation sum signals.

Clinical experiments involved 11 sequential patients taken from an epilepsy center with intractable seizures that were admitted for Stage 1 surgical interventions. Their age range varied from 3 to 17 years. The number of subdural strips and/or grid electrodes that were implanted varied for each patient depending on the nature of the diagnosis. Consequently, each patient had a different number of

2.2.2. Extracting the correlation sum Due to the high volume of information contained in the prefiltered iEEG data files, two key preprocessing steps were performed for the following reasons: (1) reduce the data to be analyzed and thus reduce the computational time as well and (2) seek a transformation of the raw iEEG data in order to enhance the accuracy

2. Methods

606

M. Tito et al. / Computers in Biology and Medicine 39 (2009) 604 – 614

Fig. 1. An illustrative example of correlation sum for all electrodes versus time (shown in terms of samples) for three seizures from patient 1 (left) and patient 3 (right). The vertical green line is the seizure onset as identified by medical experts. Note that the scales are different in the different subjects, thus making absolute thresholds impossible to use.

and sensitivity of the seizure classification algorithm. In step (2) the transformation chosen is that of correlation sum after a thorough evaluation of several other standard parameters such as mobility, complexity, activity [23], spectral powers and others. The iEEG data files were further analyzed with one second (1-s) timed windows in order to account for small EEG changes, and the correlation sum was extracted for all these 1-s windows and for each electrode. The correlation sum, which is mostly used to detect randomness in data, is

Cm,r = Cr =

2 N(N − 1)

N  k=1

⎡ ⎣1 N

N  i=1



(r − xi − xk )⎦

(1)

where N is the total number of samples in the window. The symbol (.) defines the step function (yielding +1 for a positive argument and zero otherwise), and xi − xk  is the Euclidean distance between vectors xi and xk . Vector xi as used in the correlation sum is a point in the embedded phase constructed from the input iEEG signal as a single time series in accordance with the following equation: − → x i = (xi , xi + , xi + 2, . . . , xi + (m − 1))

(2)

where m is the embedding dimension and  is a delay. For this study, m is set with a value of 4 and  is set with a value of 3. The correlation sum can thus be interpreted as the number of point pairs inside a hyper-ball of radius r. The radius that was selected was a

M. Tito et al. / Computers in Biology and Medicine 39 (2009) 604 – 614 Table 2 Patients' information: training and testing files used. Patient #

1 2 3 4 5 6 7 8 9 10 11

Total

No. of training files

No. of testing files

With seizure

With seizure

Without seizure

Without seizure

3 1 2 1 2 2 3 0 0 0 0

0 0 0 0 5 10 6 0 0 0 0

2 1 1 1 1 1 3 2 9 4 1

0 0 0 0 10 5 30 12 5 2 1

14

21

20

71

607

numerous signals, we end up with only one representative signal referred to as the inter-electrode mean. Our use of the inter-electrode mean is a result of the experimental studies [8,24–28] that reveal that the correlation sum signals tend to interlock in behavior at the onset moment of a seizure for all electrodes. With this fact, it is emphasized that the concept of averaging for a representative signal does not sidetrack from the main intent of detecting a seizure with the highest sensitivity and specificity possible. At the same time such a step minimizes to a great extent the computational burden [24,29] that would have been required in dealing with the raw iEEG data, and simplifies greatly the seizure detection process. Fig. 2 provides an illustrative example of the behavior of correlation sum for every single electrode used for a particular patient, and the results obtained after averaging yielding the so-called inter-electrode mean signal S . 2.3. Seizure detection rules

very small number based on the mean and standard deviation of the original iEEG data. These parameters were tested with different values in order to obtain the optimum numbers that can be used for the purpose of this study. Fig. 1 shows the behavior of the generated correlation sums over time for three different seizures of two different patients, which exemplify the fact that the dynamics of the correlation sum vary within a same patient and more so across patients, which only makes the problem more challenging. A singular characteristic of the correlation sum, however, is in its magnitude which abruptly changes at the time of the seizure for all electrodes with an evident decrease of the standard deviation. The vertical green line represents the electrographic seizure onset previously labeled at the observation room by the EEG technicians and doctors. It should be noted that the onset as marked by the EEG experts and doctors and the results of the synchronized drop in magnitude of the correlation sum do not coincide in time, which only heightens the need for an automated seizure detection process. 2.2.3. Aggregating EEG features In this study, the analysis was performed on an inter-patient basis, and therefore, some patient files were used as training files and the remaining files were kept for testing. As earlier mentioned, a total of 35 files some containing ictal activity and other containing interictal activity were used for training, and the remaining 91 files also with both ictal and interictal activity were used for testing. All of these files were selected randomly. The distribution of training and testing files are as displayed in Table 2. The classification algorithm was trained using 35 data files out of the 126 files, or 28%. Both seizure and non-seizure files were thus used for training. It is extremely important to emphasize that from a methodological standpoint the “ictal” files contain some interictal data as well. In contrast to this first scenario, interictal files do not contain any ictal activity at all. Such facts add to the randomness of the data and to the complexity of the problem. As can be inferred from Table 2, the data from subjects 8 to 11 was not included in the training set, as these last patients were obtained much later in the study which included initially only the seven patients available at that time. This is for the better, since it will validate the classifier's ability to perform well on an interpatient level and as these patients' data is acquired with time. As mentioned earlier, the number of electrodes used varied from one patient to another as a function of the medical condition; therefore, in order to have a general and more simplified algorithm, this procedure is solely based on an aggregated descriptor. The average of the correlation sum across all electrodes for each time window was thus computed for each EEG data file. Therefore, instead of having

2.3.1. General approach The inter-electrode mean for the correlation sum is used to detect a seizure by first establishing a unique statistical threshold that could work across patients. The selection criterion used is based on a rule that states that if an inter-electrode mean never crosses this computed statistical threshold, then the entire file is considered a nonseizure file and is removed from further analysis. Consequently, all other files whose inter-electrode means cross the statistical threshold even if it is for a single occurrence in time were considered for further inspection. In other words, the inter-electrode mean signal serves as the first requirement towards determining whether a file contains a seizure or not. It is clear that this statistical threshold is established with great caution as it bears significant consequences with it, and is considered in this study as a first critical step for the entire study. 2.3.2. Dilemma of the threshold The seizure classification method is based on a rule involving crossing a threshold using the inter-electrode mean signal S whose magnitude varies from patient to patient. This is essentially a dilemma that is faced due to unreliable and changing thresholds and varying standard deviations that can be experienced even within a single patient. With these facts, the problem becomes difficult to contain not only in terms of these noted variations, but also in ascertaining in a meaningful way the performance evaluation of the classifier. An illustrative example of the variation of the interelectrode mean signal between different patients is given in Fig. 3. Note that the magnitude of the correlation sum varies from patients 1 to 2. Nevertheless, a zoom-in, as shown in Fig. 3(c), reveals a similar behavior as that shown in Fig. 3(a), with a significant drop in magnitude of the correlation sum at the time of the seizure. Another example showing different plots of the inter-electrode mean signal from the same patient is given in Fig. 4. Note that even within the same patient, big variations in magnitude can be observed from seizure to seizure. Therefore, a general statistical threshold was established for the correlation sum that will work across all patients independently of its magnitude. The variant threshold was evaluated to be one standard deviation  of the inter-electrode mean signal S . A seizure is said to never occur if the signal is always above this threshold. 2.3.3. Nonlinear decision functions At this stage of the investigation, two measurements were taken into consideration. The first measurement is the duration in time in which S was consistently below the threshold and the second measurement is determined as the minimum value of S .

608

M. Tito et al. / Computers in Biology and Medicine 39 (2009) 604 – 614

Fig. 2. Characteristics of the inter-electrode mean: (a) original EEG signal and (b) top figure shows the behavior of the correlation sum for each of the 48 electrodes used for subject 1, seizure 5, and the bottom figure displays its respective inter-electrode mean signal.

M. Tito et al. / Computers in Biology and Medicine 39 (2009) 604 – 614

609

Fig. 4. Illustration of the variation of the inter-electrode mean signal within the same patient: (a) seizure 1 of patient 4 and (b) seizure 2 of patient 4.

Fig. 3. Inter-electrode mean signals for: (a) seizure 2 of patient 2; (b) seizure 1 of patient 1; and (c) zoomed-in view of (b).

With these two measurements in place, a table was constructed to train the seizure detector. The table contains as many records as data files were used in the training, whereas each record contains three values: the two aforementioned measurements and also a target (+1 for seizure file and −1 for non-seizure file). Recall that non-seizure

data files that did not meet the first requirement were not used in this table since they are already identified as true negatives, and were removed earlier. Nonlinear decision functions were created based on the training data. The classification results are displayed in 2-D space, which helps in visualizing the geometrical placement of the pattern classes with respect to the resulting decision function. All points on the decision function curve itself are considered of an undetermined class, since the decision function is zero for those points. In the result plots, positive and negative signs denote seizure and non-seizure files, respectively. The x-axis represents the duration, while the y-axis represents the minimum of S . If the 2-D space is chosen appropriately, which is the most challenging part of the problem, patterns of the same class will tend to cluster together and the nonlinear decision functions will yield optimal results. To address this problem and begin its implementation steps, each nonlinear decision function is obtained from the training data using the gradient descent method. The gradient descent method is known as an optimization algorithm that seeks a local minimum for a given function. The problem of classification is therefore one  = 0, which defines the that is reduced to identify the boundary D(X) decision function itself, in order to partition the pattern space into  is the vector containing the n dimensions. In the different classes. X

610

M. Tito et al. / Computers in Biology and Medicine 39 (2009) 604 – 614 Table 3 Results using the K-folds method during training.

case treated here, the classification rules are as follows:  > 0, If D(X)  < 0, If D(X)  = 0, If D(X)

− → X ∈ class of seizure files. − → X ∈class of non-seizure files. indeterminate condition.

Nonlinear decision functions are well established in the literature [30], but it is their implementation that is regrettably lacking in the resolution of real-world problems such as the one addressed in this study. The general recursive formula as expressed in (3) allows for deriving any nonlinear decision function with any number of dimension (n) and with any degree of complexity (r), and is structured as follows:  = Dr (X)

n n  

···

P1 =1 P2 =P1

n 

WP1 P2 ...Pr XP1 XP2 . . . XPr + Dr−1 = 0

(3)

 = W11 X 2 + W12 X1 X2 + W22 X 2 + W1 X1 + W2 X2 + W3 = 0 D(X) 1 2

(4)

where W11 , W12 , W22 , W1 , W2 , and W3 are the weights; and X1 and X2 are representatives of the first and second dimension, respectively. To obtain the optimal weights of this function, a cost function C was established as the sum of the square errors between the decision function output and the targets as follows: N 

 − Tp )2 (Dp (X)

Accuracy (%)

Sensitivity (%)

Specificity (%)

K1 K2 K3 Average

92.11 90.79 95.95 92.95

91.3 100 86.36 92.55

92.45 86.79 100 93.08

Table 4 Single training/testing of the nonlinear decision functions vs. K-folds results. Method

Accuracy (%)

Sensitivity (%)

Specificity (%)

Single training/Testing (28%/72%) r = 2 K-Fold average results from Table 3

92.31 91.84

95.00 92.31

91.55 91.67

Pr =Pr−1

 = This equation which is recursive in r is defined such that D0 (X) Wn+1 , which is the last element of the augmented weight vector. The way the proposed classification process was set up resulted in decision functions that are 2-D (duration and minimum) with a second order in complexity, resulting in second-order polynomials with two variables (X1 and X2 ). Thus, for the problem at hand, (3) with n = 2 and r = 2, reduces to

C=

K-Folds

(5)

p=1

where p serves as the index for the data points, where a data point is represented by its dimensions (X1 and X2 ) in reference to (4). N is the amount of data points (in the files that met the first requirement for  seizure detection); Tp is the target for point p (−1 or +1) and Dp (X) is the resulting value of the decision function evaluated at point p. The optimal weights, which define the ultimate nonlinear decision function, are those that yield a minimal value for the cost function in (5). Nonlinear decision functions are superior to linear decision functions and guarantee better separability in overlapping data sets [22], which is the case in most real-world data sets. Other good candidates to deal with high data overlap are support vector machines (SVMs) which have known significant success in data classification [31–34]. SVMs can best be described as generalized classifiers which map input vectors to a higher dimensional space (feature space) where a maximal separating hyperplane is sought to separate two different classes. The idea is thus to minimize the classification risk by maximizing the inter-class margins or distances. The mapping of the data into the so-called feature space is accomplished by means of kernel functions and the decision function is expressed in terms of the so-called support vectors. The other approach that was considered involves k-fold cross validation. Using a k-fold cross validation (instead of performing only a single training and testing step) made the estimates of sensitivity, accuracy and specificity more accurate. The results of such an approach are also quite effective if the entire data is available at once, unlike in this study were the patients data is acquired sequentially. Furthermore, as can be observed in Tables 3 and 4 the results using k-folds were very good as well, and are even better in terms of sensitivity. This last measure is the most important since it takes into consideration the false negatives (FN).

The nonlinear decision approach as proposed is comparable to the polynomial kernel of the SVM in its functional ability, but the proposed method which was fully implemented by this research group allowed us to investigate thoroughly the consequences in changing the complexity order (r) of the data set, with the flexibility for establishing other unique dimensions that can be envisioned in the future. It is therefore this flexibility of the nonlinear decision functions and their ability to handle complex overlapping data that led to the proposed nonlinear approach. The minimization of the cost function for all data points is achieved by setting the partial derivatives of this cost function with respect to all the weights to zero. This generates a system of equations that can be solved iteratively following the gradient descent method approach, in which the weight increments are dictated by a learning rate that is chosen to be small enough to simultaneously seek high accuracy and fast convergence rates. Thus, for a given point p, the notation of (5) simplifies to the following equation: C = (D − T)2

(6)

to which the chain rule is applied in the following way:

*C *C *D = *W *D *W

(7)

where

*C = 2(D − T) *D

(8)

By subsequently applying this procedure with respect to all the weights, the following set of equations can be obtained:

W11 = 2(T − D)X12 ⎤ ⎢ W12 = 2(T − D)X1 X2 ⎥ ⎥ ⎢ ⎢ W22 = 2(T − D)X 2 ⎥ 2 ⎥ ⎢ ⎥ ⎢ W = 2(T − D)X 1 1 ⎥ ⎢ ⎦ ⎣ W = 2(T − D)X 2 2 W3 = 2(T − D) ⎡

(9)

where  represents the learning rate assumed in the training phase of the algorithm. A set of weights was evaluated and recalculated throughout the optimization. With all the weights determined, the decision functions expressed through (4) can now be established. Since the training files constitute a relative small percentage of the total amount of data files, training was set to stop when classification accuracy of all patterns no longer improved with subsequent iterations. Purposely, no cross-validation was performed. Accuracy was defined to be the “proportion of the number of correct detections to the total number of detections”.

M. Tito et al. / Computers in Biology and Medicine 39 (2009) 604 – 614

611

Table 5 Learning rate used and optimum weights obtained before and after rescaling.

Read EEG data

Extract correlation sum from all electrodes

 W11 W22 W12 W1 W2 W3

Compute the average signal Sμ across all electrodes.

Before

After

1.00E−15 +3.97E−7 0.00 1.80E−13 −5.89E−5 +1.00 +1.50E−13

0.01 −133.88 +0.54 −5.14 +25.62 −212.22 −1.27E−2

No

Does S μ cross the statistical threshold σ ? Yes Count the number of consecutive seconds in the part of the signal that crossed the statistical threshold and find the minimum of Sμ in the interval where the condition above is met and then use both as inputs of the decision function D ( X )

D( X ) > 0

No

Yes Seizure detected

Seizure not detected

Fig. 5. Flowchart of the procedural steps followed for each file.

The different steps of the procedure that were followed each time a specific file is processed are displayed in Fig. 5. Note that there are two conditions that need to be satisfied for a file to be declared as containing a seizure: (a) crossing the set statistical threshold, this is the main condition that is evaluated first, and then (b) yielding positive value for the decision function. 2.4. Testing results In the single training/testing set method, the testing phase of the detector was conducted on the remaining 91 data files that were not used during training. An interesting development led to a new finding in terms of the scales of the dimensions used with the nonlinear decision function, namely duration. Experimental results reveal that the time for the nonlinear decision to be generated was extremely long (measured in hours) only because the two dimensions used had completely different scales with minimum measured in 1/1000 units while duration was in 100s of seconds). It was determined that when we rescale the duration by dividing that scale by 1000, not only did the results improve but the convergence rates of the nonlinear decision function was reduced dramatically to minutes. It is emphasized that the placements of the data does not change with rescaling. What follows are the result obtained before and after rescaling. The learning rate, as well as the calculated optimum weights before and after rescaling is shown in Table 5. For visual appreciation, Fig. 6 shows plots of the decision function and the training and testing points used to create and test the detectors, respectively. In these figures, each point represents a file that passes the first requirement for containing a seizure. The x-axis represents the

Fig. 6. Positive (+) and negative (−) points used for training and testing and plot of the decision functions obtained before and after re-scaling. The x-axis represents the duration, given in seconds or seconds∗1000, whereas the y-axis represents the minimum value of the inter-electrode correlation sum, divided by its average. Table 6 Results obtained before and after testing the detectors on the 91 testing files across all patients. Category

Before rescaling

After rescaling

TN FP FN TP Accuracy (%) Sensitivity (%) Specificity (%)

66 5 2 18 92.31 90.00 92.96

65 6 1 19 92.31 95.00 91.55

first dimension (duration); and the y-axis represents the second dimension (minimum of the correlation sum). The plots show a remarkable separation of seizure and non-seizure data. The confusion matrices that were obtained before and after rescaling are displayed in Table 6 with their corresponding accuracy and sensitivity values. With a reassessment of these results one can observe that sensitivity has improved with rescaling. As it can be observed from Table 6, the total amount of false negatives (FN) is somehow small and the TP value is high. Also sensitivity is 95% after rescaling. Based on the inter-patient nature of this procedure, the results can be considered as highly significant. It should be noted at this time that although a linear decision function appears to be sufficient in delineating seizure from nonseizure data in the training set considered, empirical results reveal

612

M. Tito et al. / Computers in Biology and Medicine 39 (2009) 604 – 614 Table 7 Single training/testing results contrasting linear vs. nonlinear decision functions. Method

Accuracy (%)

Sensitivity (%)

Specificity (%)

Single training/Testing (28%/72%) r = 1 Single training/Testing (28%/72%) r = 2

84.62 92.31

95.00 95.00

81.69 91.55

Table 8 Running time to generate the decision functions. Running time (h:min) Before rescaling After rescaling

72:25 00:32

obtained contrasting the performance of linear decision functions (r = 1) as given in Fig. 7 vs. the results of its nonlinear counterpart (r = 2) given earlier in Fig. 6. Table 7 summarizes the results in terms of accuracy, specificity and sensitivity contrasting linear and nonlinear decision functions as obtained from the training set and as applied to all the remaining testing files. 2.5. Computational requirements With the rescaling of duration, it is interesting to note the significance of rescaling with respect to the computational requirements on determining the nonlinear decision functions. Table 8 displays the running time of the training procedure before and after rescaling. The comparative time requirements given in Table 8 show that with rescaling, nonlinear decision functions on the training sets were generated 135 times faster than without rescaling, and the results did improve in terms of yielding a higher sensitivity. The reason why the decision functions are generated much faster is because both dimensions have now more similar spans with rescaling, and therefore it took less time for the algorithm to converge as a much smaller number of iterations is now required with rescaling. It is emphasized that the relative placement of the data does not change with rescaling as can be seen from Fig. 6. 3. Conclusions

Fig. 7. Contrasting results using linear decision functions with r = 1: (a) training plot using 28% of the total files available but drawn purposely from only seven patients as in Table 3 and (b) testing data from all patients from the remaining 72% of the files.

that nonlinear decision function are more resilient as larger testing sets are considered. Moreover, since the stopping condition is set to be for maximum accuracy, we have to allow for the algorithm to converge where it may as long as the stopping condition is met. Recall that the highest accuracy would mean the highest sensitivity, highest specificity and highest precision concurrently. Thus, as more data is progressively collected, there may be a need for increasing randomly the training dataset in order to enhance the prospect for an optimal decision function, and where a linear decision function may not be as evident. Furthermore, we should also anticipate the use of this generalized algorithm for different time domain features such as mobility or complexity. To provide experimental credence to this argument, consider the initial training set, and note as illustrative examples the results

This study evaluated the suitability of the correlation sum on subdural EEG signals for the classification of seizure files. The classification process was developed in a 2-D plane where nonlinear decision functions are optimized. To that end, an inter-patient analysis was conducted by applying the correlation sum detector to all patients, whereas a seizure was found depending on two measurements: (1) the time where the inter-electrode mean S is below the standard deviation  and (2) when the minimum value of S in that specific interval is obtained. The method with k-fold cross validation performed with an accuracy of 91.84%, sensitivity of 92.31%, and specificity of 91.67%, which makes this classification method most suitable for offline seizure detection applications. In retrospect, it is worth mentioning that although only 28% of the files for the single training/testing set method were used for creating the detector, high sensitivity and accuracy measures were still achieved on the larger testing data set. It is also interesting to note that in this method no cross-validation was performed, which only highlights the merits of the proposed classification method. The nonlinear decision functions were derived using the gradient decent method, which proved effective with a nonlinearity of the second order (r = 2) and only two dimensions (n = 2) were used as we sought an approach that is easier to implement, successful in its classification results, and yet computationally efficient. It is

M. Tito et al. / Computers in Biology and Medicine 39 (2009) 604 – 614

important to note at this juncture, that the Principal Component Analysis (PCA) which is not included in this study could potentially be used in the future to establish other discriminating dimensions [35–37] in order increase the domain of analysis from 2- to 3-D or higher dimensions, as more data is obtained and as the inherent overlap of real-world data is more apparent. The computational requirements for creating the nonlinear decision functions using the single training/testing set method during the training phase and the ensuing results during the testing phase reveal additional findings that are quite interesting: (1) the data clusters of seizure-files seem more spread out than those data clusters of files not leading to seizures, which clearly prove that seizures which are atypical events obviously vary greatly among subjects, and that such nonlinear decision functions are most capable of delineating such wide-ranging behaviors and (2) with the parallel drawn to eigenvectors, it is interesting to note that when the duration dimension is rescaled, the overall positioning of the data clusters of the non-seizure files relative to the seizure files did not change but that two major gains were noted: (1) the first gain, which is viewed as major, is the computational requirement which reduced by 135 fold in generating the nonlinear decision functions using the single training/testing set method and (2) the second gain is that the correlation sum results actually improved in terms of sensitivity, reducing further the number of false negatives. Conflict of interest statement None declared. Acknowledgments The authors appreciate the support provided by the National Science Foundation under grants CNS-0426125, HRD-0833093, CNS0520811, CNS-0540592 and IIS-0308155. The authors are also thankful for the clinical support provided through the Ware Foundation and the joint Neuro-Engineering Program with Miami Children's Hospital. References [1] O. Smart, H. Firpi, G. Vachtsevanos, Genetic programming of conventional features to detect seizure precursors, Engineering Applications of Artificial Intelligence 20 (8) (2007) 1070–1085. [2] A. Bragin, C.L. Wilson, T. Fields, I. Fried, J. Engel Jr., Analysis of seizure onset on the basis of wideband EEG recordings, Epilepsia 46 (2005) 59–63. [3] J. Gotman, Automatic detection of seizures and spikes, Journal of Clinical Neurophysiology 16 (1999) 130–140. [4] M. Adjouadi, M. Cabrerizo, M. Ayala, D. Sanchez, P. Jayakar, I. Yaylali, A. Barreto, Interictal spike detection using the Walsh transform, IEEE Transactions on Biomedical Engineering 51 (5) (2004) 868–873. [5] G. Calvagno, M. Ermani, R. Rinaldo, F. Sartoretto, Multiresolution approach to spike detection in EEG, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing 6 (2006) 3582–3585. [6] R. Chander, E. Urrestarrazu, J. Gotman, Automatic detection of high frequency oscillations in human intracerebral EEGs, Epilepsia 47 (4) (2006) 37. [7] J. Gotman, Automatic recognition of epileptic seizures in the EEG, Electroencephalography and Clinical Neurophysiology 54 (1982) 530–540. [8] J.C. Sackellares, L.D. Iasemidis, Seizure warning and prediction, Patent number US 6304775 B1, 2001. [9] Y.C. Lai, M.A. Harrison, M.G. Frei, I. Osorio, Inability of Lyapunov exponents to predict epileptic seizures, Physical Review Letters 8 (2003) 91–96. [10] L.D. Iasemidis, J.C. Sackellares, R.L. Gilmore, S.N. Roper, Automated seizure prediction paradigm, Epilepsia 39 (1998) 56. [11] L.B. Good, S. Sabesan, S.T. Marsh, K. Tsakalis, L.D. Iasemidis, D.M. Treiman, Automated seizure prediction and deep brain stimulation control in epileptic rats, Epilepsia 48 (6) (2007) 278. [12] M. D'Alessandro, R. Esteller, G. Vachtsevanos, A. Hinson, J. Echauz, B. Litt, Epileptic seizure prediction using hybrid feature selection over multiple intracranial EEG electrode contacts: a report of four patients, IEEE Transactions on Biomedical Engineering 50 (5) (2003) 603–615. [13] G.W. Frank, T. Lookman, M.A.H. Nerenberg, C. Essex, J. Lemieux, W. Blume, Chaotic time series analyses of epileptic seizures, Physica D 46 (1990) 427–438.

613

[14] M.R. Guevara, Chaos in electrophysiology, concepts and techniques in bioelectric measurements: is the medium carrying the message?, in: A.-R. LeBlanc, J. Billette (Eds.), Editions de l'Ecole Polytechnique de Montreal, 1997, pp. 67–87. [15] L.D. Iasemidis, A. Barreto, R.L. Gilmore, B.M. Uthman, S.A. Roper, J.C. Sackellares, Spatiotemporal evolution of dynamical measures precedes onset of mesial temporal lobe seizures, Epilepsia 35 (1994) 133–134. [16] A. Bezerianos, S. Tong, N. Thakor, Time-dependent entropy estimation of EEG rhythm changes following brain ischemia, Annals of Biomedical Engineering 31 (2003) 221–232. [17] N.S. Abend, D. Dlugos, S. Herman, Neonatal seizure detection using multichannel display of envelope trend, Epilepsia 49 (2) (2008) 349–352. [18] A. Shoeb, H. Edwards, J. Connolly, B. Bourgeois, T. Treves, J. Guttag, Patientspecific seizure onset detection, in: Proceedings of the 26th Annual International Conference of the IEEE EMBS, San Francisco, CA, USA, September 1–5, 2004. [19] A.J. Gabor, Seizure detection using a self-organizing neural network: validation and comparison with other detection strategies, Electroencephalography and Clinical Neurophysiology 107 (1) (1998) 27–32 (6). [20] J. Martinerie, C. Adam, M. Le Van Ouyen, Epileptic seizures can be anticipated by non-linear analysis, Nature Medicine 4 (1998) 1173–1176. [21] M. Adjouadi, M. Cabrerizo, M. Ayala, D. Sanchez, P. Jayakar, I. Yaylali, A. Barreto, Detection of interictal spikes and artifactual data through orthogonal transformations, Journal of Clinical Neurophysiology 22 (1) (2005) 53–64. [22] R. Kohavi, F. Provost, Glossary of terms, Machine Learning 30 (1998) 271–274. [23] B. Hjorth, The physical significance of time domain descriptors in EEG analysis, Electroencephalograms and Clinical Neurophysiology 34 (1970) 321–325. [24] M. Cabrerizo, M. Adjouadi, M. Ayala, M. Tito, Pattern extraction in interictal EEG recordings towards detection of electrodes leading to seizures, Biomedical Sciences Instrumentation 42 (ISA vol. 464) (2006) 243–248. [25] J. Arnhold, K. Lehnertz, P. Grassberger, C.E. Elger, A robust method for detecting interdependences: application to intracranially recorded EEG, Physica D 134 (1999) 419. [26] A.M. Albano, C.J. Cellucci, R.N. Harner, P.E. Rapp, Optimization of embedding parameters for prediction of seizure onset with mutual information, in: A.I. MEES (Ed.), Nonlinear Dynamics and Statistics: Proceedings of an Issac Newton Institute, Birkhauser, Boston, 2000, pp. 435–451. [27] R. Larter, B. Speelman, A coupled ordinary differential equation lattice model for the simulation of epileptic seizures, Chaos 9 (1999) 795–804. [28] M. Cabrerizo, Subdural EEG analysis for extracting discriminating measures in epileptogenic data, Ph.D. Dissertation, Department of Electrical and Computer Engineering, Florida International University, Summer 2006. [29] M. Cabrerizo, M. Tito, M. Ayala, M. Adjouadi, A. Barreto, P. Jayakar, An analysis of subdural EEG parameters for epileptic seizure evaluation, in: CD Proceedings of the 16th International Conference on Computing CIC-2007, Mexico City, 2007. [30] J.T. Tou, R.C. Gonzalez, Pattern Recognition Principles, Addison-Wesley, Reading, MA, 1974. [31] V.N. Vapnik, A.J. Chervonenkis, Theory of Pattern Recognition, Nauka, Moscow, 1974. [32] V.N. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, 1995. [33] C.J.C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery 2 (2) (1998) 121–167. [34] N. Cristianini, J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-based Learning Methods, University of Cambridge Press, Cambridge, 2000. [35] P. Levan, L. Tyvaert, J. Gotman, Model-free detection of bold changes related to epileptic discharges using independent component analysis, Epilepsia 48 (6) (2007) 397–398. [36] S. Ghosh-Dastidar, H. Adeli, N. Dadmehr, Principal component analysisenhanced cosine radial basis function neural network for robust epilepsy and seizure detection, IEEE Transactions on Biomedical Engineering 55 (2 Part 1) (2008) 512–518. [37] K. Kobayashi, I. Merlet, J. Gotman, Separation of spikes from background by independent component analysis with dipole modeling and comparison to intracranial recording, Clinical Neurophysiology 112 (3) (2001) 405–413.

Maria Tito is a new Ph.D. graduate (Spring 2008) from the Department of Electrical and Computer Engineering at Florida International University. Her research work has focused on the detection of seizures in subdural EEG recordings using nonlinear methods for data clustering and decision making. Her Ph.D. Dissertation was entitled: “Seizure Detection in Subdural Recordings Using Nonlinear Decision Functions”. Her early research work towards her M.S. degree which she obtained in the spring of 2004 involved a parametric analysis to make key observations in the elusive problem of seizure prediction. Mercedes Cabrerizo received her B.S. (Magna Cum Laude) and M.S. degrees in Computer Engineering, and her Ph.D. degree in Electrical Engineering all from Florida International University. In 2002, Mercedes obtained the prestigious National Science Foundation Graduate Fellowship in pursuit of her Ph.D. degree. Her Ph.D. Dissertation was entitled “Subdural Electroencephalogram Analysis for Extracting Discriminating Measures in Epileptogenic Data”. Mercedes is now a Research Scientist with the Brain Institute at Miami Children's Hospital. She is a member of Eta Kappa Nu

614

M. Tito et al. / Computers in Biology and Medicine 39 (2009) 604 – 614

Electrical Engineering Honor Society and the Tau Beta Pi Association, the Society of Hispanic Professional Engineers (SHPE), and the Society of Women Engineers. ¨ Melvin Ayala has a Bachelor's degree in Economic Engineering (Dipl-Ing-Ok) from the University of Applied Sciences (Germany) in 1984 and a Ph.D. (Dr. oec) from the same institution in 1987. He has been a Visiting Professor and Researcher at the State University of Sao Paulo, Brazil and has done consulting for Eurolatina Energia S.A. in the energy sector in Venezuela in the past. He has authored several publications related to software engineering, artificial intelligence and biomedical signal processing. He has been the Lab Manager of the Center for Advanced Technology and Education at Florida International University from 2002 to present. Armando Barreto is an Associate Professor in both the Departments of Electrical & Computer Engineering and Biomedical Engineering at Florida International University (FIU), in Miami, Florida. He is the Director of the Digital Signal Processing (DSP) Laboratory at FIU. Armando's academic background is in Electrical Engineering: B.S. degree from UNAM, Mexico (1987), M.S. degree from FIU (1989), and Ph.D. degree from UF, Gainesville (1993). As a faculty member Armando has sought the application of DSP techniques for the development of devices and systems that may facilitate access of individuals with disabilities to computers. Examples of these endeavors are the design and implementation of “hands-off” cursor control systems. Armando is a member of the Institute of Electrical and Electronics Engineers (IEEE), and the Association for Computing Machinery (ACM). Ian Miller is a Staff Neurologist with the Department of Neurology with the Brain Institute at Miami Children's Hospital (MCH). He received his M.D. degree form the University of Iowa, Iowa City, IA. Prior to joining MCH, he served as a Resident in Pediatrics at the University of Utah, Salt Lake City, UT and as a Resident in Neurology and Pediatric Neurology with the University of Washington, Seattle, WA. He is board certified with American Board of Psychiatry and Neurology with Licensure

in Florida, Washington, and Utah as well as a DEA license. He has received the National EpiFellows Foundation Scientific Forum Award, the J. Kiffin Penry Residents Program in Epilepsy Award, the Child Neurology Society Annual Meeting Resident Scholarship, and the American Academy of Neurology Annual Meeting Resident Scholarship. Prasanna Jayakar is the Director of the Neuroscience Center, and Chairman of the Brain Institute, at Miami Children's Hospital. He received his M.D. degree from the University of Bombay, India in Pediatrics, and a Ph.D. degree from the University of Manitoba, Canada in Neurophysiology/Bio-Engineering. He has held numerous positions including residency with the University of Bombay in Medicine and Pediatrics, Fellow in Pediatric Neurosciences with the University of Manitoba, and Fellow in Clinical Neurophysiology with the Harvard Medical School. He has also received the F.W. Horner Award for Clinical Research with the University of Manitoba, Department of Pediatrics; and the President's Prize in Child Neurology with the Canadian Association of Child Neurology. Malek Adjouadi is a Professor with the Department of Electrical and Computer Engineering with a joint faculty appointment with Biomedical Engineering at Florida International University. He is the founding director of the Center for Advanced Technology and Education funded by the National Science Foundation since 1993. He received his B.S. degree from Oklahoma State University and his M.S. and Ph.D. degrees all in EE from the University of Florida. Malek's earlier work on Computer Vision to help blind persons led to his testimony to the US Senate on the Committee of Veterans Affairs on the subject of technology to help persons with disability. Malek is also Co-leading the Joint Neuro-engineering Program between FIU and Miami Children's Hospital in researching methods for understanding key brain dysfunctions such as epilepsy. His research interests are in image and signal processing with applications to assistive technology research and neuroscience.