Electroencephalography and clinical Neurophysiology, 911 (1994) 438 443
438
,~ 1994 Elsevier Science Ireland Ltd. 0013-4694/94/$07.01)
E E G 93517
Evaluation of a computerized system for recognition of epileptic activity during long-term EEG recording T. Pietil i
S. Vapaakoski d, U. Nousiainen c, A. V irri b, H. Frey a, V. H ikkinen e and Y. Neuvo b
" Unit:ersity of Tampere, Department of Neurology, Box 607, SF-3310l Tampere (Finland), t, Tampere Unit:ersity of Technology, '" Vaajasalo Epilepsy Hospital, ,I Technical Research Centre 01' Finland, and '~ Tampere Unit:ersity Hospital, Department of Clinical Neurophysiology, Tampere (Finland)
(Acccptcd for publication: 25 January 1994)
Summary A new method of recognition of epileptic activity using adaptive segmentation in E E G during long-term intensive monitoring was developed in Tampere. The performance of the system was validated and compared to the commercially available discharge recognition system of Gotman. Twelve approximately 30 rain E E G segments recorded during intensive monitoring from 6 patients were analysed. On these E E G segments two E E G specialists marked the occurrence of epileptic activity independently. Later they re-evaluated any differences in their scoring. This consensus file was used as a reference in validating the performance of the two computer programs. We found that the program developed in T a m p e r e detected discharge activity more often than the G o t m a n system. Both systems performed poorly in spike recognition. In the specificity of the recognized segments, the G o t m a n systcm was better.
Key words: EEG; Long-term recordings; Automatic analysis
With the advent of long-term E E G recording the amount of raw E E G data to be analysed has become a problem. For a single patient one prolonged record (intensive monitoring or ambulatory E E G recording) takes at least 1 day, usually several days. In this kind of clinical setting epileptic seizures may go unnoticed because there is not always the possibility for 24 h surveillance by E E G technologists. Push-button arrangements, where the patient himself makes the detection of seizures, are not in every case reliable, as the patient may not notice the seizure at all, or may be incapacitated by it. Moreover, quite often there are no clinical manifestations of an electrographic epileptic burst. Therefore, the only way to be certain that no epileptic activity is missed has been to go through the whole recording manually. Such analysis is usually very time consuming, and the need for automatic analysis becomes apparent. Automatic analysis of E E G is itself a demanding process, as even the normal E E G has various forms in the time domain. It usually contains muscular and other artifacts as well as physiological variants which
* Corresponding author. Fax: 358-31-247 4351.
SSDI 01113 4 6 9 4 ( 9 4 ) 0 0 ( 1 2 8 - J
sometimes resemble epileptogenic transients. An experienced E E G analyser passes these artifacts quickly, but they may often lead to misinterpretations in automatic systems. The E E G computer analysis system most commonly used was introduced by Gotman in 1982, and is commercially available. However, it cannot detect every epoch of epileptogenic activity and also has a high proportion of false detections, up to 97.5% in corticography (Gotman 1985). It is also possible that some epileptic bursts will not be found, as missed epileptic phenomena have been insufficiently validated in the older analysis systems. The aim of the study was to validate the performance of the system for recognition of epileptic transients developed in Tampere and to compare it with that of Gotman.
The Tampere system To overcome the problems mentioned, a new signal analysis system has been developed for epilepsy analysis (V~irri et al. 19881. It consists of a preprocessor and an analysis processor and it operates in real time parallel with the video recording. ,An IBM compatible
L O N G - T E R M EEG R E C O R D I N G F O R R E C O G N I T I O N OF EPILEPSY
439 Two consecutive windows
]
I
IEMGI
000000
]
••
O OO • =O OO • •
EEG-signal
EEG
] .
.
.
~ .
.
.
.
~
.
TO ANALYSIS PROCESSOR
samples Fs = 50 Hz
O o• OO 000 •
X1
Xn Xn+l
F 1=F(X 1,X2,...,Xn)
00 •
•
X2n
F2=F(Xn+I,Xn+2,...,X2n)
FS=50Hz
G = I F 1 -F21
Fig. 1. The preprocessor block diagram which is used in the Tampere system for recognition of epileptic activity during long-term LEG recording. E E G signal is preprocessed using the steps illustrated in the diagram before the more detailed analysis.
personal computer is used as the analysis processor. Its main function is to monitor the results of the preprocessor and transfer the results and the raw E E G samples to the hard disk. The raw data are stored on the disk only when the analysis processor has drawn the conclusion that the signal contains interesting information. The calculation-intensive signal processing routines are performed on an add-on board connected to an expansion slot in the computer (Fig. 1). The add-on board contains the TMS32020 signal processor, R A M memory, I / O ports to the host processor and a 16-bit A / D converter which has been expanded to input 8 analog channels. The system is able to analyse 6 channels simultaneously (V~irri et al. 1988). First, the signal was divided into variable-length segments (Fig. 2). This process is commonly referred to as adaptive segmentation. Simultaneously with the segmentation a set of features is extracted from the segment whose end is being sought. When the end of the segment has been found, the segment is classified into
14
,,
45
93 / ~
,. 33
51,, ,I
: 211
3
:
143
1 ,, 23
l
52
> time Fig. 3. The adaptive segmentation method used in the Tampere system. The black dots present the samples taken from the digitized LEG signal. The absolute difference is calculated between the two functions. When the difference G reaches its local maximum, it is likely that the signals inside the two windows differ from another most.
one of the elementary signal classes on the basis of the features extracted. Our aim in selecting the segmentation algorithm was to find a reliable, robust algorithm fast enough to be used in multi-channel segmentation in real time. The algorithm we used has two equally long (0.8 sec each) consecutive windows which slide along the signal samples as they become available. A function F(x i, x i + i . . . . . x i + N ) is calculated of the samples inside both windows and the absolute difference between the two results G = I F ] - F 2 I is monitored (Fig. 3). When the difference reaches its local maximum, it is obvious that the signals inside the windows differ most from each other and a segment bound is drawn. The function F which is used in the calculation combines the average amplitude and the average difference of the two adjacent samples. By setting appropriate weights to both properties, the suitable function
143
4(1
40
E [xil+7.0EIxi+,-x,I i
Fig. 2. Example of the results of segmentation and classification. Different numbers correspond for different classes of activity. Total number of classes is 370.
1
i
1
is obtained, which is sensitive to both amplitude and frequency. In a sequence of signal samples, the average difference of the consecutive samples correlates directly with the mean frequency of the signal. The lower limit of the segment length was set to 0.48 sec and the maximum length was limited to 2.16 sec.
440
T. PIETIL)~ ET AI..
The selection of good features is an important element in designing pattern recognition systems. If the choice of features is not successful, then the different objects that should be separated with the classifier do not form clear clusters or groups of clusters, or the clusters which are formed do not differentiate the objects in the way the designer intended. The selection of the final features is often a time-consuming process done manually by examining the results obtained with each feature set. In the T a m p e r e system, a set of features is used that could describe the time-domain properties of the signal better than earlier systems using autoregressive model coefficients or related parameters. The set bears some resemblance to that introduced by Bankman and Gath (1987) and comprises the following: average amplitude of the segment variability of the amplitude greatest positive value in the segment greatest negative value in the segment - average difference between two consecutive samples - average value of the second derivative variability of the second derivative - mean amplitude of the delta band (0.5-3.5 Hz) mean amplitude of the theta band (3.5-8.5 Hz) - mean amplitude of the alpha band (8.5-12.5 Hz) - mean amplitude of the sigma band (12.5-17.0 Hz) mean amplitude of the beta band (17.0-25.0 Hz) In classification of the E E G signal, the feature vectors of each new segment are compared to the feature vectors of the elementary classes in the library. These model vectors have been obtained in the training phase of the system. The euclidian distance measure is used to determine which of the signal models is closest to the newly generated feature vector. The segment is classified to the class that is closest to it. To ensure most accurate classification the number of classes has to be set quite high. The number of elementary classes in the present library is now 370. The large n u m b e r of classes is partly due to the fact that sometimes in the training phase of the program the different types of p h e n o m e n a were partly mixed in the same class, and so several classes had to be split to enable more reliable classification. There are still a few classes where a sharp transient may appear in the middle of normal-looking activity. This is one of the possible sources of error in the classification but its effect seems to be minor. -
-
-
-
-
-
-
M
e
t
h
o
d
s
To evaluate the reliability of different epilepsy analysis methods 12 E E G samples lasting 30 min each from 6 epilepsy patients in Vaajasalo Epilepsy Hospital were digitalized. Samples were taken during routine inten-
sive monitoring, two samples per patient, so that one corresponds to the waking state and the other to sleep. The E E G samples were then analysed by both analysis systems using the default settings. Two experienced E E G specialists scored the samples blindly by means of a computer program developed for the purpose. The program displayed the scored files in 7 sec segments. It was possible to mark in a file every spike considered epileptic and also the start and end points of epileptic bursts using a mouse. The computer program composed from these markings a file which contained the starting and ending point of each epileptic segment. The E E G specialists had no information as to the results of the computer analysis or any clinical data concerning the patients in doing their scoring. First both scorers analysed the files separately, and marked epileptic bursts and spikes on them. Then they went through the files together and re-evaluated those parts which differed from each other using another computer program developed for that purpose, The result was a consensus file, which contained the starting and ending time of each epileptic spike and burst. This file was used as a reference in the evaluation of performances of the epilepsy analysis system of both Gotman and Tampere. An epileptic phenomenon was considered to be detected by the computer system if the detection by the computer and that of the human scorer lay within a 2 sec window, so each event was compared on an individual basis. As a result of the comparison several ratios were calculated: sensitivity, which was calculated by dividing the figure for computer-recognized epileptic phenomena by the number of epileptic p h e n o m e n a in the consensus file. If every single epileptic spike and burst had been recognized, that ratio would have been 100%. The second ratio calculated was the specificity. This was calculated by dividing the number of "rcal" (correct positive) epileptic p h e n o m e n a by the number of all computer recognitions. If all p h e n o m e n a which the program in question found had been real epileptic, the ratio would have been 100%. Not only were both ratios calculated for the computer programs, but also between the two human scorers.
Nature of the samples Two of the files contained well-defined 3 / s e c spike and slow wave bursts. Five of the files contained mainly fairly low voltage spikes and sharp waves. One file contained no epileptic activity according to the consensus score. The rest of the files contained mainly large amplitude sharp waves, sharp wave bursts or almost continuous complex epileptic activity. The detailed contents of the files are presented in Table I.
LONG-TERM EEG R E C O R D I N G FOR R E C O G N I T I O N OF EPILEPSY
TABLE I
441
TABLE III
Nature of the samples used in the study. Percentage of spikes means the percentage of single spikes of all epileptic phenomena. File Main type of EEG Ia lb 2a 2b 3a 3b 4a 4b 5a 5b 6a 6b
Percentage of spikes
Sensitivity 120% 100%
Spike-slow wave bursts 28 Single spike-slow waves, short bursts 72 Spikes, a lot of muscular artefacts 100 Focal spikes, fairly large amplitude 100 Small amplitude sharp waves 56 Small amplitude spikes 87 A lot of spikes and spike bursts 10 Very slow activity, fairly large amplitude spikes 79 A lot of small amplitude spikes 86 Small amplitude spikes, a lot of "spiky'" artefacts, technical? 86 A lot of artefacts, no clear epileptic phenomena Very small amplitude focal spikes 93
6O% 80% I 40%
?i!i
20% 0%
2a 2b [ 3a 3b 4a 4b 5a 5b 6a 6b 0% 37% 54% 166% 34% i 0% 0% 0% 50% Tampere [ 97% 50% G o t m a n 58% 11% 90% 0% 130% 49%20% 70% 40% 90% la
lb
46%
1Tampere
~ : Gotman
Results T h e results a r e p r e s e n t e d in T a b l e s II a n d III. O n e of t h e a n a l y s e d files c o n t a i n e d no e p i l e p t i c activity a c c o r d i n g to the c o n s e n s u s analysis a n d thus was not i n c l u d e d in t h e final results. T h e T a m p e r e system was very r e l i a b l e in t h o s e files which c o n t a i n e d c l e a r e p i l e p t i c spike-slow-wave bursts, the sensitivity b e i n g up to 97%, a n d specificity was also high, 78%. In the case of t h e i l l - d e f i n e d s e p a r a t e s h a r p waves a n d spikes, the p e r f o r m a n c e was not as good, as the T a m p e r e system failed to find m a n y o f t h e m at all, a n d t h e G o t m a n system was able to find only a few. In files which i n c l u d e d l a r g e r a m p l i t u d e spikes o r spike bursts. the p e r f o r m a n c e o f b o t h analysis systems was b e t t e r . O n a v e r a g e the sensitivity of the T a r n p e r e system was 31% a n d of the G o t m a n system 17%. Specificity for
TABLE 11
Specificity 100% r
60%
~:
40%
t
I
20%
0%~ Tamp Gotman
l a _ lb < 2 a /
2b I 3a
Discussion
~--
o
3b _ 4a_ 4 b
78%
0%0%
5a ~ 5b ~ 6a 2 6b 0%
96% 42% 67%'~/0%154% 83% 33% 39% 78% 18% 1
Tampere
~
t h e T a m p e r e system was on a v e r a g e 3 3 ~ a n d of the G o t m a n system 49%. T h e g r e a t e s t d i f f e r e n c e in specificity b e t w e e n the systems was in the files c o n t a i n i n g small a m p l i t u d e s h a r p waves, for e x a m p l e file 3a, as the T a m p e r e system p e r f o r m a n c e was p o o r e r in those. In the o t h e r files the p e r f o r m a n c e s were a b o u t the same. T h e a g r e e m e n t rate with the s c o r e r s when comp a r e d to the c o n s e n s u s results was 84% for o n e s c o r e r a n d 91% for the o t h e r . E x p e r t o p i n i o n d i f f e r e d most in the case of files c o n t a i n i n g spike b u r s t s a n d spikes. This was p a r t l y c a u s e d by c o n s i d e r a b l e v a r i a t i o n in the b e g i n n i n g a n d e n d i n g p o i n t s of the bursts a c c o r d i n g to the scorer; in most cases this was b e c a u s e one s c o r e r m a r k e d an e p i l e p t i c b u r s t with a p e r i o d of lower voltage as a single s e g m e n t a n d the o t h e r as two d i f f e r e n t s e g m e n t s . W h e n the scores w e r e c o m p a r e d with each o t h e r a c o n s i d e r a b l e i n t e r - o b s e r v e r d i s a g r e e m e n t was seen, as i n d e e d in o t h e r studies. This is b e c a u s e m a r k ing the exact start and e n d p o i n t s of every e p i l e p t i c s e g m e n t a n d finding every o c c u r r e n c e of spikes in the E E G is not s o m e t h i n g i n c l u d e d in the n o r m a l p r a c t i c e of a clinical E E G i n t e r p r e t e r .
Gotman
0%
24%
T h e a u t o m a t i c d e t e c t i o n of e p i l e p t i c activity liberates the physician from e v a l u a t i n g large a m o u n t s of d a t a in t h e intensive m o n i t o r i n g of epilepsy. T h e develo p m e n t of new, m o r e p o w e r f u l c o m p u t e r s has m a d e it p o s s i b l e to use c o m p l e x m a t h e m a t i c a l o n - l i n e analysis m e t h o d s with E E G r e c o r d i n g . H o w e v e r , no a u t o m a t i c m e t h o d d e v e l o p e d so far has b e e n able to m i m i c or to excel the p e r f o r m a n c e of the h u m a n analyser. T h e
442
main reason for this is that the analysis systems used cannot usually make decisions in a larger context, for example the record as a whole. The long-term approach in developing new, better E E G classification systems should be to develop a "holistic" approach, in other words it should mimic the human analyser more than the systems developed so far. The results obtained in this study were acquired using both computer systems with their default settings. By altering those settings sensitivity and specifity results would probably be different. However, by using the default settings it was possible to get an overview of the relative performance of the two systems. The validation of the automatic analysis method is a difficult methodological problem and there are no generally accepted principles available. A method was chosen in which analysis of the record was made by two experienced E E G specialists who later made a consensus scoring. This method was considered better than that used by G o t m a n et al. (1978), which had only two independent scorers without any consensus. That study found extremely good agreement between the computer and the human scorers. However, the samples used were very short, selected artefact-free. Several other attempts have been made to validate automatic epilepsy analysis methods, but they also use segments between 1 or 2 rain in length (Birkemeier et al. 1978; Fischer et al. 1980; Guedes de Oliveira et al. 1983; Jayakar et al. 1989) or they were edited from a comparatively artefact-free period of light sleep stage (Hoestetler et al. 1992). The samples used in the present study were taken from everyday clinical E E G s with all their physiological and technical artefacts. For that reason, the calculated agreement percentages were not comparable to those in earlier studies. A later study by G o t m a n (1990) defined missed detections as those in which the patient pushed the alarm button, but the analysis program did not recognize an epileptic segment. In that study there was no allowance for possible epileptic events which went unnoticed by both the computer and the patient. In view of this study there may be occurrences of epileptic activity which are lost in this way. Low-voltage sharp waves were difficult for both systems. The duration of the waves was longer than the duration of a spike by definition. The spike recognition algorithm used spike length according to the definition. It is arguable whether these sharp waves should be included in the analysis, but they were classified as epileptic by both scorers. When c o m p a r e d to the G o t m a n analysis system, the system developed in T a m p e r e was found to be more sensitive for epileptic events, especially in the case of epileptic bursts. However, the T a m p e r e system had more false positive detections, mainly due to its so far poorly working spike detection. In clinical practice,
T. PIETIL,~ ET AL.
where as many epileptic p h e n o m e n a as possible have to be found, it is important that no or very few epileptic bursts go undetected. Even though the amount of data is larger due to wrong detections it is still reduced compared to the original amount, making analysis easier. The most difficult problem in the T a m p e r e system and others developed so far seems to be their susceptibility to artefacts due to muscle activity or technical difficulties aside of its inability to recognize spikes very well. These artefacts may resemble epileptic wave forms to a great extent and may in some cases deceive even the human observer for a short time. To eliminate such false positive findings from automatic analysis systems, the computer program should bc able to " s e e " in a larger context and to assimilate information contained in several channels. Recently G o t m a n introduced a new, more advanced version of his spike detector (Gotman and Wang 1991). In it the spike detection is sensitive to the state of the EEG, and is able to identify some artefacts typical for each state (eye blinks in wakefulness and so on). More recently they showed the false detections to be significantly reduced, and even the true detections were increased as it was possible to lower the detection threshold in some E E G states (Gotman and Wang 1992). However, as it was not possible to measure the absolute sensitivity of the method, it is not possible to say if the sensitivity of this system has improved. In the near future a practical development goal in the T a m p e r e system is also a more advanced version of the context-sensitive burst and spike recognition system with special emphasis on reliable spike recognition. This study was supported by the Academy of Finland.
References Bankman, I. and Gath, I. Feature extraction and clustering of EEG during anaesthesia. Med. Biol. Eng. Comput., 1987, July: 474-477. Birkemeier, W.P., Fontaine, A.B., Celesia, G.G. and Ma, K.M. Pattern recognition techniques for the detection of epileptic transients in EEG. IEEE Trans. Biomed. Eng., 1978, 25: 213-217. Fischer, G., Mars, N.J.I. and Lopes da Silva, F.H. Pattern recognition of epileptiform transients in the electroencephalogram. Progr. Rep. Inst. Med. Phys., 1980, 7: 22-31. Gotman, J. Automatic detection of epileptic seizures in the EEG. E[ectroenceph. clin. Neurophysiol., 1982, 54: 530-540. Gotman, J. Seizure recognition and analysis, ln: J. Gotman, J.R. lves and P. Gloor (Eds.), Long-Term Monitoring in Epilepsy. Electroenceph, clin. Neurophysiol., Suppl. 37. Elsevier, Amsterdam. 1985: 133-145. Gotman, J. Automatic seizure detection: improvements and evaluation. Electroenceph. clin. Neurophysiol., 199(I, 76:317 324. Gotman, J. and Wang, L.Y. State-dependent spike detection: concepts and preliminary results. Electroenceph. clin. Neurophysiol., 1991, 79: 11-19.
LONG-TERM EEG RECORDING FOR RECOGNITION OF EPILEPSY Gotman, J. and Wang, L.Y. State-dependent spike detection: validation. Electroenceph. clin. Neurophysiol., 1992, 83: 12-18. Gotman, J., GIoor, P. and Schaul, N. Comparision of traditional reading of the EEG and automatic recognition of interictal epileptic activity. Electroenceph. clin. Neurophysiol., 1978, 44: 48-60. Guedes de Oliveira, P.H.H., Queiroz, C. and Lopes da Silva, F.tt. Spike detection based on a pattern recognition approach using a microcomputer. Electroenceph. clin. Neurophysiol., 1983, 56: 97103. ttoestetler, W.. Doller, H. and Homan, R. Assessment of a computer
443
program to detect epileptiform spikes. Electroenceph. clin. Neurophysiol., 1992, 83: 1-I 1. Jayakar, P., Patrick, J.P., Shwedyk, E. and Seshia, S.S. Automated rule based graded analysis of ambulatory cassette EEGs. Electroenceph, clin. Neurophysiol., 1989, 72: 165-175. Varri, A., Neuvo, Y., Loula, P. and Heikkil~i, H. Computerized classification of long-term recordings of epileptic EEG. In: M. M~ikel~i, S. Linnainmaa and E. Ukonen (Eds.), Proceedings of STEP. Finnish Artificial Intelligence Symposium, Helsinki, 15-18 August, 1988: 53-62.