Electroencephalography and Clinical Neurophysiology, 1977, 4 2 : 2 6 7 - - 2 7 4
267
© Elsevier/North-Holland Scientific Publishers Ltd
Technical contribution ON.LINE COMPUTER REJECTION OF EEG A R T I F A C T
*
A.S. GEVINS, C.L. YEAGER, G.M. ZEITLIN, S. ANCOLI and M.F. DEDON
EEG Systems Group, Langley Porter Neuropsychiatric Institute, University of California, San Francisco, Calif. 94143 (U.S.A.) (Accepted for publication: June 7, 1976)
Automated detection and elimination of the wide variety of EEG artifact is essential to the development of practical EEG analysis systems which can be used in clinical and experimental laboratories. The purpose of this report is to describe simple frequency domain procedures which detect the occurrence of gross artifact caused by head and body movements, and muscle (EMG) or eye-movement (EOG) potentials. Objective evaluation of the performance of these procedures will be presented. Although various techniques for detecting, and/or eliminating the influence of artifact-contaminated EEG have been reported, none have proven satisfactory for routine use in the clinical or experimental laboratory. Reported techniques have shown inadequate detection performance, and have either required individual manual adjustment, or have been based on inflexible decision criteria. Additionally, systematic evaluation of detection performance has not been undertaken. In the simplest of these techniques, a computer program or special-purpose device is used to determine if the in-coming signal exceeds a fixed voltage. With individual threshold adjustment, this procedure adequately detects large EOG and movement artifacts in normal waking EEGs, but is less effective when used on abnormal or sleep EEGs. Girton and Kamiya (1973) electronically subtracted EOG artifact from the EEG using a simple circuit. However, extra electrodes and individual adjustment were required. In automated EEG classification paradigms based on orthonormal transforms followed by linear discriminant analysis, Bishop and Wilson (1972) reported that artifact-contaminated EEG could perhaps be distinguished from normal and paroxysmal EEGs. While this off-line methodology would likely increase the difficulty of the classification problem, it merits further investigation as a means of supplementing on-line artifact detection. Gotman et al. (1973, 1975) at-
*
Supported in part by U.S. Public Health Service Grant NS 10471.
tempted to eliminate the confounding effects of eyemovement artifact in patients with supratentorial brain lesions by assigning lower weights to features computed from low-frequency spectral coefficients of the frontal channels. An attempt was also made to compensate for muscle potential artifact by subtracting a function of the spectral intensities in the 30--50 c/sec range from spectral intensities in the beta band. It was concluded that these techniques were not adequate to prevent distortion of spectral features characterizing the degree of EEG abnormality. Matougek and Peters~n (1973) formed an index of possible artifact contamination to qualify the computer's decision of the degree of abnormality of EEGs recorded from patients with supratentorial brain lesions. Separate agedependent quotients were formed for each patient, one with, and one excluding, spectral intensities in the 1.5 --3.5 c/sec and 17.5--25 c/sec ranges. Disagreement between the two sets of quotients as to amount of abnormality was associated with artifact contamination. This method has the advantage of not requiring individual calibration, but is limited to applications in which the EEG abnormality displays a distinct spectral signature. Viglione (1975) individually set fixed thresholds of the spectral intensity of a narrow band centered at 0.5 e/sec and of a wide band extending from 35 to 50 c/sec, resulting in the elimination of most of the interictal movement and muscle potential artifact from long-duration recordings telemetered from patients with uncontrolled grand real seizures. In the context of an intensive study of the nonparoxysmal background EEG of a relatively small number of subjects, use of manual threshold setting procedures was not an undue hardship. During an earlier phase of systems development, we used manual procedures to reject artifact during realtime spectral analysis (Gevins and Yeager 1972, 1975). Artifact not caught by the attending EEG technician was eliminated with a data-editing program prior to multivariate analysis. Although this was a functional interim procedure, the excessive amount of attention required motivated the development of automated artifact rejection algorithms. These algorithms, de-
268 scribed below, function in an on-line, real-time, spectral and transient analysis and display system, ADIEEG, described elsewhere (Gevins et al. 1975). While we feel the procedures described below are an improvement of previous methodologies, we do not wish to imply that, in their current form, they are adequate to entirely replace human discrimination of artifact in the wide variety of normal and abnormal EEGs.
A.S. GEVINS ET A [
FIH4
T~'T6
Methods I'3-01
Spectral analysis Detailed descriptions of our spectral analysis implementation have been previously reported (Gevins et al. 1975), and will not be repeated here. Briefly, following lowpass filtering, removal of non-zero mean, and application of a consine bell data window, two channels of EEG data at a time are Fast Fourier transformed. Typically, 0.5--2 sec of data, sampled at 64, 128, or 256 samples/sec, are transformed at a time. Non-overlapping periodograms are computed and ensemble-averaged to form estimators of auto and crossspectral intensity and pairwise coherence, which are then graphed and stored on digital magnetic tape. The various analysis tasks are interleaved by a multi-tasking real-time executive, allowing spectral and transient analysis and display to proceed in real time for 16 EEG channels with usable bandwidth from 1 to 25 c/sec. For the results reported below, 8 channels of EEG were lowpass-filtered at 50 c/sec (40 dB/octave rolloff), sampled 128 times/sec with l l - b i t accuracy, and Fourier transformed once/sec.
Spectral artifact detection algorithms From an initial, artifact-free segment of EEG (typically ten 1 sec data windows), estimates of the mean and standard deviation of spectral intensity are computed for each channel in the frequency bands associated with three major types of artifact: (1) head and body movements, perspiration and low frequency instrumental artifact (under 1 c/sec); (2) high frequency artifact including gross EMG (34--44 c/sec);
Fig. 1. Polygraph tracings with system's detections of artifact indicated on marker channel at bottom of each excerpt. Note delay between artifact onset and system's detections resulting from time needed to Fourier transform 1 sec of data. Upper: Normal EEG with eye-movement artifact in frontal channels. Middle: Normal EEG contaminated by muscle potentials. Bottom: Diffusely slow, abnormal EEG contaminated by motion and muscle potential artifact.
I E
i
I
'
I
AUTOMATED ARTIFACT REJECTION and (3) EOG (under 4 c/sec, frontal channels only). If artifact occurs during the initial calibration, the E E G technologists can restart the calibration epoch by pressing a b u t t o n on the r e m o t e analysis control box. For each channel and artifact type, an artifact threshold is set at N standard deviations above the baseline mean, N being a parameter which may be altered if necessary via the interactive graphics terminal (Gevins and Yeager 1975). Detection p e r f o r m a n c e is, in general, apparently not a sensitive function of N over ranges as large as 2 : 1. An N of 25 was found to be suitable in informal pilot studies; this value was used for all cases in the current study. Since free and artifact-contaminated EEG signals comprise quite different populations, it is not surprising that the threshold is so many standard deviations above the mean. Following the initial calibration epoch, spectral intensities for the various channels and artifact types are c o m p a r e d with their respective thresholds after each Fourier transform. If any threshold is exceeded, data from all channels are discarded until the appropriate value falls below its threshold. The c o m p u t e r ' s digital interface activates both the polygraph marker channel (Fig. 1) and a light on the r e m o t e analysis control box to indicate detection of artifact. At any point during the analysis, the EEG technologist may override the system's decision by pressing an "artifacto n " or " a r t i f a c t - o f f " p u s h b u t t o n on the r e m o t e analysis control box, each override raising or lowering the threshold by 10%. For the results reported below, such supervision was not used. Several cases with p o o r d e t e c t i o n p e r f o r m a n c e were re-run with supervision and are separately described in the Discussion. Case s e l e c t i o n a n d s c o r i n g
Thirty-five, 3 rain, 12-channel, artifact-contaminated recordings were r a n d o m l y selected from the E E G Systems Group's "Clinical E E G Data Base", comprising 15 normal and 20 a b n o r m a l EEGs of various types (space-occupying lesions, seizure disorders, etc.). Of the 12 channels available, the following 8 were used in this study: F 7 - - F 3 , F7--T3, T3--T5, P3--O1, F 8 - - F 4 , F8--T4, T4--T6 and P4--O2. Polar-frontal channels ( F p l and F p 2 ) were not selected because d e t e c t i o n of the o f t e n c o n t i n u o u s eye-movem e n t potentials at the polar--frontal locations would, in m a n y cases, have resulted in the deletion of the entire recording. In order to find four scorers with high agreement, 10 of the 35 recordings were individually scored for isolated or c o m b i n e d low frequency, muscle potential and e y e - m o v e m e n t artifact by two etectroencephalographers and four EEG technologists. Page numbers written on continuous, 1 in. wide paper strips were aligned with corresponding numbers on the polygraph
269 records. An onset and offset mark was then placed on the paper strip each time an artifact was identified. Isolated events shorter than a p p r o x i m a t e l y 200 msec, such as muscle spike potentials, were not scored since their discrimination f r o m sharp transients of cortical
TABLE I Evaluation of a u t o m a t i c artifact detection. No. of events
% of total
A . S y s t e m vs. c o n s e n s u s o f s c o r e r s ( N = 3 5 )
Events found by consensus Events found by system (hits) Events missed by system (misses) Total events found by system Total events found by system not found by consensus of scorers (total false positives) Events found by system not found by any scorer (complete false positive) Events found by system and found by one scorer (incomplete false positives)
229 149 80 266 117
100 65 35 100 44
71
27
46
17
B. S y s t e m vs. c o n s e n s u s o f s c o r e r s o n n o r m a l r e c o r d s o n l y ( N ~ 11)
Events found by consensus Events found by system (hits) Events missed by system (misses) Total events f o u n d by system Total events found by system not found by consensus o f scorers (total false positives) Events f o u n d by system not found by any scorer ( c o m p l e t e false positives) Events found by system and found by one scorer (incomplete false positives)
74 45 29 78 33
100 61 39 100 42
28
36
5
6
C. A v e r a g e o f i n d i v i d u a l s c o r e r s vs. c o n s e n s u s (N = 35)
Total Total Total False
events events found (hits) events missed (misses) positives
229 196 33 76
100 86 14 28
270
A.S. GEVINS ET AL.
origin is performed by the parallel sharp transient analysis logic (Gevins et al. in press). In order to form a consensus definition of artifact events, the remaining 25 records were then scored by three of the four scorers (from amongst the original six) with the best agreement, as determined by informal comparison of their paper strips.
117 events not found by the consensus (44% of the total number of events found by the system). A further analysis of these false positives indicated that 71 events had not been marked as artifact by any of the scorers (complete false positives), while 46 events had been marked by one scorer (incomplete false positives). Thus 27% of the events detected by the system were complete false positives. Detection performance was not significantly different (0.01 level) between normal and abnormal EEGs (Table IB). Overall system performance did not differ significantly (0.05 level) from that of an "average scorer" (each scorer was rated against the consensus, and the average was taken) (Table IC). The "average scorer" found 196 events (86% of the total number of events found by the consensus), missed 33 events (14% of the total) and made 76 false detections (28% of the total number of events found by the "average scorer"). The data were re-scored eliminating the 6 cases with the largest number of false positive detections (Table II and Discussion). These cases all had highamplitude intermittent activity of cortical origin which did not occur during the calibration period. The percentage decrease in complete false detections was more than twice as great as the percentage changes in hits and misses. In order to determine whether performance varied as a function of event length, a further
Results The total 35 cases were then analyzed by the ADIEEG system, which activated the polygraph's marker channel every time artifact was detected (Fig. 1). The system's detections were then compared to the consensus-defined artifact events. Since uncertainty in the time delay of the system's response to artifact (delay = uncertainty of time of occurrence during the 1 sec window used for Fourier transform + (amount of time required to transform one channel X number of channel on which event occurred)) caused disagreement at the edges of artifact events, system performance was evaluated on an event-by-event, rather than second-by-second, basis (Table IA). The system found 149 events (65% of the total number of events found by the consensus), missed 80 events (35% of the total events found by the consensus), and found
TABLE II Case
Type
Hits
Misses
False positives
Reason
Cases with largest n u m b e r o f false detections (IV = 6) 3 6 13 17 18
32
Abnormal (diffuse, intermittent slow wave bursts) Abnormal (diffuse, intermittent slow wave activity) Abnormal (diffuse, intermittent slow wave activity) Normal (sleep) Normal psychotropic medication (diffuse, intermittent, highamplitude beta spindles) Normal (sleep)
9
1
9
Calibration on normal background
11
2
8
Calibration on normal background
8
0
9
Calibration on normal background
1 8
0 8
13 8
Calibration on waking Calibration on beta free segment
2
0
11
Calibration on light sleep; scored on deep sleep.
S y s t e m performance eliminating these cases (At = 29) Total Total Total Total by
events found by consensus events found by system (hits) events missed by system (misses) events found by system not found consensus (total false positives)
179 110 69 59
100% 61% 39% 35%
AUTOMATED ARTIFACT REJECTION
271
T A B L E III Performance as a function of artifact event length (6 cases with largest n u m b e r of false positives not included) (N = 29).
Hits Misses Total false positives
200 m s e c < X < 3 sec
3 < X < 10 sec
No, events
Percentage
No. events
Percentage
No. events
Percentage
95 72 57
57 43 34 (of total events found by system)
9 19 3
32 68 11 (of total events found by system)
5 2 2
71 29 29 (of total events found by system)
analysis was p e r f o r m e d on the remaining cases by classifying the events into periods of 200 msec to 3 sec, 3 sec to 10 sec, and greater than 10 sec (Table III). The system perfo,'med best on those events greater than 10 sec in duration, and poorest on events between 3 and 10 sec in duration.
Discussion The overall p o o r e r p e r f o r m a n c e o f the system as c o m p a r e d to that of the average of the individual scorers (65% vs. 85% hits, 44% vs. 28% total false positives) is in part a c c o u n t e d for by the restricted definition of artifact events used in the evaluation (the decisions of two of the original six scorers were n o t used at all because of relatively high disagreement with the consensus of the o t h e r four scorers). The cases with the greatest n u m b e r of false detections (Table II) all had high-amplitude i n t e r m i t t e n t activity of cortical origin which was not included in the calibration period. The 2 sleep cases should logically not have been included in this study of artifact d e t e c t i o n in waking EEGs. Since the 35 r a n d o m l y selected cases did n o t h a p p e n to include paroxysmal sharp transient activity (spikes and sharp waves), 4 additional cases with these patterns were later chosen and separately analyzed. When sharp transients were included in the initial calibration epoch, the system was able to detect artifact episodes in the c o n t e x t of almost c o n t i n u o u s high-amplitude polyspikes with only a small n u m b e r of false detections. In a case with infrequent, low-amplitude sharp transients, the system was able to distinguish muscle potentials f r o m isolated sharp transients, but n o t f r o m long trains of them. The use of a single, artifact-free calibration period to set d e t e c t i o n thresholds is therefore the aspect of this m e t h o d o l o g y responsible for most o f
X > 10 sec
the false detections of i n t e r m i t t e n t and paroxysmal activity of cortical origin. Integration of the decisions of the parallel detection and analysis subsystems for artifact, shape transients (Gevins et al. 1975, 1976) bursts (under development), and drowsiness (Gevins et al. submitted for publication) will hopefully alleviate this weakness. The cases with the greatest n u m b e r of misses had low-amplitude, very low-frequency e y e - m o v e m e n t artifact. It was found possible to improve p e r f o r m a n c e to some e x t e n t in these cases by lowering threshold levels with the on-line digital interface pushbuttons. For example, in a normal EEG with this type of artifact, the n u m b e r of events found by the system rose from 20 to 31 with an a c c o m p a n y i n g reduction in the n u m b e r of misses from 12 to 1. The n u m b e r of inc o m p l e t e false detections (found by one scorer) rose from ~ to 7. The time delay inherent in the use of non-overlapping o r t h o n o r m a l transforms was also responsible for m a n y misses. Short, m o d e r a t e - a m p l i t u d e artifact events which bridged adjacent 1 sec analysis windows were distributed across the two windows, causing the corresponding spectral intensities to o f t e n fall below the detection threshold. Use of overlapping transforms would eliminate this problem, but would increase c o m p u t a t i o n a l costs. Use of a forward and backward refractory period of from 0.5 to 1.0 sec would reduce, but not eliminate, this problem, and is otherwise necessary to remove " e d g e - e f f e c t " c o n t a m i n a t i o n due to artifact having a relatively gradual onset and termination. The artifact-detection algorithms presented here are thus adequate for first-pass elimination of the most c o m m o n artifacts e n c o u n t e r e d in research employing subjects with normal EEGs (Fig. 2). When used with a m o d i c u m of h u m a n supervision, their perf o r m a n c e is acceptable for m o s t research studies em-
EEG SYSTEMS GROUP, WITHOUT AR..TIF,CT ~EJEC.T SI~RT INTENSITY SPECTRUM I=1.3 UU/SO.RT HZ IOFFT ~UERAGING P~rjE e
2/?6
,:,
F,gF4
J::,
FTIr3
l ;:; F4C:4
]':,
F3C3
l
~
C:4P4
1~
C31"~
1
B
F'4OP__
15
P301
]
8
FST4
]5
FTT3
1
B
T4T6
15
T~T5
2/7~, EEG SY~;TEttS r]R=~I.IF'. AUTO ~RTIFACT REJECT SQRT INTEN.C.;ITYSPEI~TRUM I#1 3 I.IU/~QRT NZ leFFT AVERAGING P~GE e
I 8 FOF4
15
I
15
FrF3
B
I 8 F4C4
]5
1
15
F~C3
8
i 8 C4P4
15
I 8 P402
15
i 8 FBT4
J5
I B T4T~
15
1
15
I 8 P301
15
1 8 FTT3
~5
1 8 T3T5
15
8
C3P3
AUTOMATED A R T I F A C T REJECTION ploying subjects with abnormal EEGs, or for routine clinical recordings. A reduction in the number of misses and false detections can be expected when the outputs of detection subsystems for artifact, sharp transients, burst and drowsiness are integrated by an "analysis executive" (Gevins and Yeager 1975) which will form final decisions concerning the presence or absence of these phenomena using higher-order features formed by combining the decisions of the separate parallel detectors.
Summary Simple, on-line, frequency domain procedures to detect non-continuous artifact in the waking EEG are presented. Individual algorithms detect head and body movements, large muscle potentials and eye-movement potentials. These algorithms are implemented as program modules in an interactive, real-time, spectral and transient analysis and display system, ADIEEG, described elsewhere. The system's performance in detecting artifactcontaminated EEG in 35 normal and abnormal, 3 min, 8-channel recordings was compared with that of the consensus of three expert scorers. The system correctly detected 65% of the artifact events identified by the consensus of expert scorers. Twenty-seven percent of the detections made by the system were of events that had not been marked by any of the three scorers. This performance was not statistically different from the average of the individual scorers vs. the consensus. The largest number of false detections were of intermittent, high-amplitude events of cortical origin which did not occur during the supervised calibration period.
R~sum~ Elimination des artdfacts E E G par ordinateur en temps rdel Les auteurs pr~sentent un proc~dd simple, en temps rdel, et dans le domaine de frdquence, permettant de
273 ddtecter les artdfacts discontinus de I'EEG de la veille. Des algorithmes individuels ddtectent les m o u v e m e n t s de la t~te et du corps, les grands potentiels musculaires et les potentiels de m o u v e m e n t s oculaires. Ces algorithmes sont r~alis~s sous forme de modules inplant~s dans un syst~me d'analyse spectrale et de transitoires, interactif, en temps r~el, de type ADIEEG ddcrit ai|leurs. La performance de ce syst~me dans la d6tection des art~facts a dt~ test~e sur 35 enregistrements normaux et anormaux de 3 min ~ 8 canaux, et compar4e ~ celle du consensus de trois experts. Ce syst~me d~tecte correctement 65% des dv~nements art~factiels identifigs par le consensus des experts. Vingt-sept pourcent des ddtections faites par le syst~me concernent des dvdmements qui n 'o n t ~t6 marquis par aucun des trois experts. Cette performance ne diffdre pas de faqon statistique de la moyenne des experts pris individuellement par rapport ~ celle du consensus. Le plus grand nombre de mauvaises d~tections s'est rapport~ ~ des ~v~nements intermittents de haute amplitude d'origine corticale qui n'~taient pas survenus au cours de la pdriode de calibrage. We are most grateful for the assistance of J.P. Spire, L. Hicks, G. Hammond, B. Kahey and M. Mantle, of the Moffit Hospital EEG Laboratory, University of California, San Francisco, in recording clinical EEGs and in evaluating artifact in the recordings used for this study.
References Bishop, A.O. and Wilson, W.P. Computer analysis of clinical EEG. Electroenceph. clin. Neurophysiol., 1972, 31: 117--129. Gevins, A.S. and Yeager, C.L. EEG spectral analysis in real time. DECUS Proc., Maynard, Mass., 1972, Sp. 71--80. Gevins, A.S. and Yeager, C.L. An interactive developmental approach to real-time EEG analysis. In N. Burch and H.I. Altshuler (Eds.), Behavior and brain electrical activity. Plenum Press, New York, 1975: 221--263.
Fig. 2. Twelve-channel "compressed spectral array" display of the square root spectral intensity of a subject with a normal EEG. The height of any of the computer-generated characters is the equivalent of 1.3 pv]sqrt c/sec on the spectral displays. Ten 1 sec periodograms have been ensemble-averaged to produce each spectral line. In each box the frequency scale extends from 1 to 20 c/sec. Upper: EEG contaminated by movement and muscle potential artifact. Note large peaks between 1 and 4 c/sec and intermittent, "jagged" peaks in the beta range. Without considerable experience, it is not possible to determine from the spectral displays alone that the low- and highfrequency peaks are due to artifact and not brain pathology. Lower: Same data after automatic artifact rejection. Most of the low- and high-frequency spectral peaks due to artifact have been eliminated. A 500 msec to 1 sec forward and backward refractory period surrounding detection of artifact has not been used, accounting for the residual low- and high-frequency spectral peaks.
274 Gevins, A.S., Yeager, C.L., Diamond, S.L., Spire, J.P., Zeitlin, G.M. and Gevins, A.H. Automated analysis of the electrical activity of the human brain (EEG): A progress report. IEEE Proc., 1975, 63: 1382-1399. Gevins, A.S., Yeager, C.L., Diamond, S.L., Zeitlin, G.M., Spire, J.P. and Gevins, A.H. Sharp transient analysis and threshold linear coherence spectra of paroxysmal EEGs. In I. Petersen and P. Kellaway (Eds.), Quantitative analytical studies in epilepsy. Raven Press, New York, 1976: 463--482. Gevins, A.S., Zeitlin, G.M., Ancoli, S. and Yeager, C.L. Computer rejection of EEG artifact, Part 2: contamination by drowsiness. Electroenceph. clin. Neurophysiol., 1977, in press. Girton, D.G. and Kamiya, J. A simple on-line technique for removing eye-movement artifacts from the EEG. Electroenceph. clin. Neurophysiol., 1973,
A.S. GEVINS ET A L 34: 212--216. Gotman, J., Gloor, P. and Ray, W.F. A quantitative comparison of traditional reading of the EEG and interpretation of computer-extracted features in patients with supratentorial brain lesions. Electroenceph, clin. Neurophysiol., 1975, 38: 623--639. Gotman, J., Skuce, D.R., Thompson, J., Gloor, P., Ives, J.R. and Ray, W.F. Clinical applications of spectral analysis and extraction of features from EEGs with slow waves in adult patients. Electroenceph, clin. Neurophysiol., 1973, 35: 225--235. Matou~ek, M. and Petersdn, I. Automatic evaluation of EEG background activity by means of agedependent EEG quotients. Electroenceph. clin. Neurophysiol., 1973, 35: 603--612. Viglione, S. Final report: Validation of the epileptic seizure warning system. McDonnel Douglas Astronautics Company, Huntington Beach, Calif., 1975.