Journal of Neuroscience Methods 268 (2016) 31–42
Contents lists available at ScienceDirect
Journal of Neuroscience Methods journal homepage: www.elsevier.com/locate/jneumeth
An improved artifacts removal method for high dimensional EEG Jidong Hou ∗ , Kyle Morgan, Don M. Tucker, Amy Konyn, Catherine Poulsen, Yasuhiro Tanaka, Erik W. Anderson, Phan Luu Electrical Geodesics Inc., Eugene, OR 97401, USA
h i g h l i g h t s • • • •
An automated EEG artifact removal method is developed. A novel independent component analysis strategy is proposed and applied. Dimension reduction is achieved with permutation and resampling. Improved eye blinks removal performance is demonstrated.
a r t i c l e
i n f o
Article history: Received 18 January 2016 Received in revised form 7 April 2016 Accepted 4 May 2016 Available online 5 May 2016 Keywords: EEG ICA Artifact removal Template matching Permutation Resampling
a b s t r a c t Background: Multiple noncephalic electrical sources superpose with brain signals in the recorded EEG. Blind source separation (BSS) methods such as independent component analysis (ICA) have been shown to separate noncephalic artifacts as unique components. However, robust and objective identification of artifact components remains a challenge in practice. In addition, with high dimensional data, ICA requires a large number of observations for stable solutions. Moreover, using signals from long recordings to provide the large observation set might violate the stationarity assumption of ICA due to signal changes over time. New method: Instead of decomposing all channels simultaneously, subsets of channels are randomly selected and decomposed with ICA. With reduced dimensionality of the subsets, much less amount of data is required to derive stable components. To characterize each independent component, an artifact relevance index (ARI) is calculated by template matching each component with a model of the artifact. Automatic artifact identification is then implemented based on the statistical distribution of ARI of the numerous components generated. Results: The proposed permutation resampling for identification matching (PRIM) method effectively removed eye blink artifacts from both simulated and real EEG. Comparison with existing method: The average topomap correlation coefficient between the cleaned EEG and the ground truth is 0.89 ± 0.01 for PRIM, compared with 0.64 ± 0.05 for conventional ICA based method. The average relative root-mean-square error is 0.40 ± 0.01 for PRIM, compared with 0.66 ± 0.10 for conventional method. Conclusions: The proposed method overcame limitations of conventional ICA based method and succeeded in removing eye blink artifacts automatically. © 2016 Elsevier B.V. All rights reserved.
1. Introduction Some of the most important advances in neuroscience research and clinical diagnosis have been made with electroencephalography (EEG). However, the utility of EEG is limited due to artifacts
∗ Corresponding author at: Electrical Geodesics Inc., 500 East 4th Ave., Suite 200, Eugene, OR 97401, USA. Fax: +1 541 687 7963. E-mail address:
[email protected] (J. Hou). http://dx.doi.org/10.1016/j.jneumeth.2016.05.003 0165-0270/© 2016 Elsevier B.V. All rights reserved.
caused by eye movements, eye blinks, muscle activity, heart signals, line noise, or noise from joint recordings conducted with other imaging modalities (e.g., MRI). These artifacts superpose with the EEG data, impacting statistical and physical analysis of the brain’s contribution to the EEG signal. Numerous strategies can be used to remove artifacts from the EEG. A straightforward approach is template subtraction. Based on the assumption of that the occurrence of the artifact is identical in each instance and the EEG signal is random relative to the artifact, a pure artifact template can be created through averaging
32
J. Hou et al. / Journal of Neuroscience Methods 268 (2016) 31–42
multiple occurrences of the artifact. The template is then used to subtract each instance of the artifact from the raw signal without removing the signal of interest. However, the obvious limitation is that if the artifact is highly variable, the template does not match each instance, resulting in imperfectly cleaned data, even with several modifications and improvements to this template subtraction approach (Allen et al., 1998; Sijbers et al., 1999; Niazy et al., 2005). Regression-based approaches have been applied to correct ocular artifacts (Gratton et al., 1983; Wallstrom et al., 2004) using the electrooculogram (EOG) recorded adjacent to the eyes as the reference signal. The problem is that the EOG is not free of EEG signals, such as from the frontal lobes. Similarly, Kalman filters and multichannel recursive least squares (M-RLS) algorithm were developed to remove motion and ballistocardiogram (BCG) artifacts, using accelerometers, wire loops or electrodes on a conducting layer to generate movement-tracking reference signals (Bonmassar et al., 2002; Masterton et al., 2007; Chowdhury et al., 2014). Although these techniques have proved useful, the relationship between the reference signal and the induced artifact may not always be linear, causing difficulty for a linear filtering approach. In addition, it is difficult to acquire a reference signal for some artifacts, such as electromyographic (EMG) activity that occurs at the scalp. Blind source separation (BSS) is an approach that doesn’t have the limitations noted for the above methods. The basic idea of BSS is to decompose the raw data into components that represent cortical activity and artifacts. Once the artifact components are identified, the brain signal can be reconstructed by excluding the artifact components. BSS can be carried out in many ways, e.g., canonical correlation analysis (CCA), principle component analysis (PCA) and independent component analysis (ICA). ICA has proven effective in separating unique components, particularly with powerful rotation methods such as Infomax (Bell and Sejnowski, 1995). Each independent component consists of a waveform that describes a source activity plus a topography vector that describes how the waveform contributes to the recorded signal. The advantage of ICA over PCA is that it doesn’t require the source topography to be orthogonal, which is not a reasonable assumption of the brain’s physiological activity in general. Instead of minimizing covariance among sources, ICA aims to maximize their independence. With the assumption that artifacts are statistically independent from the ongoing EEG, spatial filters derived by the ICA algorithm have been used to remove a wide variety of EEG artifacts, such as eye blinks, eye movements and electrode artifacts (Jung et al., 2000a,b; Ille et al., 2002; Flexer et al., 2005; Iriarte et al., 2003). One practical challenge of the ICA approach to artifact removal is that manual interaction is usually required to identify artifact components following decomposition, based on either spatial topographies or temporal characteristics or both. Different components are generated for each data set, making manual selection cumbersome in addition to being a subjective process. Recently, Bartels et al. (2010) applied a support vector machine (SVM) algorithm to classify EOG and EMG components with supervised training. In addition, Mantini et al. (2007) proposed automatic identification of artifact components based on the distribution of correlation coefficients of all independent components with the reference signal that represents the artifact of interest. A more serious limitation of the ICA approach is that it requires a large number of observations (data points) to generate stable independent components. The number of data points needed is typically kC2 (Nolan et al., 2010) where k is a multiplier generally set to 25, as recommended in (Onton and Makeig, 2006), and C is the number of components. Given a 256-channel recording, for example, the amount of data required to find 256 independent components would be 25 × 2562 = 1,638,400 time points. Even at 1000 samples/s sampling rate, this would require almost 30 min of data, during which there may be significant variations in both the artifact and
the brain signal. These changes imply that the spatial stationarity assumption of ICA (Jung et al., 2000c) may be violated. The irony is that ICA then only works on the inadequate measurement set of low channel count EEG. To address this high dimensional observation hunger (data requirement demand) as well as minimize the risk of violating the stationarity assumption associated with long recordings, PCA is often performed as a data reduction step, truncating the dimensionality of the data prior to ICA (Nolan et al., 2010; Kiviniemi et al., 2003; Haufe et al., 2014). As a rule of thumb, the minimum sample size of PCA is five samples per variable (five time points per channel in the case of EEG) (Gorsuch, 1983), which is much less than the kC2 samples required for ICA. This is because PCA can be derived linearly by singular value decomposition of the correlation matrix, while ICA is derived with higher-order statistics that require more samples to achieve stable solutions. Although small data sets can be processed by using a PCA + ICA approach, truncation of dimensions risks losing important information in the data, if the information is not well captured by the orthogonal basis set defined by PCA. In the present research, we aimed to improve and optimize the performance of ICA-based artifact removal approaches for high dimensional EEG data. We employed the “divide and conquer” strategy from computer science to address the high dimensionality challenge. Specifically, the divide and conquer approach that we propose employs a permutation resampling for identification matching (PRIM) strategy that can be applied to ICA (or other BSS methods) for EEG artifact removal. Instead of decomposing all channels simultaneously, or relying on PCA for data reduction, we divide the measurement channels into subspaces. Subsets of channels are randomly selected (with replacement) and ICA is conducted on each subset. The result is stable decomposition with smaller measurement sets, each of which is a random subset of the head surface topography captured by channel sub-sampling. The assumption of the method is that the true component structure, of both brain sources and artifacts, is captured for each subset, such that the artifact components can then be subtracted accurately from each subset. To facilitate ease-of-use and remove subjective judgments associated with manual identification of artifact components, we also developed an automated method to clearly separate artifact and non-artifact ICA components. For each of the numerous components generated by PRIM, we calculate an artifact relevance index (ARI); this metric reflects a given component’s similarity with the artifact template. The resulting ARI histogram highlights the difference between artifact and non-artifact components in the form of two peaks separated by a low valley. By fitting the ARI distribution with a cubic polynomial, the lowest point of the fitted curve is used to determine automatically the threshold that separates artifact from non-artifact components. We evaluate the performance of the proposed approach with eye blink removal, using both simulated EEG as well as real EEG signals and compare the results with the PCA + ICA method.
2. Materials and methods 2.1. The PRIM workflow EEG data is acquired for a duration that can span minutes to days, depending on the nature of the study or clinical exam. PRIM operates on long data files using relatively small sequential segments, thus minimizing the risk that the stationarity assumption of ICA is violated. The workflow of the PRIM approach is shown in Fig. 1. Each segment is processed through the entire workflow independent of the other segments. Once all segments are processed, the complete data file can be reconstructed.
J. Hou et al. / Journal of Neuroscience Methods 268 (2016) 31–42
33
Fig. 1. The PRIM workflow for EEG artifact removal. (a) An EGI’s dense array sensor net is used in practice. The high dimensional EEG signal recorded is X ∈ RNe×Nt , where Ne is the number of electrodes and Nt the number of samples. (b) Permutation and resampling of EEG channels. The red spots represent subsets of channels selected randomly in different runs. Note that some channels are shared among different permutations. Correspondingly, X1 , X2 ,· · ·, Xm ∈ Rp×Nt are m subsets of the original signal X generated −1 and the source activation matrix Sm are by randomly selecting p channels out of Ne channels (p Ne). (c) Signal decomposition by ICA, where both the mixing matrix Wm calculated for each of the m subsets. The total number of independent components generated is equal to p × m. (d) Component characterization. For each of the m subsets, each p of the independent components (ICm ∈ Rp×Nt ) is characterized by an artifact relevance index (ARI), which is calculated by template-matching the component with a subset of the eye-blink template, Tm ∈ Rp×L with a fixed length of L, which is extracted with the same sub-channels used for the resampling. (e) Automatic component identification is implemented by thresholding the ARI histogram of all ICs. (f) Signal reconstruction. For each of the m subsets, cleaned EEG signals are reconstructed with non-artifact components, and artifacts to be removed are reconstructed with artifact components. The full channel signals are reconstructed by summation of all reconstructed subset channels and then dividing each channel by its corresponding iteration number. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
2.1.1. Permutation and resampling of EEG channels For a given segment of EEG data, to generate a subset of p channels from a recording of Ne channels, a random integer between 1 and Ne are drawn p times without repetition to select each of the p sub-channels. This step is repeated numerous times so that the full scalp is covered and sampled uniformly, Fig. 1b. The selection of the number of sub-channels p and iterations m will be discussed later. 2.1.2. Signal decomposition (independent component analysis) For each subset of channels extracted by the permutation and resampling step, the EEG is decomposed into components that model either cortex activity or artifact sources. Although any BSS method can be used for this step, in this study infomax ICA (Bell and Sejnowski, 1995) as implemented in MNE-Python (Gramfort et al., 2014) was employed. Specifically, with ICA, an un-mixing matrix, W, is computed to generate temporally independent components, S = WX,
(1)
where X is the EEG data matrix and S is a matrix with rows of source activity waveforms, Fig. 1c. Depending on parameters used in the permutation and subset resampling step, signal decomposition will produce hundreds to thousands of independent components for a given data set (or data segment). 2.1.3. Artifact relevance index To characterize each independent component for identification, we calculate a metric that represents the similarity of each component to an artifact template; see Fig. 1d. We refer to this as ARI of the component. In this work the eye-blink template was
Fig. 2. Butterfly view of eye blink template. Subsets of the template with randomly selected channels are used to compute the ARI that characterizes corresponding independent components generated with the same channels. Different colors represent different EEG channels (256-channels). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
generated by averaging 307 eye-blinks from three participants, using a window of 2-s (1 s before the peak of eye-blink and 1 s after). With the resulting eye-blink template (Fig. 2), each component is calculated by template-matching the component with a sub-set of the eye-blink template that is extracted with the same sub-channels used for the resampling. The ARI value is taken as the maximum correlation coefficient returned by template matching. 2.1.4. Automatic component identification Due to the bootstrapping property of PRIM, hundreds to thousands of independent components will be generated. As a result, statistical information from the distribution of ARI can be exploited to automatically identify and separate components. An example of the ARI distribution is shown in Figs. 1 e and 3 . It is clear that the non-artifact components with lower ARI are aggregated around
34
J. Hou et al. / Journal of Neuroscience Methods 268 (2016) 31–42 Table 1 Number of epochs used for baseline spectra calculation in eyes closed and eyes open conditions. Participant
Eyes closed
Eyes open
#1 #2 #3 #4 #5
197 139 117 162 106
285 208 98 217 171
Total
721
979
2.3. Evaluation data
Fig. 3. ARI histogram of 5000 independent components was generated by PRIM for eye blink removal. In each of 500 runs, 10 channels were randomly selected for ICA. Two local peaks characterize two modes. The red triangle indicates the threshold automatically calculated to separate the two modes. The mode on the right with higher ARI value represents the distribution of eye blink components. The mode on the left with lower ARI value represents the distribution of non-artifact components. (For interpretation of the references to color in this figure legend and in the text, the reader is referred to the web version of this article.)
the left peak (*), and the artifact components with higher ARI are aggregated around the right peak (**). Based on this bimodal ARI distribution, automatic component identification is implemented by thresholding the ARI histogram at a location (indicated by the red triangle) within the valley between the two peaks (see Fig. 3). Specifically, the ARI histogram is first fitted with a cubic polynomial. Then, the lowest point of the fitted curve, representing the critical point separating artifact components from non-artifact components, is used to determine the threshold. 2.1.5. Signal reconstruction Once artifact components are identified, the corresponding activation waveforms are set to zero. For each subset of channels, corrected EEG signals can be computed as, X = W−1 S
(2)
where W−1 is the mixing matrix, and S is the matrix of source activation waveforms. The full channel EEG recording is reconstructed by summation of all reconstructed subset channels and then dividing each channel by its corresponding iteration number. Similarly, by setting activation waveforms to zero for non-artifact components, artifacts to be removed can be reconstructed, Fig. 1f. 2.2. Computation hardware/software Data processing and performance evaluation in this study were carried out with the following hardware/software configurations: -
2.66 GHz Quad-Core Intel Xeon, 7 GB 1066 MHz DDR3 ECC Mac Pro. OS X 10.10.5 Python 2.7.10 |Anaconda 2.3.0 (x86 64) MNE-Python 0.9.0 MATLAB Version: 8.0.0.783 (R2012b) EEGLAB v10.2.2.4b
2.3.1. Simulated EEG To assure that accurate reconstruction of the brain signals is possible, we conducted a simulation study in which blink artifacts were added to blink-free EEG signals to test if PRIM could reconstruct the known blink-free EEG without distortion. Resting EEG was acquired with both eyes open (including blinks) and eyes closed (without blinks) on one participant using a 256channel HydroCel Geodesic Sensor Net and Net Amps 400 Amplifier (Electrical Geodesics, Inc., Eugene, OR). All electrode impedances were kept below 70 k. Recordings were referenced to Cz. The EEGs were low-pass filtered (100 Hz) prior to being sampled at 250 samples/s with a 24-bit analog-to-digital converter. For EEG in eyes open condition, after filtering with a 1-Hz high-pass filter, bad channels were manually identified and interpolated. We identified blinks and created typical blink artifact waveforms. First, we performed automated eye blink detection on the eyes open resting EEG using morphology criteria followed by manual review to reject false detections. Eighty-two eye blinks were extracted with 5-s windows (centered around the peak of the blink) and classified into several (here we used three) groups by k-means clustering, which classifies blinks according to their spatial similarity. The averaged blink waveform of each group was calculated for all channels, preserving the spatial distribution of the eye blink across the entire sensor array for each blink group. The next step defined a 2-s segment around the peak of the eye blink to be used as the eye-blink artifact template for each group. A 10-s segment (2500 sample points) of resting EEG from the eyes closed condition was selected as the blink-free EEG. The three eye blink templates were randomly shuffled within the 10-s period to generate simulated waveforms consisting of six eye blinks. The simulated eye blink waveforms were superimposed on (added to) the blink-free EEG. A total of 10 of these simulation datasets were generated. 2.3.2. Resting EEG Using real EEG datasets naturally contaminated by eye blinks, we evaluated PRIM in terms of the similarity of the “cleaned” EEG power spectra to the average, artifact-free EEG power spectra. In order to generate the baseline spectra of resting EEG, we collected resting EEG data from 5 normal participants (4 male, 1 female, age 21–44) with both eyes open and eyes closed conditions. For each participant, several sessions were collected at 1000 samples/s on separate days, resulting in a total of 14 datasets. Each dataset consists of four 5-min recordings, two eyes open and two eyes closed. A 0.1-Hz high-pass filter was applied to remove slow activity and drift, and bad channels were replaced by interpolation. We first considered the eyes closed condition. Multiple 8-s epochs of EEG (as summarized in Table 1) were selected from the two five-minute eyes closed recordings. Each of the 8-s epochs was Fourier transformed using Welch’s method, which splits the data into eight sections of equal length (here 2 s), each with 50%
J. Hou et al. / Journal of Neuroscience Methods 268 (2016) 31–42
35
Fig. 4. Diagram of events within a single trial.
overlap and multiplied by a Hamming window of the same length. Any remaining data that cannot be included in the eight segments of equal length are discarded. The power spectral density is calculated in units of power per Hz. With 2-s sections, the power spectra had 0.5 Hz frequency resolutions. The power spectra for all 8-s epochs were averaged across participants to provide a baseline spectrum of resting EEG in eyes-closed condition. We then considered the eyes-open (blink-free) condition. Multiple 4-s epochs free of eye blinks were selected (as summarized in Table 1), and power spectra were generated with the same setup used for eyes-closed condition. The averaged power spectrum across participants provides a baseline spectrum of resting EEG in eyes-open (blink-free) condition. Finally, we considered the eyes-open (with blink) condition. We calculated the spectra of resting EEG before and after removal of eye blinks. To generate the spectra before and after eye blink removal, five 4-s eyes blink epochs were selected from each of the 14 resting EEG datasets (total 70 epochs). Power spectra were calculated using the same setup as the eyes-open (blink free) condition. 2.3.3. Event-related potential (ERP) To evaluate the usefulness of the proposed method for ERP research, we evaluated PRIM in terms of its impact on amplitude and latency of the P300 component in a Go/NoGo study with eye blink removal (PRIM) and without eye blink removal (traditional approach—segments contaminated by blinks are discarded). The task used in our ERP study was modeled closely of the traditional
Go/NoGo paradigm, where a participant must learn to associate an array of visual stimuli with specific motor responses, or refrain from making a response altogether (Newman et al., 1990). Twenty participants (10 female, 10 male, age 18–41 and nonoverlapping with participants in the resting state EEG study) were presented with 1 of 8 football defensive formations centrally on a 43 cm (diagonal) computer monitor for 1500 ms. Half of the formations were randomly selected as “go” stimuli, and the other half were designated as “no-go” stimuli. The formations were presented at random, with the restriction that a formation could not be presented twice in a row. Participants had to either press (Go), or refrain from pressing (NoGo), a key on a keypad when a formation was presented. For the go stimuli, participants had to learn to respond with the appropriate digit on the correct hand for each stimulus. The participants were given four digits to respond with (digits I and II of both hands), and each of the four go stimuli were mapped onto a specific digit. Each formation was presented on the screen for 1500 ms or until a key-press occurred. Immediately after each trial, specific feedback about performance on that trial was provided. The feedback given to the participant was designed to provide them all of the information needed to learn the response mapping. Feedback was presented for 10 s, or until the participant terminated the feedback with a button press. Upon termination of the feedback, the next trial began between 1500 and 2500 ms later (see Fig. 4). The study was segmented into 8 blocks of 100 stimuli, resulting in 800 total trials.
36
J. Hou et al. / Journal of Neuroscience Methods 268 (2016) 31–42
The EEG was recorded during the task using a 256-channel HydroCel Geodesic Sensor Net and the data were amplified using a Net Amps 400 Amplifier. Recordings were referenced to Cz and impedances were maintained below 50 k. The EEG was sampled at 250 samples/s. After recording, signals were filtered between 0.1–30 Hz bandpass and segmented into 1200 ms long segments time-locked to the onset of each stimulus (segments extended 200 ms before and 1000 ms after the stimulus onset). At this point, our data were run through two analysis paths: traditional and PRIM. For the traditional processing path, correct Go segments containing eye-blinks or 10 or more channels exceeding an absolute voltage threshold of 140 V were excluded from ERP derivation. After signal averaging, the data were re-referenced to the average reference for analysis. For the second path, correct Go segments for each participant were processed through PRIM, then averaged together at the individual level before being averaged across all participants. The resulting average was re-referenced to the average reference for analysis. 2.4. Performance comparison To compare the performance of PRIM against other BSS methods, we chose the PCA + ICA approach. This allows us to perform comparable evaluations of two methods that are meant to deal with the high dimensionality problem associated with the conventional ICA approach. The PCA + ICA approach has been implemented by Nolan et al. (2010) to remove eye blinks, and we use it on the simulated 10 datasets generated for evaluation (see below). First, to follow the kC2 rule, dimension reduction of the original 257-channel data set was accomplished by retaining the 10 most dominant PCA components (which explains more than 97% of the total variance). Then ICA was used to decompose the data set into 10 independent components. For each component, the maximum absolute correlation coefficient between its time course and the six eye channels (4 vertical EOG and 2 horizontal EOG) was calculated as the metric for Z-score analysis. Artifact components were identified by a Z-score of ±3. By assigning corresponding time courses to zero, EEG signal was reconstructed with Eq. (2). 2.4.1. Performance metric To evaluate the performance of each approach we calculated the relative root-mean-squared error (RRMSE), which is defined as: RRMSE =
RMS (rEEG − EEG) RMS (EEG)
(3)
where EEG and rEEG is the blink-free EEG (ground truth) and the recovered EEG, respectively. RMS is the root mean squared value of a signal s ∈ Rm×n :
m n 1 RMS (s) = s2 (m, n) m×n
Fig. 5. Eye blink removal with PRIM. Segments of 4 s EEG from four EOG channels before (red) and after (blue) eye blink removal. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
An example of eye blink removal with PRIM is shown in Figs. 5 and 6. In Fig. 5, 4-s segments of EEG from four EOG channels before and after eye blink removal were shown. The results show that the cleaned EEG signal is free of eye blinks. As illustrated in Fig. 6, PCA + ICA cleaned EEG resulted in distortions of the topomap, compared to the ground truth; the result was spatially smooth (likely due to the PCA step), which was avoided in the PRIM approach. For these ten simulation datasets, the average topomap correlation coefficient between the cleaned EEG and the ground truth is 0.89 ± 0.01 for PRIM, compared with 0.64 ± 0.05 for PCA + ICA, see Fig. 7a. Examination of the RRMSE shows that PRIM outperformed PCA + ICA in terms of lower RRMSE in all 10 simulations as well as the stability of the RRMSE measure across different simulations (Fig. 7b). The average RRMSE is 0.40 ± 0.01 for PRIM, compared with 0.66 ± 0.10 for PCA + ICA To evaluate the effect of ARI thresholding on PRIM performance, we shifted the automatically determined threshold by −0.1 and +0.1, respectively, and re-calculated the corresponding RRMSE for the 10 simulated datasets. The resulting RRMSE changed to 0.41 ± 0.01 using the shifted thresholds (both in the −0.1 and the +0.1 cases), which amounts to only 2.5% difference from the 0.40 ± 0.01 RRMSE obtained by the automatically determined thresholds. 3.2. Resting EEG
(4)
i=1 j=1
3. Results 3.1. Simulated EEG Each of the simulation datasets was processed by PRIM with 500 iterations. In each of the 500 iterations, a subset of 10 channels was randomly selected for ICA analysis according to the kC2 rule, which resulted in total 5000 independent components. Using the eye blink template shown in Fig. 2, the ARI histogram was calculated and is shown in Fig. 3. In each of the 10 simulated datasets, clear separation between cortical and artifact components was found. Automatically calculated thresholds were applied in each case to identify artifact and non-artifact components.
Each of the 70 eye blink epochs was processed with PRIM using 500 iterations and 10 sub-channels for each iteration. The ARI distribution of the 5000 independent components was calculated as described before. The threshold was automatically calculated to separate eye blink components from EEG components. The averaged spectra before and after eye blink removal from all 70 epochs are shown in Fig. 8. In the left column are averaged spectra of all channels; in the right column are the averaged spectra of eye channels only. For the eyes closed condition, a strong peak around 10 Hz can be seen in Fig. 8a. For the eyes open condition, the characteristic 10 Hz peak is weaker, but is still clearly visible in Fig. 8b. Before eye blink removal, it is clear that many channels have significant lower frequency (<8 Hz) components (see Fig. 8c). As shown in Fig. 8d, after eye blink removal with PRIM, the spectra of the restored EEG are much more similar to the baseline spectra. On the other hand, the removed eye blinks primarily consist of low
J. Hou et al. / Journal of Neuroscience Methods 268 (2016) 31–42
37
Fig. 6. Comparison of eye blinks removal. (a) Topographic maps of the blink-free EEG and eye blinks at the time of the blink peak. View is top looking down with nose at top. (b) Topographic map of EEG + eye blink. (c) Topographic maps after eye blink removal with PRIM. (d) Topographic maps after eye blink removal with PCA + ICA.
Fig. 7. Comparison of the eye blink removal performance for ten, blink-artifact cleaned data sets. (a) Average topomap correlation coefficient between the cleaned EEG and the ground truth. (b) Relative root-mean-squared error (RRMSE) between the cleaned EEG and the ground truth.
frequency components as shown in Fig. 8e. In conclusion, PRIM successfully restored the spectra of eye-blink contaminated resting EEG. 3.3. Event-related potential results Each of the 20 datasets was processed with PRIM using 100 iterations and 24 sub-channels for each iteration. The ARI distribution of the independent components was calculated as described above. The threshold was automatically calculated to separate eye blink components from EEG components. Three sets of channels corresponding to laterality (left, middle, right) were selected in order to evaluate differences in P300 topography (see Fig. 9). To obtain a P300 component, correct go trials that occurred after each participant fulfilled a learning criterion were
averaged together. From this average waveform, an adaptive mean corresponding to 22 ms before and after the maximum peak amplitude in a window extending from approximately 450–950 ms after stimulus onset was computed for each separate channel grouping (see Fig. 10, yellow boxes). This method was applied to data for each individual participant and processing path (traditional and PRIM). A repeated-measures ANOVA was performed, with laterality (left, middle, and right) and processing path (traditional and PRIM) as within-subjects factors, to test if mean P300 amplitude differ between data derived from the two different processing paths. A significant main effect of laterality was found (F(2, 38) = 4.59, p = 0.02, Greenhouse-Geisser corrected), whereby the P300 was strongest in the right hemisphere. No other main effects or interactions were found, suggesting PRIM did not affect the amplitude of the P300 or P300 topography (Fig. 10).
38
J. Hou et al. / Journal of Neuroscience Methods 268 (2016) 31–42
Fig. 8. Averaged spectra of resting EEG. Different colors represent different EEG channels. Left column: all channels. Right column: eye channels. (a) Eyes closed. (b) Eyes open (blink free). (c) Before eye blink removal. (d) After eye blink removal. (e) Removed eye blinks. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
A second repeated-measures ANOVA was performed to evaluate if significant differences exist in P300 component peak latency as a function of processing path. Laterality (left, middle, and right) and processing path (traditional and PRIM) served as within-subjects factors. No significant main effects or interactions were found
3.4. Computational performance For the simulated EEG dataset (10 s with 2500 time points), each iteration took less than 1 s for PRIM to run ICA on a subset of 10 channels in Python. The total time to process a full dataset of 256 channels with 500 iterations is approximately 330 s. On the other
hand, it took about 70 s for the PCA + ICA method to process each full dataset in MATLAB. For the resting EEG dataset (4 s with 4000 time points), each iteration took about 1 s for PRIM to run ICA on a subset of 10 channels. In total, it took about 480 s to process each full dataset with 500 iterations. For the ERP dataset (about 1.2 h with 1,100,115 time points), each iteration took about 5 min for PRIM to run ICA on a subset of 24 channels. In total, it took about 8 h to process each full dataset with 100 iterations. As will be discussed later, the processing time with PRIM could be significantly reduced with parallel processing.
J. Hou et al. / Journal of Neuroscience Methods 268 (2016) 31–42
Fig. 9. Channel montages used to quantify the P300 ERP component.
4. Discussion 4.1. Dimension reduction and data integrity The data demand of ICA is an example of the well-known variables/observations issue in high dimensional multivariate analysis. According to Harrell (2015) when a model with many free parameters is estimated without enough data, the worth of the model will be exaggerated (i.e., over fit for the sample), and the model will not be reliable in the prediction of future observed values.
39
Onton and Makeig (2006) pointed out that the number of time points of n-channel data used in ICA must be sufficient to learn the n2 weights in the n-by-n ICA un-mixing matrix. For high dimensional data sets, this often requires a large number of observations (time samples for EEG data) that may not be practical. With limited data in practice, dimension reduction is often carried out by PCA before ICA analysis (Kiviniemi et al., 2003; Haufe et al., 2014). PCA decomposes the data into orthogonal components that maximally explain the variance in the data. However, the variance in the data is generally a mixture of signal of interest and unwanted noise. As a result, dimension reduction on the basis of explained variance might lead to the loss of subtle but important information in the original data. Based on a divide and conquer strategy, PRIM avoids the need for data truncation. Instead, subsets of data were randomly selected with replacement for ICA analysis. With enough permutations, the full data set can be resampled uniformly and generate orders more components. While these numerous components may not be truly independent to each other, the redundant information they provide can be exploited for statistical analysis to guide component identification and separation. Instead of truncating the original data to meet the ICA requirement, PRIM maintains data integrity by processing lower-dimension data segments extracted from the original data. High dimensional data are routinely collected in neuroscience and neurological studies. For statistical, numerical and computational reasons, dimension reduction is generally performed before further analyses. However, any signal parts being removed are
Fig. 10. Top row: Voltage maps of the P300 derived from the two processing paths. View is top looking down with nose at top. No visual difference in voltage distribution around the location of the P300 can be seen between the two data sets. Green circles represent the location of the representative channels shown in the bottom of the figure. Clear differences are seen in the frontal regions, with PRIM-cleaned data showing substantial reduction of the positivity around the eyes. Bottom row: ERP waveform from three representative channels (location shown as green circles in voltage maps). Yellow boxes represent the P300 time window used for analysis. (For interpretation of the references to color in this figure legend and in the text, the reader is referred to the web version of this article.)
40
J. Hou et al. / Journal of Neuroscience Methods 268 (2016) 31–42
Fig. 11. Histograms of ARI generated by PRIM using different sizes of sub-sampling (SS) for 1000 iterations. (a) SS = 5; (b) SS = 10; (c) SS = 50; (d) SS = 100.
likely to have a negative effect on subsequent analyses, as do noise parts being retained (Haufe et al., 2014). By avoiding data reduction at the pre-processing stage, the utility of ICA is enhanced for high dimensional data by implementing it successively on many subsets of the measured variables. Moreover, it is worthwhile to point out that as a general data analysis technique, PRIM can be used with any BSS methods for high dimensional data analysis.
4.2. Fractions of the measurement space The success of the PRIM algorithm seems to be due to the effectiveness of ICA in separating the artifact component even when dealing with a subset of the measurement space (i.e., with a few EEG channels). Even though the artifact component of the subset is indeed a subset of the artifact, its successful removal (and then collective reconstruction of the artifact-free EEG) suggests that ICA for the subset was robust and accurate. Most likely, this is because there were sufficient observations to give power to ICA for the subset, even though there was substantially fewer observations than required for conventional ICA on the whole channel set. To further explore the stability of component separation with small numbers of channels in each subset and many permutations of subsampling, we performed PRIM on one simulated 10-s eye blink dataset (number of observations = 2500 time points at 250 samples/s), maintaining the number of subsampling permutations at 1000, while changing the size of subset to be 5, 10, 50 or 100 channels. The number of independent component generated at each
permutation and resampling step is 5, 10, 50 and 100, respectively. The resulting ARI histograms are shown in Fig. 11. It can be seen that the separation of eye blink components, those with distinctly high ARI values with respect to the eye blink template, is strong for 5 and 10 channel subsets, but weak or nonexistent for the larger subsets. This is likely due to lack of observations (only 2500 time points) to find the larger numbers (e.g., 50 and 100) of stable independent components, such that component separation was ineffective, and distinct blink components were not achieved. On the other hand, the small subsets, including the one of 5 and one of 10 channels, were able to be decomposed by ICA into apparently meaningful components, as suggested by the cluster of components at the right of the distributions with strong ARI values with respect to the eye blink template.
4.3. Permutation and resampling strategy The good performance of PRIM with as few as 5 or 10 channels may be unique to blink artifacts, which tend to be well characterized by the channels around the eyes. With more complex or more distributed artifacts, such as EMG, it may be necessary to increase the number of channels in each subset, leading to the need for more time samples and thus less temporal precision (and stationarity). Thus, the stability of infomax ICA solutions seems to require a low dimensional sample, which may not be appropriate to all signal decomposition applications. We note that there are algorithms that can be used to estimate the source components with
J. Hou et al. / Journal of Neuroscience Methods 268 (2016) 31–42
underdetermined ICA models (Lewicki and Sejnowski, 2000; Bofill and Zibulevsky, 2001). Further research is needed to explore these relaxed ICA models to see if they can provide stable solutions with less data. In practice, the ARI distribution can be used as feedback to evaluate the selection of the number of channels in each subset. As illustrated in Fig. 11, when there is clear separation between the resulting artifact components and cortical components, it indicates a reasonable choice of the size of the subset. As a bootstrapping method (Efron, 1979), the number of iterations should be large enough to achieve uniform sampling of scalp channels. Too few permutations could result in underrepresentation of some channels. Too many permutations will likely waste computation resources (even though accurate performance will not increase or decrease). In practice, we found that generally a ratio of about 10 between the number of resulted independent components and the number of full channel can be used to determine the minimum number of iterations. 4.4. ARI calculations In earlier studies, correlation coefficients with reference signals, such as the ECG and EOG recordings, have been used as the similarity metric for a given component’s relation to an artifact of interest (Mantini et al., 2007). Since the EOG is not independent of EEG signals, we decided to use an averaged spatial-temporal eye blink template as the reference. In particular, the ARI is calculated with template matching, a powerful digital image processing technique, to achieve robust performance. We choose eye blinks as the example for demonstration of the PRIM method because blink contamination creates pervasive problems in EEG research. There are other artifacts with measurable reference signals, such as ECG, head movement and BCG. We are currently evaluating the removal of these other artifacts with the PRIM workflow. On the other hand, artifacts like EMG lack measurable reference signals, which require special consideration to calculate the ARI. This is an important topic for future studies. Another improvement to consider is to use multiple ARI metrics. The potential benefit to use multiple features simultaneously is better classification boundaries between cerebral and non-cerebral components. Some popular classification methods with multivariate metrics include logistic regression and support vector machine (SVM) (Lehmann et al., 2007). 4.5. Automatic thresholding The threshold of an ARI histogram (Fig. 3) is automatically determined by a simple curve fitting method. This is based on the observation that artifact and non-artifact components are generally aggregated as two peaks separated by a low valley in the ARI histogram. The robustness of the automatically selected threshold is demonstrated by the results of the simulated EEG datasets, where a shift of −0.1 and +0.1 of the threshold only resulted in 2.5% difference from the RRMSE obtained by the automatically determined thresholds. On the other hand, although we employed automatically determined thresholds in this work, when the component distribution is not fitted with a cubic polynomial, careful review of the ARI histogram is needed to establish a threshold. 4.6. Computation time As a powerful and reliable data analysis tool, ICA has been widely used in EEG research. However, ICA is computationally very costly (Raimondo et al., 2012). It generally takes hours of computing to process a typical 1-h EEG recording of a single participant on a desktop computer.
41
In this study, we noticed that PRIM carries out ICA decomposition in less than 1 s for a 10-s sub-sampled simulation dataset, compared with about 70 s needed for traditional full-channel ICA. Additional speed-up is possible when we consider that the proposed PRIM workflow is inherently supported by parallel processing and computationally scalable. Multiple iterations of permutation and resampling can be run simultaneously on multiple computational resources. So far PRIM is implemented with serial computation codes that did not take advantage of its inherent parallel processing characteristic. We expect that the processing time would be greatly reduced by applying inexpensive highperformance computing technique and platforms, such as graphic processing units (GPUs) and cloud computing. 5. Conclusions The present study provides preliminary evidence that the PRIM algorithm represents a successful divide and conquer strategy in which the utility of a blind source separation method like ICA is enhanced for high dimensional data through implementing it successively on many subsets of the measured variables. We show that datasets with hundreds of dimensions (or higher) can be decomposed by ICA using the PRIM approach, and PRIM can be performed on large data sets with inexpensive computing resources. Another key feature of PRIM that was demonstrated is that statistical information can be gathered to interpret ICA results and objectively guide component classification to separate artifacts from EEG. Finally, PRIM outperformed the PCA + ICA dimensionality reduction strategy for eye blink removal. It would be informative and useful to also compare the performance of PRIM to other methods of artifact removal in future studies. Acknowledgments The authors would like to thank Jenn Lewis, Hannah White and Amanda Gunn for their assistance in the preparation of the data. References Allen, P.J., Polizzi, G., Krakow, K., Fish, D.R., Lemieux, L., 1998. Identification of EEG events in the MR scanner: the problem of pulse artifact and a method for its subtraction. NeuroImage 8 (3), 229–239. Bartels, G., Shi, L.-C., Lu, B.-L., 2010. Automatic artifact removal from EEG—a mixed approach based on double blind source separation and support vector machine. Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference, 2010, 5383–5386. Bell, A.J., Sejnowski, T.J., 1995. An information-maximization approach to blind separation and blind deconvolution. Neural Comput. 7 (6), 1129–1159. Bofill, P., Zibulevsky, M., 2001. Underdetermined blind source separation using sparse representations. Signal Process. 81 (11), 2353–2362. Bonmassar, G., Purdon, P.L., Jääskeläinen, I.P., Chiappa, K., Solo, V., Brown, E.N., Belliveau, J.W., 2002. Motion and ballistocardiogram artifact removal for interleaved recording of EEG and EPs during MRI. NeuroImage 16 (4), 1127–1141. Chowdhury, M.E.H., Mullinger, K.J., Glover, P., Bowtell, R., 2014. Reference layer artefact subtraction (RLAS): a novel method of minimizing EEG artefacts during simultaneous fMRI. NeuroImage 84, 307–319. Efron, B., 1979. Bootstrap methods: another look at the jackknife. Ann. Stat. 7 (1), 1–26. Flexer, A., Bauer, H., Pripfl, J., Dorffner, G., 2005. Using ICA for removal of ocular artifacts in EEG recorded from blind subjects. Neural Netw. 18 (7), 998–1005. Gorsuch, R., 1983. Factor Analysis, 2nd ed. LEA, Hillsdale, NJ. Gramfort, A., Luessi, M., Larson, E., Engemann, D.A., Strohmeier, D., Brodbeck, C., Parkkonen, L., Hämäläinen, M.S., 2014. MNE software for processing MEG and EEG data. NeuroImage 86, 446–460. Gratton, G., Coles, M.G., Donchin, E., 1983. A new method for off-line removal of ocular artifact. Electroencephalogr. Clin. Neurophysiol. 55 (4), 468–484. Haufe, S., Dähne, S., Nikulin, V.V., 2014. Dimensionality reduction for the analysis of brain oscillations. NeuroImage 101, 583–597. Harrell, F., 2015. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. Springer.
42
J. Hou et al. / Journal of Neuroscience Methods 268 (2016) 31–42
Ille, N., Berg, P., Scherg, M., 2002. Artifact correction of the ongoing EEG using spatial filters based on artifact and brain signal topographies. J. Clin. Neurophysiol. 19 (2), 113–124. Iriarte, J., Urrestarazu, E., Valencia, M., Alegre, M., Malanda, A., Viteri, C., Artieda, J., 2003. Independent component analysis as a tool to eliminate artifacts in EEG: a quantitative study. J. Clin. Neurophysiol. 20 (4), 249–257. Jung, T.P., Makeig, S., Humphries, C., Lee, T.W., McKeown, M.J., Iragui, V., Sejnowski, T.J., 2000a. Removing electroencephalographic artifacts by blind source separation. Psychophysiology 37 (2), 163–178. Jung, T.-P., Makeig, S., Westerfield, M., Townsend, J., Courchesne, E., Sejnowski, T.J., 2000b. Removal of eye activity artifacts from visual event-related potentials in normal and clinical subjects. Clin. Neurophysiol. 111 (10), 1745–1758. Jung, T.P., Makeig, S., Lee, T.-W., McKeown, M.J., Brown, G., Bell, A.J., Sejnowski, T.J., 2000c. Independent component analysis of biomedical signals. 2nd International Workshop on Independent Component Analysis and Blind Signal Separation 0, 633–644. Kiviniemi, V., Kantola, J.-H., Jauhiainen, J., Hyvärinen, A., Tervonen, O., 2003. Independent component analysis of nondeterministic fMRI signal sources. NeuroImage 19 (2), 253–260. Lehmann, C., Koenig, T., Jelic, V., Prichep, L., John, R.E., Wahlund, L.-O., Dodge, Y., Dierks, T., 2007. Application and comparison of classification algorithms for recognition of Alzheimer’s disease in electrical brain activity (EEG). J. Neurosci. Methods 161 (2), 342–350. Lewicki, M.S., Sejnowski, T.J., 2000. Learning overcomplete representations. Neural Comput. 12 (2), 337–365.
Mantini, D., Perrucci, M.G., Cugini, S., Ferretti, A., Romani, G.L., Del Gratta, C., 2007. Complete artifact removal for EEG recorded during continuous fMRI using independent component analysis. NeuroImage 34 (2), 598–607. Masterton, R.A.J., Abbott, D.F., Fleming, S.W., Jackson, G.D., 2007. Measurement and reduction of motion and ballistocardiogram artefacts from simultaneous EEG and fMRI recordings. NeuroImage 37 (1), 202–211. Newman, J.P., Patterson, C.M., Howland, E.W., Nichols, S.L., 1990. Passive avoidance in psychopaths: the effects of reward. Personal. Individ. Differ. 11 (11), 1101–1114. Niazy, R.K., Beckmann, C.F., Iannetti, G.D., Brady, J.M., Smith, S.M., 2005. Removal of FMRI environment artifacts from EEG data using optimal basis sets. NeuroImage 28 (3), 720–737. Nolan, H., Whelan, R., Reilly, R.B., 2010. FASTER: fully automated statistical thresholding for EEG artifact rejection. J. Neurosci. Methods 192 (1), 152–162. Onton, J., Makeig, S., 2006. Information-based modeling of event-related brain dynamics. Prog. Brain Res. 159, 99–120. Raimondo, F., Kamienkowski, J.E., Sigman, M., Fernandez Slezak, D., 2012. CUDAICA: GPU optimization of infomax-ICA EEG analysis. Comput. Intell. Neurosci. 2012, 206972. Sijbers, J., Michiels, I., Verhoye, M., Van Audekerke, J., Van der Linden, A., Van Dyck, D., 1999. Restoration of MR-induced artifacts in simultaneously recorded MR/EEG data. Magn. Reson. Imaging 17 (9), 1383–1391. Wallstrom, G.L., Kass, R.E., Miller, A., Cohn, J.F., Fox, N.A., 2004. Automatic correction of ocular artifacts in the EEG: a comparison of regression-based and component-based methods. Int. J. Psychophysiol. 53 (2), 105–119.