Clinical Neurophysiology 118 (2007) 197–208 www.elsevier.com/locate/clinph
Response-time corrected averaging of event-related potentials Henning Gibbons *, Jutta Stahl Georg-Elias-Mu¨ller Institute for Psychology, University of Go¨ttingen, Gosslerstr. 14, D-37073 Go¨ttingen, Germany Accepted 17 September 2006 Available online 27 October 2006
Abstract Objective: The present study presents a novel approach to averaging of event-related potentials (ERPs). Acknowledging latency variability of late ERP components as related to performance fluctuations across trials should improve the assessment of late portions of the ERP. Methods: Prior to the averaging procedure stimulus-to-response epochs in the electroencephalogram (EEG) were expanded/compressed in time to match mean RT in a certain condition and participant. By means of several mathematical functions RT variability was differentially distributed over late vs. early portions of the ERP. Data from 20 participants from two conditions of an identity-based priming task were analyzed using traditional stimulus- and response-locked averaging, as well as four different RT-corrected averaging procedures. Results: Area under the curve as an index of precision of LPC assessment was reliably enhanced for certain RT-corrected procedures relative to traditional ERP averaging. Moreover, a priming effect on amplitude of a distinct LPC subcomponent which could not be confirmed with traditional stimulus-locked averaging was reliably born out using a cubic RT-correction procedure. Conclusions: RT-corrected ERP averaging can outperform traditional ERP averaging in the assessment of late portions of the ERP, and experimental effects upon. Significance: Cognitive ERP researchers may take advantage of the improved capability of RT-corrected averaging to establish experimental effects on amplitudes in the late ERP range. Ó 2006 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved. Keywords: EEG; ERP; Averaging; Response-time correction; Late positive complex
1. Introduction Averaging represents an extremely important procedure in the use of event-related potentials (ERPs). When similar auditory or visual stimuli are presented repeatedly, averaging across presentations strongly reduces the portion of activity in the electroencephalogram (EEG) unrelated to stimulus processing (i.e., background noise). By contrast, ERP reflections of mental processes triggered by the stimulus presentations will be relatively enhanced (cf., Picton et al., 1995). However, there is also disadvantage to the averaging procedure that becomes most relevant in situations where *
Corresponding author. Tel.: +49 551 393623; fax: +49 551 393662. E-mail address:
[email protected] (H. Gibbons).
stimulus processing does not merely involve simple, lowlevel mental operations. With cognitive tasks such as semantic categorization, mental rotation, or mental arithmetic one must assume significant variation in the timing of late ERP components related to performance fluctuations across trials. For example, in speeded stimulus categorization the participant may be required to correctly indicate whether a word presentation is the name of an animal or a plant. On particularly fast trials, response time (RT) for a correct response may approach 500 ms, whereas RT will probably exceed 1000 ms on several slow trials. Consequently, a hypothetical ERP component indicating the completion of semantic stimulus categorization will occur in single-trial ERPs at quite different points in time, and appear rather ‘‘smeared’’ in traditional stimuluslocked ERP averages. In fact, the frequent finding of broad
1388-2457/$32.00 Ó 2006 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.clinph.2006.09.011
198
H. Gibbons, J. Stahl / Clinical Neurophysiology 118 (2007) 197–208
late positive complex (LPC) in the ERP accompanying many higher cognitive tasks (e.g., Falkenstein et al., 1994; Ro¨sler and Heil, 1991; Rugg and Doyle, 1994) may often result from overlap of several temporally distinct positive deflections which, however, markedly vary in latency across trials and participants (Johnson, 1986). It is obvious that establishing differences between two experimental conditions regarding a specific late ERP component becomes more difficult as this component is subject to both smearing due to intra- and interindividual latency variability, and overlap from other late ERP components. The present article therefore introduces a novel approach to ERP averaging that takes into account intraand interindividual RT variability to achieve a more distinct ERP signature of late brain processes during cognitive tasks. For this purpose, single-trial ERP segments are adjusted for RT before averaging is carried out. More specifically, ERPs for trials where RT was shorter than mean RT are expanded in time to match mean RT for a given participant and experimental condition, whereas for trials where RT was longer than mean RT, ERPs are compressed in time (see Fig. 1). A crucial aspect, however, concerns the
way how single-trial ERP segments are expanded/compressed in time, or, the mathematical function according to which amplitude values in the original ERP segments are assigned new latencies to obtain RT-corrected ERP segments. Intuitively, RT correction should affect early portions of the original ERP segments less strongly than late portions. This is because with higher cognitive tasks latency variability of early sensory processes will usually not as strongly contribute to RT variability as latency variability of late cognitive processes. Thus, in order to capture late ERP deflections more precisely than with traditional averaging one must take into account that latency variability of mental events is distributed over the entire stimulus-RT interval in an essentially nonlinear fashion. The aim of the present study was to employ different (linear and nonlinear) functions for RT-adjustment of ERP segments prior to the averaging procedure, and to determine whether the novel procedure is superior to traditional stimulus-locked ERP averaging regarding (a) the assessment of the LPC in a priming task based on stimulus identification (Gibbons, 2006; Gibbons et al., 2006) and (b) the ability to establish a difference in amplitude of a certain LPC subcomponent between two experimental priming conditions. A cautionary note concerns the fact that an improved assessment of late ERP components might also be achieved by computing response-locked ERP waveforms in addition to their stimulus-locked counterparts. Late ERP components usually occur in closer temporal proximity to the overt response than to the preceding stimulus presentation, and experimental effects involving these components probably would be seen most clearly in response-locked ERPs. Therefore, to attest usefulness to the novel approach of RT-corrected stimulus-locked ERP averaging it must also perform better than traditional response-locked averaging, regarding the assessment of experimental effects upon late ERP components. 2. Method 2.1. Participants The participants were 5 male and 15 female right-handed undergraduate students ranging in age from 19 to 28 yr (M = 22.7 yr, SD = 2.5 yr). Their participation served as partial fulfillment of a course requirement. All participants had normal or corrected-to normal vision and were naı¨ve regarding the purpose of the experiment. 2.2. Apparatus and stimuli
Fig. 1. The principle behind RT-corrected averaging of ERP waveforms. In the left panel, traditional averaging of waveforms from two trials with different RTs results in smearing of the LPC because single-trial peaks occur at different points in time. In the right panel, waveforms are expanded/compressed onto a common length of (RT1 + RT2)/2 before averaging is performed. Peaks of the LPC will then occur at similar points in (relative) time, and averaging does not result in smearing of the LPC.
An IBM compatible computer equipped with a 1700 SVGA monitor served for stimulus presentation. Stimuli were the digits 1, 2, 3, and 4; presented at two out of four screen locations (see Fig. 2). Each presentation consisted of one red and one blue stimulus (target and distractor,
H. Gibbons, J. Stahl / Clinical Neurophysiology 118 (2007) 197–208
Fig. 2. Examples of probe displays in the switch and control conditions for a given prime display. White digits appeared in red (target) and black digits appeared in blue (distractor). The rightmost panel presents the arrangement of response buttons. Buttons marked in gray indicate the correct response.
199
which all had to be responded to. This ensured that no distinction between prime and probe displays could be made. Participants sat in a sound-attenuated, dimly lit room 60 cm in front of a computer monitor. The four response buttons were operated by index and middle fingers of both hands. As dependent behavioral variables, mean RT and error percent were determined for the two relevant priming conditions (switch and control). For RT, only trials were considered that were correctly responded to and had passed the EEG artifact rejection (see below). Analyses of variance with repeated measures on Priming Condition were computed for RT and error percent. 2.4. Electrophysiological recordings
respectively). The display subtended a horizontal visual angle of 2.4° and a vertical visual angle of 1.6°. Responses were recorded with an accuracy of ±1 ms using a 4-button response board with buttons that were labeled 1–4. 2.3. Procedure The task comprised 384 trials divided into two equally-sized blocks. First, 16 practice trials were administered. Between blocks, there was a break of 30 s. There were two subsequent displays on each trial; the prime and the probe, each containing two of the four digits (see Fig. 2, left panel). The 384 trials represented all possibilities to repeat (or not repeat) the two prime stimuli in the probe display, with respect to their role as target and distractor. In this fully balanced design, there were seven different prime–probe transfer conditions (cf., Christie and Klein, 2001). Only the following two conditions are relevant here. In the 96 trials of the control condition, target and distractor stimuli of the probe display were the two digits that had not appeared in the prime display. On another 48 trials, prime target and distractor digits appeared color-reversed on the probe. These trials are referred to as the switch condition. The seven transfer conditions were presented in random order. Participants were instructed to respond as quickly and correctly as possible by pressing the response button labeled with the digit that was presented in red, irrespective of its location. A trial began with the presentation of the prime which was terminated when a response was made. If the prime response was correct, after 400 ms the probe display was presented, and remained on the screen until a response was made. In case of wrong responses to prime or probe, error feedback (50-ms tone; 1100 Hz) was given. No response within 1000 ms after prime/probe onset was prompted with a 150-ms tone (1100 Hz). False/missing responses to the probe were recorded and classified accordingly. After a 400-ms inter-trial interval, the next trial started. Thus, from the participant’s perspective, the experiment consisted of 768 consecutive presentations
Electrical brain activity was recorded from 27 scalp locations of the 10–20 system (Jasper, 1958), using an electrode cap (Electrocap Inc.) with sintered Ag/AgCl electrodes; electrode sites were FP1, FP2, F7, F3, Fz, F4, F8, FT7, FC3, FCz, FC4, FT8, T3, C3, Cz, C4, T4, CP3, CPz, CP4, T5, P3, Pz, P4, T6, O1, and O2. EEG was referenced against the right mastoid, and an active left-mastoid reference electrode was employed. Vertical and horizontal EOG were monitored from electrodes positioned below and above the right eye, and from the outer left and right canthi, respectively. The EEG was recorded continuously using a 32-channel digital Synamps amplifier and Acquire software (NeuroScan Inc.). Sampling rate was 500 Hz; bandpass was set to 0.1–70 Hz. An on-line notch filter was employed to suppress the 50-Hz band. The EEG was re-referenced against algebraically linked mastoids and epoched offline. 2.5. Data processing with traditional averaging Epochs ranged from 100 ms before until 1000 ms after probe display onset and were baseline corrected with respect to the 100-ms interval preceding probe display onset. All data were screened for artifacts, and contaminated trials were rejected (less than 10% for each participant). EOG correction was performed according to Gratton et al. (1983). Only epochs accompanying correct responses faster than 1000 ms were considered. In a final step, separate averages were computed for switch and control conditions, employing a traditional stimulus-locked ERP averaging procedure which, for each condition, determined mean ERP amplitude across epochs at each point in time at each of the 27 electrode sites. Average waveforms covered (100 ms, 1000 ms) intervals with respect to probe display onset. Traditional response-locked averages were then computed for intervals ranging from 400 ms until 100 ms relative to the response. Note that also with response-locked averaging, baseline correction involved the (100 ms, 0 ms) interval with respect to probe display onset.
200
H. Gibbons, J. Stahl / Clinical Neurophysiology 118 (2007) 197–208
2.6. Data processing with RT-corrected averaging All steps prior to the final averaging procedure were identical to data processing with the traditional averaging procedure, which ensured that any performance differences between averaging procedures were not merely due to different data entered. RT-correction of single-trial epochs from one participant and condition involved three steps, (1) cutting to size the original (100 ms, 1000 ms) epochs, such that (0 ms, RT) epochs were obtained for each single trial, (2) computation of new latencies for all data points, and (3) subsequent linear interpolation of amplitudes, in order to arrive at a sampling rate of 500 Hz also with the RT-corrected epochs. In the original waveforms, one data point can be described by two values, amplitude and corresponding latency. While for each data point m of the cut-to-size segments amplitude remained the same, a new latency was computed with accuracy at the fourth decimal place according to the following formula, NewLatðmÞ ¼ m þ
mk ðmeanRT trialRT Þ; trialRT k
where meanRT is the mean RT for one condition and participant, trialRT is the RT of the to-be-corrected trial, and k (1, 2, 3, 4) determines whether RT-correction is performed according to a linear, quadratic, cubic, or to-thepower-of-four function, respectively (see below). Several aspects of this formula have to be highlighted here. First, for epochs shorter than mean epoch length (i.e., in case of trialRT < meanRT), a new latency resulted for each data point that was larger than its original latency since the rightmost term in the equation (meanRT trialRT) was positive. Hence, the original epoch was expanded in time. Second, for epochs longer than mean epoch length (i.e., in case of trialRT > meanRT), the new latency of each data point was smaller than its original latency since (meanRT trialRT) was negative. Hence, the original epoch was compressed in time. Furthermore, as k increases, early portions of the ERP tend to be left unchanged, whereas late portions of the ERP tend to be changed quite substantially. For example, assume 400 ms and 600 ms as RT of the to-be-corrected trial and mean RT for the specific condition and participant, respectively. With a linear function (k = 1), the new latency of data point m = 100 ms computes as 100 + (100/400) * (600 400) = 150 ms, and the new latency for data point m = 350 ms computes as 350 + (350/400) * (600 400) = 525 ms. On the other hand, with a cubic function (k = 3), data point m = 100 ms gets assigned a new latency of 103.1 ms = 100 + (100/400)3 * (600 400), whereas data point m = 350 ms gets assigned a new latency of 484.0 ms = 350 + (350/400)3 * (600 400). Obviously, with the cubic function early data points (e.g., 100 ms) remain almost unchanged (103.1 ms) whereas late data points (e.g., 350 ms) are subject to substantial change in latency (484 ms). In other words, compared to the linear function
the cubic function better satisfies the intuitive requirement that RT differences should be accompanied by latency differences between late rather than early ERP components (see Section 1). After new latencies for one epoch had been computed for a given k (1, 2, 3, 4), the epoch was resampled at 500 Hz to allow for averaging of the RT-corrected epochs which all had equal length (meanRT), but differed with respect to the exact latencies of data points. For this purpose, new amplitudes were linearly interpolated from any two neighboring data points that had a multiple of 2 ms in between their latencies. This resulted in a number of meanRT/2 data points for each epoch in one condition and participant. Each of these data points was described by a latency value that was a multiple of 2 ms and an amplitude value that resulted from the interpolation procedure. Resampling was followed by a traditional stimuluslocked averaging procedure that involved all RT-corrected epochs in one condition and participant, and was performed separately for the four different correction functions (k = 1, 2, 3, 4). Doing so yielded four RT-corrected ERP averages for each participant and condition. Since mean RT for a given condition differed between participants, at this point grand averages could not yet be obtained. Thus, computation of new latencies was also necessary for data points of each participant’s RT-corrected average waveforms in the two conditions. The same procedure as for the single-trial RT-correction applied to across-participants RT correction, except that in the formula trialRT was replaced by mean RT of the participant and condition, and meanRT was replaced by grand-mean RT computed across all 20 participants in the respective Priming Condition (control: 616 ms; switch: 632 ms). Note that always the same k was used for across-participants RT-correction as had been used for the preceding singletrial RT-correction. This resulted, for each k, in 40 double-RT-corrected averaged ERP waveforms, 20 waveforms in the control condition with an equal length of 308 data points (616 ms) and 20 waveforms in the switch condition with an equal length of 316 data points (632 ms).1 2.7. Data analysis The first question to answer was whether, depending on k, RT correction performed better than the traditional average with respect to the assessment of the LPC. If indeed RT-correction can lead to better synchronization of equivalent ERP components in the single trials than it was the case with uncorrected single-trial waveforms, subsequent averaging should enhance the LPC amplitude 1
Since the time scale is distorted with nonlinear RT-correction of singletrial segments, we prefer to describe RT-corrected waveforms in terms of data points instead of ms. Note that interval length in the control and switch conditions is 308 and 316 data points which corresponds, with a sampling rate of 500 Hz, to the mean RTs of 616 and 632 ms, respectively.
H. Gibbons, J. Stahl / Clinical Neurophysiology 118 (2007) 197–208
relative to traditional averaging (see Fig. 1; cf. Ruchkin, 1988). Area under the curve (AUC) was used as a measure of LPC magnitude that should be more reliable as, for example, maximum LPC amplitude. To attest specificity of the potential advantage of RT-corrected averaging for late portions of the ERP, AUC was determined separately for the first and second halves of RT-corrected ERP averages for each participant (0 to RT/4 data points, and RT/4 to RT/2 data points, respectively), priming condition, and k, as well as for the first and second halves of the traditional ERP averages of each participant and condition. Rectified amplitudes were used to ensure that ERP negativity was not subtracted from ERP positivity but rather made its own additive contribution to overall AUC in the respective half of an epoch. AUC values were subjected to an ANOVA with repeated measures on factors Priming Condition (control, switch), Averaging Condition (traditional stimulus-locked average; and k = 1, 2, 3, 4),2 Half (first and second), and Electrode (n = 27, see above). Greenhouse-Geisser corrected p values are reported when appropriate. Significant interactions were explored using Scheffe’s test. The second question concerned the possibility that RTcorrected averaging might reveal a difference between priming conditions (control, switch) regarding a specific LPC component that went undetected with traditional averaging. Visual inspection of control/switch overlays for the different values of k (1, 2, 3, 4) showed that in the switch condition there was a positive component at around data point 250 at medial central-to-posterior sites, particularly CPz and Pz, that appeared much more pronounced with k = 2, 3, and 4 than with traditional stimulus-locked averaging (see Figs. 6 and 8). The task was to determine whether the priming effect on amplitude of this P500 component3 was significantly larger for a certain k (1, 2, 3, 4) than for traditional stimulus- and response-locked averaging. However, problems arise with the definition of a time window covering P500 whose mean ERP amplitude could be subjected to statistical analysis. This is because an assumed superiority of RT-corrected averaging could be tied to the time window chosen which simply could be more appropriate to assess P500 in RT-corrected averages than in traditional stimulus- and response-locked averages. To circumvent this problem, time windows for the assessment of P500 were optimized for each averaging procedure, by centering a 25-data-points (i.e., 50-ms) window 2
AUC was not analyzed for the response-locked averages. Responselocked EEG segments could be obtained from the original stimulus-locked (100 ms, 1000 ms) segments only as (400 ms, 100 ms) segments. These segments were too small to monitor the entire averaged stimulus–response interval of around 600 ms duration, which prevented an analysis of AUC for response-locked averages as a function of ERP half. 3 For ease of description, we refer to this component as ‘P500’ despite the time scale is distorted with nonlinear RT correction. Results from traditional stimulus-locked averaging suggested the presence of an equivalent albeit less pronounced positive component at around 500 ms in real time in the switch condition (see Fig. 8).
201
around the peak latency of P500 at electrode Pz in the switch condition, separately for each averaging procedure. Then, an ANOVA was computed for mean amplitude in these 50-ms windows, involving factors Averaging Condition (six levels: traditional stimulus-locked, traditional response-locked, and RT-corrected with k = 1, 2, 3, and 4), Priming Condition (switch, control), and Electrode (CPz, Pz). If indeed certain RT-corrected procedures perform better than traditional procedures in establishing a priming effect on P500 amplitudes, this should become evident in a significant interaction Averaging Condition by Priming Condition, which would then be further explored using Scheffe’s test. 3. Results 3.1. Behavioral data The ANOVA for RT yielded a significant effect of Priming Condition (switch, control), F(1, 19) = 5.5, p < .05. Mean RT was larger in the switch condition than in the control condition (632 and 616 ms, respectively). The effect of Priming Condition on error percent was not significant, F(1, 19) = 0.3, p = .86. There were 6.1% and 6.2% errors in the switch and control conditions, respectively. 3.2. Electrophysiological data Figs. 3 and 4 present the grand-mean ERP waveforms for the switch condition, as a function of k (1, 2, 3, and 4). For better visibility, ERPs for k = 1, 2 and k = 3, 4 are superimposed on the traditional stimulus-locked average in separate figures. All waveforms have an identical length of 316 data points which corresponds to grand-mean RT in the switch condition (632 ms). It can be seen that linear RT-correction (k = 1) does not perform well, since the ERP is flattened compared to the remaining values of k. Particularly, the early N1–P2 complex appears strongly reduced in amplitude (see Fig. 3, dotted line). On the other hand, for k = 2–4, the early parts of the ERP closely match the traditional stimulus-locked ERP, whereas late portions of the ERP, as expected, with k = 2–4 even reach somewhat larger amplitudes relative to traditional stimulus-locked averaging. A similar pattern was found with the control condition (not shown). To further investigate the effects of averaging procedures on averaged ERPs, an ANOVA was carried out with repeated measures on Electrode (27), Priming Condition (switch and control), Half (first and second), and Averaging Condition (five levels: traditional stimulus-locked, and k = 1, 2, 3, and 4), with rectified AUC as the dependent measure (see Section 2). In this analysis, there was a significant main effect of Averaging Condition, F(4, 76) = 5.7, p < .05, e = 0.28, indicating larger mean AUC for k = 3 compared to both traditional stimuluslocked averaging (p < .05) and RT-corrected averaging with k = 1 (p = .01). Moreover, AUC was larger for
202
H. Gibbons, J. Stahl / Clinical Neurophysiology 118 (2007) 197–208
Fig. 3. Grand-average ERP waveforms in the switch condition, shown for the interval between probe display onset and grand-mean RT (632 ms = 316 data points), as a function of Averaging Condition. Bold line = traditional stimulus-locked averaging. Dotted line = RT-corrected averaging with k = 1. Thin solid line = RT-corrected averaging with k = 2. Negativity is plotted upward.
k = 4 compared to both traditional stimulus-locked averaging (p = .01) and RT-corrected averaging with k = 1 (p < .01). The remaining post-hoc comparisons between averaging conditions were not significant. A highly significant main effect of Half, F(1, 19) = 47.1, p < .001, was due to larger mean AUC in the second relative to the first half of the ERPs. The significant main effect of Electrode, F(26, 494) = 21.0, p < .001, e = 0.15 was not further explored. The interaction between Averaging Condition and Half just failed to reach conventional significance, F(4, 76) = 3.1, p = .07, e = 0.38. Since an a-priori assumption had been made regarding the differential efficiency of RT-corrected averaging for early vs. late portions of the ERP, the effect of Averaging Condition was investigated separately for the first and second halves of the waveforms. In the ANOVA for the first ERP half, involving factors Priming Condition (switch and control), Averaging Condition (traditional stimulus-locked, and k = 1, 2, 3, and 4), and Electrode (27), the effect of Averaging Condition was not significant, F(4, 76) = 1.7, p = .20, e = 0.28. By contrast, in the analysis of the second ERP half there was a significant main effect of Averaging Condition, F(4, 76) = 7.0, p < .01, e = 0.28. Post-hoc Scheffe’s test revealed significantly enhanced mean second-half AUC for k = 3 relative to the traditional average (p < .001) and k = 1 (p < .01).
Similarly, with k = 4 mean second-half AUC was significantly larger compared to the traditional average (p < .001) and k = 1 (p < .01; see Fig. 5). The remaining contrasts were not significant. No effect involving Priming Condition was significant in the ANOVAs, indicating similar efficiency of RT-corrected averaging for the switch and control conditions. Fig. 6 presents comparisons of RT-corrected stimuluslocked ERP waveforms for conditions switch and control, for k = 3 as an example. Fig. 7 displays response-locked ERPs for the two priming conditions. The most prominent priming effect, reduced ERP amplitude at around 300 ms at fronto-central electrode sites for switch relative to control, has been discussed in detail in Gibbons (2006). This result is not of interest here because it involves early portions of the ERP and can be found also in traditional stimuluslocked averages. However, Figs. 6 and 8 also show an enhancement of a P500 component at central-to-parietal midline electrodes for switch relative to control that seems to be most pronounced with RT-corrected averaging with k = 3 and k = 4. Larger ERP amplitude for switch than control is also found in the traditional response-locked averages, at around 160 ms prior to the response (see Fig. 7). Although it is unclear whether this effect corresponds to the P500 effect in stimulus-locked ERPs
H. Gibbons, J. Stahl / Clinical Neurophysiology 118 (2007) 197–208
203
Fig. 4. Grand-average ERP waveforms in the switch condition for the interval between probe display onset and grand-mean RT (632 ms = 316 data points), as a function of Averaging Condition. Bold line = traditional stimulus-locked averaging. Dotted line = RT-corrected averaging with k = 3. Thin solid line = RT-corrected averaging with k = 4.
(subtracting 160 ms from grand-grand-mean RT of 624 ms results in a stimulus-locked latency of 464 ms rather than 500 ms), in the following also response-locked ERPs are tested for priming effects on ERP amplitude.
Fig. 5. Mean area under the curve (AUC) as a function of averaging condition and first vs. second half of the ERP waveforms. Values of k ranging from 1 to 4 involve single-trial RT-correction prior to the averaging procedure, with RT variability being progressively distributed over late portions of the stimulus–response interval. 1) Significantly different from traditional stimulus-locked averaging (p < .001) and RTcorrected averaging with k = 1 (p < .01) according to Scheffe’s test.
To investigate the P500 component in more detail, first P500 peak latency in the switch condition was determined from electrode Pz in the grand-average waveforms, separately for averaging conditions. P500 peak latency was observed at data points 248, 240, 248, 250, and 252, for traditional stimulus-locked, and RT-corrected averaging procedures with k = 1, 2, 3, and 4, respectively. With traditional response-locked averaging, a peak that seemed to correspond to stimulus-locked P500 was observed at data point 82 relative to the response. For each averaging procedure, an optimized time window to assess P500 was determined by centering a 25-data-points window around P500 peak latency. Then, an ANOVA was carried out for mean ERP amplitude in these optimized time windows, involving factors Averaging Condition (six levels: traditional stimuluslocked, traditional response-locked, RT-corrected with k = 1, 2, 3, and 4), Priming Condition (switch, control) and Electrode (CPz, Pz). In this analysis, there was a significant main effect of Averaging Procedure, F(5, 95) = 3.1, p < .05, e = 0.52. Post-hoc Scheffe’s tests, however, did not reveal significant differences in P500 amplitude between any two averaging procedures (all p values > .10). A significant main effect of Electrode, F(1, 19) = 19.8, p < .001, was due to larger P500 amplitude at Pz than CPz. Priming
204
H. Gibbons, J. Stahl / Clinical Neurophysiology 118 (2007) 197–208
Fig. 6. Grand-average RT-corrected ERP waveforms as a function of Priming Condition (switch, control) for k = 3. Three hundred and eight data points are displayed which corresponds to mean RT in the faster priming condition (control, 616 ms). Two major priming effects distinguish the switch condition from the control condition, reduction of fronto-central positivity, and a distinct additional positive peak at around data point 250 (i.e., 500 ms) at centralto-parietal midline electrodes (P500).
Condition also had a significant effect on P500 amplitude, F(1, 19) = 10.7, p < .01, indicating larger P500 for switch than control. This main effect was further qualified by a significant interaction between Averaging Condition and Priming Condition, F(5, 95) = 3.4, p < .05, e = 0.55. Posthoc Scheffe’s test revealed larger P500 for switch than control for the RT-corrected averaging procedures with k = 3 (6.9 and 5.5 lV, p < .001) and k = 4 (6.8 and 5.7 lV, p < .05; see Fig. 9). For k = 2, the difference approached significance (6.4 and 5.3 lV, p = .08). By contrast, with traditional stimulus- and response-locked averaging and with linear RT-corrected averaging (k = 1) there were no significant switch–control differences in P500 amplitude (all p values > .50). 4. Discussion The present study investigated a novel approach to ERP averaging that takes into account RT variability across trials and participants before averaging is performed. For this purpose, ERP segments were adjusted for RT such that the stimulus–response interval was projected onto a common length, corresponding to mean RT in the relevant experi-
mental condition. Because it was not entirely clear to which amount RT variability stemmed from early sensory or late cognitive stages of processing, RT-correction was performed according to several mathematical functions that differed with respect to how RT variability was distributed over the stimulus–response interval. While with one correction function (k = 1) latency variability was equally distributed over the stimulus–response interval, with other functions (k = 2, 3, 4) latency variability was progressively distributed over late portions of the ERP. Furthermore, traditional response-locked ERP averaging was considered a potential tool to assess amplitudes of late ERP components more precisely than traditional stimulus-locked averaging. Two experimental conditions from a priming task based on stimulus identification (Gibbons, 2006; Gibbons et al., 2006) were employed to investigate the performance of RT-corrected ERP averaging. In the control condition two stimuli served as target and distractor that had not appeared in the preceding prime display. By contrast, in the switch condition designed to produce RT cost, target and distractor stimulus were the preceding distractor and target stimuli, respectively. Consistent with the literature
H. Gibbons, J. Stahl / Clinical Neurophysiology 118 (2007) 197–208
205
Fig. 7. Grand-average response-locked ERP waveforms as a function of Priming Condition. Baseline correction was performed with reference to the interval (100 ms, 0 ms) relative to probe display presentation (not shown).
(e.g., Stadler and Hogan, 1996), a negative-priming effect was found in terms of a moderate RT increase by 16 ms in the switch condition. Two aspects of performance of the novel ERP averaging procedure were of particular interest within the present study. First, if indeed late cognitive processes and their
ERP reflections (which, with the present task, largely involve the LPC) had occurred at different points in time in fast vs. slow trials/participants, RT-correction prior to the averaging procedure should enhance amplitude of the LPC. Second, if there was a true difference between two experimental conditions (switch and control in the present
Fig. 8. Grand-average RT-corrected ERP waveforms at electrode CPz, as a function of Priming Condition and Averaging Condition. Note that the P500 component in the switch condition appears most pronounced with k = 3 and k = 4.
206
H. Gibbons, J. Stahl / Clinical Neurophysiology 118 (2007) 197–208
Fig. 9. Mean P500 amplitude as a function of Priming Condition and Averaging Condition. Data have been collapsed across electrodes CPz and Pz. ***According to Scheffe’s test at p = .001 significant priming effect for k = 3. *According to Scheffe’s test at p = .05 significant priming effect for k = 4.
largest string length for the averaging procedure that most effectively distributes RT variability over late portions of the ERP, thereby optimally synchronizing ERP reflections of equivalent brain processes. Unfortunately, string length seems not appropriate in the present case since, unlike with the traditional stimulus-locked average, interpolation of amplitudes for the RT-corrected ERP segments is necessary (see Section 2). Interpolation, however, works as a filter that reduces amplitude differences between neighboring data points and thereby, string length. Thus, even if a certain RT-corrected averaging procedure produces (visually) more complex ERP averages, this must not necessarily become evident in a larger string measure. AUC was therefore preferred as a measure of precision of LPC assessment which is largely unaffected by interpolation. 4.2. RT-corrected averaging and the assessment of experimental effects on late ERP components
case) regarding a circumscribed LPC subcomponent, with the RT-corrected averaging procedure it should be easier to statistically underpin this difference as with traditional stimulus- and response-locked averaging procedures. The present findings indicate that two RT-corrected averaging procedures passed both these tests with success. With both the cubic and the to-the-power-of-four functions, area under the curve (AUC) was significantly enhanced compared to both traditional stimulus-locked averaging and linear RT-corrected averaging. Moreover, this effect was restricted to the second half of the ERP waveforms, where it could be expected for correction functions that assign most of the latency variability to late portions of the ERP. Of course, the superiority of k = 3 and k = 4 regarding the assessment of the LPC as measured by AUC may critically depend on the present stimulus identification task. With different tasks, one must expect different patterns of performance for k = 1–4, and still other functions not tested here may prove to be most successful. 4.1. String length as an alternative measure of the precision of LPC assessment? One may argue that measures other than AUC may be better suited to determine whether a certain averaging procedure ‘‘assesses the LPC more precisely’’ than others. In particular, one may conclude on precise assessment of the late ERP range only if distinct subcomponents can be seen in the average, i.e., distinct short-duration peaks and troughs, rather than a single broad positive wave. Such greater complexity of the waveform should increase the so-called ‘‘string length’’ (Hendrickson and Hendrickson, 1980). With this measure, amplitude differences between neighboring data points, given in absolute values, are added up within a certain time window. Clearly, string length becomes greater as the ERP in the interval of interest is more complex in the above sense. Thus, one would predict
Regarding the P500 priming effect, it was of interest whether there was a certain k P 1 for which an amplitude difference between switch and control conditions could be more convincingly demonstrated than with the traditional stimulus- and response-locked averages. For each averaging condition, optimized time windows to assess the ERP priming effect were determined by centering a 25-datapoints (50-ms) window around P500 peak latency as observed at electrode Pz in the grand-averaged ERP waveforms for the switch condition. Using mean ERP amplitude in the optimized time windows as dependent measure, it was shown that significance of the P500 priming effect depended on the averaging procedure employed. While the P500 priming effect was significant with RT-corrected averaging with k = 3 (p < .001) and k = 4 (p < .05), it only approached significance with RT-corrected averaging with k = 2 (p = .08) and was absent with the remaining averaging procedures, particularly traditional stimulus- and response-locked averaging (p values > .50). It is interesting that k = 4 performed worse than k = 3 because from Fig. 8 one might get the impression of similar P500 priming effects for k = 4 than k = 3. However, standard deviation of the individual priming effects in the 20 participants was larger for k = 4 than k = 3, causing a smaller p value despite the similar mean switch–control amplitude difference. From this observation one can conclude that, at least for the present data, the tothe-power-of-four function distributes latency variability over late portions of the ERP in a less optimal way than the cubic function (k = 3), regarding the synchronization of equivalent late brain processes. Furthermore, it is worth noting that a stimulus-locked, RT-corrected averaging procedure (k = 3) performed reliably better in assessing the P500 priming effect than the response-locked averaging procedure, despite the fact that the time window of interest (data point 250 ± 25 data points) is much closer to the overt response (which occurs at around data point 310)
H. Gibbons, J. Stahl / Clinical Neurophysiology 118 (2007) 197–208
than to stimulus presentation (data point 0). Thus, there must be substantial contributions of motor processing to RT variability which are independent of those cognitive sources of RT variability that presumably underlie the latency variability of P500. Only of minor interest within the present article are considerations regarding the functional significance of the P500 priming effect. It appears that for the switch condition, at least on some trials, an additional late cognitive process is required that could perhaps be understood as a second P300 indicating stimulus ‘‘rechecking’’ (Johnson, 1986). Remember, the same two digits appeared in the probe as in the prime display, but color-reversed. Thus, the P500 priming effect may suggest that, at an initial stage of analysis, the probe display is erroneously identified as identical to the preceding prime display (Gibbons, 2006). To still correctly identify the correct probe response may then require a second stimulus analysis being reflected in a second P300 component. 4.3. Comparison of RT-corrected averaging and other alternative ERP averaging techniques Alternatives to the traditional stimulus-locked ERP average are not entirely new in the literature. For example, with the Woody average (Woody, 1967) trial-by-trial latency differences are estimated based on cross-correlations between single-trial waveforms and a template averaged waveform. The single-trial waveforms are then shifted by exactly the time lag where the respective cross-correlation coefficients were largest. Averaging the shifted waveforms results in a new template, and the procedure is repeated. Eventually, the template represents the Woody average which compensates for between-trials latency jitter. Note, however, that with this procedure no attempt is being made to relate latency variability of single-trial ERP components to RT variability, which would seem desirable within the field of mental chronometry. Moreover, as Picton et al. (1995) emphasize, the Woody average is ‘‘most useful when the ERP is characterized by one major wave and smaller waves that either vary in the same way as the larger wave or are not relevant to the analysis’’ (p. 17). The major wave that dominates the shifting of individual trials is, of cause, most often the P300 component. For the present data, it is clear that a P300-dominated Woody average would not perform as effectively in establishing the P500 priming effect as the RT-corrected procedure with k = 3. This is because with the latter procedure, latency change of data points in the P300 range was only (300/500)3 22% of the latency change of data points in the P500 range (see the formula in Section 2 above). Nevertheless, the RT-corrected procedure with k = 3 performed outstandingly well in establishing a P500 priming effect. The Warp average (Picton et al., 1988) represents another alternative to traditional stimulus-locked averaging. For ease of description, assume only two single-trial EEG waveforms to be averaged, A and B. For each data point
207
of A, with Warp averaging that data point of B is determined that most closely matches the respective data point of A, with respect to the parameters amplitude and slope. Data points of both waveforms are then individually shifted in time (yet preserving the order of data points) to achieve a pairing of data points of A and B for which an index of dissimilarity of the waveforms becomes minimal. Thereby, it is ensured that equivalent portions of waveforms A and B are located at similar points in time before traditional averaging is being performed. However, the Warp average has its problems whenever the waveforms to be averaged contain high-frequency noise. In these cases, two data points can be determined as highly similar based on parameters introduced by noise, rather than by true ERPs, and waveforms are shifted accordingly. Picton et al. (1995) therefore recommend Warp averaging at the level of grand averages only, where smooth waveforms averaged across trials within one participant are entered, and not for averaging of noisy single-trial segments. The present study, however, aimed at an improvement of single-trial ERP averaging, for which Warp averaging may not be appropriate. Finally, Woldorff’s (1993) ADJAR technique (for adjacent responses) shall be mentioned. ADJAR assumes fixed ERP components associated with distinct events separated by variable delays. Although originally developed to separate overlapping components evoked by consecutive stimuli, the ADJAR filter could also be used to separate fixed sets of stimulus-following components and response-preceding components that overlap at variable delays on different trials. This would decontaminate the response-locked average from stimulus-locked components that are smeared through the RT distribution, and vice-versa. The disadvantage of this approach is that it does not model the processes by which RTs get to vary, as if that processes did not contribute directly to the ERPs, except in the delay between triggering the stimulus-locked and the response-locked components. In particular, the present approach differs from ADJAR insofar as it does not assume ERP components to be either time-locked with respect to stimulus presentations or responses. A component that is not time-locked to the stimulus (i.e., occurs later with slow than fast responses) must not necessarily precede the response at fixed intervals. This notion receives support from the present P500 findings; neither the traditional stimulus-locked nor the traditional responselocked average could establish a P500 priming effect. 4.4. How can the optimal RT-correction function be determined for a given data set? So far, the present work has shown that RT-correction of ERP segments prior to the averaging procedure, both on the level of single trials and individual averages, can be superior to traditional stimulus-locked and responselocked ERP averages regarding the assessment of the LPC, and experimental effects upon. One may even predict
208
H. Gibbons, J. Stahl / Clinical Neurophysiology 118 (2007) 197–208
that the rationale of RT-correction can be applied with still more success to experimental tasks that require higher cognitive operations than the present one. For example, mean RT in mental-rotation or mental-arithmetic tasks usually exceeds the present range of 600 ms. At the same time, RT variability (and latency variability of late cognitive processes) will markedly increase, making an RT-correction approach to ERP averaging still more desirable. Two cautionary notes have to be made. First, with other tasks most likely other RT-correction functions than in the present study will be most appropriate to assess late portions of the ERP, and experimental effects upon. Therefore, interested researchers should routinely use more than one correction function; it is recommended to try at least values for k ranging from 2 to 4. The most promising approach, however, could be to first compute separate traditional ERP averages for quartiles of fastest to slowest responses/participants. From these averages one might get an idea about how latency of a certain ERP component is related to RT variability. For example, with a given task one may find that N1 and P2 peak latencies do not differ between RT quartiles, whereas N2 peak latency occurs on average 10 ms later in the slowest compared to the fastest quartile, and P3 peak latency occurs on average 30 ms later, etc. This kind of information could then be used to a-priori select a correction function that sufficiently fits the empirical data, regarding the distribution of latency variability over the stimulus–response interval. The second cautionary note relates to an implicit assumption of the present approach, namely that different RTs within a given experimental condition are primarily accompanied by different latencies of a certain invariable set of ERP components. This, however, must not always be the case: slow responses within one and the same experimental condition may differ from fast responses in that additional ERP components appear, whereas other components do not differ, in particular with respect to their latency. With such a scenario, one would expect superior performance of traditional stimulus-locked ERP averaging all along the time scale, because RT-corrected averaging would introduce smearing into the averaged waveforms rather than help to avoid it. Note, however, that a pattern of additional ERP components with slow trials may be uncovered with the above-mentioned approach of ERP comparisons between RT quartiles. Thus, this latter approach may not only serve to determine an optimal RT-correction function but also indicate whether RT-corrected averaging should be performed at all. Finally, RT-corrected averaging may be further improved when only a certain part of the stimulus–response interval is expanded/compressed in time. For instance, one would perhaps agree that within the first 100 ms following stimulus presentation variability in the timing of information processing is very small and, moreover, unrelated to RT variability. Similarly, once in a trial
the motor program of the correct response has been started, most contributions to RT variability are already made. Thus, within the last 50 ms preceding the overt response rather few sources of RT variability might be assumed. However, with the present RT-correction procedures, for all values of k greater than 1, this last 50-ms interval gets assigned more RT variability than earlier 50-ms intervals. We therefore plan to reanalyze the data employed here along with other ERP data, using a similar approach where, however, only (100 ms; RT-50 ms) ERP segments are fed into the RT-correction algorithm. Acknowledgements The authors thank two anonymous reviewers for their helpful comments on an earlier draft of this article. References Christie J, Klein RM. Negative priming for spatial location? Can J Exp Psychol 2001;55:24–38. Hendrickson DE, Hendrickson AE. The biological basis of individual differences in intelligence. Pers Indiv Differ 1980;1:3–33. Falkenstein M, Hohnsbein J, Hoormann J. Effects of choice complexity on different subcomponents of the late positive complex of the eventrelated potential. Electroen Clin Neuro: Evoked Potentials 1994;92:148–60. Gibbons H. An event-related potential investigation of varieties of negative priming. J Psychophysiol 2006;20:170–85. Gibbons H, Rammsayer TH, Stahl J. Multiple sources of positive and negative priming effects: an evoked-potential study. Mem Cogn. 2006;34:172–86. Gratton G, Coles MGH, Donchin E. A new method for off-line removal of ocular artifact. Electroen Clin Neuro 1983;55:468–84. Jasper HH. The ten–twenty electrode system of the International Federation. Electroen Clin Neuro 1958;20:371–5. Johnson R. A triarchic model of P300 amplitude. Psychophysiology 1986;23:367–84. Picton TW, Lins OG, Scherg M. The recording and analysis of eventrelated potentials. In: Boller F, Grafman J, editors. Handbook of neuropsychology, vol. 10. New York: Elsevier; 1995. p. 3–73. Picton TW, Hunt M, Mowrey R, Rodriguez R, Maru JT. Evaluation of brainstem auditory evoked potentials using dynamic time warping. Electroen Clin Neuro 1988;71:212–25. Ro¨sler F, Heil M. Toward a functional categorization of slow waves: taking into account past and future events. Psychophysiology 1991;28:344–58. Ruchkin DS. Measurement of event-related potentials: signal extraction. In: Picton TW, editor. Handbook of electroencephalography and clinical neurophysiology, Rev. series, vol. 3. Human event-related potentials. Amsterdam: Elsevier; 1988. p. 7–43. Rugg MD, Doyle MC. Event related potentials and stimulus repetition in direct and indirect tests of memory. In: Heinze HJ, Mu¨nte T, Mangun GR, editors. Cognitive electrophysiology. Boston: Birkha¨user; 1994. p. 124–48. Stadler MA, Hogan ME. Varieties of positive and negative priming. Psychon B Rev 1996;3:87–90. Woldorff MG. Distortion of ERP averages due to overlap from temporally adjacent ERPs: analysis and correction. Psychophysiology 1993;30:98–119. Woody CD. Characterization of an adaptive filter for the analysis of variable latency neuroelectric signals. Med Biol Eng 1967;5:539–53.