Auditory feedback in error-based learning of motor regularity

Auditory feedback in error-based learning of motor regularity

brain research 1606 (2015) 54–67 Available online at www.sciencedirect.com www.elsevier.com/locate/brainres Research Report Auditory feedback in e...

926KB Sizes 4 Downloads 76 Views

brain research 1606 (2015) 54–67

Available online at www.sciencedirect.com

www.elsevier.com/locate/brainres

Research Report

Auditory feedback in error-based learning of motor regularity$ Floris T. van Vugta,b,n, Barbara Tillmanna a

Lyon Neuroscience Research Center, Auditory Cognition and Psychoacoustics Team CNRS-UMR 5292, INSERM U1028, University Lyon-1, Lyon, France b Institute of Music Physiology and Musicians’ Medicine, University of Music, Drama, and Media, Hanover, Germany

ar t ic l e in f o

abs tra ct

Article history:

Music and speech are skills that require high temporal precision of motor output. A key

Accepted 9 February 2015

question is how humans achieve this timing precision given the poor temporal resolution

Available online 23 February 2015

of somatosensory feedback, which is classically considered to drive motor learning. We

Keywords:

hypothesise that auditory feedback critically contributes to learn timing, and that,

Motor learning

similarly to visuo-spatial learning models, learning proceeds by correcting a proportion

Auditory feedback

of perceived timing errors. Thirty-six participants learned to tap a sequence regularly in

Timing

time. For participants in the synchronous-sound group, a tone was presented simulta-

Movement variability

neously with every keystroke. For the jittered-sound group, the tone was presented after a

Action–perception coupling

random delay of 10–190 ms following the keystroke, thus degrading the temporal

Music

information that the sound provided about the movement. For the mute group, no

Feedback error-based learning

keystroke-triggered sound was presented. In line with the model predictions, participants in the synchronous-sound group were able to improve tapping regularity, whereas the jittered-sound and mute group were not. The improved tapping regularity of the synchronous-sound group also transferred to a novel sequence and was maintained when sound was subsequently removed. The present findings provide evidence that humans engage in auditory feedback error-based learning to improve movement quality (here reduce variability in sequence tapping). We thus elucidate the mechanism by which high temporal precision of movement can be achieved through sound in a way that may not be possible with less temporally precise somatosensory modalities. Furthermore, the finding that sound-supported learning generalises to novel sequences suggests potential rehabilitation applications. & 2015 Elsevier B.V. All rights reserved.

☆ This research was supported by the EBRAMUS, European Brain and Music grant (ITN MC FP7, GA 238157). The team “Auditory cognition and psychoacoustics” is part of the LabEx CeLyA (“Centre Lyonnais d’Acoustique”, ANR-10-LABX-60). We are indebted to Alexia Ferréol for invaluable assistance in running a subset of participants. n Correspondence to: Lyon Neuroscience Research Center, Team CAP CNRS-UMR 5292 INSERM U1028, 50 Avenue Tony Garnier, 69007 Lyon, France. E-mail address: [email protected] (F.T. van Vugt).

http://dx.doi.org/10.1016/j.brainres.2015.02.026 0006-8993/& 2015 Elsevier B.V. All rights reserved.

brain research 1606 (2015) 54–67

1.

Introduction

Music and speech are skills that require high temporal precision of motor output (Merchant et al., 2013). For example in speech, precise timing of articulatory output is critical for being understood (Dromey et al., 1995; Keller, 1990). Similarly in music, expert pianists were previously shown to achieve stunning timing accuracy, such as trial-to-trial timing consistency in the order of milliseconds when playing musical scales (van Vugt et al., 2012; Wagner, 1971). Such small deviations were reproduced reliably, even more than a year later (van Vugt et al., 2013a). A key question is how humans can achieve such timing precision in spite of noisy sensory input. Proprioception and tactile perception, which are classically thought to drive motor learning, have low temporal resolution. For example, temporal resolution of tactile input is over 50 ms even for passive stimulation (Tinazzi et al., 1999). Similarly, timing resolution of visual perception of movements is in the order of tens of milliseconds at least (Carlini and French, 2014). How can humans achieve millisecond-resolution motor timing with such unreliable sensory information? We hypothesised that motor timing may be learnt through auditory feedback. Auditory temporal resolution is known to be high, for example allowing discrimination of sounds by as little as several milliseconds (Exner, 1875). Sensitivity to timing is typically found to be greater in the audition than other modalities (Glenberg et al., 1989; Kanai et al., 2011; Karabanov et al., 2009). We furthermore suggest that the mechanism by which this learning proceeds is error-correction. A large body of evidence in visuomotor learning suggests that human visuomotor learning proceeds by correction of spatial error (Cheng and Sabes, 2006; Ghahramani et al., 1997; Kawato et al., 1987; Shadmehr et al., 2010; Thoroughman and Shadmehr, 2000). According to this theory, motor programs are updated on a trial-by-trial basis by a proportion of the visual movement error. Similarly in speech production, adaptation to experimentally induced formant shifts may proceed by correcting for a proportion of the produced formant frequency error, supported by empirical studies in humans (Houde and Jordan, 1998) and songbirds (Sober and Brainard, 2009, 2012). However, to date no evidence exists that this process can occur in the temporal domain. Correction of temporal deviations is thought to be a critical process in sensorimotor synchronisation (Repp and Su, 2013). This suggests that the brain can perform the necessary computations for temporal error correction. It is shown that temporal error correction may occur in maintaining periodic movement, but not necessarily yield improvement of motor performance over time (as in motor learning, which we are concerned with here). Previous work highlighted the potential for auditory feedback in motor learning of timing. For instance, Ronsse et al. (2011) had participants learn to perform a 90 deg out-of-phase bimanual cyclic wrist movement regularly in time. One group received visual feedback in the form of a two-dimensional figure that showed their movement on-line. A second group received auditory feedback in the form of sounds that marked the timing of particular points in their trajectories. Both groups improved equally in their performance over time. The auditory

55

group performed identically on a retention test without auditory feedback, but the visual group was severely impaired when visual feedback was removed. This suggested that the visual group had become dependent on the feedback whereas the auditory group had not. This finding was furthermore supported by neuroimaging observations of remnants of visual activity even in the no-feedback retention test. Even though an important contribution to understanding of auditory feedback in motor learning, the study by Ronsse and colleagues does not clarify the mechanism according to which learning proceeded. It remains unclear exactly how participants used auditory feedback, and in particular, it remains to be shown that participants can engage in error-based learning in such an experimental setting. Also, because both groups showed similar improvements in task performance, it might be argued that motor learning was actually driven by somatosensory feedback. In sum, to date, it remains unclear whether the brain can use auditory feedback for motor learning of timing, and if yes, by which kind of mechanism. We here propose an adaptation of models for visuomotor adaptation learning, extrapolated to the temporal domain. According to these models (Cheng and Sabes, 2006; Ghahramani et al., 1997; Thoroughman and Shadmehr, 2000), learning schematically proceeds as follows. Performance on a trial t is represented by a vector mt (which could be, for instance the movement end-point or, in our case, a vector of timings of keystrokes). The performance is perceived as a sensory feedback vector ft that is compared to a target vector x, such that the perceived error for that trial is xft. In feedback-based learning, performance is updated by mtþ1 ¼ a (xft), where a is a scalar that determines the learning rate (Ghahramani et al., 1997). In this way, over time, performance will converge onto the target and thus motor learning occurs. A critical prediction of feedback error-based learning models is that if the feedback is distorted, i.e. if ft is noisy, learning should be impaired because the system is correcting for errors that were never made in the first place and the learning rate should be affected (Burge et al., 2008; Wei and Körding, 2010). In other words, in experimental conditions in which feedback is distorted, participants should show impaired learning. In our case, because we investigate learning in the temporal domain, our distortion is implemented as a jittered delay. Our participants were monitored while they explicitly learned to tap a sequence of keystrokes as regularly as possible. Movement variability has been suggested to be an equally important measure of motor performance as is speed (Reis et al., 2009; Shmuelof et al., 2012) and might even be a more reliable metric of real-life motor skill than speed (Gobel et al., 2011; Haith and Krakauer, 2013). We quantified the regularity of tapping as the variability of the inter-tap-intervals (ITIs). The crucial manipulation in our experiment was that participants received auditory feedback (tones) either simultaneously with the keystrokes (synchronous-sound group) or jittered in time (jittered-sound group). An additional group of participants did not receive sounds linked to the keystrokes (mute group). To ensure that participants tapped at the same speed, but to avoid confounds arising from an auditory pacing signal, we used a visual count-in cue prior to each trial (Hove and Keller, 2010; Hove et al., 2010). This visual count-in cue indicated speed and regularity in the form of a yellow bar that is presented above the numbers indicating the to-be-tapped

56

brain research 1606 (2015) 54–67

Fig. 1 – (A) Experimental procedure on a single trial. A visual count-in cue (in yellow) first indicated the items of the to-beproduced sequence one by one at the target speed (300 ms ITI). The cue then disappeared and the participant tapped the sequence on the four response buttons. After each keystroke, the item in the sequence was greyed out on the screen. (B) Schematic illustration of how the four fingers were placed and the approximate spatial layout of the four keys. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

sequence items prior to each trial (Fig. 1). We used the (same) tempo for each trial of the experimental blocks. The experiment spanned two days. The first day, participants were trained in tapping a seven-digit sequence (primary sequence). The second day, the participants continued with the primary sequence and were subsequently introduced to a novel sequence (secondary sequence). In our design, all response keys were mapped to the same sound. In this way, we eliminated the potential confound that participants may use sound type for response selection (Hoffmann et al., 2001). We furthermore investigated whether improvements due to auditory feedback could be maintained in the absence of feedback once learning had occurred. This is important for clinical applications. For example, if auditory-feedback were used during motor stroke rehabilitation, it is important that what is learned during therapy generalises to other activities that are not necessarily accompanied by sounds, such as patients’ activities of daily living. As in daily life, auditory feedback of movement is often absent, it is important that patients do not become reliant on auditory feedback. The idea that participants might become reliant on the feedback presented during learning is referred to as the guidance hypothesis (Salmoni et al., 1984).

The guidance hypotheses predicts that performance would return to baseline once the feedback is removed. In our experiment, we thus addressed the following questions. First, can auditory feedback be used to improve tapping regularity (temporal evenness) in a motor sequence production task? We hypothesised that the group who received sounds synchronously with the keystrokes would improve in tapping regularity, in particular in comparison to the mute group receiving no keystroke-triggered sounds. Second, a group receiving temporally jittered sounds is predicted to also show impaired learning because, if they use feedback error-based learning, they will be correcting for motor errors that were not committed. Third, we tested whether the groups that received sounds (simultaneous or jittered) maintained potentially occurring motor improvements when they are subsequently deprived of auditory feedback. Fourth, we aimed to find evidence that improvements due to auditory feedback carry over to different sequences of movements (that is, we expected to see an improvement in tapping regularity in an untrained motor sequence). In addition to our motor learning task, we included a number of auxiliary tests to assess participants’ basic auditory and sensorimotor capacities, aiming to understand individual

brain research 1606 (2015) 54–67

57

Fig. 2 – Regularity of the tapping timing of the sequences trained during the two days. The vertical axis denotes the variability of the within-sequence inter-keystroke-intervals, expressed as a percentage of the average duration and presented on a logarithmic scale. The x-axis indicates the blocks, arranged chronologically. Here p1 to p10 indicate the 10 training blocks of the primary sequence and s1 to s3 indicate the training blocks of the secondary sequence.

differences in learning. Participants’ basic auditory temporal precision was tested with an anisochrony detection task (Ehrlé and Samson, 2005; Hyde and Peretz, 2004) that measured thresholds for detecting a slightly delayed note in an otherwise isochronous five-tone sequence. To measure participants’ temporal precision for the prediction of an auditory effect of their keystrokes, we used a delay detection threshold task that measured participants’ sensitivity to delays between motor (keystroke) and auditory (tone) events (van Vugt and Tillmann, 2014). That is, we established from which delay onwards, participants noticed that the tone came after the keystroke instead of immediately. This delay-detection task was also included to assess whether the temporal jitter that we included in the motor learning phase may have gone unnoticed by the participants. Finally, participants’ auditory-motor synchronisation capabilities were measured with a synchronisation–continuation tapping paradigm (Tillmann et al., 2011; Wing and Kristofferson, 1973). These tests helped to describe our groups better (thus eliminating potential confounds or group differences), and confirmed that the jitter went unnoticed. However, as these results were not critical to the groupdifferences observed for movement regularity learning that are the main topic of our paper, these tests and their results are described in the Supplementary materials only.

2.

Results

2.1.

Sequence learning—Primary sequence

Day 1. To compare baseline tapping performance, we investigated tapping performance in the last warmup block, during which participants did not yet receive keystroke-triggered sounds but the visual count-in cue prescribed the 300 ms IOI tapping speed as in the experimental blocks. We calculated an ANOVA with between-factor group (synchronous-sound, jittered-sound and mute) and dependent variable tapping regularity during the last warmup block. The three groups did not differ in tapping regularity [F(2, 33)¼ .43, p¼.66]. When we

re-ran the same ANOVA for the first main block we again found no difference between groups [F(2, 33)¼ 2.01, p¼ .15]. Critically, this indicated that auditory feedback had no effect on initial motor performance. In order to assess motor learning during first day, we computed a 3  6 ANOVA with group (synchronous-sound, jittered-sound and mute) as between-participant factor and block (six levels corresponding to the six blocks) as withinparticipant factor and tapping regularity as dependent variable. There was a statistical trend for a main effect of group [F(2, 33)¼2.86, p¼.07] as well as statistical trends (after Greenhouse–Geisser-correction) for the effect of block [F(7, 231)¼ 2.28, p¼.03, pGG ¼ .076, η2G ¼ .01] and the interaction between group and block [F(14, 231)¼ 2.00, p¼ .02, pGG ¼.06, η2G ¼ .02] (Fig. 2). Analysing the groups separately, we found that the synchronous-sound group showed a main effect of block [F(7, 77)¼ 3.05, pGG ¼ .047] indicating that the tapping of the sequence became more regular for this group over the course of the first day. By contrast, the jittered-sound group [F(7, 77)¼ 1.42, pGG ¼.25] and the mute group [F(7, 77)¼.94, pGG ¼ .42] showed no changes in tapping irregularity between the blocks. In order to rule out differences between groups in ordinal accuracy (number of times the sequence was tapped correctly) or speed, we calculated the same 6  3 ANOVA but with ordinal accuracy or speed as dependent variable. In terms of ordinal accuracy, participants gradually tapped more correct sequences in each block, from 20.5 (SD 3.7) correct sequences in the first block (out of 25) to 22.7 (SD 2.9) on the last block of the first day [F(6, 198)¼ 5.15, pGG ¼.0002, η2G ¼.03]. There were no differences between the groups [F(2, 33)¼.28, p¼ .76] and no group-block interaction [F(12, 198)¼1.25, p¼.25]. As for speed, participants were initially slightly slower than the tempo indicated by the visual count-in cue but then speeded up over the course of the first day of training [F(7, 231)¼5.12, pGG ¼.002, η2G ¼ .04]. They tapped at 350.1 (SD 48.6) ms average interval duration on the first block. Gradually, participants speeded up to 326.5 (SD 54.1) ms average interval duration at the last block of day 1. There was no

58

brain research 1606 (2015) 54–67

main effect of group [F(2, 33)¼.65, p¼.52] and no group-block interaction [F(14, 231)¼ 1.05, pGG ¼.40] (see Supplementary Fig. S2). Day 2. For the second day, we performed a 3  3 ANOVA with between-participants factor group (synchronous-sound, jittered-sound and mute) and within-participant factor block (three levels corresponding to the three blocks) and tapping regularity as dependent variable. The main effect of block was not significant [F(2, 66)¼ 2.22, pGG ¼ .12] nor was the interaction between group and block [F(4, 66)¼ .72, pGG ¼.56]. Crucially, the main effect of group was significant [F(2, 33)¼ 3.19, p¼ .05, η2G ¼ .15]. Post-hoc Tukey HSD testing revealed that this effect was due to the synchronous-sound group tending to tap more regularly than the jittered-sound group [p¼ .065], but not more regularly than the mute group [p¼ .13] and the jittered-sound and mute group were not different [p¼ .94] (Fig. 2). As additional analysis, we performed the same 3  3 ANOVA as above with ordinal accuracy and speed as dependent variable. The number of correct sequences did not differ between blocks [F(2, 66)¼2.17, pGG ¼ .13] or between the groups [F(2, 33)¼1.39, p¼ .26] and there was no interaction between block and group [F(4, 66)¼1.69, pGG ¼ .17]. Furthermore, the tapping speed was now at 324.1 (SD 45.6) ms and did not differ between the groups [F(2, 33)¼1.30, p¼ .28], remained constant between the blocks [F(2, 66)¼1.27, p¼ .29] and there was no interaction [F(4, 66)¼.49, p¼.74] (Supplementary Fig. S2).

2.2.

Sequence learning—Secondary sequence

We performed a 3  3 ANOVA with group (synchronoussound, jittered-sound and mute) as between-participants factor and block as within-participants factor and tapping regularity as dependent variable. There was a main effect of group [F(2, 33)¼ 7.10, p ¼.003, η2G ¼ .27]. Tukey HSD tests indicated that the synchronous-sound group tapped more regularly than the jittered-sound [p ¼.008] and the mute [p ¼.003] groups, but the mute and jittered-sound groups did not differ in tapping regularity [p ¼.92] (Fig. 2). We found no effect of block [F(2, 66)¼.02, p¼ .98] and no interaction between group and block [F(4, 66)¼.68, p ¼.60]. As for the primary sequence, we performed additional ANOVAs with the same design as above but with ordinal accuracy or speed as dependent variables. Participants tapped more correctly during the third block (M¼ 22.4, SD¼2.2 correct sequences) than the first block (M¼ 20.6, SD¼3.8 correct sequences) [F(2, 66)¼8.84, pGG ¼.0007, η2G ¼.05]. There was a statistical trend for a main effect of group [F(2, 33)¼ 2.73, p¼ .08] (indicating that the synchronous-sound group produced slightly more correct sequences than the other two groups) but no interaction between group and block [F(4, 66)¼ 1.58, p¼ .19]. As for speed, participants achieved greater speed in the later blocks [F(2, 66)¼9.63, pGG ¼ .0004, η2G ¼.04], but there was no main effect of group [F(2, 33)¼ .75, p ¼.48] and no interaction [F(4, 66)¼.85, p ¼.50].

2.3.

Transfer to an untrained sequence

To investigate whether participants’ sequence learning for the primary sequence transferred to the secondary (novel) sequence, we compared the last block of the primary sequence and the first block of the secondary sequence. To also compare

Fig. 3 – The effect of sequence switching. The figure indicates that the synchronous-sound group improves in tapping regularity (reduced variability) whereas the mute and jittered-sound group’s performance remains the same.

the starting blocks of the two sequences, we furthermore included the first block of the primary sequence. That is, we performed an 3  3 ANOVA with group (synchronous-sound, jittered-sound and mute) as between-participants factor and block as within-participants factor (block 1 and 10 of the primary sequence and block 1 of the secondary sequence) and tapping regularity as dependent variable. The main effect of group [F(2, 33)¼4.80, p¼ .01, η2G ¼ .19] and block [F(2, 66)¼ .605, p¼.004, η2G ¼ .03] were significant, as was the interaction between the two [F(4, 66)¼3.33, p¼ .02, η2G ¼.04]. We therefore compared the three blocks for each group individually. The synchronous-sound group became more regular between the first and last block of the primary sequence [p¼ .01] and between the first blocks of the two sequences [p¼ .0002] but the last block of the primary sequence was not different from the first block of the secondary [p¼.77]. The jittered-sound group showed no significant differences between any block pairs [all p¼ 1.0]. The mute group showed only an improvement between the first and last primary block [p¼.02] but not between the other block pairs [p¼ .98 and p¼.26] (Fig. 3). In order to investigate ordinal accuracy or speed differences, we performed the same ANOVA as above with ordinal accuracy and speed as dependent variables. In ordinal accuracy, the main effect of group was not significant [F(2, 33)¼ .87, p¼ .42]. The main effect of block was significant [F(2, 66)¼ 12.36, po.001, η2G ¼ .10] and the interaction between group and block just fell short of significance [F(4, 66)¼2.45, p¼.053, η2G ¼ .04]. We proceeded to analyse the three groups separately. The synchronous-sound group improved in accuracy between the first and last block of the primary sequence [p¼.001], and performed better for the first block of the secondary sequence than for the first block of the primary sequence [p¼ .04], while the last block of the primary and first block of the secondary sequence did not differ [p¼.87]. The mute group, however, showed only a statistical trend for an improvement between the first and last block of the primary sequence [p¼.06] and then a sharp decline

brain research 1606 (2015) 54–67

59

in accuracy with the secondary sequence [p¼.0006]. There was no difference between the first blocks of the two sequences [p¼.49]. The jittered-sound group showed a decline in accuracy between the last block of the primary sequence and the secondary sequence [p¼ .045] but no differences between the other pairs of blocks [p¼ 1.00 and p¼ .24]. For tapping speed, there was a main effect of block [F(2, 66)¼ 11.87, po.001, η2G ¼.11]. Tukey contrasts revealed that this signalled significant differences between the first and last primary sequence blocks [po.0001] and between the first block of the secondary sequence and the first [p¼ .04] and last [p¼ .04] blocks of the primary sequence. However, there was no main effect of group [F(2, 33)¼1.60, p¼ .22] and no interaction between group and block [F(4, 66)¼ .60, p¼ .73] (Supplementary Fig. S3).

2.4.

Testing for reliance on auditory feedback

To investigate the effect of eliminating the keystroketriggered sound (“mute” blocks) for the synchronous-sound and jittered-sound groups, we performed a 2  2  2 ANOVA with group (synchronous-sound vs. jittered-sound) as between-participants factor and sequence (primary, secondary) and block (mute, sound) as within-participant factors. We did not include the mute group in this analysis because they did not receive keystroke-triggered sounds at all during the experiment. We found a main effect of group [F(1, 22)¼ 5.95, p¼ .02, η2G ¼ .18] which indicated that the synchronoussound group tapped more regularly than the jittered-sound group in all four blocks under consideration here. There were no other significant effects [all Fo1.42, p4.25] (Fig. 4). In order to check for differences in ordinal accuracy or speed, we performed the same ANOVAs as above for those dependent variables. For ordinal accuracy we found a main effect of condition (mute vs. sound) indicating that both groups tended to produce fewer correct sequences in the mute block [F(1, 22)¼ 9.18, p¼.006, η2G ¼.05]. There were no significant effects [all Fo1.45, p4.32]. For sequence production speed, we found an interaction effect of group (synchronous-sound, jittered-sound) and condition (sound vs. mute) [F(1, 22)¼ 4.91, p¼.037, η2G ¼ .02]. Therefore, we continued to analyse the two groups separately. For the synchronous-sound group, there were no significant effects [both Fo2.4, p4.15]. For the jittered-sound group, only the main effect of condition was significant [F(1, 11)¼ 20.85, p¼.0008, η2G ¼ .12]: the jittered-sound group tapped faster during the mute block than during the sound block (Supplementary Fig. S4).

2.5.

Scale sequence tapping

Participants were tested tapping a scale sequence (1234321) either at a speed indicated by the visual count-in cue (presetspeed) or as fast as they could (maximal-speed). For the preset-speed and maximal-speed scale blocks we performed a 2  3 ANOVA with group (synchronous-sound, jittered-sound and mute) as between-participants factor, day (day 1 before the learning part; day 2 after the learning part) as withinparticipant factor and with regularity as dependent variable.

2.5.1.

Preset-speed scale blocks

The main effect of group was significant [F(2, 33)¼4.20, p¼.02, η2G ¼ .14], revealing that on the second day, the synchronous-

Fig. 4 – Tapping regularity between the muted block (labelled mute) and the preceding non-muted (labelled sound) block for the primary (left) and secondary (right) sequence. The x-axis indicates the blocks in chronological order (omitting several intermittent blocks between the primary and secondary sequence for clarity of presentation). The figure shows that the synchronous- and jittered-sound groups’ performance is not affected by absence of sound.

Fig. 5 – Performance on the scale (1234321) tapping blocks with preset speed (300 ms ITI) or maximal speed. The figure indicates that the preset-speed performance is essentially unaltered between days, whereas an increase in variability is observed in the maximal-speed scale tapping. No group differences were found.

sound group tapped more regularly than the other two groups [F (2, 33)¼ 4.08, p¼.03, η2G ¼.20], but not on the first day [F(2, 33)¼1.77, p¼ .19] (Fig. 5). We performed the same ANOVA as above for ordinal accuracy and for speed. For ordinal accuracy, there was only a tendency for an interaction between group and block [F(2, 33)¼3.19, p¼ .054, η2G ¼ .06], revealing that the jittered-sound group produced less correct scales on the second day than on the first day [F(1, 11)¼ 8.56, p¼ .01, η2G ¼ .16] whereas there was no difference for the synchronous-sound [F(1, 11)¼.20, p¼ .67] and mute [F (1, 11)¼.03, p¼ .86] groups (Fig. 5). No other effects were significant [Fo1.86, p417]. There were no speed differences between the groups [F(2, 33)¼ .97, p¼ .39] and no interaction between day and group [F(2, 33)¼ 1.66, p¼.20]. However, there was a main effect of day [F(1, 33)¼43.62, po.0001, η2G ¼ .29] indicating that participants

60

brain research 1606 (2015) 54–67

approached the target ITI (300 ms) more closely on the second day than on the first day (Supplementary Fig. S5).

2.5.2.

Maximal-speed scale blocks

Participants showed a marked decrease in regularity [F(1, 33)¼ 23.68, po.0001, η2G ¼ .14] on the second day relative to the first day. There were no differences between the groups [F(2, 33)¼ .28, p ¼.76] and no interaction [F(2, 33)¼.81, p¼ .45] (Fig. 5). Participants produced more correct sequences on the second day [F(1, 33)¼9.44, p ¼.004, η2G ¼.05], but there was no effect of group [F(2, 33)¼.39, p¼ .68] and no interaction between group and block [F(2, 33)¼ 2.02, p ¼.15]. On the second day, participants tapped faster [F(1, 33)¼ 76.87, po.0001, η2G ¼.14], but again no differences between the groups [F(2, 33)¼ .26, p ¼.78] or interaction [F(2, 33)¼ 1.97, p¼ .15] appeared (Supplementary Fig. S5). This suggests that participants favoured speed over accuracy on the second day, but in a way that did not differ between the groups.

3.

Discussion

The present study tested the hypothesis that motor learning of timing can occur through error-based learning with auditory feedback. We postulated that learning of timing may proceed in a fashion similar to feedback error-based learning previously observed in the spatial domain (Cheng and Sabes, 2006; Ghahramani et al., 1997), but here applied to the temporal domain. A critical prediction of such error-based learning models is that distorted feedback (here : temporally jittered) should impair learning because the motor system would be correcting for errors that were not there. Furthermore, we studied transfer of motor learning between sequences, by training participants on one (primary) sequence and then testing them on another (secondary) sequence. The first block of the second day furthermore served as a retention measure that assessed task performance when the short-term effects of feedback have worn off (Schmidt and Lee, 1988). Our participants were instructed to improve their movement regularity whilst holding tapping speed constant. We take movement regularity as a proxy for motor skill, in line with recent suggestions in motor learning research (Haith and Krakauer, 2013).

3.1. Auditory feedback enables learning sequence tapping regularity Regularity of the tapping was defined as the standard deviation of the intervals between subsequent keystrokes (expressed as a percentage of the average inter-tap-interval). Participants in the synchronous-sound group (who received a woodblock tone simultaneously with each keystroke) tapped increasingly regularly in the course of the first day, and this improvement continued on the second day. The mute and jittered-sound groups (who received no keystroke-triggered feedback or jittered delayed feedback) showed no such improvement on the first day and only the mute group revealed a very mild improvement by the end of the second day (Fig. 2). By contrast, we observed no differences between the three groups in terms of ordinal accuracy or speed.

These observations provide evidence that participants engaged in feedback error-based learning: participants in the synchronous-sound group were able to improve their tapping regularity by gradually adjusting production of intervals that were too long or too short, as suggested by feedback learning models (Cheng and Sabes, 2006; Ghahramani et al., 1997). That is, our data can be accounted for by postulating that participants used the sound timing as a proxy for the keystroke timing and thus intervals that sounded too long or too short were adjusted accordingly, leading to more regular tapping performance. Participants in the mute group had no keystroke-triggered sound feedback at their disposal and were not able to improve tapping regularity. Participants in the jitter group had sound information which was set up to be misleading however. We expected the temporal jitter to cause participants in this group to correct for errors they heard, but were never there in their tapping performance, thus precluding tapping improvements. This is exactly what we observed. Our finding that auditory feedback can be used for the learning of timing regularity of movement squares with previous results suggesting that the brain may use tone feedback for movement coordination timing (Ronsse et al., 2011). However, as Ronsse and colleagues observed no difference between an auditory-feedback group and a visual-feedback group, it remained possible that learning was based on somatosensory feedback, which was present in both groups. Our study resolved this issue by showing that a mute group did not improve in tapping regularity. Also, the present study is the first to show this learning in a retention task (the second day in our experiment), which is a critical test for motor learning (Schmidt and Lee, 1988). Furthermore, the present study elucidates the mechanism by which learning of temporal regularity of sequence tapping (in our experiment) or bimanual coordination (in the study of Ronsse and colleagues) may occur, namely error-based learning. The unique feature of our study is that it yields direct evidence that error-based learning in the temporal domain may underlie timing improvement, by showing that the jittered-sound group does not improve tapping regularity whereas the synchronous-sound group does. Based on previous findings that constantly-delayed auditory feedback impairs rhythmic tapping performance (e.g., Pfordresher and Dalla Bella, 2011), and disrupts the fluency of speech (Yates, 1963), it might be argued that our findings could be explained by suggesting that the tapping performance of the jittered-sound group was disrupted due to delayed auditory feedback. However, this is not a viable explanation of our findings regarding learning for the following reasons. First, performance between the groups did not differ in the beginning of the experiment. If the jitteredsound group was disrupted by delayed auditory feedback, they would have had to perform worse than the synchronous-sound group from the moment sounds were introduced. We found no such performance difference. Second, if the jittered-sound group’s tapping was disrupted by the feedback, one would expect that their performance would improve when this feedback is removed, i.e. in a mute block. However, we found no improvement in regularity for the jittered-sound group in these mute blocks for either the primary or secondary sequence. Thirdly, if the jittered-sound group's performance was disrupted, they should perform worse than the mute group. Contrary to this, we found that the jittered-sound and mute group’s performance were identical during our experiment. These reasons rule out a

brain research 1606 (2015) 54–67

disruption explanation for the lack of motor learning observed in the jittered-sound group.

3.2.

Generalisation: Transfer to an untrained sequence

In our present study, we further investigated how much of the observed motor learning transferred from the sequence that was practised (primary sequence) to a different sequence (secondary sequence) which had the same formal properties. If all learning were sequence-specific, performance values should return to the baseline level of the first block with the primary sequence. Only the synchronous-sound group showed an improvement in tapping regularity between the first block of the primary sequence and the first block of the secondary sequence, providing evidence for sequence-unspecific learning. This finding is important for the potential of auditory feedback-based learning interventions, because in rehabilitation settings the aim is to achieve maximal transfer from the training performance to activities of daily living (which typically are not accompanied by auditory feedback) (Krakauer, 2006). Furthermore, the motor learning found in the synchronous-sound group is not just procedural learning, or usedependent learning, because if this were the case, all groups should show the same effect as they made the same number of movements in the course of the experiment.

3.3. Participants do not become reliant on auditory feedback We have shown that participants engage in motor learning based on auditory feedback, but to what extent is their performance affected when this feedback is subsequently removed? For the synchronous- and jittered-sound groups, we investigated the effect of auditory feedback deprivation by adding a block in which the groups no longer received keystroke-triggered sound after the learning of the primary sequence and the secondary sequence. The synchronoussound group tapped more regularly overall than the jitteredsound group (an effect of the prior learning), but neither group was affected by auditory feedback deprivation (Fig. 4). This suggests that, first, the jittered-sound group was not hampered in their performance by the occurrence of delayed tones (as argued above). Second, the synchronous-sound group had not become reliant on the auditory feedback that it used for the learning of regularity. If they were, their performance would have deteriorated when the sounds were removed. Instead, we found that the synchronous-sound group maintained their improved level of performance in the absence of auditory feedback, in line with previous findings in non-musicians (Heitger et al., 2012; Ronsse et al., 2011) and in long-term expert pianists (Repp, 1999; van Vugt et al., 2013b). Together with the finding that only the synchronous-sound feedback group improved movement regularity in the course of the experiment, this finding suggests that the auditory feedback error-learning is sufficiently stable so that it no longer requires the feedback during the second day. This finding is not in agreement with the guidance hypothesis, which states that after participants have learned a task with feedback, their performance will degrade when this feedback is absent (Salmoni et al., 1984). Other studies also have found evidence against auditory feedback guidance (Ronsse et al., 2011) like the present study. We

61

speculate that these findings may be reconciled with the guidance hypothesis if one supposes that guidance only occurs in the early stages of learning. In the later stages of learning (such as on the second day in the present study), sound is no longer required as the task is crystallised into feed-forward motor programs that can be executed without relying on feedback (Lashley, 1951). Whether or not sensory feedback remains necessary could also depend on the sensory modality, as suggested previously (Ronsse et al., 2011).

3.4.

Auditory feedback was not used for action selection

The improvement in tapping regularity of the synchronoussound group indicates that auditory feedback benefits the learning of timing regularity in motor sequence production. We interpret this as evidence for error-based motor learning with auditory feedback. It might be argued that simply the presence of sound improved performance. Previous accounts have argued that sensory signals (e.g. tones) function as an aid to represent and select particular movements (Hommel et al., 2001; Shin et al., 2010). This reasoning was then used to explain benefit of sound on sequence tapping speed (for example, see Hoffmann et al., 2001). However, the motor gains observed in the present study cannot be explained in this way for two reasons. First, in our study, all four keystrokes mapped to the same sound. Therefore, the sounds could not be used to select individuated finger movements in the sense proposed by Hoffmann et al. (2001), because all movements would then have the same code, which means that they do not select anything. Second, the proposed account cannot explain why the jittered-sound group did not improve in tapping regularity. In the action selection account, a jittered delay in the sound would be unlikely to influence the sound’s capacity to perform the role of representing keystrokes during action selection. Alternatively, previous accounts of sequence learning have suggested that as humans learn movement sequences, these sequences are initially divided into small chunks. As learning progresses, these chunks become larger and larger (Hikosaka et al., 2002; Sakai et al., 2004). This allows the sequence to be performed as gradually larger parts, which allows for temporally smoother performance. Therefore, it might be argued that these chunking-accounts of motor learning could also explain why tapping regularity increased in our experiment. However, one would expect that such chunking occurs equally well in the absence of auditory feedback, such as in our mute group, because the sequence was explicitly presented throughout the experiment. Consequently, the chunking account fails to explain the differences in motor learning between our experimental groups (synchronous-sound, jittered-sound and mute), such as the lack of regularity improvement in the jitteredsound group. Therefore, our findings suggest that auditory error-based learning can occur in addition to chunking learning.

3.5. Auditory feedback may be necessary for learning temporally precise timing A surprising finding of the present study was the lack of improvement in tapping regularity in the mute group. Even though this group received no keystroke-triggered auditory feedback, we expected that participants would nevertheless

62

brain research 1606 (2015) 54–67

Table 1 – Listing of the blocks in the motor learning experiment. The sequence was either a scale (1234321), the primary assigned sequence or the secondary assigned sequence. Target speed indicates whether the trials had a preset tapping speed (established by a visual count-in cue indicating the numbers one by one prior to the trial) (“þ” indicates that the this tempo was then enforced by repeating trials that were more than 10% slower or faster than the target tempo). When auditory feedback is listed as “mute”, this indicates that in those blocks all participants heard only continuous white noise (instead of keystroke-triggered sounds), whereas “by group” indicated that the feedback condition relative to the assigned group was used, that is synchronous-sound, jittered-sound or mute. Day

Block

Sequence

Trials

Target speed

Auditory feedback

1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2

Scale Scale-300 Scale-max Warmup-free Warmup-500/400/300 Warmup-300-fixed Main block 1-7 Warmup-300 Warmup-300-fixed Main block 8-10 Mute block Warmup-free Warmup-500/400/300 Warmup-300-fixed Main block 1-3 Mute block Scale-300 Scale-max

Scale Scale Scale Primary Primary Primary Primary Primary Primary Primary Primary Secondary Secondary Secondary Secondary Secondary Scale Scale

5 25 25 5 5/5/5 5 Correct and in time 25 Each 5 5 Correct and in time 25 Each 25 5 5/5/5 5 Correct and in time 25 Each 25 25 25

Unrestricted 300 ms As fast as possible Unrestricted 500/400/300 ms 300 msþ 300 ms 300 ms 300 msþ 300 ms 300 ms Unrestricted 500/400/300 ms 300 msþ 300 ms 300 ms 300 ms As fast as possible

All groups All groups All groups All groups All groups By group By group By group By group By group All groups All groups All groups By group By group All groups All groups All groups

be able to improve tapping regularity using somatosensory feedback. Our observation that they did not improve therefore suggests that auditory feedback may be necessary, not just sufficient, to learn highly precise motor timing. Previous findings that the temporal resolution of somatosensory feedback is poor, ranging from tens to hundreds of milliseconds, and is therefore probably not reliable enough to enable learning of high temporal precision. The claim that auditory feedback may be necessary for learning timing is also supported by findings in expert pianists, which reveal that even in expert musicians, systematic timing deviations persist in experts’ piano scale playing (Jabusch et al., 2009; van Vugt et al., 2013a, 2014). These timing deviations (in the order of milliseconds) indicate that expert musicians do not on average play perfectly regularly, but exhibit systematic small temporal deviations just below the auditory perceptual threshold (van Vugt et al., 2013b). Based on our results, the occurrence of these systematic deviations can be explained as being the residues of timing errors that, once they fell below the auditory threshold, could no longer be corrected. Our study is the first to show auditory feedback may be necessary for motor learning in music and speech when temporal resolution demands are high.

3.6. Anisochrony, delay detection and synchronisation– continuation tapping tasks Data from the additional timing tasks showed that after learning, the three groups did not differ in auditory temporal prediction precision (as measured by the anisochrony task) or keystroke-sound asynchrony detection (see Supplementary materials). Furthermore, the three groups were matched in terms of (minimal) musical training (see Table 2). However, results of the synchronisation–continuation tapping tasks showed that after learning, the synchronous-sound group

mute mute mute mute mute

mute mute mute

mute mute mute

was able to tap closer to the beat than the other groups (see Supplementary materials Fig. S1). Our interpretation for this effect is that the synchronous-sound group might have managed to learn the temporal association between their keystroke and the sound and was therefore able to make it occur closer in time to the metronome click than did the other groups (Note that we cannot exclude that the groups might have differed before the experiment, although this is less likely in light of the observation that groups not differ in their previous musical background or other potentially relevant motor experience we tested for in the questionnaire). Furthermore, the delay detection task showed that the amount of jitter employed in this study remained below participants’ perceptual threshold. This effectively rules out the alternative explanation that the jittered-sound group participants noticed the jitter and therefore adjusted their performance voluntarily.

3.7.

Outlook

Our data provided evidence that auditory feedback can and does play a critical role in motor sequence learning, in particular in the learning of timing regularity of the movements. In our study, we found that synchronous, time-locked auditory feedback improves motor learning relative to no auditory feedback and relative to temporally jittered feedback. This finding extends findings of motor learning based on feedback in visual or proprioceptive modalities to the auditory modality. This finding supports the idea that the brain is a flexible task machine (Reich et al., 2012) that can use sensory feedback from various modalities as long as the feedback contains relevant information about the movement. Further data supporting the flexible task machine hypothesis comes from studies showing that one sensory modality (e.g. auditory) can be substituted by another (e.g. visual) (Bach-y-Rita et al.,

63

brain research 1606 (2015) 54–67

Table 2 – Participant characteristics. Values are reported as mean (SD) unless otherwise specified. We report statistical comparison between the three groups using Kruskal–Wallis tests except for blind typing capacity, where we used regular χ2 testing (because in that case no ordering existed between the levels). Variable

Synchronous

Mute

Jittered

Statistical test

N Age (years)

12 25.7 (5.61) 6/6 84.3 (15.02)

Text messaging capacity (self-rating)

7.3 (1.78)

12 25.1 (4.54) 5/7 78.9 (22.91) 6.6 (2.07)

N/A χ2(2) ¼.37 p¼ .83

Gender (female/male) Handedness (Edinburgh Laterality Quotient %)

Keyboard typing capacity (self-rating)

7.1 (1.56)

6.7 (1.30)

χ2(2) ¼2.52 p¼ .28

Blind typing capacity (number of participants in the following categories: 10 fingers/o10 fingers /none) Use of video games per week (number of participants in the following categories: 0 h/o1 h/1–7/47 h) Use of computer keyboard per week (number of participants in the following categories: o1 h/1–14 h/414 h) Use of cell phone text messages per week (number of participants in the following categories: o1 h/1–7 h/47 h) Years of musical instruction (other than obligatory courses at school)

5/3/4

12 23.6 (1.78) 9/3 82.3 (25.30) 7.5 (1.38) 7.5 (0.67) 3/4/5

4/1/7

χ2(4) ¼3.13 p¼ .54

7/1/4/0

6/3/2/1

3/4/2/3

χ2(2) ¼2.72 p¼ .26

0/7/5

0/6/6

0/5/7

χ2(2) ¼.65 p¼ .72

3/7/2

3/8/1

5/3/4

χ2(2) ¼.07 p¼ .96

0.9 (1.28)

0.1 (0.29)

0.7 (1.23)

χ2(2) ¼3.96, p¼ .14

1969; Striem-Amit et al., 2012) and from previous observations that auditory feedback can be used in similar ways as visual feedback during motor learning (Oscari, Secoli et al., 2012; for a review, see Sigrist et al., 2013). Our data expands this view by showing that auditory feedback may be required to learn timing at a precision level that exceeds the inherent temporal resolution of other senses. Thus, although feedback from various sensory modalities might be processed by the same mechanism (feedback error-based learning), the temporal resolution of these modalities may be the bottleneck for learning highly precise timing, enabling improvements when audition is used but not proprioception or tactile information. Furthermore, we show that learning may proceed similarly across not only sensory modalities (as previously proposed by the flexible task machine hypothesis), but also across space and time (namely, through error-correction). A limitation of our study is that we did not test a group in which the sound was delayed by a constant amount. Comparing the performance of such a group with the groups that we tested would allow to distinguish two further hypotheses: first, that the sound needs to temporally coincide with the movement (in which case the constant delayed group would perform less well than the synchronous-sound group), or second, that the sound needs to be in a constant temporal relation (time-locked) with, even if not simultaneous to, the movement (in which case the constant delayed group would perform as well as our synchronous-sound group). Our findings also open the road to the use of auditory feedback in clinical applications. As our participants used auditory feedback in essentially the same way as visual or proprioceptive feedback is used, sound may be employed in rehabilitation of patients in which vision or proprioception is impaired, as suggested previously (Altenmüller et al., 2009; Rosati et al., 2012). Furthermore, our finding that learning transferred to novel sequences and that performance was unaffected when feedback was removed,

χ2(2) ¼2.9 p¼ .23 χ2(2) ¼.47 p¼ .79 χ2(2) ¼2.08 p¼ .35

promote auditory feedback as an promising candidate for clinical interventions. That is, auditory-feedback based interventions might allow patients to be independent of sound for their activities of daily living movements. From the viewpoint of clinical rehabilitation, future studies will need to expand the present findings by investigating the long-term effects of the auditory feedback (e.g., using retention tasks on a third day or after a week).

4.

Experimental procedures

4.1.

Participants

Thirty-seven right-handed non-musicians (20 female) participated in this study. One participant was not able to press the individual keys independently and was therefore discarded. All subsequent analysis relates to the 36 remaining participants. Participants were randomly assigned to one of the three experimental groups (Table 2).

4.2.

Materials

4.2.1. Main motor learning task 4.2.1.1. Sequences. Each participant was trained on two sequences of 7 digits (ranging from 1 to 4), called primary and secondary sequence. Participants tapped the numbers 1 to 4 with the index (1) to little finger (4). Twelve different sequences were generated with the same formal properties. Sequences were designed to be of similar difficulty and give rise to major chunking, as described in details in the Supplementary materials. The set of sequences was then divided into six pairs. For each group (synchronous-sound, jittered-sound and mute) and each sequence pair (a,b), one participant had a as primary sequence and b as secondary sequence, and another participant had b as primary sequence and a as secondary sequence. In this way, the participants in each of the three groups, taken together,

64

brain research 1606 (2015) 54–67

practiced the same sequences, and each sequence was equally often presented as primary and secondary sequence.

4.2.1.2. Sounds. The woodblock sound presented at each keystroke (for synchronous- and jittered-sound groups) was 63 ms in duration and was chosen for its sharp onset (maximising temporal localisation) and nevertheless aesthetically pleasing sound. It was saved as a wave file and played using the experimental software (see below). The auditory white noise stimulus (used in the mute group and the mute blocks (see below) was generated using the program Audacity and saved as a wave file.

4.2.1.3. Apparatus. The button box was custom made with four Cherry MX MX1A-11NN key switches and four key caps originating from the functional key row of the Diatec Filco. To determine the optimal spatial arrangement for the buttons, we asked 6 scientific collaborators from our lab (3 male, 3 female) to put their fingers on a flat surface in the most comfortable and natural way. We marked the positions of the four fingers, averaged the distance matrices and transformed this average distance matrix back into 2 dimensional coordinates using MDS. We connected the MX keys through small electric circuits to a commercially available button box (ioLab Systems, Inc.) that registered keystrokes and communicates these, timestamped, through serial USB interface with the Python script on Windows running pygame for visual presentation. The computer was a Dell Precision M6500 laptop running Windows 7 and interfacing with the box through HDI protocols in a python script. The sounds (woodblock or white noise) were presented through Sennheiser HD250 linear II headphones. 4.3.

Procedure

Participants came into the lab on two adjacent days. In the beginning of the session on the first day, participants completed a brief questionnaire that measured their handedness (according to the 10-item short form of the Edinburgh Handedness Inventory) and inquired how much time they spend speed typing or phone messaging (as these activities might influence task performance) and to verify that they had little or no musical experience (less than 2 years of musical instruction, and not currently practising music). They then completed the sequence learning blocks for that day (more details below). The session on the first day lasted about an hour. On the second day, participants first continued the sequence training blocks (details below). Then, participants completed the auditory and motor tests (anisochrony, delay detection and synchronisation– continuation tapping; see Supplementary materials). Finally, participants filled out the debriefing part of the questionnaire. This session on the second day lasted about 75 min. Participants received a nominal payment for their participation.

4.3.1.

Sequence tapping

The participants were seated and rested their arm comfortably on a table with the right hand index to little fingers placed on four keys of the custom button box. The participants’ fingers rested on the keys with a little cushion of soft cloth in between. A removable shield was suspended above the button box so that the participants could not see their fingers or the

keys. The participants were wearing headphones connected to the computer sound card which presented the auditory stimuli but reduced sounds from the environment. Participants were asked to tap a sequence of adjacent button positions (1234321), referred to hereafter as “scale sequence”, prior to the sequence learning on the first day and after sequence learning on the second day. In the first block of 25 repetitions, they were asked to tap the scale at a speed of 300 ms between keystrokes. Again, a visual count-in cue indicated the target speed prior to each trial, but was not present during the actual tapping. Participants were instructed to “tap at the speed and regularity indicated by the visual count-in cue, that is, to tap as regularly as possible”. In a second scale block, participants were instructed to tap the scale as fast as they could. In all scale blocks, all three groups received white noise over their headphones and no keystroke-triggered sounds, so as to ensure that conditions were strictly comparable between the groups. After the scale sequence on the first day, participants engaged in the sequence learning part of the experiment. The primary sequence is trained during the first and second day, and the secondary sequence is trained only on the second day. Our aim was to have participants tap the sequence at roughly the same speed throughout the experiment and measure their improvement in regularity. We designed our task to be demanding by requiring participants to produce the sequence at relatively high speed: 300 ms ITI (Inter-TapInterval). In order to make this feasible and not discourage our participants, we designed several training blocks (see Table 1 for full details) during which, initially, the participants could try tapping the sequence at their own speed. In the next block, a target tempo was introduced by the visual count-in cue, initially at a slow speed (500 ms), then an intermediate speed (400 ms) and finally the actual tempo used during the remainder of the experiment (300 ms). After each trial, the computer calculated participants’ tapping rate from the recorded keystrokes and participants received a notice if they were 10% slower or faster than indicated by the visual cue. During the final warm-up block, participants needed to tap the sequence 5 times correctly and within the aforementioned speed window before the experiment continued. Importantly, in the warm-up blocks all three groups received white noise over their headphones and no keystroke-triggered sounds. This was done so as to ensure that no auditory-motor learning could happen during the warm-up phase. Only in the last warm-up block, participants in the synchronous- and jittered-sound groups received keystroke-triggered sounds (either synchronously or after a random delay, respectively). As before, participants were instructed to “tap at the speed and regularity indicated by the visual count-in cue, that is, to tap as regularly as possible”. On the second day, after having completed the main training blocks for the primary sequence (after block 10) and for the secondary sequence (after block 3), participants were asked to complete one more block, this time without receiving any keystroke-triggered sound, but instead hearing white noise at a comfortable loudness level. That is, for the continuous-sound group, the situation did not change (we kept these additional blocks also for this group in order to keep the amount of task performance the same across the groups).

65

brain research 1606 (2015) 54–67

4.3.2.

Procedure for one trial

The seven digits of the sequence (i.e., of the primary, secondary or scale sequence) were presented at once on the computer screen coded as numbers 1 (index) to 4 (little finger) (Povel and Collard, 1982) and remained visible throughout the block. On each trial, participants first passively watched a yellow bar indicating the items in the sequence one by one at the desired speed (300 ms ITI except during warm-up blocks) (Fig. 1). Once the yellow bar had indicated the last item of the sequence, the participants commenced tapping the sequence from the beginning. At the first keystroke, the yellow bar disappeared and the first item in the sequence was slightly greyed out to indicate that it had been tapped. Subsequent keystrokes (whether correct or not) caused the subsequent items in the sequence to be greyed out. The greying out was chosen to be subtle so that it made clear the key was struck, but provide as little timing cues as possible. In this way, participants could not use the timing of the greying out as a timing cue in their motor learning. Note that the greying out was constant across the three conditions. Furthermore, we used a single feedback tone (a woodblock sound) for all keys.

4.4.

Data analysis

4.4.1.

General analysis methods

The sequence tapping data were analysed as follows. We discarded sequences that contained errors (different or additional keystrokes) or omissions and then performed the following two-step outlier rejection procedure. First, we pooled the data from all participants and discarded inter-keystrokeintervals that exceeded 5 SD of the overall mean. This outlier rule was deliberately chosen conservatively to only discard intervals that corresponded to sporadic keystrokes that failed to register and were attempted again by the participant after a long pause; this was confirmed by visual inspection and discarded .5% of all keystrokes. In a second step, we separated the intervals by participant and block and eliminated keystrokes that were further than 3 SD from the mean for that participant’s block. This resulted in an additional 1.7% of all intervals being discarded. The reason for using this two-step discarding procedure is that discarding outliers based on parametric statistics (mean and SD) is known to be highly susceptible to the outliers as outliers strongly bias the initial estimates of the mean and SD themselves. Using the procedure above, we were able to first eliminate outliers due to unregistered keystrokes, and later trim the distributions depending on each participant. The main dependent variable of interest was the tapping regularity. For each participant and each block, we calculated the standard deviation of the inter-keystroke-interval durations and expressed it as a percentage of the mean interval duration so as to compensate for possible speed differences. The larger this quantity, the more irregular the tapping is, and the smaller, the closer to isochrony. Although our focus is on the main dependent variable of tapping regularity, we also report ordinal accuracy (number of trials in which participants produced the correct sequence of keystrokes) and speed (the mean inter-keystroke-interval) for comparison with previous studies in the Supplementary materials. We performed mixed design ANOVAs unless otherwise indicated. Mauchly’s test for sphericity was always performed

and whenever it was significant, we applied the Greenhouse– Geisser correction and for the sake of brevity report only the corrected p-value marked with pGG. We report generalised effect sizes (η2G) (Bakeman, 2005). Group comparisons were calculated using Tukey’s HSD procedure. Our significance criterion was po.05, or, where applicable sphericity correction was used, pGGo.05 (GG: Greenhouse–Geisser correction). We made all data collected in the context of this study available for download at: http://dx.doi.org/10.6084/m9.fig share.1281118.

Appendix A.

Supporting information

Supplementary data associated with this article can be found in the online version at http://dx.doi.org/10.1016/j.brainres. 2015.02.026.

r e f e r e nc e s

Altenmu¨ller, E., Marco-Pallares, J., Mu¨nte, T.F., Schneider, S., 2009. Neural reorganization underlies improvement in strokeinduced motor dysfunction by music-supported therapy. Ann. N.Y. Acad. Sci. 1169, 395–405, http://dx.doi.org/10.1111/j.17496632.2009.04580.x. Bach-y-Rita, P., Collins, C.C., Saunders, F.A., White, B., Scadden, L., 1969. Vision substitution by tactile image projection. Nature 221 (5184), 963–964, http://dx.doi.org/10.1038/221963a0. Bakeman, R., 2005. Recommended effect size statistics for repeated measures designs. Behav. Res. Methods 37 (3), 379–384. Burge, J., Ernst, M.O., Banks, M.S., 2008. The statistical determinants of adaptation rate in human reaching. J. Vis, 8 (4), 20.1–19, http://dx.doi.org/10.1167/8.4.20. Carlini, A., French, R., 2014. Visual tracking combined with handtracking improves time perception of moving stimuli. Sci. Rep. 4, 5363, http://dx.doi.org/10.1038/srep05363. Cheng, S., Sabes, P.N., 2006. Modeling sensorimotor learning with linear dynamical systems. Neural Comput. 18 (4), 760–793, http://dx.doi.org/10.1162/089976606775774651. Dromey, C., Ramig, L.O., Johnson, A.B., 1995. Phonatory and articulatory changes associated with increased vocal intensity in Parkinson disease: a case study. J. Speech Hear. Res. 38 (4), 751–764. Ehrle´, N., Samson, S., 2005. Auditory discrimination of anisochrony: influence of the tempo and musical backgrounds of listeners. Brain Cogn. 58 (1), 133–147, http://dx.doi.org/ 10.1016/j.bandc.2004.09.014. Exner, S., 1875. Experimentelle Untersuchung der einfachsten psychischen Prozesse. III. Pflugers Archiv Fur Die Gesammte Physiologie Des Menschen Und Thiere 11, 402–412. Ghahramani, Z., Wolpert, D.M., Jordan, M.I., 1997. Computational models of sensorimotor integration. Adv. Psychol. 119, 117–147. Glenberg, A.M., Mann, S., Altman, L., Forman, T., Procise, S., 1989. Modality effects in the coding and reproduction of rhythms. Mem. Cognit. 17 (4), 373–383. Gobel, E.W., Parrish, T.B., Reber, P.J., 2011. Neural correlates of skill acquisition: decreased cortical activity during a serial interception sequence learning task. NeuroImage 58 (4), 1150–1157, http://dx.doi.org/10.1016/j.neuroimage.2011.06.090.

66

brain research 1606 (2015) 54–67

Haith, A.M., Krakauer, J.W., 2013. Model-based and model-free mechanisms of human motor learning. Adv. Exp. Med. Biol. 782, 1–21, http://dx.doi.org/10.1007/978-1-4614-5465-6_1. Heitger, M.H., Ronsse, R., Dhollander, T., Dupont, P., Caeyenberghs, K., Swinnen, S.P., 2012. Motor learning-induced changes in functional brain connectivity as revealed by means of graph-theoretical network analysis. NeuroImage 61 (3), 633–650, http://dx.doi.org/10.1016/j.neuroimage.2012.03.067. Hikosaka, O., Nakamura, K., Sakai, K., Nakahara, H., 2002. Central mechanisms of motor skill learning. Curr. Opin. Neurobiol. 12 (2), 217–222. Hoffmann, J., Sebald, A., Sto¨cker, C., 2001. Irrelevant response effects improve serial learning in serial reaction time tasks. J. Exp. Psychol. Learn. Mem. Cogn. 27 (2), 470–482, http://dx. doi.org/10.1037//0278-7393.27.2.470. Hommel, B., Mu¨sseler, J., Aschersleben, G., Prinz, W., 2001. The theory of event coding (TEC): a framework for perception and action planning. Behav. Brain Sci. 24 (5), 849–878. Houde, J.F., Jordan, M.I., 1998. Sensorimotor Adaptation in Speech Production. Science 279 (5354), 1213–1216, http://dx.doi.org/ 10.1126/science.279.5354.1213. Hove, M.J., Keller, P.E., 2010. spatiotemporal relations and movement trajectories in visuomotor synchronization. Music Percept. 28 (1), 15–26. Hove, M.J., Spivey, M.J., Krumhansl, C.L., 2010. Compatibility of motion facilitates visuomotor synchronization. J. Exp. Psychol. Learn. Mem. Cogn. 36 (6), 1525–1534, http://dx.doi.org/10.1037/ a0019059. Hyde, K.L., Peretz, I., 2004. Brains that are out of tune but in time. Psychol. Sci. 15 (5), 356–360, http://dx.doi.org/10.1111/j.09567976.2004.00683.x. Jabusch, H.-C., Alpers, H., Kopiez, R., Vauth, H., Altenmu¨ller, E., 2009. The influence of practice on the development of motor skills in pianists: a longitudinal study in a selected motor task. Hum. Mov. Sci. 28 (1), 74–84, http://dx.doi.org/10.1016/j. humov.2008.08.001. Kanai, R., Lloyd, H., Bueti, D., Walsh, V., 2011. Modalityindependent role of the primary auditory cortex in time estimation. Exp. Brain Res. 209 (3), 465–471, http://dx.doi.org/ 10.1007/s00221-011-2577-3. Karabanov, A., Blom, O., Forsman, L., Ulle´n, F., 2009. The dorsal auditory pathway is involved in performance of both visual and auditory rhythms. NeuroImage 44 (2), 480–488, http://dx. doi.org/10.1016/j.neuroimage.2008.08.047. Kawato, M., Furukawa, K., Suzuki, R., 1987. A hierarchical neuralnetwork model for control and learning of voluntary movement. Biol. Cybern. 57 (3), 169–185, http://dx.doi.org/ 10.1007/BF00364149. Keller, E., 1990. Speech motor timing. In: Hardcastle, W.J., Marchal, A. (Eds.), Speech Production and Speech Modelling. Springer, Netherlands, pp. 343–364. Krakauer, J.W., 2006. Motor learning: its relevance to stroke recovery and neurorehabilitation. Curr. Opin. Neurol. 19 (1), 84–90. Lashley, K., 1951. The problem of serial order in behavior. In: Jeffress, L.A. (Ed.), Cerebral Mechanisms in Behavior: The Hixon Symposium. Wiley, New York, pp. 112–146. Merchant, H., Harrington, D.L., Meck, W.H., 2013. Neural basis of the perception and estimation of time. Annu. Rev. Neurosci. 36 (1), 313–336, http://dx.doi.org/10.1146/annurev-neuro062012-170349. Oscari, F., Secoli, R., Avanzini, F., Rosati, G., Reinkensmeyer, D.J., 2012. Substituting auditory for visual feedback to adapt to altered dynamic and kinematic environments during reaching. Exp. Brain Res. 221 (1), 33–41, http://dx.doi.org/ 10.1007/s00221-012-3144-2.

Pfordresher, P., Dalla Bella, S., 2011. Delayed auditory feedback and movement. J. Exp. Psychol. Learn. Mem. Cogn. 37 (2), 566–579, http://dx.doi.org/10.1037/a0021487. Povel, D.-J., Collard, R., 1982. Structural factors in patterned finger tapping. Acta Psychol. 52 (1–2), 107–123, http://dx.doi.org/ 10.1016/0001-6918(82)90029-4. Reich, L., Maidenbaum, S., Amedi, A., 2012. The brain as a flexible task machine: implications for visual rehabilitation using noninvasive vs. invasive approaches. Curr. Opin. Neurol. 25 (1), 86–95, http://dx.doi.org/10.1097/WCO.0b013e32834ed723. Reis, J., Schambra, H.M., Cohen, L.G., Buch, E.R., Fritsch, B., Zarahn, E., Krakauer, J.W., 2009. Noninvasive cortical stimulation enhances motor skill acquisition over multiple days through an effect on consolidation. Proc. Natl. Acad. Sci. 106 (5), 1590–1595, http://dx.doi.org/10.1073/pnas.0805413106. Repp, B.H., 1999. Effects of auditory feedback deprivation on expressive piano performance. Music Percept. 16 (4), 409–438. Repp, B.H., Su, Y.-H., 2013. Sensorimotor synchronization: a review of recent research (2006–2012). Psychon. Bull. Rev., 1–50, http://dx.doi.org/10.3758/s13423-012-0371-2. Ronsse, R., Puttemans, V., Coxon, J.P., Goble, D.J., Wagemans, J., Wenderoth, N., Swinnen, S.P., 2011. Motor learning with augmented feedback: modality-dependent behavioral and neural consequences. Cereb. Cortex 21 (6), 1283–1294, http: //dx.doi.org/10.1093/cercor/bhq209. Rosati, G., Oscari, F., Spagnol, S., Avanzini, F., Masiero, S., 2012. Effect of task-related continuous auditory feedback during learning of tracking motion exercises. J. Neuroeng. Rehabil. 9 (1), 79, http://dx.doi.org/10.1186/1743-0003-9-79. Sakai, K., Hikosaka, O., Nakamura, K., 2004. Emergence of rhythm during motor learning. Trends Cogn. Sci. 8 (12), 547–553, http: //dx.doi.org/10.1016/j.tics.2004.10.005. Salmoni, A.W., Schmidt, R.A., Walter, C.B., 1984. Knowledge of results and motor learning: a review and critical reappraisal. Psychol. Bull. 95 (3), 355–386. Schmidt, R.A., Lee, T. (1988). Motor Control and Learning, 5E. Human Kinetics 10%. Shadmehr, R., Smith, M.A., Krakauer, J.W., 2010. Error correction, sensory prediction, and adaptation in motor control. Annu. Rev. Neurosci. 33 (1), 89–108, http://dx.doi.org/10.1146/ annurev-neuro-060909-153135. Shin, Y.K., Proctor, R.W., Capaldi, E.J., 2010. A review of contemporary ideomotor theory. Psychol. Bull. 136 (6), 943–974, http://dx.doi.org/10.1037/a0020541. Shmuelof, L., Krakauer, J.W., Mazzoni, P., 2012. How is a motor skill learned? Change and invariance at the levels of task success and trajectory control. J. Neurophysiol. 108 (2), 578–594, http://dx.doi.org/10.1152/jn.00856.2011. Sigrist, R., Rauter, G., Riener, R., Wolf, P., 2013. Augmented visual, auditory, haptic, and multimodal feedback in motor learning: a review. Psychon. Bull. Rev. 20 (1), 21–53, http://dx.doi.org/ 10.3758/s13423-012-0333-8. Sober, S.J., Brainard, M.S., 2009. Adult birdsong is actively maintained by error correction. Nat. Neurosci. 12 (7), 927–931, http://dx.doi.org/10.1038/nn.2336. Sober, S.J., Brainard, M.S., 2012. Vocal learning is constrained by the statistics of sensorimotor experience. Proc. Natl. Acad. Sci. 109 (51), 21099–21103, http://dx.doi.org/10.1073/ pnas.1213622109. Striem-Amit, E., Guendelman, M., Amedi, A., 2012. “Visual” acuity of the congenitally blind using visual-to-auditory sensory substitution. PLoS One 7 (3), e33136, http://dx.doi.org/10.1371/ journal.pone.0033136. Thoroughman, K.A., Shadmehr, R., 2000. Learning of action through adaptive combination of motor primitives. Nature 407 (6805), 742–747, http://dx.doi.org/10.1038/35037588. Tillmann, B., Stevens, C., Keller, P.E., 2011. Learning of timing patterns and the development of temporal expectations.

brain research 1606 (2015) 54–67

Psychol. Res. 75 (3), 243–258, http://dx.doi.org/10.1007/s00426010-0302-7. Tinazzi, M., Frasson, E., Bertolasi, L., Fiaschi, A., Aglioti, S., 1999. Temporal discrimination of somesthetic stimuli is impaired in dystonic patients. NeuroReport 10 (7), 1547–1550. van Vugt, F.T., Altenmu¨ller, E., Jabusch, H.-C., 2013a. The influence of chronotype on making music: circadian fluctuations in pianists’ fine motor skills. Front. Hum. Neurosci. 7, 347, http: //dx.doi.org/10.3389/fnhum.2013.00347. van Vugt, F.T., Furuya, S., Vauth, H., Jabusch, H.-C., Altenmu¨ller, E., 2014. Playing beautifully when you have to be fast: spatial and temporal symmetries of movement patterns in skilled piano performance at different tempi. Exp. Brain Res. 232 (11), 3555–3567, http://dx.doi.org/10.1007/s00221-014-4036-4. van Vugt, F.T., Jabusch, H.-C., Altenmu¨ller, E., 2012. Fingers phrase music differently: trial-to-trial variability in piano scale playing and auditory perception reveal motor chunking. Front. Auditory Cognit. Neurosci. 3, 495, http://dx.doi.org/10.3389/ fpsyg.2012.00495. van Vugt, F.T., Jabusch, H.-C., Altenmu¨ller, E., 2013b. Individuality that is unheard of: systematic temporal deviations in scale

67

playing leave an inaudible pianistic fingerprint. Front. Cogn. Sci. 4, 134, http://dx.doi.org/10.3389/fpsyg.2013.00134. van Vugt, F.T., Tillmann, B., 2014. Thresholds of auditory-motor coupling measured with a simple task in musicians and nonmusicians: was the sound simultaneous to the key press?. PLoS One 9 (2), e87176, http://dx.doi.org/10.1371/journal. pone.0087176. Wagner, C., 1971. The influence of the tempo of playing on the rhythmic structure studied at pianist’s playing scales. In: Vredenbregt, J., Wartenweiler, J. (Eds.), Medicine and Sport, vol. 6. Basel: Karger, pp. 129–132. Wei, K., Ko¨rding, K., 2010. Uncertainty of feedback and state estimation determines the speed of motor adaptation. Front. Comput. Neurosci. 4, 11, http://dx.doi.org/10.3389/ fncom.2010.00011. Wing, A.M., Kristofferson, A.B., 1973. Response delays and the timing of discrete motor responses. Percept. Psychophys. 14 (1), 5–12, http://dx.doi.org/10.3758/BF03198607. Yates, A.J., 1963. Delayed auditory feedback. Psychol. Bull. 60 (3), 213–232, http://dx.doi.org/10.1037/h0044155.