The auditory scene: An fMRI study on melody and accompaniment in professional pianists

The auditory scene: An fMRI study on melody and accompaniment in professional pianists

NeuroImage 102 (2014) 764–775 Contents lists available at ScienceDirect NeuroImage journal homepage: www.elsevier.com/locate/ynimg Full-Length Arti...

2MB Sizes 0 Downloads 11 Views

NeuroImage 102 (2014) 764–775

Contents lists available at ScienceDirect

NeuroImage journal homepage: www.elsevier.com/locate/ynimg

Full-Length Article

The auditory scene: An fMRI study on melody and accompaniment in professional pianists Danilo Spada a, Laura Verga b, Antonella Iadanza c,d, Marco Tettamanti d,e,f, Daniela Perani a,d,e,f,⁎ a

Faculty of Psychology, Vita-Salute San Raffaele University, Milano, Italy Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany Division of Neuroradiology, San Raffaele Scientific Institute, Milano, Italy d C.E.R.M.A.C. (Centro Eccellenza Risonanza Magnetica Alto Campo), San Raffaele Scientific Institute, Vita-Salute San Raffaele University, Milano, Italy e Department of Nuclear Medicine, San Raffaele Scientific Institute, Milano, Italy f Division of Neuroscience, San Raffaele Scientific Institute, Milano, Italy b c

a r t i c l e

i n f o

Article history: Accepted 20 August 2014 Available online 28 August 2014 Keywords: Music Harmony Melody Accompaniment Salience Auditory stream segregation

a b s t r a c t The auditory scene is a mental representation of individual sounds extracted from the summed sound waveform reaching the ears of the listeners. Musical contexts represent particularly complex cases of auditory scenes. In such a scenario, melody may be seen as the main object moving on a background represented by the accompaniment. Both melody and accompaniment vary in time according to harmonic rules, forming a typical texture with melody in the most prominent, salient voice. In the present sparse acquisition functional magnetic resonance imaging study, we investigated the interplay between melody and accompaniment in trained pianists, by observing the activation responses elicited by processing: (1) melody placed in the upper and lower texture voices, leading to, respectively, a higher and lower auditory salience; (2) harmonic violations occurring in either the melody, the accompaniment, or both. The results indicated that the neural activation elicited by the processing of polyphonic compositions in expert musicians depends upon the upper versus lower position of the melodic line in the texture, and showed an overall greater activation for the harmonic processing of melody over accompaniment. Both these two predominant effects were characterized by the involvement of the posterior cingulate cortex and precuneus, among other associative brain regions. We discuss the prominent role of the posterior medial cortex in the processing of melodic and harmonic information in the auditory stream, and propose to frame this processing in relation to the cognitive construction of complex multimodal sensory imagery scenes. © 2014 Elsevier Inc. All rights reserved.

Introduction In the past two decades, there has been a growing interest for music in cognitive neuroscience research, presumably because in musical perception as well as in musical production performed by either naive subjects or professional musicians, both general purpose and highly specialized cognitive functions are at play. In particular, polyphonic music listening, which is common in everyday exposure to Western tonal music of various styles and genres, involves processing both sounds and structural aspects of the acoustic input. The former are referred to with the term texture, and involve the distinction between a prominent, salient stream, called melody, and subordinate streams, collectively called accompaniment. The latter, in Western polyphonic tonal music, are described through the theory of harmony, which

⁎ Corresponding author at: Vita-Salute San Raffaele University and Scientific Institute San Raffaele, Via Olgettina 60, I-20132 Milano, Italy. Fax: +39 02 26432717. E-mail addresses: [email protected] (D. Spada), [email protected] (L. Verga), [email protected] (A. Iadanza), [email protected] (M. Tettamanti), [email protected] (D. Perani).

http://dx.doi.org/10.1016/j.neuroimage.2014.08.036 1053-8119/© 2014 Elsevier Inc. All rights reserved.

deals with the simultaneous combination of notes in chords and chord progressions. Harmony can be viewed as a description of the regularities and rules of the musical tonal system, which people internalize from the first days of life through mere exposure and on the basis of neurobiological constraints. Cognitive neuroscience has elucidated the neural correlates of the violation of such rules both in naive subjects and musicians, but the interplay between melodic salience and harmonic structure has been hitherto largely overlooked. From the point of view of general purpose cognitive processes, auditory scene analysis (Bregman, 1990, 2007, 2008) mechanisms allow to segregate the auditory flow in streams with different degrees of salience through different kinds of acoustic cues, such as timbre, frequency proximity, spatial distribution, and onset time (Halpern et al., 1996; Hébert et al., 1995; Macherey and Delpierre, 2013; Palmer and Holleran, 1994; Maeder et al., 2001; Walker et al., 2011; Uhlig et al., 2013; Zatorre et al., 1999). With respect to auditory texture, the most salient stream is usually placed in the upper voice and it is referred to – in musical terms – as “melody”, whereas the remaining voices in lower streams are collectively defined as “accompaniment”. The neural mechanisms underlying polyphonic texture perception have been

D. Spada et al. / NeuroImage 102 (2014) 764–775

investigated particularly in auditory perceptual memory and auditory attention studies. Electro-physiological measurements often focused on the study of mismatch negativities (MMN). The MMN is an auditory Event Related Potential component originating in the auditory cortex around 100–200 ms post stimulus onset and reflecting the processing of infrequent or deviant stimuli (Näätänen et al., 2004, 2007). By investigating the MMN in musicians and in untrained controls, Fujioka et al. (2005) demonstrated that both an upper and a lower melodic voice are represented in the auditory perceptual memory; more importantly, the upper voice appears to be relatively more salient as reflected by larger MMN amplitudes for changes in the upper than in the lower voice in both population groups. Similarly, a study by Lee et al. (2009) reported higher auditory brainstem responses for upper versus lower tones, with trained musicians as experimental subjects, and a recent functional magnetic resonance imaging (fMRI) study by Uhlig et al. (2013) has evidenced the greater salience of melody over accompaniment when listening to complex multi-part musical stimuli. These authors found an increase in the activation of the fronto-parietal attention network when participants were instructed to pay attention to the melody stream as compared to the accompaniment. Finally, a Positron Emission Tomography (PET) study (Satoh et al., 2003) focusing on the contrast between listening to Soprano melodies as opposed to listening to harmony, found activations, notably in the precuneus (Soprano melodies N harmony) and in the anterior cingulate cortex (harmony N Soprano melodies). From the complementary point of view of specialized cognitive functions, studies focusing on musical structure processing established that humans, on the basis of neurophysiological (Koelsch and Jentschke, 2010; Tramo et al., 2001) and of cognitive constraints present from birth (Perani et al., 2010; Trehub, 2001), are able to internalize through mere exposure the harmonic rules of the musical system of their culture. This observation may explain the capacity displayed by naive subjects and – to an even higher degree – by professional musicians to generate precise harmonic expectations during music perception (Carrión and Bly, 2008; Koelsch et al., 2000; Pearce et al., 2010; Tillmann and Bigand, 2001; Tillmann, 2005; Tillmann et al., 2006). The investigation of the neural underpinnings of this process has been extensively carried out with diverse neuroimaging techniques, most often using stimuli characterized by harmonic violations, usually realized through mistuned chords or harmonically incongruent cadences. The former are obtained by shifting the pitch of one or more notes of a chord, which nevertheless maintains its identity and harmonic role in the cadence. The latter are the result of the substitution of one or more notes of the chord totally modifying their harmonic role in the cadence. Such studies consistently reported the involvement of a network of cortical areas encompassing the bilateral superior temporal gyrus and the right inferior frontal cortex (Brattico et al., 2006; Garza Villarreal et al., 2011; Koelsch, 2005, 2011a, 2011b; Koelsch and Siebel, 2005; Maess et al., 2001; Miranda and Ullman, 2007; Ruiz et al., 2009). While both melodic salience and harmonic structure have been thoroughly explored in musical research, the interplay between these two points of view has in our opinion been largely overlooked until the present. This gap in the literature becomes particularly problematic when considering the increased tendency in more recent studies to use ecological (polyphonic) musical stimuli. In such polyphonic compositions, it should be considered that melody carries in itself harmonic information, in a manner that, at least from the theoretical point of view, is largely independent from the harmonic information characterizing accompaniment. Some experimental hints in favor of this theoretical view are indeed available. Behavioral studies in children have evidenced that the capacity to process melodies develops earlier than the capacity to process accompaniment chords (Trainor and Trehub, 1994; Trehub et al., 1984). Electrophysiological data indicate that melody carries in itself harmonic information (Miranda and Ullman, 2007; Pearce et al., 2010), even pre-attentively (Brattico et al., 2006). One study by Koelsch and Jentschke (2010) directly compared harmonic violations

765

occurring, respectively, in single melodies and harmonized melodies, and showed that melodic information – present both in melodies and in the top voice of chords as well – is processed earlier (N125, 100– 150 ms) and more frontally than harmonic information (N180, 160– 210 ms) which is processed more broadly over the scalp. In our experiment, we investigated the neural correlates of the interplay between the processes responsible for melodic salience attribution and the processes involved in harmonic structure analysis. We tackled two crucial experimental questions. First, we investigated the neural correlates of melodic salience, contrasting activations in response to two different polyphonic textures, one with the melody as the uppermost of four voices (Soprano), and the other with the same melody displaced as the lowest voice (Bass). Second, we evaluated how the brain reacts when dealing with unrelated, unexpected musical events occurring in the harmonic context, i.e. harmonic violations, but clearly distinguishing between violations that occur in the melody from those occurring in the accompaniment. To this aim, we focused on a population of trained musicians, as a previous study investigating harmonic expectations during melodic perception in adults emphasized that the degree of musical training influences the sensitivity to unexpected chords accompanying the harmonically correct melody (Loui and Wessel, 2007). We used sparse acquisition fMRI (Belin et al., 1999; Hall et al., 1999) to present trained musicians with excerpts of original melodies under optimal listening conditions. In order to resolve our first experimental question on the neural correlates of melodic salience, one set of experimental stimuli, all devoid of harmonic violations, was divided according to differing melodic salience (Soprano versus Bass voices). To resolve our second experimental question on the dependence of structural analysis on the harmonic context, we orthogonally compared, in a 2 by 2 factorial design, stimuli (all with Soprano voice) with harmonic violations occurring in either melody, accompaniment, or both melody and accompaniment. Based on the relevant literature reviewed above, for our effect of melodic salience we expected to find activations correlating with auditory perceptual memory, and possibly also involving the fronto-parietal auditory selective attention network, including the precuneus. With respect to the 2 by 2 factorial design, for the main effect of altered versus correct melody (and possibly also for the main effect of altered versus correct accompaniment), we expected to find activations in the harmonic violation network, including the bilateral superior temporal gyrus and the right inferior frontal cortex. With respect to the crucial interaction between violations occurring in melody and violations occurring in accompaniment, revealing the interplay between the focus on harmonic context and the processing of harmonic violations, we envisaged a possible key role of a subset of brain regions in the perceptual salience and in the harmonic violation networks; more specifically, we expected to observe areas critically involved in the processing of unexpected events (i.e., bilateral superior temporal gyrus, and right inferior frontal gyrus) together with areas deemed important for the processing of salience (i.e., in particular the precuneus). Materials and methods Subjects 20 trained musicians (6 men, 14 women) participated in the fMRI study (mean age 30.29 years, SD 7.02, range 22–44 years). They were right-handed according to the Edinburgh Handedness Inventory (Oldfield, 1971), and had normal hearing and none of them reported history of neurological or psychiatric disorders. All subjects were soloist pianists with a classical curriculum studiorum, recruited from Conservatories. We avoided professional accompanying pianists, who may process a musical excerpt in a bass-salient manner. They all had been playing daily at least since 12 years (mean 21.00 years of practice, SD 6.86, range 12–33 years of practice). Subjects with absolute pitch were excluded. The experiment was approved by the Ethics Committee

766

D. Spada et al. / NeuroImage 102 (2014) 764–775

of San Raffaele Hospital in Milan, Italy. Every subject gave written consent to participate after receiving an explanation of the procedure, according to the current revision of the Declaration of Helsinki. Stimuli The entire pool of stimuli consisted of 40 base form original melodies with accompaniment, composed ad hoc according to both polyphonic rules and Western tonal harmony (Fig. 1). All stimuli were composed by one of the authors (D.S.). Examples of the stimuli for all experimental conditions are available as Supplementary Material. The complete set of stimuli can be obtained by sending requests to danilo.spada@harmonio. org. Each short composition was written with four parts-voices (Soprano, Alto, Tenor, Bass), all homo-rhythmic except the leading voice, which may present some more notes in order to facilitate its salience: this was the only ploy we used to distinguish melody from the other three voices. The excerpts lasted for 8.00 to 8.20 s, and were executed through Musical Instrument Digital Interface (MIDI) digitized piano sounds. To avoid spatial selective attention (Saupe et al., 2010), the excerpts were presented with equal balancing in both channels. To avoid any interpretation of dynamics, all notes had homogeneous attacks and decays, and followed a digitally stable tempo consisting of semiminima or chroma. Stimuli were written as a harmonically self-contained musical phrase, playing on the harmonic sequence “I° → IV° → V° → I°” (or corresponding substitutions), on different tonalities (C, D, Bb, Eb, G, A, E, F, Ab, B), rhythms (4/4, 3/4, 2/4, 6/8) and tempo (from 74 to 104 beats per minute). In order to avoid habituation, each stimulus was developed with distinct contours both for melodies and accompaniments. Sequences were created with a MIDI sequencer (Rosegarden4) playing piano samples (Yamaha Disklavier Pro), and then digitally recorded through a low-latency audio-server (Jack Audio Connection Kit) and a multichannel digital recorder (Ardour Digital Audio Workstation) on a Linux platform (DebianGNU/Linux) with a real-time kernel module.

To evaluate the effects of melodic salience, two conditions were presented: correct melodies as Soprano, i.e. above the accompaniment (cMcA_S = correct Melody and correct Accompaniment with Melody as Soprano, Fig. 1A1) and correct melodies as Bass, i.e. below the accompaniment (cMcA_B = correct Melody and correct Accompaniment with Melody as Bass, Fig. 1A2). Both conditions included 40 adapted stimuli from the set of 40 base form stimuli. To evaluate the effects of the harmonically based relations between melody and accompaniment, the 40 base form stimuli were presented in 4 different modalities, all with Soprano melodies: both melody and accompaniment correct (cMcA; note that this is in all respects the very same condition entering the melodic salience contrast, i.e. cMcA_S), correct melody and altered accompaniment (cMaA), altered melody and correct accompaniment (aMcA), and both melody and accompaniment altered (aMaA). Alterations consisted in shifting notes coming after the beginning of the mixed cadence (Fig. 1A), between the sub-dominant and the dominant, a semitone upward or downward. Melody alterations resulted from shifts of the melodic line notes, all either upward or downward (Fig. 1B1). Accompaniment alterations resulted from shifts of all the three notes composing each chord, all either upward or downward (Fig. 1B2). A shift of all three chord notes was preferred to the alteration of one single note in each chord, in order to preserve the harmonic integrity of the accompaniment. In the condition aMaA, melody and accompaniment were shifted to the opposite semitones (either melody upward and accompaniment downward, or vice versa, Fig. 1B3), in order to avoid consonances between the voices. Alterations occurred (depending on the tempo of each stimulus) at 4100 ms, 4400 ms, 4700 ms, or 5200 ms from stimulus onset. These temporal constraints were chosen to comply with the requirements of optimal signal detection in the sparse temporal sampling technique (Belin et al., 1999; Hall et al., 1999). In order both to balance the overall quantity of correct versus altered stimuli (40 × 2 correct stimuli (cMcA_S, cMcA_B) / 40 × 3 altered

Fig. 1. Example scores of stimuli for each experimental condition. (A1) A sample of an original stimulus (experimental condition cMcA_S), showing the harmonic development which starts from the sub-dominant domain (IV°), goes through the dominant one (V°), and reaches the tonic (I°). (A2) Same stimulus, but with melody as the lower voice (experimental condition cMcA_B). The stimuli in this experimental condition were employed to evaluate the salience of melody. (B) Examples of harmonic alterations, all with melody as the upper voice; (B1) altered melody with correct accompaniment (aMcA): example of upward semitone shifts; (B2) correct melody with altered accompaniment (cMaA): example of downward semitone shifts; (B3) altered melody with altered accompaniment (aMaA): example of upward semitone shifts for melody and downward semitone shifts for accompaniment. (C) A filler stimulus; the circled note of the last chord represents the sound with different timbre (trombone instead of piano).

D. Spada et al. / NeuroImage 102 (2014) 764–775

stimuli (cMaA, aMcA, aMaA) and to introduce an attentive task, we also included two catch trial conditions constituted by filler stimuli: 20 filler_S (with Soprano melody) stimuli and 20 filler_B (with Bass melody) stimuli, which were equivalent to the stimuli of the cMcA_S and cMcA_B conditions, except for the presence of a deviant timbre (trombone, General MIDI #58) in the last chord, for the last note of either melody or accompaniment (Fig. 1C). All possible combinations of melody position and deviant timbre position were presented during the experiment (i.e.: melody up and deviant timbre up, melody up and deviant timbre down, melody down and deviant timbre up, melody down and deviant timbre down). Further, deviant timbres could occur in each of the four voices composing the excerpts (Soprano, Alto, Tenor, or Bass voice). We chose timbre deviance as a low difficulty level – and effective – cue for auditory stream segregation (Krumhansl, 1989; McAdams et al., 1995). The entire set of 240 stimuli was subdivided into 8 blocks, each containing 30 musical excerpts. Both the blocks and the order of stimuli within each block were pseudo-randomized, counter-balancing across subjects. Task Participants were asked to listen to the music excerpts and to detect the occurrence of the deviant instrument. They had to signal this event by pressing a response-box button. This task has been used in previous studies to maintain the subjects' attention throughout the entire experiment duration (e.g., Janata et al., 2002; Koelsch et al., 2002; Leino et al., 2007). The time of maximal BOLD signal amplitude elicited by the motor response was planned to occur after the functional sparse temporal acquisition window, since the deviant timbre was inserted at the end of the excerpt. Subjects were explicitly told that the deviant instrument could occur in the last note of one of the four voices. The timbre task allowed us not only to test general attention during the experiment, but also to have a measure of the relative salience of the different voices. Based on the literature, we hypothesized that the occurrence of the deviant timbre shall be more quickly detected if it appears in the more salient voice; since we expected the participants' attention to be already directed to the most salient stream in which the deviant timbre is presented. Instead, if the deviant timbre occurred in a less salient voice, the detection of alterations should be delayed. Specifically, 1) when both melody and deviant timbre appeared as the Soprano voice, we expected the detection of the deviant timbre to be maximally fast; 2) when melody was the Bass voice and the deviant timbre occurred in the Soprano voice, we expected participants to be maximally slow, since melody (which is usually the upper, most salient voice) is in a less salient position and the deviant timbre is yet in another position, leading to strong auditory incongruence; 3) in the intermediate conditions (melody up–deviant timbre down; melody down–deviant timbre down) we expected intermediate reaction times, due to relatively reduced incongruence. More in general, we expected a gradient in the participants' reaction times according to the position in which the deviant timbre occurred, with progressively slower responses for decreasingly salient voices (i.e., reaction times for Soprano b Alto b Tenor b Bass). Before fMRI scanning, each subject underwent a rapid training outside the scanner room with other stimuli than those of the experimental set, in order to familiarize with the task and to make sure that there were no distractions due to harmonic violations, but that the participants focused on timbre deviations as requested. An experimenter monitored the participants' responses and verbally corrected any mistakes, discouraging the participants to respond to stimuli containing harmonic violations but no timbre deviations. The training session ended when the participant proved that they had fully understood the task instructions and provided at least 10 consecutive correct responses. Inside the scanner, subjects were asked to relax, keep their head and eyes still, and refrain as much as possible from swallowing, coughing, or doing other movements. Prior to the beginning of the task, a brief trial

767

test (the same of the training) was presented to ensure that the participants were refraining from excessive head movements during task performance. Data acquisition Images were acquired on a 3 T Philips Achieva whole body scanner (Philips Medical System, Best, NL) at San Raffaele Scientific Institute. Functional whole-brain images were collected using a T2*-weighted gradient echo, echo-planar pulse sequence, using Blood Oxygenation Level Dependent (BOLD) contrast. Sparse Temporal Sampling was used (Belin et al., 1999; Hall et al., 1999), in order to avoid confounding activations induced by the MR scanner noise (Gaab et al., 2007a, 2007b; Moelker and Pattynama, 2003). Stimuli were presented in a silent window 9 s long, allowing for optimal listening conditions, followed by the acquisition of a functional image in 3 s (acquisition time), resulting in a repetition time (TR) of 12 s. The TR duration and stimulus onsets were defined based on the expected typical course of the hemodynamic response (Hall et al., 1999). Each functional volume consisted of 38 continuous axial slices with a thickness of 4 mm (TE 30 ms, flip angle 85°, field of view 240 × 240 mm, matrix size 128 × 128). Each run began with two dummy scans that were subsequently discarded from the analysis. For each subject, we presented 8 blocks of stimuli in randomized order (7 min each). The presentation of the musical stimuli and the collection of the behavioral responses were controlled by Presentation 12.2 (Neurobehavioral Systems, Albany, CA) software. Stimuli were fed as optical inputs into an audio-control unit (DAP Center Mark II, MR Confon) connected to MRI-compatible headphones (MR Confon). Anatomical images were acquired for localization and visualization of cerebral activations at the end of functional acquisitions with a high resolution T1-weighted scan (TR 7.3 ms, TE 3.5 ms, 200 slices with 1 mm resolution). The anatomical scan had a duration of 6.22 min. fMRI data analysis Data processing and statistical analysis of fMRI data were conducted using Statistical Parametric Mapping (SPM8, Wellcome Department of Imaging Neuroscience, London, UK). For each subject, images were realigned to the first image of the first session, normalized to the Montreal Neurological Institute (MNI) standard space, and smoothed with a 8 mm FWHM Gaussian isotropic kernel prior to undergo general linear model (GLM) statistical analysis. The data of one participant were discarded because of excessive movement inside the scanner. Data were then analyzed in two-stages with a random-effects procedure. In the first level GLM analysis, hemodynamic evoked responses for all experimental conditions were modeled as finite-impulse response functions of windows length equal to the acquisition time (3 s) and 1 time bin. Temporal series for each subject were filtered with a high passfilter cut off of 128 s. First-level t-Student's tests for each experimental condition were calculated. At the second level, we tested our hypotheses by computing a flexible factorial design, with an Experimental Conditions factor (7 levels: cMcA_S, cMcA_B, aMcA, cMaA, aMaA, filler_S, filler_B) modeled assuming no independence between levels and equal variance. In order to investigate the effects of violations occurring in musical excerpts, a series of contrasts was then specified. i) t-test evaluating the main effect of melodic salience, i.e. correct stimuli with Soprano melody (cMcA_S, filler_S) N Bass melody (cMcA_B, filler_B), and opposite contrast; the filler conditions were included in the contrast to augment statistical sensitivity, based on the rationale that the additional components of detecting the deviant timbre for deviant filler stimuli should be eliminated by contrast subtraction. ii) t-test evaluating the main effect of harmonic violations occurring within melody, i.e. “correct melody” (cMcA and cMaA) N “altered melody” (aMcA and aMaA), and opposite contrast. iii) t-test evaluating the main effect of harmonic violations occurring within accompaniment, i.e. “correct accompaniment”

768

D. Spada et al. / NeuroImage 102 (2014) 764–775

(cMcA and aMcA) N “altered accompaniment” (cMaA and aMaA). iv) Ftest evaluating the 2 by 2 factorial interaction between harmonic context and harmonic violations; this interaction was defined as a standard, two-tailed omnibus F-test in SPM, with contrast weights: cMcA = +1, aMcA = −1, cMaA = −1, aMaA = +1; the 2 by 2 factorial interaction tested for the specific neural activation effects of introducing harmonic violations in one harmonic context (melody or accompaniment, respectively) as being dependent or independent of the presence of concomitant harmonic violations in the other harmonic context (accompaniment or melody, respectively). To help in accounting for the interaction effects, we also specified the following post-hoc onetailed T-contrasts: v) t-test evaluating the simple main effect of harmonic violations, i.e. “correct melody with correct accompaniment” (cMcA) N “altered melody with altered accompaniment” (aMaA) and opposite contrast. vi) t-test evaluating the simple main effect of harmonic violations within either melody or accompaniment, i.e. “correct melody with altered accompaniment” (cMaA) N “altered melody with correct accompaniment” (aMcA) and opposite contrast. All tests were assumed to be significant with a P b 0.05, cluster level, FWE corrected. Independent Component Analysis (ICA) was performed using the Group independent components analysis of fMRI data Toolbox (GIFT, Calhoun et al., 2001), implemented in SPM5 (Statistical Parametric Mapping, Wellcome Department of Imaging Neuroscience, London, UK). Data from each session of each subject – previously motion corrected, normalized to the MNI standard space, and spatially smoothed – were used. Images were reduced via Principal Component Analysis (PCA) in two steps: data were reduced to 30 principal components in the first step, and to 15 principal components in the second step. ICA was then conducted with the Infomax ICA algorithm (Bell and Sejnowski, 1995). Group analysis was performed using the ICASSO toolbox, which is used in GIFT to determine the reliability of ICA algorithm: the ICA algorithm is run several times to determine the algorithmic reliability or stability, thus allowing for a better estimation of each component. Individual subject's data were then backreconstructed with the GICA3 method (for details, see Erhardt et al., 2011), in order to obtain the most accurate spatial maps and time courses, and scaled to Z-Scores. One-sample t-test statistics were performed to evaluate the significance of independent component maps. Results were accepted as significant with a P b 0.05, voxel level, FWE corrected. Through temporal sorting, the ICA time courses for each component, experimental condition, and subject were linearly regressed with the design matrix stimulus onset parameters for each experimental condition. The resulting beta estimates of this first level multivariate regression were entered in a group-level random effects one-sample ttest analysis to estimate the size of the positive or negative correlation between the stimulation for each experimental condition and the activation time course of a particular independent component. In addition, the same contrasts between the experimental conditions as in the GLM analysis performed in SPM were also computed with the ICA beta estimates.

If the deviant timbre instead occurred in a less salient voice, its detection should be delayed. A 2 × 2 within-subjects ANOVA with the factors melody (up vs down) and deviant timbre (up vs down) confirmed a main effect of melody position [F(1,19) = 26.731, P = .000, η2p = .585]. When melody was the Soprano voice, reaction times in the timbre task were generally faster (mean = .662 ms, sem = .069) than reaction times for melody as the Bass voice (mean = .721 ms, sem = .067). We also observed a significant interaction between the position of melody and deviant timbre [F(1,19) = 26.935, P = .000, η2p = .586]. Post-hoc tests revealed that when melody was the Soprano voice, a deviant timbre in the Soprano voice (mean = .622 ms, sem = .067) elicited significantly faster reaction times than a deviant timbre in the Bass voice (mean = .703 ms, sem = .074) (P = .010). Instead, when melody was presented as the Bass voice, the opposite pattern was observed: slower reaction times were elicited by deviant timbres occurring in the Soprano voice (accompaniment, mean = .759, sem = .065) as compared to the Bass voice (melody, mean = .683 ms, sem = .069) (P = .010). The congruent condition with both melody and timbre in the Soprano voice elicited the fastest reaction times, as compared to all the other possible combinations (all P b .048) (Fig. 2A). An ANOVA using with the deviant timbre position as within-subject factor (4 levels: Soprano, Alto, Tenor, Bass) revealed a main effect of position [F(3,57) = 11.267, P = .000, η2p = .372]. Despite a clear trend for progressively slower reaction times (Soprano: mean = .638 ms, sem = .069; Alto: mean = .685 ms, sem = .068; Tenor: mean = .698 ms, sem = .074; Bass: mean = .745 ms, sem = .065), post-hoc tests (Bonferroni corrected for multiple comparisons) revealed no difference between deviant timbres in the Soprano versus the Alto voice (P = .192), nor between Soprano and Tenor (P = .150). However, the

Results Behavioral results On average, subjects gave 98.6% (range: 87.5%–100%) correct responses. This result indicates that all subjects were able to easily perceive the different timbres within the musical excerpts and that they maintained a high level of attention to the stimuli throughout the duration of fMRI scanning. We also analyzed reaction times data (only for correct responses), as a measure of the relative salience of the different voices. We expected the deviant timbre to be more quickly detected if it occurred in the most salient voice, where the attention focus should already be directed.

Fig. 2. Reaction times effects of deviant timbre position in melody and accompaniment. (A) Interaction between the factors melody (Soprano (upper voice) vs Bass (lower voice)) and deviant timbre position (Soprano vs Bass). (B) Reaction times to deviant timbres grouped according to their position in the polyphonic excerpt. ** = P b .01; *** = P b .001.

D. Spada et al. / NeuroImage 102 (2014) 764–775

769

difference between deviant timbres in the Soprano and the Bass voice was significant (P = .000) (Fig. 2B). In summary, these results are in line with the consistent evidence in the literature strongly suggesting that the upper voice (usually melody) is more salient than accompaniment in polyphonic music (Fujioka et al., 2005; Lee et al., 2009; Uhlig et al., 2013), and confirm that our polyphonic stimuli adhered to the same principles. FMRI results Melodic salience The contrast between correct stimuli with Soprano melody (cMcA_S, filler_S) minus Bass melody (cMcA_B, filler_B) elicited a significantly stronger hemodynamic response in the precuneus and in the middle frontal gyrus, bilaterally, and in the right superior frontal gyrus. Further significant activations were found in the cingulate cortex, with clusters in its anterior, middle, as well as in the posterior portions (Table 1A and Fig. 3A). The opposite contrast, i.e. correct stimuli with Bass melody minus Soprano melody, elicited a significant activation in the right superior temporal gyrus, in correspondence to the primary and secondary auditory cortices (Table 1B and Fig. 3B). Melody versus accompaniment We then evaluated the differences in activation for harmonic violations in melody versus accompaniment contexts. The results of the main effect of harmonic violations occurring within melody are summarized in Tables 2A, 2B and in Fig. 4A, B. The conditions with “correct melody” (cMcA and cMaA) minus the conditions with “altered melody” (aMcA and aMaA) activated, bilaterally, the precuneus and the retrosplenial cortex, the right cuneus, and the right lingual gyrus. The opposite comparison did not show any significant results at P b 0.05, FWE cluster level corrected. At P b 0.001 voxel level, uncorrected, we found activation of the right inferior frontal gyrus, pars triangularis (Brodmann area 45). The main effect of harmonic violations occurring within accompaniment did not show any significant activations in either direction. The 2 by 2 factorial interaction of harmonic violations occurring within melody (respectively, within accompaniment) in the context of correct versus altered accompaniment (respectively, altered melody) elicited a significant activation at the junction between the posterior middle cingulate gyrus and the precuneus in the right hemisphere (Table 2C and Fig. 4C). The interaction was dominated by a selective activation increase for stimuli with both correct melody and correct accompaniment (cMcA), as opposed to the other three stimulus types (aMcA, cMaA, aMaA). The results of the simple main effects of harmonic violations are reported in Table 3A and Fig. 5A. The direct contrast “correct melody with correct accompaniment minus altered melody with altered accompaniment” activated the left premotor and motor cortex, and the Table 1A Melodic salience. Specific activations for the contrast “melody as the upper voice minus melody as the lower voice” (P b 0.05, cluster level, FWE corrected). Activations are grouped according to activation clusters. Anatomical location

R precuneus L precuneus R posterior cingulate cortex R middle cingulate cortex R superior frontal gyrus R middle frontal gyrus R anterior cingulate cortex L middle frontal gyrus

Cluster level

Voxel level

MNI coordinates

k

P value

Z score

x

y

z

282

0.00015

296

0.00011

103

0.03640

4.89 3.89 3.85 3.84 4.32 3.92 3.94 4.14

14 −12 10 6 24 24 2 −24

−48 −54 −40 −46 50 58 42 46

24 28 16 36 8 20 0 16

Fig. 3. Contrasts melody as the upper voice versus melody as the lower voice. Significant activations (P b 0.05, FWE corrected for multiple comparisons) are displayed on cortical renderings and on sagittal (x coordinate level in mm) and axial (z coordinate levels in mm) slices of the average anatomical image of all participants (warped to the MNI coordinate space). (A) Higher activations for upper minus lower voice. (B) Higher activations for lower minus upper voice.

supplementary motor area bilaterally. The opposite comparison (aMaA minus cMcA) did not show any significant activations. The results of the simple main effects of harmonic violations within either melody or accompaniment are reported in Table 3B and Fig. 5B. The direct contrast “correct melody with altered accompaniment minus altered melody with correct accompaniment” activated the right precuneus and retrosplenial cortex. The opposite comparison (aMcA minus cMaA) did not show any significant activations.

ICA analysis In several of the experimental contrasts reported in the previous section (melodic salience, main effect of correct melody, 2 × 2 factorial interaction, and simple main effect of correct melody with altered accompaniment), we found a prominent involvement of cortico-medial brain regions, such as the precuneus, the retrosplenial, and the anterior cingulate cortex. These cortico-medial brain regions have been associated with the default mode network, a brain system that has been recently also implicated in music processing, particularly with respect to timbral features while passively listening to music (Alluri et al., 2012). As a consequence, we wished to assess whether these effects in cortico-medial brain regions displayed some of the relevant functional properties of the default mode network, in particular its tendency to undergo increased activation with a temporal course that is negatively correlated with that of sensory and cognitive tasks (i.e. sensory and cognitive tasks are known to deactivate the default mode network). To this purpose, we performed ICA of our fMRI data to specifically isolate Table 1B Specific activations for the contrast “melody as the lower voice minus melody as the upper voice” (P b 0.05, cluster level, FWE corrected). Anatomical location

Cluster level

Voxel level

MNI coordinates

k

P value

Z score

x

y

z

Right superior temporal gyrus

165

0.00450

4.15

66

−16

4

770

D. Spada et al. / NeuroImage 102 (2014) 764–775

Table 2A Effects of harmonic alterations in melody versus accompaniment. Specific activations for the contrast “correct melody minus altered melody” (P b 0.05, cluster level, FWE corrected). Anatomical location

R precuneus R cuneus R retrosplenial cortex R lingual gyrus L precuneus L retrosplenial cortex

Cluster level

Voxel level

MNI coordinates

k

P value

Z score

x

y

z

216

0.00096

117

0.02200

4.39 3.44 3.86 3.4 3.91 3.79

14 20 22 14 −20 −10

−42 −64 −52 −56 −48 −54

8 20 8 −4 4 8

the default mode network component and test for its temporal correlation with the time course of our experimental conditions. In addition, ICA was also aimed at detecting relevant auditory and attentive functional networks, in order to demonstrate robust taskcompliant activations in our fMRI data sample. 15 components were evaluated with ICA and subsequently underwent a one-sample t-test analysis to evaluate their correlation with each experimental condition. Some components were identified as either of no interest (component 9, posterior occipital cortex), or as contaminated by noise in MRI signal (1, 2, 3, 4, 7, 8, 10, 12, 13, 14, and 15), particularly over or near the cavities filled with cerebrospinal fluid, in the ventricles and around the outer cortical margin, or over the anterior prefrontal regions most prone of movement and magnetic susceptibility artifacts. The remaining components (5, 6, and 11) and their relations with the experimental conditions are described below (Table 4). Independent component 5 (Fig. 6A) was identified as corresponding to the default mode network, encompassing the medial and lateral prefrontal cortices extending to the anterior cingulate, posterior cingulate and precuneus, and the bilateral temporo-parietal junctions. This component was negatively correlated with all the experimental conditions (all P b 10−5). Independent component 6 (Fig. 6B) covered the bilateral temporal cortices and was positively correlated with all the experimental conditions (all P b 10−5). Independent component 11 (Fig. 6C) included a fronto-parietal network and only showed significant correlations with the “filler conditions” (filler_S, P b .0005; filler_B, P b .005). Independent components 6 and 11 furthermore presented a significant “filler stimuliminus experimental stimuli” effect (independent component 6: P b 10−6, T = 8.066; independent component 11: P = 0.0047, T = 2.835). No other effects were found to be significant. Discussion The main goal of our experiment was to investigate the neural correlates of the interplay between melody and accompaniment, considering them as basic components of the auditory scene in the musical context. We investigated this through the manipulation of harmony and salience, that is between, respectively, a structural and high-level feature of music, and the result of auditory segregation processes that in the music domain lead to the individuation of differentiated auditory Table 2B Specific activations for the contrast “altered melody minus correct melody” (P b 0.05, cluster level, FWE corrected). Anatomical location

R inferior frontal gyrus (pars triangularis)

Cluster level

Voxel level

MNI coordinates

k

Z score

x

9

P value 0.00045

3.32

54

y 34

z

Fig. 4. Contrast correct melody versus altered melody. Graphical conventions are the same as in Fig. 3. (A) Higher activations for correct minus altered melody. (B) Higher activations for altered minus correct melody. (C) Specific activations for the interaction between harmonic violations in melody and harmonic violations in accompaniment.

streams inside the texture. A type of auditory segregation commonly occurring in music leads to the well known distinction of melody (more salient stream) and accompaniment (less salient stream). Accompaniment physically realizes the sequence of chords whose structural description is provided by harmony, thus providing constraints to melody development (“melody as a consequence of chords”). Conversely, it is also possible to harmonize a given melody, by building-up chords that follow each other in conformity to a pre-existing leading voice (chords as a consequence of melody). This intricate framework underlines the importance of harmony both at the melodic and at the accompaniment level, a duality that deserves more consideration in the neuroscientific literature. In the present fMRI study, through a selective manipulation of salience and of the harmonic components of both melody and accompaniment, we have attempted to provide more comprehensive evidence regarding the interplay between them at the

Table 2C Specific activations for the 2 by 2 factorial interaction between harmonic violations in melody and harmonic violations in accompaniment (P b 0.05, cluster level, FWE corrected). Anatomical location

0 R middle cingulate gyrus

Cluster level

Voxel level

k

P value

Z score

MNI coordinates x

y

z

98

0.01600

5.64

12

−48

32

D. Spada et al. / NeuroImage 102 (2014) 764–775 Table 3A Simple main effects of melody and accompaniment alterations. Specific activations for the contrast “correct melody with correct accompaniment minus altered melody with altered accompaniment” (P b 0.05, cluster level, FWE corrected). Anatomical location

L precentral gyrus (motor cortex) L precentral gyrus (premotor cortex) L postcentral gyrus (somatosensory cortex) R supplementary motor area L supplementary motor area

Cluster level

Voxel level

MNI coordinates

k

P value

Z score

x

y

z

143

0.00910

3.62 3.34 3.80

−36 −26 −32

−20 −18 −22

56 64 52

92

0.05000

3.73 3.45

8 −10

−2 −6

52 60

neural level, in order to deepen our understanding of the dynamics that characterize such a complex auditory scene. We start our Discussion with some considerations on the specificity of our experimental design and on the employed sparse acquisition technique. We then move to the discussion of the fMRI results related to our two main experimental objectives, i.e. melodic salience and context-dependent harmonic processing. Experimental task and sparse fMRI acquisition The independent component analysis demonstrated the efficacy of our experimental paradigm for activation signal detection, and the overall compliance of the experimental participants to the task requirements (as also shown by almost perfect accuracy in the behavioral task performance analysis). Independent component 5 encompassed the medial prefrontal cortices, medial and lateral prefrontal cortices, posterior cingulate and precuneus, and the bilateral temporo-parietal junctions, and therefore closely corresponded to the default mode network (Raichle, 2010). In agreement with the typical pattern of deactivation of the default mode network induced by sensory-motor and

771

Table 3B Specific activations for the contrast “correct melody with altered accompaniment minus altered melody with correct accompaniment” (P b 0.05, cluster level, FWE corrected). Anatomical location

Cluster level

Voxel level

MNI coordinates

k

P value

Z score

x

y

z

R precuneus /retrosplenial cortex R retrosplenial cortex

153

0.00660

3.82 4.56

16 22

−46 −52

8 8

cognitive tasks, the timing of activation of this component was negatively correlated with the timing of presentation of stimuli of all the experimental conditions. This indicates that, overall, the experimental stimuli of all seven conditions (cMcA_S, cMcA_B, aMcA, cMaA, aMaA, filler_S, filler_B) actively engaged the sensory-motor and cognitive systems of the trained musicians participating in the study. This also indicates that, contrary to a previous report (Alluri et al., 2012), our data do not provide direct evidence of a functional involvement of the default mode network in musical processing, at least with respect to our experimental manipulations of melodic salience and harmonic violations in an active timbre detection task. In fact, a crucial distinction between our study and that by Alluri et al. (2012) with respect to the anticorrelated functional properties of the default mode network, is that whereas we employed an attentive music processing task, the latter study was based on a passive music listening condition. The fact that we were able to isolate the default mode network using the independent component analysis is therefore most likely due to the fact that this system became relatively more activated in the silent interstimulus interval between the music excerpts presented as stimuli. Independent component 6 covered the bilateral temporal auditory cortices and its timing of activation was positively correlated with that of all seven experimental conditions, demonstrating the involvement of primary and secondary auditory processes, independently of stimulus type. Graded effects for the different stimulus types were also observed, however, with higher positive correlations between Table 4 Significant results of the one-sample t-test performed to test for the correlation between each independent component and the experimental conditions (P b 0.05). Experimental condition

Independent component 5 P value

T value

cMcA_S cMcA_B aMcA cMaA aMaA filler_S filler_B

7.71 e−10 6.41 e−11 3.61 e−16 8.27 e−12 4.35 e−13 3.84 e−6 5.48 e−12

−6.55 −7.01 −9.10 −7.38 −7.90 −4.79 −7.45

Experimental condition

Independent component 6 P value

Fig. 5. Effects of harmonic violations in melody and accompaniment. Graphical conventions are the same as in Fig. 3. (A) Higher activations for “correct melody with correct accompaniment minus altered melody with altered accompaniment”. (B) Higher activations for “correct melody with altered accompaniment minus altered melody with correct accompaniment”.

cMcA_S cMcA_B aMcA cMaA aMaA filler_S filler_B

3.42 e−37 1.45 e−31 2.09 e−31 9.75 e−33 7.25 e−33 9.14 e−48 4.08 e−50

Experimental condition

Independent component 11

cMcA_S cMcA_B aMcA cMaA aMaA filler_S filler_B

T value 16.85 14.75 14.69 15.18 15.23 21.02 22.00

P value

T value

n.s. n.s. n.s. n.s. n.s. 4.01 e−4 4.10 e−3

– – – – – 3.62 2.92

772

D. Spada et al. / NeuroImage 102 (2014) 764–775

Fig. 6. Independent components of interest. Independent components are displayed on axial (z coordinate levels in mm) slices of the average anatomical image of all participants (warped to the MNI coordinate space). (A) Independent component 5: default mode network. (B) Independent component 6: auditory system. (C) Independent component 11: dorsal frontoparietal attention network.

independent component 6 and filler stimuli compared to experimental stimuli. This finding, together with the significant correlations between independent component 11, most likely consisting of the dorsal frontoparietal attention network (Corbetta and Shulman, 2002), and the filler conditions only, demonstrates the effectiveness of top-down attentional selection of stimuli relevant for the experimental task (i.e. pressing a button when detecting a note played by a deviant musical instrument, which only occurred in filler stimuli). The independent component analysis thus showed robust and consistent activation signal detection in the key sensory-motor and associative cortices involved by our auditory task, in spite of the generally lower sensitivity of the employed sparse sampling fMRI technique, compared to continuous fMRI acquisition (Mueller et al., 2011). Nevertheless, as further discussed below, the lower sensitivity of the sparse sampling fMRI technique – which was employed in our study to instantiate the optimal listening conditions required by the task – may have more critically affected the discrimination capacity of some specific between-conditions contrasts, possibly only revealing the tip of the iceberg of the involved activations. In this sense, it is somewhat reassuring in the context of a functional mapping fMRI study that, due to the lower sensitivity of sparse sampling, the fMRI data statistics is susceptible of the false negatives problem more than the false positives problem. In other words, even if the reported activations may only represent a subset of broader functional networks involved in the task, it is reasonable to assume that they represent true effects (especially in consideration of the stringent FWE-type correction for multiple comparisons that we have used). Melodic salience Our first experimental question concerned the neural responses to a variation of melodic salience as a result of shifting melody in the auditory texture. We administered two sets of stimuli, one with Soprano melody, i.e. above accompaniment (melody as the upper voice, more

salient) and one with Bass melody, i.e. below the accompaniment (melody as the lower voice, less salient). Our fMRI results, in line with previous studies (Fujioka et al., 2005; Lee et al., 2009; Satoh et al., 2001, 2003), confirmed that the neural processing of polyphonic music is affected by the position of the melodic line in the musical texture, and that both the auditory perceptual memory system and the auditory selective attention network may be crucially involved. With respect to the attention network, by contrasting musical stimuli with Soprano melody and stimuli with Bass melody, we observed activations bilaterally in the precuneus and middle frontal gyrus, in the anterior, middle, and posterior cingulate cortex, and in the right superior frontal gyrus. Activations in the precuneus and in the anterior prefrontal cortex were also found, with remarkable similarity to our network of activation, in two PET studies by Satoh et al. (2001, 2003), in which the authors investigated the neural correlates of auditory selective attention in musicians and non-musicians listening to polyphonic compositions and attending to a given part of the texture. A network similar to the one we have reported, encompassing the anterior medial prefrontal cortex, the precuneus and the posterior cingulate cortex, is the so-called “episodic memory recall network” (Cavanna and Trimble, 2006; Maguire, 2001; Sajonz et al., 2010; Svoboda et al., 2006; Wagner et al., 2005), which has been linked to familiarity (Hassabis et al., 2007). Although our stimuli were original compositions, a sense of familiarity may have in principle arisen relative to cadenzas in the population of professional musicians that we investigated. Indeed, all the excerpts were harmonically organized as a series of chords developing in the domains of Tonic (I°), sub-Dominant (IV), Dominant (V7), and Tonic again. However, the fact that cadenzas were equally present in both kinds of stimuli (melody as the upper or lower voice) should be sufficient to rule out this explanation for the observed activations. Most likely, then, the sense of familiarity was rather linked to the kind of texture, i.e. the analysis of melodic pitch in stimuli with Soprano melody versus those with Bass melody, the former being more salient and familiar.

D. Spada et al. / NeuroImage 102 (2014) 764–775

By contrast, the activation pattern for stimuli with Bass melody minus stimuli with Soprano melody involved the right superior temporal gyrus, in correspondence to the primary and secondary auditory cortices, consistent with our hypothesis of an involvement of the auditory perceptual memory system (Fujioka et al., 2005; Lee et al., 2009). These brain regions are also well known to participate in processing higher order harmonic information (Janata et al., 2002; Schmithorst, 2005; Zatorre, 2003), and seem to reflect increased dependence on fine acoustic analysis for less salient melody voices. Violations in melody and accompaniment Our second main experimental question was focused on the neural correlates of harmonic violations occurring in the melody and in the accompaniment. It is important to stress that for the contrasts addressing this research question, melodic salience was maintained constant in all the involved experimental conditions, namely keeping melody as Soprano. In the main effect of harmonic violations occurring within melody, we found that the contrast of correct minus altered melody activated a posterior occipito-parietal network including the precuneus, cuneus, lingual gyrus, and retrosplenial cortex. The involvement of these areas has been consistently reported in other studies investigating harmony processing. In an fMRI study on the processing of melody and harmony by trained musicians and by naive subjects (Schmithorst and Holland, 2003), the bilateral activation of the lingual gyrus was associated with harmonic processing in musicians but not in naive subjects, suggesting a change in the kind of visual representation of harmonies due to musical training. The activation of the lingual gyri and cuneus was also observed in a study focusing on harmonic processing and auditory imagery, where musician subjects were presented with visual notes and/ or the corresponding sounds (Schürmann et al., 2002). Another PET and Electroencephalography study found recruitment of the posterior portion of the precuneus bilaterally during music listening, pointing to an interplay between music processing and visual imagery (Nakamura et al., 1999). In a further PET study (Platel et al., 1997), the left extrastriate cortex was associated to the processing of pitch (compared to rhythm and timbre), suggesting that visual imagery may be used to encode ‘high’ and ‘low’ pitch oscillations along the music score. The hypothesis of a mental music score linked to activity in the precuneus and occipital areas has been made in two other studies (Satoh et al., 2001, 2003) evaluating the relationship between a single voice and the whole harmonization in the context of selective attention. The opposite contrast of altered minus correct melody activated the right inferior frontal gyrus, pars triangularis, although at an uncorrected significance level. Nevertheless, this finding was at least in partial agreement with our experimental hypothesis, based on previous studies showing that the violation of musical expectations increases activation in the right inferior frontal gyrus and in the bilateral superior temporal gyrus (Koelsch et al., 2002; Koelsch, 2006; Maess et al., 2001; Tillmann et al., 2003, 2006). The relative weakness and spatial restriction of our effect may have been due either to the reduced sensitivity of the sparse sampling fMRI technique (Mueller et al., 2011), or to temporal constraints in the construction of our stimuli to comply with the requirements of sparse temporal sampling fMRI acquisition. Following such temporal constraints, all our stimuli begun with the mixed cadence (harmonically IV → V → I) and harmonic violations occurred between the sub-dominant and dominant chords (IV → V), a transition with less harmonic tension than the one between dominant and tonic (V → I), which is the standard type of harmonic violation. The main effect of harmonic violations occurring within accompaniment, in turn, did not produce any significant results, contrary to our hypothesis of similar findings as for the harmonic violations in melody. One possible explanation for this negative finding is that, in order to maintain the internal consonance of chords and instead turn the participants' focus on the harmonic contrast between accompaniment and melody, we decided to shift all three chord notes of the accompaniment

773

one semitone upward or downward, instead of shifting only one single note of the chord triad. However, this explanation appears relatively implausible, given that comparable activation patterns involving the inferior frontal cortex have been reported both for soft violations (Tillmann et al., 2006) and for stronger musical expectancy violations (Koelsch et al., 2002, 2005; Tillmann et al., 2003). Thus, the lack of significant results for the main effect of harmonic violations occurring within accompaniment may give further strength to our argument on the importance of the interplay between melodic salience and harmonic structure in music perception, and on its relevance when investigating the neural correlates of harmonic violations. Considering together the results of the main effect of harmonic violations within both melody and accompaniment, our results seem to indicate that only the structural integrity of the melody, not of the accompaniment, is fundamental for the recruitment of an occipito-posterior parietal network whose role in music perception could be that of allowing integrated multimodal representations. Most importantly, the 2 by 2 factorial interaction between violations occurring within melody and violations occurring in accompaniment activated the junction between the posterior middle cingulate gyrus and the precuneus in the right hemisphere. This result is consistent with our hypothesis of an overlap with the effect for perceptual salience in the precuneus, whereas, contrary to our hypothesis, we found no overlap with the harmonic violation network (one possibility for this negative finding being once again the reduced sensitivity of the sparse sampling fMRI acquisition). The precuneus–cingulate posterior medial region thus appears selectively tuned to the interplay between the focus on harmonic context and the processing of harmonic violations. The interaction was dominated by a selective activation increase for stimuli with both correct melody and correct accompaniment (cMcA), as opposed to stimuli carrying harmonic violations (aMcA, cMaA, aMaA). In descriptive terms, this activation pattern indicates that the presence of a harmonic violation in either of the two harmonic contexts (melody or accompaniment, respectively) determines an activation reduction in the posterior medial brain region, quite independently of the presence or absence of harmonic violations in the other harmonic context (accompaniment or melody, respectively). The interaction contrast provides evidence based on the most stringent statistical criteria for the aforementioned recruitment of associative areas in the processing of stimuli which sounded perceptually salient to our subjects, both in terms of texture (Soprano melody) and of harmonic structure (melody and accompaniment free of harmonic violations). Indeed, in the presence of such perceptual salience, other higherlevel cognitive processes besides auditory ones may occur, which hinge in particular on the activity of the retrosplenial cortex and of the precuneus, and contribute in melodic processing, such as selfreflection (Cavanna and Trimble, 2006; Herwig et al., 2012; Sajonz et al., 2010), mind-wandering (Mason et al., 2007), construction of complex scenes (Hassabis et al., 2007), spatial navigation (Burgess et al., 2002; Hartley et al., 2003; Spiers and Maguire, 2006) or other spatial tasks (Kumaran and Maguire, 2005). It is not possible to tease apart these higher-order cognitive components based on our results, but we can envisage that several of them become recruited in professional pianists when listening to tonal harmonies and engaging in multisensory imagery, possibly referred to the self in the form of personal memories, and to a melodic flow moving on time in a harmonically structured acoustical environment. Finally, we analyzed two further effects. The first one, i.e. the contrast of stimuli with “correct melody with correct accompaniment” minus stimuli with “altered melody with altered accompaniment”, showed activations in the sensory-motor system, lateralized to the left hemisphere, including motor, premotor, and supplementary motor areas. Activations in motor areas for auditory tasks have been previously found in expert musicians (Bangert et al., 2006; Haueisen and Knösche, 2001). In trained musicians, an auditory–motor coupling serves to integrate motor sequences required to play an instrument with auditory

774

D. Spada et al. / NeuroImage 102 (2014) 764–775

sensory feedback in perfect timing with each other (Lahav et al., 2007; Zatorre et al., 2007). The opposite contrast of stimuli with “altered melody with altered accompaniment” minus stimuli with “correct melody with correct accompaniment” did not yield any significant activations. This may be due to the virtually complete disappearance of tonal context in fully altered stimuli, given in particular that in our stimuli melody and accompaniment were altered in diverging directions (one up and the other down) in order to avoid a consequent consonance. In turn, the second of these effects, i.e. the contrast of “correct melody with altered accompaniment” minus “altered melody with correct accompaniment”, activated the right precuneus and the retrosplenial cortex, whereas the opposite comparison did not show any significant activations. This result, closely resembling the one of the main effect of harmonic violations occurring within melody (see above), indicates that the processing of melody is probably less disturbed by violations of the accompaniment, than is the processing of accompaniment by violations of the melody. In other words, melody appears as the most “sensible” level in the musical texture. We hypothesize that melody is represented by musicians through multimodal sensory imagery processes, allowing them to assemble it as a definite auditory object during the unfolding of the auditory stream, even in the case of the disruption of the familiar harmonic background. Conclusions The main finding of the present study is the elucidation of the role of the posterior medial cortex, including the precuneus, the posterior cingulate, and the lingual gyri, in the processing of melodic information in polyphonic textures, i.e. in the processing of melody in relation to accompaniment, the former being more salient than the latter. The retrosplenial cortex and precuneus seem to be particularly involved in the mental representation of melodic lines, provided that their own harmonic context is correct and that their place in the polyphonic texture maximizes their salience. We may speculate that a prominent role of the posterior medial cortex is the construction of complex multimodal sensory imagery scenes in which melody can be conceived as an object and accompaniment as the ground of each scene. The sense of familiarity with the auditory scene may be seen as a feature that enables trained musicians to deploy representational abilities with multimodal character. Further studies in naive subjects as opposed to trained musicians may help in better clarifying these aspects. Acknowledgments We thank Mari Tervaniemi, Daniele Schön, and Guido Andreolli for precious comments to our manuscript. This work was supported by the European Union Project BrainTuning FP6-2004 NEST-PATH-028570. Appendix A. Supplementary data Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.neuroimage.2014.08.036. References Alluri, V., Toiviainen, P., Jääskeläinen, I.P., Glerean, E., Sams, M., Brattico, E., 2012. Largescale brain networks emerge from dynamic processing of musical timbre, key and rhythm. Neuroimage 59 (4), 3677–3689. Bangert, M., Peschel, T., Schlaug, G., Rotte, M., Drescher, D., Hinrichs, H., Heinze, H.-J., Altenmüller, E., 2006. Shared networks for auditory and motor processing in professional pianists: evidence from fMRI conjunction. Neuroimage 30, 917–926. Belin, P., Zatorre, R.J., Hoge, R., Evans, A.C., Pike, B., 1999. Event-related fMRI of the auditory cortex. Neuroimage 10, 417–429. Bell, A.J., Sejnowski, T.J., 1995. An information-maximization approach to blind separation and blind deconvolution. Neural Comput. 7, 1129–1159. Brattico, E., Tervaniemi, M., Näätänen, R., Peretz, I., 2006. Musical scale properties are automatically processed in the human auditory cortex. Brain Res. 1117, 162–174.

Bregman, A.S., 1990. Auditory Scene Analysis: The Perceptual Organization of Sound. The MIT Press, Cambridge, MA. Bregman, A.S., 2007. Auditory scene analysis. In: Basbaum, A.I., Koneko, A., Shepherd, G.M. , Westheimer, G. (Eds.), The Senses: A Comprehensive Reference. Academic Press, San Diego, pp. 861–870. Bregman, A.S., 2008. Auditory scene analysis. In: Squire, L. (Ed.), New Encyclopedia of Neuroscience. Academic Press, Oxford, UK. Burgess, N., Maguire, E.A., O'Keefe, J., 2002. The human hippocampus and spatial and episodic memory. Neuron 35, 625–641. Calhoun, V.D., Adali, T., Pearlson, G.D., Pekar, J.J., 2001. A method for making group inferences from functional MRI data using independent component analysis. Hum. Brain Mapp. 14, 140–151. Carrión, R.E., Bly, B.M., 2008. The effects of learning on event-related potential correlates of musical expectancy. Psychophysiology 45, 759–775. Cavanna, A.E., Trimble, M.R., 2006. The precuneus: a review of its functional anatomy and behavioural correlates. Brain 129, 564–583. Corbetta, M., Shulman, G.L., 2002. Control of goal-directed and stimulus-driven attention in the brain. Nat. Rev. Neurosci. 3, 201–215. Erhardt, E.B., Rachakonda, S., Bedrick, E.J., Allen, E.A., Adali, T., Calhoun, V.D., 2011. Comparison of multi-subject ICA methods for analysis of fMRI data. Hum. Brain Mapp. 32 (12), 2075–2095. Fujioka, T., Trainor, L.J., Ross, B., Kakigi, R., Pantev, C., 2005. Automatic encoding of polyphonic melodies in musicians and nonmusicians. J. Cogn. Neurosci. 17, 1578–1592. Gaab, N., Gabrieli, J.D.E., Glover, G.H., 2007a. Assessing the influence of scanner background noise on auditory processing. I. An fMRI study comparing three experimental designs with varying degrees of scanner noise. Hum. Brain Mapp. 28, 703–720. Gaab, N., Gabrieli, J.D.E., Glover, G.H., 2007b. Assessing the influence of scanner background noise on auditory processing. II. An fMRI study comparing auditory processing in the absence and presence of recorded scanner noise using a sparse design. Hum. Brain Mapp. 28, 721–732. Garza Villarreal, E.A., Brattico, E., Leino, S., Ostergaard, L., Vuust, P., 2011. Distinct neural responses to chord violations: a multiple source analysis study. Brain Res. 1389, 103–114. Hall, D.A., Haggard, M.P., Akeroyd, M.A., Palmer, A.R., Summerfield, A.Q., Elliott, M.R., Gurney, E.M., Bowtell, R.W., 1999. “Sparse” temporal sampling in auditory fMRI. Hum. Brain Mapp. 7, 213–223. Halpern, A.R., Kwak, S., Bartlett, J.C., Dowling, W.J., 1996. Effects of aging and musical experience on the representation of tonal hierarchies. Psychol. Aging 11, 235–246. Hartley, T., Maguire, E.A., Spiers, H.J., Burgess, N., 2003. The well-worn route and the path less traveled: distinct neural bases of route following and wayfinding in humans. Neuron 37, 877–888. Hassabis, D., Kumaran, D., Maguire, E.A., 2007. Using imagination to understand the neural basis of episodic memory. J. Neurosci. 27, 14365–14374. Haueisen, J., Knösche, T.R., 2001. Involuntary motor activity in pianists evoked by music perception. J. Cogn. Neurosci. 13, 786–792. Hébert, S., Peretz, I., Gagnon, L., 1995. Perceiving the tonal ending of tune excerpts: the roles of pre-existing representation and musical expertise. Can. J. Exp. Psychol. 49, 193–209. Herwig, U., Kaffenberger, T., Schell, C., Jäncke, L., Bruehl, A.B., 2012. Neural activity associated with self-reflection. BMC Neurosci. 13, 52. Janata, P., Birk, J.L., Van Horn, J.D., Leman, M., Tillmann, B., Bharucha, J.J., 2002. The cortical topography of tonal structures underlying Western music. Science 298, 2167–2170. Koelsch, S., 2005. Neural substrates of processing syntax and semantics in music. Curr. Opin. Neurobiol. 15, 207–212. Koelsch, S., 2006. Significance of Broca's area and ventral premotor cortex for musicsyntactic processing. Cortex 42, 518–520. Koelsch, S., 2011a. Toward a neural basis of music perception — a review and updated model. Front. Psychol. 2, 110. Koelsch, S., 2011b. Towards a neural basis of processing musical semantics. Phys. Life Rev. 8 (2), 89–105. Koelsch, S., Jentschke, S., 2010. Differences in electric brain responses to melodies and chords. J. Cogn. Neurosci. 22, 2251–2262. Koelsch, S., Siebel, W.A., 2005. Towards a neural basis of music perception. Trends Cogn. Sci. 9, 578–584. Koelsch, S., Gunter, T., Friederici, A.D., Schröger, E., 2000. Brain indices of music processing: “nonmusicians” are musical. J. Cogn. Neurosci. 12, 520–541. Koelsch, S., Gunter, T.C., von Cramon, D.Y., Zysset, S., Lohmann, G., Friederici, A.D., 2002. Bach speaks: a cortical “language-network” serves the processing of music. Neuroimage 17, 956–966. Koelsch, S., Fritz, T., Schulze, K., Alsop, D., Schlaug, G., 2005. Adults and children processing music: an fMRI study. Neuroimage 25, 1068–1076. Krumhansl, C.L., 1989. Why is musical timbre so hard to understand? In: Nielzen, S., Olsson, O. (Eds.), Structure and Perception of Electro-Acoustic Sound and Music. Excerpta Medica, Amsterdam, The Netherlands, pp. 43–54. Kumaran, D., Maguire, E.A., 2005. The human hippocampus: cognitive maps or relational memory? J. Neurosci. 25, 7254–7259. Lahav, A., Saltzman, E., Schlaug, G., 2007. Action representation of sound: audiomotor recognition network while listening to newly acquired actions. J. Neurosci. 27, 308. Lee, K.M., Skoe, E., Kraus, N., Ashley, R., 2009. Selective subcortical enhancement of musical intervals in musicians. J. Neurosci. 29, 5832–5840. Leino, S., Brattico, E., Tervaniemi, M., Vuust, P., 2007. Representation of harmony rules in the human brain: further evidence from event-related potentials. Brain Res. 1142, 169–177. Loui, P., Wessel, D., 2007. Harmonic expectation and affect in Western music: effects of attention and training. Percept. Psychophys. 69, 1084–1092. Macherey, O., Delpierre, A., 2013. Perception of musical timbre by cochlear implant listeners: a multidimensional scaling study. Ear Hear. 34, 426–436.

D. Spada et al. / NeuroImage 102 (2014) 764–775 Maeder, P.P., Meuli, R.A., Adriani, M., Bellmann, A., Fornari, E., Thiran, J.P., Pittet, A., Clarke, S., 2001. Distinct pathways involved in sound recognition and localization: a human fMRI study. Neuroimage 14, 802–816. Maess, B., Koelsch, S., Gunter, T.C., Friederici, A.D., 2001. Musical syntax is processed in Broca's area: an MEG study. Nat. Neurosci. 4 (5), 540–545. Maguire, E.A., 2001. Neuroimaging studies of autobiographical event memory. Philos. Trans. R. Soc. Lond. B Biol. Sci. 356, 1441–1451. Mason, M.F., Norton, M.I., Van Horn, J.D., Wegner, D.M., Grafton, S.T., Macrae, C.N., 2007. Wandering minds: the default network and stimulus-independent thought. Science 315, 393–395. McAdams, S., Winsberg, S., Donnadieu, S., De Soete, G., Krimphoff, J., 1995. Perceptual scaling of synthesized musical timbres: common dimensions, specificities, and latent subject classes. Psychol. Res. 58, 177–192. Miranda, R.A., Ullman, M.T., 2007. Double dissociation between rules and memory in music: an event-related potential study. Neuroimage 38, 331–345. Moelker, A., Pattynama, P.M.T., 2003. Acoustic noise concerns in functional magnetic resonance imaging. Hum. Brain Mapp. 20, 123–141. Mueller, K., Mildner, T., Fritz, T., Lepsien, J., Schwarzbauer, C., Schroeter, M.L., Möller, H.E., 2011. Investigating brain response to music: a comparison of different fMRI acquisition schemes. Neuroimage 54, 337–343. Näätänen, R., Pakarinen, S., Rinne, T., Takegata, R., 2004. The mismatch negativity (MMN): towards the optimal paradigm. Clin. Neurophysiol. 115 (1), 140–144. Näätänen, R., Paavilainen, P., Rinne, T., Alho, K., 2007. The mismatch negativity (MMN) in basic research of central auditory processing: a review. Clin. Neurophysiol. 118 (12), 2544–2590. Nakamura, S., Sadato, N., Oohashi, T., Nishina, E., Fuwamoto, Y., Yonekura, Y., 1999. Analysis of music–brain interaction with simultaneous measurement of regional cerebral blood flow and electroencephalogram beta rhythm in human subjects. Neurosci. Lett. 275, 222–226. Oldfield, R.C., 1971. The assessment and analysis of handedness: the Edinburgh inventory. Neuropsychologia 9, 97–113. Palmer, C., Holleran, S., 1994. Harmonic, melodic, and frequency height influences in the perception of multivoiced music. Percept. Psychophys. 56, 301–312. Pearce, M.T., Ruiz, M.H., Kapasi, S., Wiggins, G.A., Bhattacharya, J., 2010. Unsupervised statistical learning underpins computational, behavioural, and neural manifestations of musical expectation. Neuroimage 50, 302–313. Perani, D., Saccuman, M.C., Scifo, P., Spada, D., Andreolli, G., Rovelli, R., Baldoli, C., Koelsch, S., 2010. Functional specializations for music processing in the human newborn brain. Proc. Natl. Acad. Sci. U. S. A. 107, 4758–4763. Platel, H., Price, C., Baron, J.-C., Wise, R., Lambert, J., Frackowiak, R.S., Lechevalier, B., Eustache, F., 1997. The structural components of music perception. A functional anatomical study. Brain 120, 229–243. Raichle, M.E., 2010. Two views of brain function. Trends Cogn. Sci. 14, 180–190. Ruiz, M.H., Koelsch, S., Bhattacharya, J., 2009. Decrease in early right alpha band phase synchronization and late gamma band oscillations in processing syntax in music. Hum. Brain Mapp. 30, 1207–1225. Sajonz, B., Kahnt, T., Margulies, D.S., Park, S.Q., Wittmann, A., Stoy, M., Ströhle, A., Heinz, A., Northoff, G., Bermpohl, F., 2010. Delineating self-referential processing from episodic memory retrieval: common and dissociable networks. Neuroimage 50, 1606–1617.

775

Satoh, M., Takeda, K., Nagata, K., Hatazawa, J., Kuzuhara, S., 2001. Activated brain regions in musicians during an ensemble: a PET study. Brain Res. Cogn. Brain Res. 12, 101–108. Satoh, M., Takeda, K., Nagata, K., Hatazawa, J., Kuzuhara, S., 2003. The anterior portion of the bilateral temporal lobes participates in music perception: a positron emission tomography study. Am. J. Neuroradiol. 24, 1843–1848. Saupe, K., Koelsch, S., Rübsamen, R., 2010. Spatial selective attention in a complex auditory environment such as polyphonic music. J. Acoust. Soc. Am. 127, 472–480. Schmithorst, V.J., 2005. Separate cortical networks involved in music perception: preliminary functional MRI evidence for modularity of music processing. Neuroimage 25, 444–451. Schmithorst, V.J., Holland, S.K., 2003. The effect of musical training on music processing: a functional magnetic resonance imaging study in humans. Neurosci. Lett. 348, 65–68. Schürmann, M., Raij, T., Fujiki, N., Hari, R., 2002. Mind's ear in a musician: where and when in the brain. Neuroimage 16, 434–440. Spiers, H.J., Maguire, E.A., 2006. Thoughts, behaviour, and brain dynamics during navigation in the real world. Neuroimage 31, 1826–1840. Svoboda, E., McKinnon, M.C., Levine, B., 2006. The functional neuroanatomy of autobiographical memory: a meta-analysis. Neuropsychologia 44, 2189–2208. Tillmann, B., 2005. Implicit investigations of tonal knowledge in nonmusician listeners. Ann. N. Y. Acad. Sci. 1060, 100–110. Tillmann, B., Bigand, E., 2001. Global context effect in normal and scrambled musical sequences. J. Exp. Psychol. Hum. Percept. Perform. 27, 1185–1196. Tillmann, B., Janata, P., Bharucha, J.J., 2003. Activation of the inferior frontal cortex in musical priming. Ann. N. Y. Acad. Sci. 999, 209–211. Tillmann, B., Koelsch, S., Escoffier, N., Bigand, E., Lalitte, P., Friederici, A.D., von Cramon, D. Y., 2006. Cognitive priming in sung and instrumental music: activation of inferior frontal cortex. Neuroimage 31, 1771–1782. Trainor, L.J., Trehub, S.E., 1994. Key membership and implied harmony in Western tonal music: developmental perspectives. Percept. Psychophys. 56, 125–132. Tramo, M.J., Cariani, P.A., Delgutte, B., Braida, L.D., 2001. Neurobiological foundations for the theory of harmony in western tonal music. Ann. N. Y. Acad. Sci. 930, 92–116. Trehub, S.E., 2001. Musical predispositions in infancy. Ann. N. Y. Acad. Sci. 930, 1–16. Trehub, S.E., Bull, D., Thorpe, L.A., 1984. Infants' perception of melodies: the role of melodic contour. Child Dev. 55, 821–830. Uhlig, M., Fairhurst, M.T., Keller, P.E., 2013. The importance of integration and top-down salience when listening to complex multi-part musical stimuli. Neuroimage 77, 52–61. Wagner, A.D., Shannon, B.J., Kahn, I., Buckner, R.L., 2005. Parietal lobe contributions to episodic memory retrieval. Trends Cogn. Sci. 9, 445–453. Walker, K.M.M., Bizley, J.K., King, A.J., Schnupp, J.W.H., 2011. Cortical encoding of pitch: recent results and open questions. Hear. Res. 271, 74–87. Zatorre, R.J., 2003. Music and the brain. Ann. N. Y. Acad. Sci. 999, 4–14. Zatorre, R.J., Mondor, T.A., Evans, A.C., 1999. Auditory attention to space and frequency activates similar cerebral systems. Neuroimage 10, 544–554. Zatorre, R.J., Chen, J.L., Penhune, V.B., 2007. When the brain plays music: auditory–motor interactions in music perception and production. Nat. Rev. Neurosci. 8, 547–558.