Hemispheric roles in the perception of speech prosody

Hemispheric roles in the perception of speech prosody

www.elsevier.com/locate/ynimg NeuroImage 23 (2004) 344 – 357 Hemispheric roles in the perception of speech prosody $ Jackson Gandour, a,* Yunxia To...

963KB Sizes 0 Downloads 77 Views

www.elsevier.com/locate/ynimg NeuroImage 23 (2004) 344 – 357

Hemispheric roles in the perception of speech prosody

$

Jackson Gandour, a,* Yunxia Tong, a Donald Wong, b Thomas Talavage, c Mario Dzemidzic, d Yisheng Xu, a Xiaojian Li, e and Mark Lowe f a

Department of Audiology and Speech Sciences, Purdue University, West Lafayette, IN 47907-2038, USA Department of Anatomy and Cell Biology, Indiana University School of Medicine, IN 46202-5120, USA c School of Electrical and Computer Engineering, Purdue University, IN 47907-2035, USA d MDZ Consulting Inc., Greenwood, IN 46143, USA e South China Normal University, Guangzhou, PR China f Cleveland Clinic Foundation, Cleveland, OH 44195, USA b

Received 11 March 2004; revised 2 June 2004; accepted 2 June 2004

Speech prosody is processed in neither a single region nor a specific hemisphere, but engages multiple areas comprising a large-scale spatially distributed network in both hemispheres. It remains to be elucidated whether hemispheric lateralization is based on higher-level prosodic representations or lower-level encoding of acoustic cues, or both. A cross-language (Chinese; English) fMRI study was conducted to examine brain activity elicited by selective attention to Chinese intonation (I) and tone (T) presented in three-syllable (I3, T3) and onesyllable (I1, T1) utterance pairs in a speeded response, discrimination paradigm. The Chinese group exhibited greater activity than the English in a left inferior parietal region across tasks (I1, I3, T1, T3). Only the Chinese group exhibited a leftward asymmetry in inferior parietal and posterior superior temporal (I1, I3, T1, T3), anterior temporal (I1, I3, T1, T3), and frontopolar (I1, I3) regions. Both language groups shared a rightward asymmetry in the mid portions of the superior temporal sulcus and middle frontal gyrus irrespective of prosodic unit or temporal interval. Hemispheric laterality effects enable us to distinguish brain activity associated with higher-order prosodic representations in the Chinese group from that associated with lower-level acoustic/auditory processes that are shared among listeners regardless of language experience. Lateralization is influenced by language experience that shapes the internal prosodic representation of an external auditory signal. We propose that speech prosody perception is mediated primarily by the RH, but is left-lateralized to task-dependent regions when language processing is required beyond the auditory analysis of the complex sound. D 2004 Elsevier Inc. All rights reserved. Keywords: fMRI; Human auditory processing; Speech perception; Selective attention; Laterality; Language; Prosody; Intonation; Tone; Chinese

$ Supplementary data associated with this article can be found, in the online version, at doi: 10.1016/j.neuroimage.2004.06.004. * Corresponding author. Department of Audiology and Speech Sciences, Purdue University, 1353 Heavilon Hall, 500 Oval Drive, West Lafayette, IN 47907-2038. Fax: +1-765-494-0771. E-mail address: [email protected] (J. Gandour). Available online on ScienceDirect (www.sciencedirect.com.)

1053-8119/$ - see front matter D 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.neuroimage.2004.06.004

Introduction The differential roles of the left (LH) and right (RH) cerebral hemispheres in the processing of prosodic information have received considerable attention over the last several decades. Evidence supporting an RH role in the perception of prosodic units at phrase- and sentence-level structures has been wide-ranging, including dichotic listening (Blumstein and Cooper, 1974; ShipleyBrown et al., 1988), lesion deficit (Baum and Pell, 1999; Bra˚dvik et al., 1991; Pell, 1998; Pell and Baum, 1997; Weintraub et al., 1981), and functional neuroimaging (Gandour et al., 2003; George et al., 1996; Meyer et al., 2003; Plante et al., 2002; Wildgruber et al., 2002). Involvement of the LH in the perception of prosodic units at the syllable- or word-level structures has also been compelling with converging evidence from dichotic listening (Moen, 1993; Van Lancker and Fromkin, 1973; Wang et al., 2001), lesion deficit (Eng et al., 1996; Gandour and Dardarananda, 1983; Hughes et al., 1983; Yiu and Fok, 1995), and neuroimaging (Gandour et al., 2000, 2003; Hsieh et al., 2001; Klein et al., 2001). The precise mechanisms underlying functional asymmetry for speech prosody remain a matter of debate. Task-dependent hypotheses focus on functional properties (e.g., tone vs. intonation) of the speech stimuli (Van Lancker, 1980), whereas cue-dependent hypotheses are directed to particular physical properties (e.g., temporal vs. spectral) of the acoustic signal (Ivry and Robertson, 1998; Poeppel, 2003; Schwartz and Tallal, 1980; Zatorre and Belin, 2001). Speech prosody is predicted to be right-lateralized by cue-dependent hypotheses. Hemispheric specialization, however, appears to be sensitive to language-specific factors irrespective of neural mechanisms underlying lower-level auditory processing (Gandour et al., 2002). The Chinese (Mandarin) language can be exploited to address questions of functional asymmetry underlying prosodic processing that involve primarily variations in pitch. Chinese has four lexical tones (e.g., ma [tone 1] ‘‘mother’’, ma ‘‘hemp’’ [tone 2], ma [tone 3] ‘‘horse’’, ma [tone 4] ‘‘scold’’). Tones 1 – 4 can be described phonetically as high level, high rising, falling rising, and high falling, respectively (Howie, 1976). They are manifested at the level of the syllable, the smallest structural unit for carrying prosodic

J. Gandour et al. / NeuroImage 23 (2004) 344–357

features, on a time scale of 200 – 350 ms. Intonation, on the other hand, is manifested at the phrase or sentence level, typically on a time scale of seconds. In Chinese, interrogative intonation exhibits a higher pitch contour than that of its declarative counterpart (Shen, 1990) as well as a wider pitch range for sentence-final tones (Yuan et al., 2002). In English, interrogative sentences do not have overall higher pitch contours than declarative sentences, nor do they show any effects of tone and intonation interaction in sentence-final position. Chinese interrogative intonation with a final rising tone has a rising end, which is similar to English, whereas that with a final falling tone often has a falling end (Yuan et al., 2002). In a previous fMRI study of Chinese tone and intonation (Gandour et al., 2003), both tone and intonation were judged in sentences presented at a fixed length (three words), and we observed leftlateralized lexical tone perception in comparison to intonation. However, the prosodic unit listeners selectively attended to and the temporal interval of attentional focus were coterminous. In judgments of lexical tone, the focus of attention was on the final word only, whereas judgments of intonation required that the focus be directed to the entire sentence. Whether the principal driving force in hemispheric lateralization of speech prosody is due to the temporal interval of attentional focus rather than the hierarchical level of linguistic units is not yet well-established. The aim of the present study is to determine whether the temporal interval in which prosodic units are presented influences the neural substrates used in prosodic processing. As such, participants are asked to make perceptual judgments of tone and intonation in one-syllable and threesyllable Chinese utterances. By comparing activation in homologous regions of both hemispheres, we can assess the extent to which hemispheric laterality for speech prosody is driven by the temporal interval, prosodic unit, or both. Only native Chinese speakers possess implicit knowledge that relates external auditory cues to internal representations of tone and intonation. By employing two language groups, one consisting of Chinese speakers, the other of English speakers, we are able to determine whether activation of particular brain areas is sensitive to language experience.

345

syllable utterances, 39% of the pairs for the three-syllable utterances. Stimuli that were identical in both tone and intonation comprised 28% and 22% of the pairs in one-syllable and threesyllable utterances, respectively. Recording procedure A 52-year-old male native speaker of Mandarin was instructed to read one- and three-syllable utterances at a conversational speaking rate in a declarative and interrogative sentence mood. A reading task was chosen to maximize the likelihood of simulating normal speaking conditions as much as possible while at the same time controlling the syntactic, prosodic, and segmental characteristics of the spoken sentences. To enhance the naturalness of producing the three-syllable utterances, he was told to treat them as SVO (subject verb object) sentences with non-emphatic stress placed on the final syllable. All items in the list were typed in Chinese characters. A sufficient pause was provided between items to ensure that the speaker maintained a uniform speaking rate. By controlling the pace of presentation, we maximized the likelihood of obtaining consistent, natural-sounding productions. To avoid list-reading effects, extra items were placed at the top and bottom of the list. Recordings were made in a double-walled soundproof booth using an AKG C410 headset type microphone and a Sony TCD-D8 digital audio tape recorder. The subject was seated and wore a custom-made headband that maintained the microphone at a distance of 12 cm from the lips. Prescreening identification procedure All one- and three-syllable utterances were presented individually in random order for identification by five native speakers of Chinese who were naive to the purposes of the experiment. They were asked to respond whether they heard a declarative or interrogative intonation and to indicate the tone occurring on the final syllable. Only those stimuli that achieved a perfect (100%) recognition score for both intonation and tone were retained for possible use as stimuli in our training and experimental sessions.

Materials and methods Task procedure Subjects Ten native speakers of Mandarin (five male; five female) and ten native speakers of American English (five male; five female) were closely matched in age/years of education (Chinese: M = 29/ 19; English: M = 27/19). All subjects were strongly right-handed (Oldfield, 1971) and exhibited normal hearing sensitivity. All subjects gave informed consent in compliance with a protocol approved by the Institutional Review Board of Indiana University Purdue University Indianapolis and Clarian Health. Stimuli Stimuli consisted of 36 pairs of three-syllable Chinese utterances, and 44 pairs of one-syllable Chinese utterances. Utterances were designed with two intonation patterns (declarative, interrogative) in combination with the four Chinese tones on the utterancefinal syllable (Fig. 1). Focus was held constant on the utterancefinal syllable. No adjacent syllables in the three-syllable utterances formed bisyllabic words to minimize lexical-semantic processing. Tone or intonation each differed in 36% of the pairs for the one-

The experimental paradigm consisted of four active tasks (Table 1) and a passive listening task. The active tasks required discrimination judgments of intonation (I) and tone (T) in paired threesyllable (I3, T3) and one-syllable (I1, T1) utterances. Subjects were instructed to focus their attention on either the utterance-level intonation or the lexical tone of the final syllable, make discrimination judgments, and respond by pressing a mouse button (left = same; right = different). The control task involved passive listening to the same utterances, either one-syllable utterances (L1) or threesyllable utterances (L3). Subjects responded by alternately pressing the left and right mouse button after each trial. A scanning sequence consisted of two tasks presented in a blocked format alternating with rest periods (Fig. 2). The onesyllable and three-syllable utterance blocks contained 11 and 9 trials, respectively. The order of scanning runs and trials within blocks were randomized for each subject. Instructions were delivered to subjects in their native language via headphones during rest periods immediately preceding each task: ‘‘listen’’ for passive listening to speech stimuli, ‘‘intonation’’ for same – different judgments on Chinese intonation, and ‘‘tone’’ for same – different

346

J. Gandour et al. / NeuroImage 23 (2004) 344–357

Fig. 1. Acoustic features of sample Chinese speech stimuli. Broad-band spectrograms (SPG: 0 – 8 kHz) and voice fundamental frequency contours (F0: 0 – 400 Hz) are displayed for utterance pairs consisting of same tone/different intonation in three-syllable utterances (top left), same tone/different intonation in onesyllable utterances (top right), different tone/same intonation in three-syllable utterances (bottom left), and different tone/same intonation in one-syllable utterances (bottom right).

judgments on Chinese tone. Average trial duration was about 2.9 and 3.5 s, respectively, for the one-syllable and three-syllable utterance blocks, including a response interval of 2 s.

All speech stimuli were digitally edited to have equal maximum energy level in dB SPL. Auditory stimuli were presented binaurally using a computer playback system (E-Prime) and a pneumatic-

Table 1 Samples of Chinese tone and intonation stimuli for tasks involving one-syllable and three-syllable utterances

Note. I1 (T1) and I3 (T3) represent intonation (tone) tasks in one-syllable and three-syllable utterances, respectively.

J. Gandour et al. / NeuroImage 23 (2004) 344–357

347

Fig. 2. Sequence and timing of conditions in each of the four functional imaging runs. I3 and I1 stand for intonation in three-syllable and one-syllable Chinese utterances, respectively; T3 and T1 stand for tone in three-syllable and one-syllable Chinese utterances, respectively; R = rest interval; L3 and L1 stand for passive listening to three-syllable and one-syllable Chinese utterances, respectively.

based audio system (Avotec). The plastic sound conduction tubes were threaded through tightly occlusive foam eartips inside the earmuffs that attenuated the average sound pressure level of the continuous scanner noise by f30 dB. Average intensity of all experimental stimuli was 92 dB SPL as compared to 80 dB SPL scanner noise. Accuracy, reaction time, and subjective ratings of task difficulty were used to measure task performance. Each task was self-rated by listeners on a 1- to 5-point graded scale of difficulty (1 = easy, 3 = medium, 5 = hard) at the end of the scanning session. Before scanning, subjects were trained to a high level of accuracy using stimuli different from those presented during the scanning runs: I3 (Chinese, 93% correct; English, 88%); I1 (Chinese, 92%; English, 77%); T3 (Chinese, 99%; English, 82%); T1 (Chinese, 99%; English, 85%). Imaging protocol Scanning was performed on a 1.5T Signa GE LX Horizon scanner (Waukesha, WI) equipped with birdcage transmit – receive radiofrequency head coils. Each of four 200-volume echoplanar imaging (EPI) series was begun with a rest interval consisting of 8 baseline volumes (16 s), followed by 184 volumes during which the two comparison tasks (32 s) alternated with intervening 16 s rest intervals, and ended with a rest interval of 8 baseline volumes (16 s) (Fig. 2). Functional data were acquired using a gradient-echo EPI pulse sequence with the following parameters: repetition time (TR) 2 s; echo time (TE) 50 ms; matrix 64  64; flip angle (FA) 90j; field of view (FOV) 24  24 cm. Fifteen 7.5-mm-thick, contiguous axial slices were used to image the entire cerebrum. Before functional imaging runs, high-resolution, and anatomic images were acquired in 124 contiguous axial slices using a 3D Spoiled-Grass (3D SPGR) sequence (slice thickness 1.2 – 1.3 mm; TR 35 ms; TE 8 ms; 1 excitation; FA 30j; matrix 256  128; FOV 24  24 cm) for purposes of anatomic localization and coregistration to a standard stereotactic system (Talairach and Tournoux, 1988). Subjects were scanned with eyes closed and room lights dimmed. The effects of head motion were minimized by using a head – neck pad and dental bite bar.

Imaging analysis Image analysis was conducted using the AFNI software package (Cox, 1996). All data for a given subject were motioncorrected to the fourth acquired volume of the first functional imaging run. To remove differences in global intensity between runs, the signal in each voxel was detrended across each functional scan to remove scanner signal drift, and then normalized to its mean intensity. Each of the four functional runs was analyzed to obtain cross-correlation for each of three reference waveforms with the measured fMRI time series for each voxel. The first reference waveform corresponded to one of the four active conditions (I1, I3, T1, T3) presented in a single run (Fig. 2). The second and third reference waveforms corresponded to the two control conditions, L1 and L3, respectively, presented during the two runs with the same temporal interval for the intonation and tone conditions (L1 for I1 and T1; L3 for I3 and T3). After the resulting EPI volumes were transformed to 1-mm isotropic voxels in Talairach coordinate space (Talairach and Tournoux, 1988), the correlation coefficients were converted to z scores for purposes of analyzing multisubject fMRI data (Bosch, 2000), and spatially smoothed by a 5.2-mm FWHM Gaussian filter to account for intersubject variation in brain anatomy and to enhance the signal-to-noise ratio. Direct comparison of active conditions (I1, I3, T1, T3) across runs was accomplished by computing the average z score for each of the four active conditions relative to its corresponding control condition. Averaged z scores for the control conditions were then subtracted from those obtained for their corresponding intonation or tone conditions (e.g., DzI1 = zI1  zL1, DzI3 = zI3  zL3). Evaluating each active condition to a control of the same temporal interval also makes it possible to compare active conditions across temporal intervals (e.g., DzI1 vs. DzI3). Within- and between-group random effects maps (I1 vs. L1, T1 vs. L1, I3 vs. L3, T3 vs. L3) were also generated for display purposes by applying voxel-wise ANOVAs on the z (e.g., Chinese zI1 vs. Chinese zL1) and Dz (e.g., Chinese DzI1 vs. English DzI1) values, respectively. The individual voxel threshold for between-group maps was set at P = 0.01. For within-group maps, significantly activated voxels ( P < 0.001) located within a radius of 7.6 mm were grouped into clusters, with a minimum cluster size threshold

348

J. Gandour et al. / NeuroImage 23 (2004) 344–357

corresponding to four original resolution voxels. According to a Monte Carlo simulation (AlphaSim), this clustering procedure yielded a false-positive alpha level of 0.04. ROI analysis Nine anatomically constrained 5-mm radius spherical regions of interest (ROI) were examined along with other regions. We chose ROIs that have been implicated in previous studies of phonological processing (Burton, 2001; Hickok and Poeppel, 2000l; Hickok et al., 2003), speech perception (Binder et al., 2000; Davis and Johnsrude, 2003; Giraud and Price, 2001; Scott, 2003; Scott and Johnsrude, 2003; Scott et al., 2000; Zatorre et al., 2002), attention (Corbetta, 1998; Corbetta and Shulman, 2002; Corbetta et al., 2000; Shaywitz et al., 2001; Shulman et al., 2002), and working memory (Braver and Bongiolatti, 2002; Chein et al., 2003; D’Esposito et al., 2000; Jonides et al., 1998; Newman et al., 2002; Paulesu et al., 1993; Smith and Jonides, 1999). ROIs were symmetric in nonoverlapping frontal, temporal, and parietal regions of both hemispheres (see Table 2 and Fig. 3). All center coordinates were derived by averaging over peak location coordinates reported in previous studies. They were then slightly adjusted to avoid overlapping of ROIs and crossing of major anatomical boundaries. Of these coordinates, 26 out of 27 (9 ROIs  3 coordinates) fell within 1 SD, 1 (x, mSTS) within 2 SD of the mean published values. Similar results were obtained with 7-mm radius ROIs, but we chose to present only 5-mm radius results because larger ROIs would have to be shifted to avoid crossing of anatomical boundaries. The mean Dz (I1, I3, T1, T3) was calculated for each ROI and every subject. These mean Dz values within each ROI were analyzed using repeated measures mixed-model ANOVAs (SASR) to compare activation between tasks (I1, T1, I3, T3), hemispheres (LH, RH), and groups (Chinese, English). Tasks and hemispheres were treated as fixed, within-subjects effects; groups as a fixed, between-subjects effect. Subjects were nested within groups as a random effect. It may seem reasonable to use stimulus length as a separate factor in the ANOVA, treating one-syllable and three-syllable as two levels of this factor. However, as pointed out in Introduction, although each stimulus contained three syllables in both I3 and T3 tasks, T3 was different from I3 with respect to attentional demands. In T3, participants had to pay attention to the last syllable only, whereas in I3, they had to focus their attention on all three syllables. Treating stimulus length as a separate factor would have confounded length (1, 3) and prosodic unit (I, T).

Results Behavioral performance Behavioral measures of task performance by Chinese and English groups are given in Table 3. A repeated measures ANOVA was conducted with Group as between-subjects factor (Chinese, English) and Task as within-subjects factor (I1, I3, T1, T3). Results revealed significant task  group interactions on self-ratings of task difficulty [ F(1,18) = 3.14, P = 0.0325], accuracy [ F(3,54) = 18.33, P < .0001] and reaction time (RT) [ F(3,54) = 8.68, P < 0.0001]. Tests of simple main effects indicated that for between group comparisons, the tone task was judged to be easier for Chinese than for English listeners (T1, P < 0.0001; T3, P = 0.0004); Chinese listeners judged all tasks at a higher level of accuracy than English listeners (P < 0.01); and

Table 2 Center coordinates and extents of 5-mm spherical ROIs Region BA

x

y

Frontal aMFG mMFG pMFG FO

F32 F45 F44 F37

+50 +4 +32 +22 +10 +33 +25 +14 centered deep within the frontal operculum of the inferior frontal gyrus, extending dorsally to the lower bank of the inferior frontal sulcus, ventrally to the bordering edge of the anterior insula

10 46/9 9/6 45/13

Parietal IPS 40/7 IPL

40

Temporal aSTG 38

mSTS

22

pSTG

22

z

Description

F32 48 +43 centered in and confined to the intraparietal sulcus F50 31 +28 centered in anteroventral aspects of the supramarginal gyrus, extending ventrally into the bordering edge of the Sylvian fissure

+9 8 centered in the temporal pole and wholly confined to the STG; posterior border ( y = +5) was about 20 mm anterior to the medial end of the first transverse temporal sulcus (TTS) F49 20 3 centered in the STS encompassing both the upper and lower banks of the STS; anterior border ( y = 16) was contiguous with the medial border of TTS F56 38 +12 centered in the STG, extending ventrally into the STS; anterior border ( y = 35) was about 20 mm posterior to the medial border of TTS F55

Notes. Stereotaxic coordinates (mm) are derived from the human brain atlas of Talairach and Tournoux (1988). a, anterior; m, middle; p, posterior; FO, frontal operculum; MFG, middle frontal gyrus; IPS, intraparietal sulcus; IPL, inferior parietal lobule; STG, superior temporal gyrus; STS, superior temporal sulcus. Right hemisphere ROIs were generated by reflecting the left hemisphere location across the midline.

RTs were longer for English than for Chinese listeners when making tonal judgments (T1, P = 0.0281; T3, P = 0.007). Regardless of language background, intonation judgments took longer in the onesyllable (I1) than in the three-syllable (I3) utterances (Chinese, P = 0.0003; English, P = 0.0421). In the Chinese group, I1 was judged to be more difficult than T1 (P = 0.0003); more errors were made in I1 than T1 (P = 0.0001), and RTs were longer in I1 compared to T1 (P < 0.0001), I3 compared to T3 (P = 0.0339). In contrast, the English group achieved a higher level of accuracy in I3 than in any of the other three tasks (P < 0.01). Between group comparisons ROI-based ANOVAs revealed that the Chinese group exhibited significantly ( P < 0.001) greater activity, as measured by Dz, in the left IPL relative to the English group regardless of task (I1, I3, T1, T3) (Figs. 4f and 5; Table 4). No other ROIs in either the LH or RH elicited significantly more activity in the Chinese group as compared to the English group. In contrast, the English group showed significantly greater bilateral or right-sided activity in frontal, parietal, and temporal

J. Gandour et al. / NeuroImage 23 (2004) 344–357

349

Fig. 3. Location of fixed spherical ROIs in frontal (open circle), parietal (checkered circle), and temporal (barred circle) regions displayed in left sagittal sections (top and middle panels), and on the lateral surface of both hemispheres (bottom panels). LH = left hemisphere; RH = right hemisphere. Stereotactic x coordinates that appear in the top and middle panels are derived from the human brain atlas of Talairach and Tournoux (1988). See also Table 2.

ROIs relative to the Chinese group (Fig. 6; Table 4). In the frontal lobe, all four ROIs (Figs. 4a – d) were more active bilaterally for the tone tasks (T1, T3). In the parietal lobe, IPS (Fig. 4e) activity was greater in both the LH and RH for T1. In the temporal lobe, the

pSTG (Fig. 4i) was more active bilaterally across tasks (I1, I3, T1, T3), whereas greater activity in the aSTG (Fig. 4g) was observed across tasks in the RH only. Within group comparisons

Table 3 Behavioral performance and self-ratings of task difficulty a

Language group

Task

Accuracy (%)

Reaction time (ms)

Difficulty

Chinese

I1 T1 I3 T3 I1 T1 I3 T3

91.2 97.3 93.9 96.9 76.8 70.9 85.3 72.2

682 504 559 485 668 642 565 656

3.3 1.4 2.7 1.8 4.0 3.3 3.1 3.4

English

(1.5) (0.9) (1.3) (1.2) (2.5) (2.9) (2.4) (3.0)

(48) (41) (36) (26) (49) (53) (40) (47)

(0.4) (0.16) (0.3) (0.25) (0.30) (0.37) (0.28) (0.27)

Note. Values are expressed as mean and standard error (in parentheses). See also note in Table 1. a Scalar units are from 1 to 5 (1 = easy; 3 = medium; 5 = hard) for selfratings of task difficulty.

Hemisphere effects for the Chinese group revealed complementary leftward and rightward asymmetries, as measured by Dz, depending on ROI and task (Table 5). Laterality differences favored the LH in the frontal aMFG (Figs. 4a and 7, upper panel) for intonation tasks only, irrespective of temporal interval (I1, I3). In the parietal lobe, significantly more activity was observed in the left IPL (Figs. 4f and 5) across tasks, and in the left IPS (Figs. 4e and 7, lower panel) for T3 (cf. Gandour et al., 2003). In the temporal lobe, activity was greater in the left pSTG (Figs. 4i and 8) and aSTG (Fig. 4g) across tasks regardless of temporal interval. In contrast, laterality differences favored the RH in the frontal mMFG (Fig. 4b) and temporal mSTS (Figs. 4h and 8) across tasks. Hemisphere effects for the English group were restricted to frontal and temporal ROIs in the RH (Table 5). Rightward asymme-

350

J. Gandour et al. / NeuroImage 23 (2004) 344–357

Fig. 4. Comparison of mean Dz scores between language groups (Chinese, English) per task (I1, T1, I3, T3) and hemisphere (LH, RH) within each ROI. Frontal lobe, a – d; parietal, e – f; temporal, g – i. I1 is measured by DzI1; T1 by DzT1; I3 by DzI3; T3 by DzT3. Error bars represent F1 SE.

tries were observed in the frontal mMFG (Fig. 4b) and temporal mSTS (Fig. 4h) across tasks. These functional asymmetries favoring the RH were identical to those for the Chinese group. No significant leftward asymmetries were observed for any task across ROIs. Task effects for the Chinese group revealed laterality differences, as measured by Dz, related to the prosodic unit. Intonation (I1, I3), when compared to tone (T1, T3), favored the LH in the aMFG (Figs. 4a and 7). In the pMFG (Fig. 4c), I3 was greater than T3 in the RH; I1 was greater than T1 in both hemispheres. For both groups, a cluster analysis revealed significant (P < .001) activation in the supplementary motor area across tasks. The Chinese group showed predominantly right-sided activation in the lateral cerebellum across tasks. In the caudate and thalamus, increased activation was observed in the Chinese group for the intonation tasks only (I1, I3), but across tasks in the English group. Fig. 5. A random effects fMRI activation map obtained from comparison of discrimination judgments of intonation in one-syllable utterances (I1) relative to passive listening to the same stimuli (L1) between the two language groups (DzI1 Chinese vs. DzI1 English). Left/right sagittal sections through stereotaxic space are superimposed onto a representative brain anatomy. The Chinese group shows increased activation in the left IPL, as compared to the English group, centered in ventral aspects of the supramarginal gyrus, and extending into the bordering edge of the Sylvian fissure. Similar activation foci in the IPL are also observed in I3 vs. L3, T1 vs. L1, and T3 vs. L3 comparisons. See also Fig. 4.

Discussion Hemispheric roles in speech prosody The major findings of this study demonstrate that Chinese tone and intonation are best thought of as a mosaic of multiple local asymmetries that allows for the possibility that different regions

J. Gandour et al. / NeuroImage 23 (2004) 344–357

351

Table 4 Group effects per task-and-hemisphere from statistical analyses on mean Dz within each spherical ROI Group

Hemi

Task

C>E

LH

I1 T1 I3 T3 I1 T1 I3 T3 I1 T1 I3 T3 I1 T1 I3 T3

Frontal aMFG

RH

E>C

LH

RH

Parietal mMFG

pMFG

FO

IPS

Temporal IPL

aSTG

mSTS

pSTG

*** *** *** ***

***

*

***

**

**

*

**

*

***

*

***

**

**

*

**

*

***

***

* * * *

** ** ** ** ** ** ** **

Note. C = Chinese group; E = English group; Hemi = hemisphere. LH = left hemisphere; RH = right hemisphere. *F(1, 18), P < 0.05; **F(1, 18), P < 0.01; ***F(1, 18), P < 0.001. See also notes to Tables 1 and 2.

may be differentially weighted in laterality depending on language-, modality-, and task-related features (Ide et al., 1999). Earlier hypotheses that focus on hemispheric function capture only part of, but not the whole, phenomenon. Not all aspects of speech prosody are lateralized to the RH. Cross-language differences in laterality of particular brain regions depend on a listener’s implicit knowledge of the relation between external stimulus features (acoustic/auditory) and internal conceptual representations (linguistic/prosodic). All regions in the frontal, temporal, and parietal lobes

Fig. 6. Random effects fMRI activation map obtained from comparison of discrimination judgments of tone in one-syllable utterances (T1) relative to passive listening to the same stimuli (L1) between the two language groups (DzT1 English vs. DzT1 Chinese). An axial section reveals increased activation bilaterally in both frontal and parietal regions, as well as in the supplementary motor area, for the English group relative to the Chinese group. Similar activation foci are also observed in the T3 vs. L3 comparison. See also Fig. 4.

that are lateralized to the LH in response to all tasks or subsets of tasks are found in the Chinese group only (Fig. 9). Conversely, the two regions in the temporal and frontal lobes that are lateralized to the RH are found in both language groups. We infer that LH laterality reflects higher-order processing of internal representations of Chinese tone and intonation, whereas RH laterality reflects lower-order processing of complex auditory stimuli. Previous models of speech prosody processing in the brain have either focused on linguistics or acoustics as the driving force underlying hemispheric lateralization. In this study, tone and intonation are lateralized to the LH for the Chinese group. Despite their functional differences from a linguistic perspective, they both recruit shared neural mechanisms in frontal, temporal, and parietal regions of the LH. The finding that intonation is lateralized to the LH cannot be accounted for by a model that claims that ‘‘suprasegmental sentence level information of speech comprehension is subserved by the RH’’ (Friederici and Alter, 2004, p. 268). Neither can this finding be explained by a hypothesis based on the size of the temporal integration window (short ! LH; long ! RH) (Poeppel, 2003). In spite of the fact that both intonation and tone meet his criteria for a long temporal integration window, they are lateralized to the LH instead of the RH. Instead of viewing hemispheric roles as being derived from either acoustics or linguistics independently, we propose that both linguistics and acoustics, in addition to task demands (Plante et al., 2002), are all necessary ingredients for developing a neurobiological model of speech prosody. This model relies on dynamic interactions between the two hemispheres. Whereas the RH is engaged in pitch processing of complex auditory signals, including speech, we speculate that the LH is recruited to process categorical information to support phonological processing, or even syntactic and semantic processing (cf. Friederici and Alter, 2004). With respect to task demands, I1 elicits greater activation than T1 in the left aMFG and bilaterally in the pMFG. These differences cannot be explained by ‘‘prosodic frame length’’ (Dogil et al., 2002) since both tone and intonation are presented in an identical temporal context (one-syllable). These findings cannot be explained by a model that claims that segmental, lexical

352

J. Gandour et al. / NeuroImage 23 (2004) 344–357

Table 5 Within-group hemisphere effects per task from statistical analyses on mean Dz within each spherical ROI Group

Hemi

Task

Frontal

C

LH > RH

I1 T1 I3 T3 I1 T1 I3 T3 I1 T1 I3 T3 I1 T1 I3 T3

+

aMFG

RH > LH

E

LH > RH

RH > LH

Parietal mMFG

pMFG

FO

IPS

Temporal IPL

aSTG

mSTS

** ** ** **

++

* * * *

* * * *

* * * *

* * * *

* * * *

Note. *F(1, 9), P < 0.05; **F(1, 9), P < 0.01; +tTukey-adjusted(9), P < 0.05. See also notes to Tables 2 and 4.

(i.e., tone), and syntactic information is processed in the LH, suprasegmental sentence level information (i.e., intonation) in the RH (Friederici and Alter, 2004). Rather, they most likely reflect task demands related to retrieval of internal representations associated with tone and intonation. Functional heterogeneity within a spatially distributed network Frontal lobe Activation in the frontopolar cortex (BA 10) was bilateral across all tasks for English listeners, but predominantly left-sided in the intonation tasks (I1, I3) for Chinese listeners (Table 5). The frontopolar region has extensive interconnections with auditory regions of the superior temporal gyrus (Petrides and Pandya, 1984). Thus, when presented with a competing articulatory suppression task, bilateral activation of frontopolar cortex has been reported in a verbal working memory paradigm (Gruber, 2001). Its functional role is inferred to be that of integrating working memory with the allocation of attentional resources (Koechlin et al., 1999), or applying greater effort in memory retrieval (Buckner et al., 1996; Schacter et al., 1996). These cross-language differences in frontopolar activation are likely to result from the linguistic function of suprasegmental information in Chinese and English. As measured by RT and accuracy, Chinese listeners take longer and are less proficient in judging intonation than tone. The relatively greater difficulty in intonation judgments presumably reflects the fact that in Chinese, all syllables carry tonal contours obligatorily. Tones are likely to be processed first, as compared to intonation, due to this syllable-bysyllable processing. By comparison, intonation contours play a comparatively minor role in signaling differences in sentence mood. In this study, the unmarked (i.e., minus a sentence-final particle) yes – no interrogatives are known to carry a light functional load (Shen, 1990). In the present study, subjects were required to keep tone or intonation information of the first stimulus in a pair in their working memory while concurrently accessing tone or intonation identification of the second stimulus. Due to the functional difference between tone and intonation for Chinese listeners,

pSTG

++

tTukey-adjusted(9), P < 0.01.

intonation judgment of the second stimulus competes for more attentional resources and leads to greater effort in memory retrieval of intonation from the first stimulus. This process presumably elicits greater activity in the left frontopolar region for intonation tasks in Chinese listeners. English listeners, on the other hand, employ a different processing strategy regardless of linguistic function. Without prior knowledge of the Chinese language, retrieving auditory information from working memory and making discrimination judgments is presumed to be equally difficult between tone and intonation, resulting in bilateral activation of frontopolar cortex for all tasks. Dorsolateral prefrontal cortex, including BA 46 and BA 9, is involved in controlling attentional demands of tasks and maintaining information in working memory (Corbetta and Shulman, 2002; Knight et al., 1999; MacDonald et al., 2000; Mesulam, 1981). The rightward asymmetry in the mMFG (BA 46) that is observed in all tasks (I1, I3, T1, T3) in both language groups (Table 5) points to a stage of processing that involves auditory attention and working memory. Functional neuroimaging data reveal that auditory selective attention tasks elicit increased activity in right dorsolateral prefrontal cortex (Zatorre et al., 1999). In the music domain, perceptual analysis and short-term maintenance of pitch information underlying melodies recruits neural systems within the right prefrontal and temporal cortex (Zatorre et al., 1994). In this study, activation of the prefrontal mMFG and temporal mSTS is similarly lateralized to the RH across tasks in both language groups. These data are consistent with the idea that the right dorsolateral prefrontal area (BA 46/9) plays a role in auditory attention that modulates pitch perception in sensory representations beyond the lateral belt of the auditory cortex, and actively retains pitch information in auditory working memory (cf. Plante et al., 2002). Albeit in the speech domain, this frontotemporal network in the RH serves to maintain pitch information regardless of its linguistic relevance. A frontotemporal network for auditory short-term memory is further supported by epileptic patients who show significant deficits in retention of tonal information after unilateral excisions of the right frontal or temporal regions (Zatorre and Samson, 1991). In nonhuman primates, a processing stream for soundobject identification has been proposed that projects anteriorly

J. Gandour et al. / NeuroImage 23 (2004) 344–357

353

Fig. 7. Random effects fMRI activation maps obtained from comparison of discrimination judgments of intonation (I3; upper panel) and tone (T3; bottom panel) in three-syllable utterances relative to passive listening to the same stimuli (L3) for the Chinese group (zI3 vs. zL3; zT3 vs. zL3). In I3 vs. L3 and I1 vs. L1 (not shown), increased activity in frontopolar cortex (aMFG) shows a leftward asymmetry (upper panel; x =  35), whereas activation of the middle (mMFG) region of dorsolateral prefrontal cortex shows the opposite laterality effect (upper panel; x = + 35, + 40, + 45). In T3 vs. L3, IPS activity is predominant in the LH (bottom panel; x =  35,  40,  45). In I3 (upper panel; x = + 35, + 40, + 45) vs. T3 (lower panel; x = + 35, + 40, + 45), activation of the right pMFG is greater in the I3 than the T3 task. See also Fig. 4.

along the lateral temporal cortex (Rauschecker and Tian, 2000), leading to the lateral prefrontal cortex (Hackett et al., 1999; Romanski et al., 1999a,b). A similar anterior processing stream destined for the lateral prefrontal cortex in humans presumably underlies a frontotemporal network, at least in the RH, for lowlevel auditory processing of complex pitch information. Intonation elicited greater activity relative to tone in the pMFG (BA 9), bilaterally in the one-syllable condition, right sided only in the three-syllable condition (Fig. 4c). The fact that I3 elicited greater activity than T3 in the posterior MFG of the RH replicates Gandour et al. (2003). One possible explanation focuses on the prosodic units themselves. Tones are processed in the LH, intonation predominantly in the RH. However, this account is untenable because I1 elicits greater activation bilaterally as compared to T1. Moreover, intonation (I1, I3) and tone (T1, T3) tasks separately elicit no hemispheric laterality effects in the pMFG. Another

possible explanation has to do with the temporal interval. One might argue that the difference between I3 and T3 is due to the time interval of focused attention for the prosodic unit: I3 = three syllables; T3 = last syllable only. On this view, shorter prosodic frames are processed in the LH, longer frames in the RH. This alternative account of pMFG activity is also ruled out because I1 elicits similar hemispheric laterality effects as I3. Instead, differential pMFG activity related to direct comparisons between intonation and tone are most likely related to task demands (cf. Plante et al., 2002). As measured by RT and self-ratings of task difficulty, intonation tasks are more difficult than tone for Chinese listeners (Table 3). Equally significant is the fact that the English group shows greater activation for tonal processing (T1, T3) than the Chinese group in the pMFG bilaterally (Table 4). These findings together are consistent with the idea that the pMFG coordinates attentional resources required by the task.

354

J. Gandour et al. / NeuroImage 23 (2004) 344–357

Fig. 8. A random effects fMRI activation map obtained from comparison of discrimination judgments of intonation in one-syllable utterances (I1) relative to passive listening to the same stimuli (L1) for the Chinese group (zI1 vs. zL1). Left/right sagittal sections reveal increased mSTS activity in the RH, projecting both ventrally and dorsally into the MTG and STG, respectively. pSTG activity shows the opposite hemispheric effect, part of a continuous swath of activation extending caudally from middle regions of the STG/STS. Similar activation foci are also observed in T1 vs. L1, I3 vs. L3, and T3 vs. L3. See also Fig. 4.

position in a sequence of syllables, which causes repeated shifts in attention from one item to another. These laterality differences between T3 and T1 indicate that selective attention to discrete linguistic constructs is a gradient neurophysiological phenomenon in the context of task-specific demands. The Chinese group, as compared to English, shows greater activation across tasks (I1, I3, T1, T3) in the left ventral aspects of the IPL (BA 40) near the parietotemporal boundary (Table 4). Within the Chinese group, a relatively greater IPL activation on the left is observed across tasks and without regard to the prosodic unit (I, T) or temporal interval (1, 3). Perhaps it is the ‘‘categoricalness’’ or phonological significance of the auditory stimuli that triggers activation in this area (Jacquemot et al., 2003). This languagespecific effect can be understood from the conceptualization of the IPL as part of an auditory-motor integration circuit in speech perception (Hickok and Poeppel, 2000; Wise et al., 2001). Chinese listeners possess articulatory-based representations of Chinese tones and intonation. English listeners do not. Consequently, no

The fronto-opercular region (FO, BA 45/13) is activated bilaterally in both language groups (Table 5). Activation levels are similar across tasks (I1, I3, T1, T3). Recent neuroimaging studies (Meyer et al., 2002, 2003) also show bilateral FO activation in a prosodic speech condition in which a speech utterance is reduced to speech melody by removal of all lexical and syntactic information. Increased FO activity is presumed to reflect increased effort in extracting syntactic, lexical-semantic, or slow pitch information from degraded speech signals (Meyer et al., 2002, 2003), or in discriminating sequences of melodic pitch patterns (Zatorre et al., 1994). Similarly, our tasks require increased cognitive effort to extract tone and intonation from the auditory stream to maintain this information in working memory. Parietal lobe There appear to be at least two distinct regions of activation in the parietal cortex, one located more superiorly (IPS) in the intraparietal sulcus and adjacent aspects of the superior parietal lobule, another more inferiorly (IPL) in the anterior supramarginal gyrus (SMG) near the parietotemporal boundary (cf. Becker et al., 1999). Our findings show greater activation in the IPS bilaterally in T1 for the English group compared to Chinese (Table 4). It has been proposed that this area supports voluntary focusing and shifting of attentional scanning across activated memory representations (Chein et al., 2003; Corbetta and Shulman, 2002; Corbetta et al., 2000; Cowan, 1995; Mazoyer et al., 2002). The efficacy of selective attention depends on how external stimuli are encoded into internal phonological representations. English listeners experienced more difficulty in focusing and shifting of attention in T1 because lexically relevant pitch variations do not occur in English monosyllables. In contrast, the Chinese group shows left-sided activity in the IPS for T3 (Table 5). This finding replicates our previous study of Chinese tone and intonation (Gandour et al., 2003), reinforcing the view that a left frontoparietal network is recruited for the processing of lexical tones (Li et al., 2003). In T1, listeners extract tone from isolated monosyllables. In T3, they extract tone from a fixed

Fig. 9. Laterality effects for ROIs in the Chinese group only, and in both Chinese and English groups, rendered on a three-dimensional LH template for common reference. In the Chinese group (top panel), IPL, aSTG, and pSTG are left-lateralized (LH > RH) across tasks; aMFG (I1, I3) and IPS (T3) are left-lateralized for specific tasks. In both language groups (bottom panel), mMFG and mSTS are right-lateralized (RH > LH) across tasks (bottom right panel). Other ROIs do not show laterality effects. No ROI elicited either a rightward asymmetry for the Chinese group only, or a leftward asymmetry for both Chinese and English groups. See also Table 5.

J. Gandour et al. / NeuroImage 23 (2004) 344–357

activation of this area is observed in the English group. Its LH activity co-occurs with a leftward asymmetry in the pSTG across tasks. Co-activation of the IPL reinforces the view that it is part of an auditory – articulatory processing stream that connects posterior temporal and inferior prefrontal regions. An alternative conceptualization is that the phonological storage component of verbal working memory resides in the IPL (Awh et al., 1996; Paulesu et al., 1993). This notion predicts that both passive listening and verbal working memory tasks should elicit activation in this region, since auditory verbal information has obligatory access to the store (Chein et al., 2003). I1, I3, T1, and T3 were all derived by subtracting their corresponding passive listening control condition. Contrary to fact, this notion would wrongly predict no increased activation in the IPL. Temporal lobe The anterior superior temporal gyrus (aSTG) displays an LH advantage in the Chinese group across tasks (Table 5). A reduced RH, rather than increased LH aSTG activation, appears to underlie this hemispheric asymmetry across all tasks. Since intelligible speech is used in all tasks, phonological input alone may be sufficient to explain the leftward asymmetry in the Chinese group (Scott and Johnsrude, 2003; Scott et al., 2000). It is also consistent with the notion that this region maps acoustic – phonetic cues onto linguistic representations as part of a larger auditory-semantic integration circuit in speech perception (Giraud and Price, 2001; Scott and Johnsrude, 2003; Scott et al., 2003). In contrast, English listeners do not have knowledge of these prosodic representations. Consequently, they employ a nonlinguistic pitch processing strategy across tasks and fail to show any hemispheric asymmetry. A language group effect is not found in hemispheric laterality of the mSTS (BA 22/21). Both groups show greater RH activity in the mSTS across tasks (Table 5). This suggests that this area is sensitive to different acoustic features of the speech signal irrespective of language experience. The rightward asymmetry may reflect shared mechanisms underlying early attentional modulation in processing of complex pitch patterns. In this study, subjects were required to direct their attention to slow modulation of pitch patterns (i.e., c300 – 1000 ms) underlying either Chinese tone or intonation. This interpretation is consistent with hemispheric roles hypothesized for auditory processing of complex sounds in the temporal lobe: RH for spectral processing, LH for temporal processing (Poeppel, 2003; Zatorre and Belin, 2001; Zatorre et al., 2002). Moreover, it is consistent with the view that right auditory cortex is most important in the processing of dynamic pitch variation (Johnsrude et al., 2000). Both groups show greater activation in the right mSTS. We therefore infer that this activity reflects a complex aspect of pitch processing that is independent of language experience. A left asymmetric activation of the posterior part of the superior temporal gyrus (pSTG; BA 22) across tasks is observed in the Chinese group only (Table 5). It has been suggested that the left pSTG, as part of a posterior processing stream, is involved in prelexical processing of phonetic cues and features (Scott, 2003; Scott and Johnsrude, 2003; Scott and Wise, 2003). English listeners, however, show no leftward asymmetry in the pSTG (Table 5). Moreover, they show greater activation bilaterally relative to the Chinese group (Table 4). Therefore, auditory phonetic cues that are of phonological significance in one’s native language may be primarily responsible for this leftward asymmetry.

355

These findings collectively support functional segregation of temporal lobe regions, and their functional integration as part of a temporofrontal network (Davis and Johnsrude, 2003; Scott, 2003; Scott and Johnsrude, 2003; Specht and Reul, 2003). LH networks in the temporal lobe that are sensitive to phonologically relevant parameters from the auditory signal are in anterior and posterior, as opposed to central, regions of the STG/STS (Giraud and Price, 2001). The anterior region appears to be part of an auditorysemantic processing stream, the posterior region part of an auditory-motor processing stream. Both processing streams, in turn, project to convergence areas in the frontal lobe. Effects of task performance on hemispheric asymmetry In this study, the BOLD signal magnitude depends on the participant’s proficiency in a particular phonological task (Chee et al., 2001). The two groups differ maximally in relative language proficiency: Chinese group, 100%; English group, 0%. As reflected in behavioral measures of task performance (Table 3), perceptual judgments of Chinese tones require more cognitive effort by English monolinguals due to their unfamiliarity with lexical tones. Their unfamiliarity with the Chinese language results in greater BOLD activation for T1 and T3, either bilateral or RH only (cf. Chee et al., 2001). The effect of minimal language proficiency applies only to lexical tone. Intonation, on the other hand, elicits bilateral activation for both groups in the posterior MFG, frontal operculum, and intraparietal sulcus (Table 4; Fig. 4). This common frontoparietal activity implies that processing of intonation requires similar cognitive effort for Chinese and English participants.

Conclusions Cross-language comparisons provide unique insights into the functional roles of different areas of this cortical network that are recruited for processing different aspects of speech prosody (e.g., auditory, phonological). By using tone and intonation tasks, we are able to distinguish hemispheric roles of areas sensitive to linguistic levels of processing (LH) from those sensitive to lower-level acoustical processing (RH). Rather than attribute processing of speech prosody to RH mechanisms exclusively, our findings suggest that lateralization is influenced by language experience that shapes the internal prosodic representation of an external auditory signal. This emerging model assumes a close interaction between the two hemispheres via the corpus callosum. In sum, we propose a more comprehensive model of speech prosody perception that is mediated primarily by RH regions for complex-sound analysis, but is lateralized to task-dependent regions in the LH when language processing is required.

Acknowledgments Funding was provided by a research grant from the National Institutes of Health R01 DC04584-04 (JG) and an NIH postdoctoral traineeship (XL). We are grateful to J. Lowe, T. Osborn, and J. Zimmerman for their technical assistance in the MRI laboratory. Portions of this research were presented at the 11th annual meeting of the Cognitive Neuroscience Society, San Francisco, April 2004. Correspondence should be addressed to Jack Gandour, Department of Audiology and Speech Sciences,

356

J. Gandour et al. / NeuroImage 23 (2004) 344–357

Purdue University, West Lafayette, IN 47907-2038, or via email: [email protected].

References Awh, E., Jonides, J., Smith, E.E., Schumacher, E.H., Koeppe, R.A., Katz, S., 1996. Dissociation of storage and rehearsal in verbal working memory. Psychol. Sci. 7 (1), 25 – 31. Baum, S., Pell, M., 1999. The neural bases of prosody: insights from lesion studies and neuroimaging. Aphasiology 13, 581 – 608. Becker, J., MacAndrew, D., Fiez, J., 1999. A comment on the functional localization of the phonological storage subsystem of working memory. Brain Cogn. 41, 27 – 38. Binder, J., Frost, J., Hammeke, T., Bellgowan, P., Springer, J., Kaufman, J., Possing, E., 2000. Human temporal lobe activation by speech and nonspeech sounds. Cereb. Cortex 10 (5), 512 – 528. Blumstein, S., Cooper, W.E., 1974. Hemispheric processing of intonation contours. Cortex 10, 146 – 158. Bosch, V., 2000. Statistical analysis of multi-subject fMRI data: assessment of focal activations. J. Magn. Reson. Imaging 11 (1), 61 – 64. Bra˚dvik, B., Dravins, C., Holta˚s, S., Rosen, I., Ryding, E., Ingvar, D., 1991. Disturbances of speech prosody following right hemisphere infarcts. Acta Neurol. Scand. 84 (2), 114 – 126. Braver, T.S., Bongiolatti, S.R., 2002. The role of frontopolar cortex in subgoal processing during working memory. NeuroImage 15 (3), 523 – 536. Buckner, R.L., Raichle, M.E., Miezin, F.M., Petersen, S.E., 1996. Functional anatomic studies of memory retrieval for auditory words and visual pictures. J. Neurosci. 16 (19), 6219 – 6235. Burton, M., 2001. The role of the inferior frontal cortex in phonological processing. Cogn. Sci. 25 (5), 695 – 709. Chee, M.W., Hon, N., Lee, H.L., Soon, C.S., 2001. Relative language proficiency modulates BOLD signal change when bilinguals perform semantic judgments. NeuroImage 13 (6 Pt 1), 1155 – 1163. Chein, J.M., Ravizza, S.M., Fiez, J.A., 2003. Using neuroimaging to evaluate models of working memory and their implications for language processing. J. Neurolinguist. 16, 315 – 339. Corbetta, M., 1998. Frontoparietal cortical networks for directing attention and the eye to visual locations: identical, independent, or overlapping neural systems? Proc. Natl. Acad. Sci. U. S. A. 95 (3), 831 – 838. Corbetta, M., Shulman, G.L., 2002. Control of goal-directed and stimulusdriven attention in the brain. Nat. Rev., Neurosci. 3 (3), 201 – 215. Corbetta, M., Kincade, J.M., Ollinger, J.M., McAvoy, M.P., Shulman, G.L., 2000. Voluntary orienting is dissociated from target detection in human posterior parietal cortex. Nat. Neurosci. 3 (3), 292 – 297. Cowan, N., 1995. Sensory memory and its role in information processing. Electroencephalogr. Clin. Neurophysiol., Suppl. 44, 21 – 31. Cox, R.W., 1996. AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. Comput. Biomed. Res. 29 (3), 162 – 173. Davis, M.H., Johnsrude, I.S., 2003. Hierarchical processing in spoken language comprehension. J. Neurosci. 23 (8), 3423 – 3431. D’Esposito, M., Postle, B.R., Rypma, B., 2000. Prefrontal cortical contributions to working memory: evidence from event-related fMRI studies. Exp. Brain Res. 133 (1), 3 – 11. Dogil, G., Ackermann, H., Grodd, W., Haider, H., Kamp, H., Mayer, J., Riecker, A., Wildgruber, D., 2002. The speaking brain: a tutorial introduction to fMRI experiments in the production of speech, prosody and syntax. J. Neurolinguist. 15, 59 – 90. Eng, N., Obler, L., Harris, K., Abramson, A., 1996. Tone perception deficits in Chinese-speaking Broca’s aphasics. Aphasiology 10, 649 – 656. Friederici, A.D., Alter, K., 2004. Lateralization of auditory language functions: a dynamic dual pathway model. Brain Lang. 89 (2), 267 – 276. Gandour, J., Dardarananda, R., 1983. Identification of tonal contrasts in Thai aphasic patients. Brain Lang. 18 (1), 98 – 114.

Gandour, J., Wong, D., Hsieh, L., Weinzapfel, B., Van Lancker, D., Hutchins, G.D., 2000. A crosslinguistic PET study of tone perception. J. Cogn. Neurosci. 12 (1), 207 – 222. Gandour, J., Wong, D., Lowe, M., Dzemidzic, M., Satthamnuwong, N., Tong, Y., Li, X., 2002. A cross-linguistic FMRI study of spectral and temporal cues underlying phonological processing. J. Cogn. Neurosci. 14 (7), 1076 – 1087. Gandour, J., Dzemidzic, M., Wong, D., Lowe, M., Tong, Y., Hsieh, L., Satthamnuwong, N., Lurito, J., 2003. Temporal integration of speech prosody is shaped by language experience: an fMRI study. Brain Lang. 84 (3), 318 – 336. George, M.S., Parekh, P.I., Rosinsky, N., Ketter, T.A., Kimbrell, T.A., Heilman, K.M., Herscovitch, P., Post, R.M., 1996. Understanding emotional prosody activates right hemisphere regions. Arch. Neurol. 53 (7), 665 – 670. Giraud, A.L., Price, C.J., 2001. The constraints functional neuroimaging places on classical models of auditory word processing. J. Cogn. Neurosci. 13 (6), 754 – 765. Gruber, O., 2001. Effects of domain-specific interference on brain activation associated with verbal working memory task performance. Cereb. Cortex 11 (11), 1047 – 1055. Hackett, T.A., Stepniewska, I., Kaas, J.H., 1999. Prefrontal connections of the parabelt auditory cortex in macaque monkeys. Brain Res. 817 (1 – 2), 45 – 58. Hickok, G., Poeppel, D., 2000. Towards a functional neuroanatomy of speech perception. Trends Cogn. Sci. 4 (4), 131 – 138. Hickok, G., Buchsbaum, B., Humphries, C., Muftuler, T., 2003. Auditorymotor interaction revealed by fMRI: speech, music, and working memory in area Spt. J. Cogn. Neurosci. 15 (5), 673 – 682. Howie, J.M., 1976. Acoustical Studies of Mandarin Vowels and Tones. Cambridge University Press, New York. Hsieh, L., Gandour, J., Wong, D., Hutchins, G.D., 2001. Functional heterogeneity of inferior frontal gyrus is shaped by linguistic experience. Brain Lang. 76 (3), 227 – 252. Hughes, C.P., Chan, J.L., Su, M.S., 1983. Aprosodia in Chinese patients with right cerebral hemisphere lesions. Arch. Neurol. 40 (12), 732 – 736. Ide, A., Dolezal, C., Fernandez, M., Labbe, E., Mandujano, R., Montes, S., Segura, P., Verschae, G., Yarmuch, P., Aboitiz, F., 1999. Hemispheric differences in variability of fissural patterns in parasylvian and cingulate regions of human brains. J. Comp. Neurol. 410 (2), 235 – 242. Ivry, R., Robertson, L., 1998. The Two Sides of Perception. MIT Press, Cambridge, MA. Jacquemot, C., Pallier, C., LeBihan, D., Dehaene, S., Dupoux, E., 2003. Phonological grammar shapes the auditory cortex: a functional magnetic resonance imaging study. J. Neurosci. 23 (29), 9541 – 9546. Johnsrude, I.S., Penhune, V.B., Zatorre, R.J., 2000. Functional specificity in the right human auditory cortex for perceiving pitch direction. Brain 123 (Pt 1), 155 – 163. Jonides, J., Schumacher, E.H., Smith, E.E., Koeppe, R.A., Awh, E., Reuter-Lorenz, P.A., Marshuetz, C., Willis, C.R., 1998. The role of parietal cortex in verbal working memory. J. Neurosci. 18 (13), 5026 – 5034. Klein, D., Zatorre, R., Milner, B., Zhao, V., 2001. A cross-linguistic PET study of tone perception in Mandarin Chinese and English speakers. NeuroImage 13 (4), 646 – 653. Knight, R.T., Staines, W.R., Swick, D., Chao, L.L., 1999. Prefrontal cortex regulates inhibition and excitation in distributed neural networks. Acta Psychol. (Amst.) 101 (2 – 3), 159 – 178. Koechlin, E., Basso, G., Pietrini, P., Panzer, S., Grafman, J., 1999. The role of the anterior prefrontal cortex in human cognition. Nature 399 (6732), 148 – 151. Li, X., Gandour, J., Talavage, T., Wong, D., Dzemidzic, M., Lowe, M., Tong, Y., 2003. Selective attention to lexical tones recruits left dorsal frontoparietal network. NeuroReport 14 (17), 2263 – 2266. MacDonald III, A.W., Cohen, J.D., Stenger, V.A., Carter, C.S., 2000. Dissociating the role of the dorsolateral prefrontal and anterior cingulate cortex in cognitive control. Science 288 (5472), 1835 – 1838.

J. Gandour et al. / NeuroImage 23 (2004) 344–357 Mazoyer, P., Wicker, B., Fonlupt, P., 2002. A neural network elicited by parametric manipulation of the attention load. NeuroReport 13 (17), 2331 – 2334. Mesulam, M.M., 1981. A cortical network for directed attention and unilateral neglect. Ann. Neurol. 10 (4), 309 – 325. Meyer, M., Alter, K., Friederici, A.D., Lohmann, G., von Cramon, D.Y., 2002. fMRI reveals brain regions mediating slow prosodic modulations in spoken sentences. Hum. Brain Mapp. 17 (2), 73 – 88. Meyer, M., Alter, K., Friederici, A.D., 2003. Functional MR imaging exposes differential brain responses to syntax and prosody during auditory sentence comprehension. J. Neurolinguist. 16, 277 – 300. Moen, I., 1993. Functional lateralization of the perception of Norwegian word tones—Evidence from a dichotic listening experiment. Brain Lang. 44 (4), 400 – 413. Newman, S.D., Just, M.A., Carpenter, P.A., 2002. The synchronization of the human cortical working memory network. NeuroImage 15 (4), 810 – 822. Oldfield, R.C., 1971. The assessment and analysis of handedness: the Edinburgh inventory. Neuropsychologia 9 (1), 97 – 113. Paulesu, E., Frith, C.D., Frackowiak, R.S., 1993. The neural correlates of the verbal component of working memory. Nature 362 (6418), 342 – 345. Pell, M.D., 1998. Recognition of prosody following unilateral brain lesion: influence of functional and structural attributes of prosodic contours. Neuropsychologia 36 (8), 701 – 715. Pell, M.D., Baum, S.R., 1997. The ability to perceive and comprehend intonation in linguistic and affective contexts by brain-damaged adults. Brain Lang. 57 (1), 80 – 99. Petrides, M., Pandya, D.N., 1984. Association fiber pathways to the frontal cortex from the superior temporal region in the rhesus monkey. J. Comp. Neurol. 273, 52 – 66. Plante, E., Creusere, M., Sabin, C., 2002. Dissociating sentential prosody from sentence processing: activation interacts with task demands. NeuroImage 17 (1), 401 – 410. Poeppel, D., 2003. The analysis of speech in different temporal integration windows: cerebral lateralization as ‘asymmetric sampling in time’. Speech Commun. 41 (1), 245 – 255. Rauschecker, J.P., Tian, B., 2000. Mechanisms and streams for processing of ‘‘what’’ and ‘‘where’’ in auditory cortex. Proc. Natl. Acad. Sci. U. S. A. 97 (22), 11800 – 11806. Romanski, L.M., Bates, J.F., Goldman-Rakic, P.S., 1999a. Auditory belt and parabelt projections to the prefrontal cortex in the rhesus monkey. J. Comp. Neurol. 403 (2), 141 – 157. Romanski, L.M., Tian, B., Fritz, J., Mishkin, M., Goldman-Rakic, P.S., Rauschecker, J.P., 1999b. Dual streams of auditory afferents target multiple domains in the primate prefrontal cortex. Nat. Neurosci. 2 (12), 1131 – 1136. Schacter, D.L., Alpert, N.M., Savage, C.R., Rauch, S.L., Albert, M.S., 1996. Conscious recollection and the human hippocampal formation: evidence from positron emission tomography. Proc. Natl. Acad. Sci. U. S. A. 93 (1), 321 – 325. Schwartz, J., Tallal, P., 1980. Rate of acoustic change may underlie hemispheric specialization for speech perception. Science 207, 1380 – 1381. Scott, S., 2003. How might we conceptualize speech perception? The view from neurobiology. J. Phon. 31, 417 – 422. Scott, S.K., Johnsrude, I.S., 2003. The neuroanatomical and functional organization of speech perception. Trends Neurosci. 26 (2), 100 – 107. Scott, S.K., Wise, R., 2003. PET and fMRI studies of the neural basis of speech perception. Speech Commun. 41, 23 – 34. Scott, S.K., Blank, C.C., Rosen, S., Wise, R.J., 2000. Identification of a

357

pathway for intelligible speech in the left temporal lobe. Brain 123 (Pt 12), 2400 – 2406. Scott, S.K., Leff, A.P., Wise, R.J., 2003. Going beyond the information given: a neural system supporting semantic interpretation. NeuroImage 19 (3), 870 – 876. Shaywitz, B.A., Shaywitz, S.E., Pugh, K.R., Fulbright, R.K., Skudlarski, P., Mencl, W.E., Constable, R.T., Marchione, K.E., Fletcher, J.M., Klorman, R., et al., 2001. The functional neural architecture of components of attention in language-processing tasks. NeuroImage 13 (4), 601 – 612. Shen, X.-N., 1990. The Prosody of Mandarin Chinese. University of California Press, Berkeley, CA. Shipley-Brown, F., Dingwall, W.O., Berlin, C.I., Yeni-Komshian, G., Gordon-Salant, S., 1988. Hemispheric processing of affective and linguistic intonation contours in normal subjects. Brain Lang. 33 (1), 16 – 26. Shulman, G.L., d’Avossa, G., Tansy, A.P., Corbetta, M., 2002. Two attentional processes in the parietal lobe. Cereb. Cortex 12 (11), 1124 – 1131. Smith, E.E., Jonides, J., 1999. Storage and executive processes in the frontal lobes. Science 283, 1657 – 1661. Specht, K., Reul, J., 2003. Functional segregation of the temporal lobes into highly differentiated subsystems for auditory perception: an auditory rapid event-related fMRI-task. NeuroImage 20 (4), 1944 – 1954. Talairach, J., Tournoux, P., 1988. Co-planar Stereotaxic Atlas of the Human Brain : 3-Dimensional Proportional System: An Approach to Cerebral Imaging. Thieme Medical Publishers, New York. Van Lancker, D., 1980. Cerebral lateralization of pitch cues in the linguistic signal. Pap. Linguist. 13 (2), 201 – 277. Van Lancker, D., Fromkin, V., 1973. Hemispheric specialization for pitch and tone: evidence from Thai. J. Phon. 1, 101 – 109. Wang, Y., Jongman, A., Sereno, J., 2001. Dichotic perception of Mandarin tones by Chinese and American listeners. Brain Lang. 78, 332 – 348. Weintraub, S., Mesulam, M.M., Kramer, L., 1981. Disturbances in prosody. A right-hemisphere contribution to language. Arch. Neurol. 38 (12), 742 – 744. Wildgruber, D., Pihan, H., Ackermann, H., Erb, M., Grodd, W., 2002. Dynamic brain activation during processing of emotional intonation: influence of acoustic parameters, emotional valence, and sex. NeuroImage 15 (4), 856 – 869. Wise, R.J., Scott, S.K., Blank, S.C., Mummery, C.J., Murphy, K., Warburton, E.A., 2001. Separate neural subsystems within ‘Wernicke’s area’. Brain 124 (Pt 1), 83 – 95. Yiu, E., Fok, A., 1995. Lexical tone disruption in Cantonese aphasic speakers. Clin. Linguist. Phon. 9, 79 – 92. Yuan, J., Shih, C., Kochanski, G., 2002. Comparison of declarative and interrogative intonation in Chinese. In: Bel, B., Marlien, I. (Eds.), Proceedings of the First International Conference on Speech Prosody. Aixen-Provence, France, pp. 711 – 714 (April). Zatorre, R.J., Belin, P., 2001. Spectral and temporal processing in human auditory cortex. Cereb. Cortex 11 (10), 946 – 953. Zatorre, R., Samson, S., 1991. Role of the right temporal neocortex in retention of pitch in auditory short-term memory. Brain 114 (Pt 6), 2403 – 2417. Zatorre, R.J., Evans, A.C., Meyer, E., 1994. Neural mechanisms underlying melodic perception and memory for pitch. J. Neurosci. 14 (4), 1908 – 1919. Zatorre, R.J., Mondor, T.A., Evans, A.C., 1999. Auditory attention to space and frequency activates similar cerebral systems. NeuroImage 10 (5), 544 – 554. Zatorre, R.J., Belin, P., Penhune, V.B., 2002. Structure and function of auditory cortex: music and speech. Trends Cogn. Sci. 6 (1), 37 – 46.