Brain & Language 162 (2016) 46–59
Contents lists available at ScienceDirect
Brain & Language journal homepage: www.elsevier.com/locate/b&l
Prosodic grouping at birth Nawal Abboub a,⇑, Thierry Nazzi a,b, Judit Gervain a,b a b
Laboratoire Psychologie de la Perception, Université Paris Descartes, Sorbonne Paris Cité, Paris, France Laboratoire Psychologie de la Perception, CNRS, Paris, France
a r t i c l e
i n f o
Article history: Received 16 June 2015 Revised 26 July 2016 Accepted 6 August 2016
Keywords: Newborn infants Perceptual biases Prosodic grouping Prenatal exposure Bilingualism Near-infrared spectroscopy
a b s t r a c t Experience with spoken language starts prenatally, as hearing becomes operational during the second half of gestation. While maternal tissues filter out many aspects of speech, they readily transmit speech prosody and rhythm. These properties of the speech signal then play a central role in early language acquisition. In this study, we ask how the newborn brain uses variation in duration, pitch and intensity (the three acoustic cues that carry prosodic information in speech) to group sounds. In four near-infrared spectroscopy studies (NIRS), we demonstrate that perceptual biases governing how sound sequences are perceived and organized are present in newborns from monolingual and bilingual language backgrounds. Importantly, however, these prosodic biases are present only for acoustic patterns found in the prosody of their native languages. These findings advance our understanding of how prenatal language experience lays the foundations for language development. Ó 2016 Elsevier Inc. All rights reserved.
1. Introduction Learning about language depends critically on a complex interplay between neurobiologically constrained processing mechanisms, perceptual biases and linguistic input. At birth, infants possess many language-general abilities. They can discriminate between most speech sounds (Cheour-Luhtanen et al., 1995; Werker & Gervain, 2013); and between rhythmically different languages they never heard before (Nazzi, Bertoncini, & Mehler, 1998; Nazzi & Ramus, 2003). Moreover, they prefer speech over a variety of non-linguistic sounds (Decasper & Spence, 1986; Vouloumanos & Werker, 2007) and infant- over adult-directed speech (Fernald & Kuhl, 1987). However, as hearing is operational from the 24th to the 28th week of gestation (Hepper & Shahidullah, 1994), experience with spoken language starts in the womb, and some evidence of prenatal learning is found at birth. Indeed, newborns prefer their mother’s voice over other female voices (Decasper & Fifer, 1980), their native language over a rhythmically different unfamiliar language (Mehler et al., 1988; Moon, Panneton Cooper, & Fifer, 1993), and their communicative cries reflect the prosody of the language they heard in utero (Mampe, Friederici, Christophe, & Wermke, 2009). Moreover, it has been shown that newborns who received bilingual prenatal exposure recognize both languages as familiar and can discriminate them from a ⇑ Corresponding author at: Laboratoire Psychologie de la Perception (UMR 8242), CNRS & Université Paris Descartes, 45, rue des Saints Pères, Paris 75006, France. E-mail address:
[email protected] (N. Abboub). http://dx.doi.org/10.1016/j.bandl.2016.08.002 0093-934X/Ó 2016 Elsevier Inc. All rights reserved.
rhythmically different unfamiliar language (Byers-Heinlein, Burns, & Werker, 2010). Additionally, newborns are able to recognize stories heard during pregnancy (Decasper & Spence, 1986) or melodies to which they were exposed prenatally (DeCasper, 1994; Granier-Deferre, Bassereau, Ribeiro, Jacquet, & Decasper, 2011). Taken together, these findings constitute evidence that infants start learning about language while still in the womb, and that speech heard in utero has a more important impact on the development of speech perception and language learning than hitherto believed. Speech experienced in utero, however, is different from broadcast speech transmitted through the air. Maternal tissues act as a low-pass filter, mainly transmitting sounds below 300–400 Hz (Gerhardt et al., 1992; Querleu, Renard, Versyp, Paris-Delrue, & Crèpin, 1988). As a consequence, prosody, the global melody and rhythm of speech, is relatively well preserved and transmitted to the fetal inner ear, whereas more detailed, phonetic aspects are disrupted (Querleu et al., 1988). Importantly, prosody is a powerful cue that infants have been shown to make use of during language acquisition. For instance, newborns rely on prosody to discriminate languages (Nazzi, Bertoncini, et al., 1998; Nazzi & Ramus, 2003), to detect boundaries in speech (Christophe, Dupoux, Bertoncini, & Mehler, 1994), differences in the pitch contour or lexical stress pattern of words (Nazzi, Floccia, & Bertoncini, 1998; Sansavini, Bertoncini, & Giovanelli, 1997) or even between function words and content words (Shi, Werker, & Morgan, 1999), on the basis of their different acoustic characteristics. They also use prosody to segment words out of the continuous speech stream (Johnson &
N. Abboub et al. / Brain & Language 162 (2016) 46–59
Jusczyk, 2001; Jusczyk, Houston, & Newsome, 1999; Kooijman, Hagoort, & Cutler, 2009; Mattys, Jusczyk, Luce, & Morgan, 1999; Nazzi, Iakimova, Bertoncini, Fredonie, & Alcantara, 2006; Nishibayashi, Goyet, & Nazzi, 2015) or to learn about the syntactic features of their native language (Hirsh-Pasek et al., 1987), such as its basic word order (Gervain & Werker, 2013; Nespor et al., 2008) or argument structure (Christophe, Gout, Peperkamp, & Morgan, 2003). Thus, the variations in pitch, intensity or duration that carry prosody in the speech signal serve as robust and particularly important cues to language learning. Yet, how infants perceive these three acoustic dimensions at birth has remained largely unexplored, and whether language experience shapes the perception of these acoustic cues is currently heatedly debated. One issue at stake is the origin and developmental trajectory of the prosodic grouping bias known as the Iambic-Trochaic Law (ITL). Some authors have argued that the ITL is language-independent. Specifically, it has been claimed that the auditory system automatically groups sequences of sounds that differ in duration with the longest element in final position (i.e., prominence-final or iambic grouping), and sequences of sounds that differ in intensity or pitch with the loudest or highest element in initial position (i.e., prominenceinitial or trochaic grouping). The ITL was initially proposed to explain the grouping of musical or non-linguistic sequences (Bolton, 1894; Cooper & Meyer, 1960; Woodrow, 1951). As a well-known example, people tend to perceive the fire truck siren as a sequence of two paired sounds, the first one being higher than the second one. This grouping principle was later extended to account for regularities in speech production and biases in speech perception in adults (Bion, Benavides-varela, & Nespor, 2011; Hay & Diehl, 2007; Hayes, 1995; Nespor et al., 2008). The proposal that the ITL is language-general is supported by studies showing that adult speakers of prosodically and rhythmically different languages such as English and French show similar grouping preferences (Hay & Diehl, 2007). Moreover, trochaic grouping on the basis of a pitch contrast was found in Italian adults, in Italian and French infants, whose native language makes little use of pitch cues in its prosody (Abboub, Boll-Avetisyan, Bhatara, Hoehle, & Nazzi, 2016; Bion et al., 2011), as well as in rats (de la Mora, Nespor, & Toro, 2013), suggesting not only that prosodic grouping preferences might exist in the absence of language experience, but also that they might be shared by humans and other mammals. However, a recent alternative hypothesis has emerged, according to which prosodic grouping biases might, at least in part, be influenced by language experience. Supporting this view, recent cross-linguistic research has shown that although English and Japanese adults group sequences varying in intensity trochaically, only English, but not Japanese, adults group sequences varying in duration iambically (Iversen, Patel, & Ohgushi, 2008). The two languages differ at the phrasal level, since Japanese has a trochaic rhythm (^Tokyo ni, Tokyo to, ‘to Tokyo’, with prosodic prominence marked by higher pitch on the content word ‘Tokyo’ in initial position; Gervain & Werker, 2013), whereas English has an iambic rhythm (to Ro:me, with prosodic prominence marked by lengthened duration on the content word ‘Rome’ in final position). Relatedly, while both German and French adults follow the ITL when presented with complex linguistic stimuli varying in intensity or duration, they nevertheless exhibit language-specific differences, German adults showing stronger ITL effects; moreover, effects based on pitch were found for German but not French adults (Bhatara, Boll-Avetisyan, Unger, Nazzi, & Höhle, 2013). Similar findings were found using complex non-linguistic stimuli (Bhatara, Boll-Avetisyan, Agus, Höhle, & Nazzi, 2015). The authors argue that these cross-linguistic differences reflect the fact that German has a predominantly trochaic word-level stress pattern, while French does not. Additionally, French is iambic at the phrasal
47
level, whereas German can have both rhythmic patterns. In infants, Japanese- and English-learning 7–8-month-olds (Yoshida et al., 2010) revealed a pattern of results similar to the one found in adults (Iversen et al., 2008) and bilingual Spanish and Basque 9– 10-month-olds (Molnar, Lallier, & Carreiras, 2014) also showed consistent grouping for intensity, but not for duration. These early cross-linguistic differences were found essentially for duration, suggesting first, that the language environment might influence grouping preferences early on, and second, that the three acoustics cues are not affected in the same way by this cross linguistic modulation. This raises the question of how and when during development language experience starts modulating perceptual grouping biases. Contributing to this debate, the current study will explore whether newborns already possess general perceptual mechanisms to group sounds according to prosodic cues, and whether these abilities are already modulated by the native language(s) heard in utero. If such perceptual biases are present early in development, they have the potential to help infants break into language. A related point concerns the cerebral basis of prosodic grouping, which remains, to a large extent, unexplored. In adults, language comprehension, including morphosyntactic and semantic processing, is predominantly lateralized to the left hemisphere, while prosodic processing typically recruits a more dynamic network in the right hemisphere (Friederici, 2012; Hickok & Poeppel, 2007), although the lateralization of prosodic processing also depends on the functional relevance of prosody in the language studied and on the context. Left dominance may be observed if the prosodic cue used is lexically or morphosyntactically relevant such as lexical tone in adults who speak a tonal language (Gandour et al., 2004; Kreitewolf, Friederici, & von Kriegstein, 2014; Sato, Sogabe, & Mazuka, 2007, 2010). In infants, few studies have investigated the neural basis of prosodic processing in general, and none have specifically looked at how prosodic grouping is processed across these three acoustic dimensions. The few existing optical imaging studies investigating prosodic processing in general reported that sleeping neonates and 3-month-olds showed a right hemispheric specialization (Homae, Watanabe, Nakano, & Taga, 2007; Sato et al., 2010; Telkemeyer et al., 2009), as do 4-year-olds (Wartenburger et al., 2007). Nevertheless in these studies, sentential prosody was tested in its full acoustic complexity. Thus it is still unclear how and where in the brain prosodic cues in isolation (i.e. variations in duration only, intensity only or pitch only) are processed and grouped. More specifically, we do not know whether, and if yes, how grouping on the basis of a single acoustic cue is perceived and processed in the developing brain. The current study therefore sought to answer two questions. First, we explored the earliest foundations of the crucial ability to detect and process prosodic patterns. In particular, no study has as yet tested newborns’ prosodic grouping biases and their neural correlates, a gap that the present study intends to fill. Accordingly, we tested prosodic patterns that vary along one of the three acoustic dimensions characterizing speech prosody: duration, intensity, and pitch. To do so, we used near-infrared spectroscopy (NIRS), an optical imaging technique ideally suited to test the youngest developmental populations (high motion tolerance, easy application, no carrier substance or magnetic field, no noise, etc.). NIRS has the advantage of providing good spatial localization, allowing us to identify the brain areas responsible for prosodic grouping. This technique has been widely used to explore the neural correlates of speech perception and language acquisition in newborns and young infants (Gervain, Berent, & Werker, 2012; Gervain, Macagno, Cogoi, Peña, & Mehler, 2008; Gomez et al., 2014; May, Byers-Heinlein, Gervain, & Werker, 2011; Peña et al., 2003; Telkemeyer et al., 2009). French, the language that our monolin-
48
N. Abboub et al. / Brain & Language 162 (2016) 46–59
gual participants were exposed to prenatally, uses mainly durational contrasts in its prosody, in particular final lengthening (Dell, Hirst, & Vergnaud, 1984; Nespor et al., 2008). Second, we investigated how these abilities are modulated by language exposure, comparing prosodic grouping biases for the pitch condition in monolingual French-exposed newborns and newborns exposed to French and another language making more systematic use of pitch in its prosody. To address these two questions, we conducted four NIRS experiments. First, we explored the origins of prosodic grouping in French-exposed monolingual newborns in each of the three relevant acoustic dimensions (duration: Exp 1; intensity: Exp 2; pitch: Exp 3). Second, we tested French-other language bilingual newborns in the pitch contrast condition (Exp 4), the dimension for which even non-linguistic animals show a grouping preference. In all four experiments, newborns listened to sequences of pure tone pairs pertaining to three conditions: a consistent grouping condition in which the pairs were consistent with the grouping predicted by the universal bias and/or by native language prosody (short-long, strong-weak and high-low), an inconsistent grouping condition which presented the opposite grouping pattern (longshort, weak-strong, low-high) and a no-contrast condition, whereby tones in the pair were identical (equal duration, intensity or pitch). We predicted that if infants use variation in a given prosodic dimension to group sounds, differences in response amplitudes or localization should be found for the consistent and inconsistent conditions. Since no previous study has tested the cerebral basis of the prosodic grouping bias in any developmental population, we had no clear predictions regarding the localization of the response. Sentential prosody has been found to be processed in the right hemisphere early on in development (Homae et al., 2007; Sato et al., 2010; Telkemeyer et al., 2009). However, these studies all compared fully complex sentential prosody to a flattened, nonprosodic condition, whereas in our study, the two critical conditions (consistent and inconsistent grouping) both contain prosodic information. The crucial difference is in the well-formedness and sequential ordering of this information, i.e. in structural and sequential ordering aspects, which are typically processed in the
left hemisphere (Dehaene-Lambertz, Dehaene, & Hertz-Pannier, 2002; Gervain et al., 2012; Gervain, Macagno, et al., 2008). Accordingly, we might either find an involvement of both hemispheres, or a left hemispheric advantage. Given the importance of duration variation in French, Experiment 1 presented newborns from monolingual French families with duration variations. 2. Experiment 1 2.1. Methods 2.1.1. Participants Eighteen healthy, full-term newborns (13 females; mean age: 2.05 days; range: 1–4 days; Apgar score P 8) born to Frenchspeaking families contributed data to the final analyses. An additional 7 newborns were tested but were excluded from data analysis due to the infant becoming awake or fussy during the experiment (3), having thick hair (2) or failing to complete the procedure (2). All parents gave informed consent before the experiment. The present experiment was approved by The Conseil d’évaluation éthique pour les recherches en santé (CERES) ethics board (Université Paris Descartes). 2.1.2. Material and design We used eight different pure tone pairs, whereby tones within a pair contrasted in duration (250 ms vs. 500 ms), but were identical in their intensity (70 dB) and pitch [any one of the eight notes of the Western octave, e.g. a pair could consist of two A (220 Hz) notes, two B (247 Hz) notes, two C (262 Hz) notes, two D (267 Hz) notes two E (294 Hz) notes two F (330 Hz) notes, two G (349 Hz) notes and two AA (440 Hz) notes]. The tone pair sequences were presented in three conditions (see Fig. 1a). In the consistent condition, the pairs were grouped such that they followed the grouping pattern found in French and/or predicted by the ITL (short-long). In the inconsistent condition, they followed the non-predicted grouping pattern (long-short). In the no-contrast condition, they did not provide any grouping cues (equal duration).
Fig. 1. Experimental design of the procedure used in Experiments 1, 2, 3 and 4. Each rectangle represents a block consisting in a series of six groups of a certain type of grouping, separated by short pauses. (a) Duration condition (Exp.1); (b) Intensity condition (Exp.2); (c) Pitch condition (Exp.3 and 4).
N. Abboub et al. / Brain & Language 162 (2016) 46–59
The experiment consisted of 8 blocks of each of the three conditions (consistent, inconsistent, no contrast), each comprising the presentation of 6 tone pairs, for a total of 24 blocks. Blocks were spaced by silent time intervals of varying duration (26– 32 s) to avoid inducing phase-locked brain responses and lasted 8–12 s. Within blocks, tone pairs were also separated by pauses (480 ms), yielding blocks of about 8–12 s. The 24 blocks were presented in an interleaved fashion in such a way as to disallow more than two consecutive blocks of the same condition. The order of the blocks was randomized and counterbalanced across subjects. 2.1.3. Procedure Infants were tested with a NIRx NIRScout 816 machine (sourcedetector separation: 3 cm; two wavelengths of 760 nm and 850 nm; sampling rate: approximately 10 Hz) at the maternity ward of the Robert Debré Hospital, Paris, France. The testing session lasted about 16–17 min. The optical probe cap was placed
49
on newborns’ heads targeting the fronto-temporo-parietal auditory areas. The optical sensors were inserted into a stretchy EEG cap and were placed bilaterally on the infants’ head using surface anatomical landmarks (inion, nasion, vertex and the bilateral preauricular points; see Fig. 2). We approximated the location of the cortical regions underlying our NIRS channels following the procedure described in Lloyd-Fox et al. (2014), using of age-appropriate, i.e. newborn, structural MRIs and stereotaxic atlases from Shi et al. (2011) (Fig. 2). This localization analysis suggests that on average channels 1, 2, 4 and 5 in the LH and channels 13, 14, 15 and 16 in the RH are positioned over the inferior-middle frontal area, Channels 3, 6, 8 and 11 and 17, 19, 22 and 24 were mostly positioned over the temporal lobe including the superior and middle temporal gyrus, while channels 7, 9, 10 and 12, and channels 18, 20, 21 and 23 were most often located over the central gyrus, supramarginal gyrus and the angular gyrus (see Fig. 3). Given the high variability of newborns’
Fig. 2. (A) Picture of a neonate with the cap placed upon the head (right view), (B), A scalp surface from MRI neonate templates (Shi et al., 2011) with the probe set. The channels in the probe set are projected down from the scalp surface to the cortical surface. Red circles indicate sources, while write circles indicate detectors.
Fig. 3. Configuration of probe sets overlaid on a schematic newborn brain. For each fNIRS channel located within this probe set, the identity of the underlying brain area (using the LPBA40 atlas) is illustrated according to their localization. The blue channels indicate the position of the probe over the frontal area, the orange channels over the parietal area and the purple over the temporal area on the infant head. Grey circles indicate sources, while black circles indicate detectors.
50
N. Abboub et al. / Brain & Language 162 (2016) 46–59
head shapes and the lack of precise information about the underlying brain anatomy for the specific infants we tested, these localizations are approximately correct at the group level only. Testing was done with the infants lying in their cribs in a quiet room, in a state of sleep with at least one parent present throughout the study. Sound stimuli were administered through two loudspeakers positioned at a distance of 1 m from the infant’s head, at an angle of 30°, elevated to the same height as the crib. A Dell OptiPlex PC380 computer played the stimuli thought E-prime and an HP laptop computer recorded the NIRS signal. The NIRS machine used pulsated LED emitters. 2.1.4. Data processing and analysis The NIRS machine measured the scattering and absorption of near-infrared light, from which the changes in the concentration of oxygenated hemoglobin (oxyHb) and deoxygenated hemoglobin (deoxyHb) were calculated as indicators of neural activity. OxyHb and deoxyHb concentrations were used in the data analysis. To eliminate high-frequency noises (e.g., heartbeat) and overall trends, the data were band pass-filtered between 0.01 and 0.7 Hz. Movement artifacts, defined as concentration changes 0.1 mmol mm over 0.2 s, were removed by rejecting blockchannel pairs in which artifacts occurred. For the non-rejected blocks, a baseline was linearly fitted between the means of the 5 s preceding the onset of the block and the 5 s starting 15 s after offset of the block. Newborns were included in the analysis only if the amount of data rejected was less than 30%. All statistical analyses were carried out over oxyHb and deoxyHb. We conducted two types of analyses: a cluster-based permutation analysis and more classical mean activation analyses, including channel-by-channel t-tests and ANOVAs. In order to compare the signal changes across the consistent and inconsistent conditions, we first carried out a cluster-based non-parametric permutation test (Cohen, 2014; Maris & Oostenveld, 2007) for each experiment. This analysis has two main advantages over more traditional channel-by-channel t-tests. First, it does not pose the problem of multiple comparisons, which typically plagues t-tests, since conducting uncorrected t-tests increases the chance of committing a type I error, while conducting corrected t-tests often reduces statistical power quite dramatically. Permutation tests avoid this problem altogether. Second, and quite importantly for the subsequent mean activation analyses, clusterbased permutation tests allow us to derive regions of interest (ROIs) from the data, which is very useful in the current situation, as we have no previous studies to rely on for choosing relevant ROIs. While data-driven, this analysis may be constrained anatomically and/or functionally by declaring channels as neighbors only if they are in the same anatomical or functional region. So the cluster-based permutation analysis offers a data-driven, but anatomically informed way to define ROIs. Given these advantages, several previous fNIRS studies used permutation tests in addition to t-tests (Edwards, Wagner, Simon, & Hyde, 2015; Ferry et al., 2016; Mahmoudzadeh et al., 2013; Vannasing et al., 2016). To perform the cluster-based permutation test, we first ran paired-sample t-tests between the consistent and inconsistent conditions for each pair of samples within the time course of the hemodynamic response in each channel. Then all temporally and spatially adjacent pairs with a t-value greater than a standard threshold (we used t = 2) were grouped together into cluster candidates. Two pairs of samples were considered temporally adjacent if they were consecutive and spatially adjacent if they were in the same anatomical area, as defined by our localization analysis (Fig. 3), and if they were at a distance of 3 cm from one another. We calculated cluster-level statistics for each cluster candidate by summing the t-values from the t-tests for every data point included in the cluster candidate. We then identified the cluster
candidate with the largest t-value for each hemisphere. Then a permutation analysis evaluated whether this cluster level statistic was significantly different from chance. This was done by randomly labeling the data as belonging to one or the other experimental condition. The same t-test statistic as before was computed for each random assignment, which allowed us to obtain its empirical distribution under the null hypothesis of no difference between conditions. Clusters were then formed as before. The proportion of random partitions that produce a cluster-level statistic greater than the actually observed one provides the p value of the test. In all, 500 permutations under the null hypothesis were conducted on the time series for consistent and inconsistent for the left and right hemisphere. As a confirmation of the above novel analysis method and to ensure comparability with other studies, we also conducted traditional parametric tests to compare the average activation for the consistent and inconsistent conditions. For these analyses, data were averaged in each block over a 27 s time window starting from the onset of the block (12 s stimulation + 15 s return to baseline). We conducted channel-by-channel paired sample t-tests. Two types of t-test were run: first, signal changes under each condition (consistent, inconsistent and no-contrast) were tested against a zero baseline. Second, we directly compared responses to the inconsistent condition against the consistent condition. Moreover, we conducted within-subject ANOVAs with factors Grouping (consistent/inconsistent), Hemisphere (LH/RH) and Region of Interest for each experiment. The region of interest (ROI) was chosen to be the clusters that emerged from the permutation tests.
2.2. Results and discussion Fig. 4 presents the grand average results for oxyHb and deoxyHb concentration changes, averaged across all blocks for the three prosodic patterns (consistent, inconsistent and no contrast). The permutation test revealed a greater hemodynamic response to the inconsistent condition than to the consistent condition in both hemispheres mostly in the temporal-parietal areas for oxyHb: a cluster in the LH including channels 9, 10 and 12 (tcluster1 = 803; pcluster1 < 0.0001) and a cluster in the RH including channels 17 and 19 (tcluster2 = 625; pcluster2 < 0.0001) (see Fig. 5, Table 3 for statistics). No effect was found for deoxyHb. Channel-by-channel t-tests (Table 1 for oxyHb and Table 2 for deoxyHb) comparing the response in the consistent and inconsistent conditions for oxyHb showed significantly greater activation to the inconsistent grouping in three channels (CH 12, 17 (p < 0.01, d12 = 0.78, d17 = 0.81) and CH 10 (p < 0.05, d = 0.79)). These results thus converge with the permutation test results, and suggest that newborns are able to discriminate between the two different groupings. We also performed a repeated measures analysis of variance (ANOVA) with Grouping (Consistent vs. Inconsistent), Hemisphere (LH vs. RH) and ROI (ROI 1/ROI 2) as within-subject factors using oxyHb concentrations as the dependent measure. ROI 1 included the channels in Cluster 1 identified above and their LH equivalents, while ROI 2 corresponded to Cluster 2 above and its RH equivalent (see Fig. 4). These ROIs were thus chosen on the basis of the permutation test results. As a confirmation of the previous two analyses, the ANOVA showed a significant Grouping effect [F(1, 17) = 10.91, p = 0.004], with greater activation in the inconsistent condition [minconsistent = 0.027 mmol mm vs. mconsistent = 0.003 mmol mm]. A triple interaction was also found between Grouping, Hemisphere and ROI [F(1, 17) = 5.24, p = 0.035]. No other effect was significant. Post-hoc Bonferonni comparisons revealed that the differential activation between inconsistent and consistent sequences was stronger in ROI 1 in the LH [F(1, 17) = 10.63,
N. Abboub et al. / Brain & Language 162 (2016) 46–59
51
Fig. 4. Grand average results of Experiment 1, monolingual newborns: Duration. Channels are plotted following the same placement as in Fig. 3. The x-axis represents time in seconds; the y-axis shows concentration in mmolmm. The rectangle along the x-axis indicates time of stimulation. The continuous red and blue lines in the graphs represent oxyHb and deoxyHb concentrations, respectively, in response to the inconsistent sequences. The dashed red and blue lines represent oxyHb and deoxyHb concentrations, respectively, in response to the consistent sequence. The dotted red and blue lines represent oxyHb and deoxyHb concentrations, respectively, in response to the no-contrast sequence. T-test for inconsistent versus consistent sequence: **: p < 0.01; *: p < 0.05, uncorrected. The continuous black line in the graphs represent the significant cluster defined by ROI 1 (CH 9, 10 and 12) and the ROI 2 (CH 17 and 19), respectively and in the dashed black line represent their counterpart.
Fig. 5. Statistical maps indicating cluster p-values (color coded) comparing the responses to consistent vs. inconsistent sequences in each experiment.
52
N. Abboub et al. / Brain & Language 162 (2016) 46–59
Table 1 Statistical results of significant t-tests under each condition: consistent (C), inconsistent (IC) and no-contrast (N) against a zero baseline (O) and between condition: consistent and inconsistent, in Experiments 1–4 for oxyHb [uncorrected p values: (punc), p values corrected using the False Discovery Rate (FDR) correction (Benjamini & Hochberg, 1995): (pFDR), CH: Channels]. Exp. 1 Duration
Exp. 2 Intensity
IC vs C CH 3 4 5 9 10 12 17 24
punc
0.011 0.004 0.010
IC vs O pFDR
0.089 0.089 0.089
IC vs C CH
punc
pFDR
punc
pFDR
0.012 0.040 0.044 0.034 0.023 0.037 0.003 0.030
0.097 0.118 0.118 0.118 0.118 0.118 0.073 0.118
3 4 5 7 9 16
0.024
0.588
0.023 0.026
0.316 0.316
punc
IC vs O pFDR
N vs O
punc
pFDR
0.020 0.045
0.201 0.268
0.025
0.201
IC vs C
punc
0.037 0.019 0.048 0.035
C vs O
punc
pFDR
0.011
0.261
punc
pFDR
0.032 0.006 0.020
0.254 0.138 0.238
Exp. 4 Pitch bilingual
C vs O
1 2 4 8 13 19 22
N vs O
pFDR
Exp. 3 Pitch
CH
IC vs O
punc
0.383 0.383 0.383
0.005
IC vs O
N vs O
pFDR
CH
punc
pFDR
punc
pFDR
0.020 0.029
0.231 0.231
0.601
7 10 12 17 21
0.021 0.006 0.041
0.258 0.148 0.325
0.02
0.231
punc
pFDR
0.041
0.893
0.116
Table 2 Statistical results of significant t-tests under each condition: consistent (C), inconsistent (IC) and no-contrast (N) against a zero baseline (O) and between condition: consistent and inconsistent, in Experiments 1–4 for deoxyHb [uncorrected p values: (punc), p values corrected using the False Discovery Rate (FDR) correction (Benjamini and Hochberg, 1995): (pFDR), CH: Channels]. Exp. 1 Duration
Exp. 2 Intensity
N vs O
IC vs C
IC vs O
CH
punc
pFDR
CH
punc
pFDR
3
0.013
0.308
1 6 8 24
0.016
0.381
Exp. 3 Pitch
pFDR
0.043
0.628
N vs O
punc
pFDR
0.018
0.441
punc
pFDR
0.027
0.642
Exp. 4 Pitch bilingual
N vs O
C vs O
CH
punc
pFDR
9 16 17 18 19 20 21
0.023
0.178
0.038 0.045 0.006 0.043 0.042
C vs O
punc
0.178 0.178 0.143 0.178 0.178
IC vs C
punc
pFDR
0.036
0.587
CH 2 3 4 5 6 11 13 15 17 18 19 20 23 24
punc
IC vs O pFDR
0.006
0.151
0.020
0.241
p = 0.005], but in ROI 2 in the RH [F(1, 17) = 10.91, p = 0.027]. No significant effects or interactions were found for deoxyHb. These results show that the newborn brain clearly distinguishes between different prosodic grouping patterns based on durational variation. In particular, we find greater activation in response to the trochaic long-short pattern, which is inconsistent with the iambic grouping bias for durational contrasts predicted by the ITL, and opposite of the prosodic grouping found in French (phrase-final lengthening). We interpret this increased activation as a sign of
C vs O
N vs O
punc
pFDR
punc
pFDR
0.015 0.026 0.032
0.071 0.077 0.077
0.032
0.274
0.039
0.274
0.015 0.007
0.071 0.071 0.021
0.274
0.030
0.077
0.003 0.028 0.019 0.010
0.062 0.077 0.076 0.071
punc
pFDR
0.026
0.462
0.039
0.462
greater processing effort associated with the perceptually dis-preferred and unfamiliar grouping (Ding, Fu, & Lee, 2014). These findings thus clearly establish the presence of prosodic grouping preferences at birth. However, because duration is an acoustic correlate that plays a major role in prosodic prominence in French, the present results alone do not allow us to distinguish between a general auditory bias and the effects of prenatal experience with the native language. Therefore, in Experiment 2, we tested whether newborns could discriminate between
N. Abboub et al. / Brain & Language 162 (2016) 46–59
ITL-consistent and ITL-inconsistent grouping patterns using an intensity contrast, a cue that plays a minor role in French prosody. 3. Experiment 2 3.1. Methods 3.1.1. Participants A new group of eighteen healthy, full term newborns (11 females; mean age: 2.28 days; range: 1–4 days; Apgar score P 8) born to French-speaking families contributed data to the final analyses. An additional 8 newborns were tested, but were excluded from data analysis due to the infant becoming awake or fussy. All parents gave informed consent before the experiment. The present experiment was approved by The Conseil d’évaluation éthique pour les recherches en santé (CERES) ethics board (Université Paris Descartes). 3.1.2. Material and design We used eight different pure tone pairs, whereby tones within a pair contrasted in intensity (70 vs. 76 dB), but were identical in their duration (either both 250 ms or both 500 ms) and pitch (e.g. both A, both B, both C, both D, both E, both F, both G and both AA). The tone pair sequences were presented in three conditions (see Fig. 1b). In the consistent condition, the pairs were grouped such that they followed the grouping pattern predicted by the ITL (strong-weak). In the inconsistent condition, they followed the non-predicted grouping pattern (weak-strong). In the nocontrast condition, no grouping cue was present (equal intensity). The design of Experiment 2 was similar to the one used in Experiment 1. 3.1.3. Procedure, data processing and analysis The procedure, data processing and analysis were identical to Experiment 1.
53
3.2. Results and discussion Fig. 6 presents the grand average results for oxyHb and deoxyHb concentration changes, averaged across all blocks for the consistent, inconsistent and no contrast prosodic patterns. The permutation analysis revealed no significant clusters. Channel-by-channel comparisons between the oxyHb responses to the consistent and inconsistent conditions revealed a significantly greater response to the inconsistent condition in CH 3 (p < 0.05, d = 0.58; Table 1 for oxyHb and Table 2 for deoxyHb). We again performed an ANOVA with Grouping, ROI and Hemisphere as within-subject factors using oxyHb concentrations as the dependent measure, similarly to Experiment 1. Since neither the permutation test nor the t-tests revealed any significant clusters or groups of channels that could be defined as the ROIs specific to the experiment, we used the ROIs of Exp. 1. This is not to argue that the same brain regions respond to duration- and intensitybased groupings. However, since no ROIs emerged from the data, the least arbitrary option was to rely on the ROIs of Experiment 1. As a confirmation of the permutation and t-tests, the ANOVA showed no significant Grouping effect, only the interaction between Grouping and ROI was found to be significant [F(1, 17) = 5.11, p = 0.037]. Post-hoc Bonferonni comparisons revealed a greater differential activation to the inconsistent over the consistent grouping in ROI 1 than in ROI 2 (mROI1 = 0.016 mmol mm vs. mROI2 = 0.006 mmol mm). No effect was found for deoxyHb. The present results thus reveal much weaker activation to the grouping patterns than those found in Experiment 1 for duration. We argue that this could be due to the relatively minor role of intensity as a cue in French prosody. We also note that in the one channel in which differential activation was found between the conditions, the inconsistent grouping gave rise to greater activation than the consistent one, similarly to Experiment 1, and confirming our interpretation that this pattern of response might be attributed to greater processing effort.
Fig. 6. Grand of average results of Experiment 2, monolingual newborns: Intensity. All graphical conventions are the same as those in Fig. 2
54
N. Abboub et al. / Brain & Language 162 (2016) 46–59
Since recent proposals have extended the ITL to the pitch dimension (Abboub et al., 2016; Bhatara et al., 2015, 2013; Bion et al., 2011; de la Mora et al., 2013), Experiment 3 explored whether the newborn brain processes ITL-consistent and inconsistent grouping patterns when pitch is varied, pitch being, like intensity, a less important cue in French prosody than duration.
design of Experiment 3 was similar to the one used in Experiments 1 and 2.
4. Experiment 3
4.2. Results and discussion
4.1. Methods
Fig. 7 presents the grand average results for oxyHb and deoxyHb concentration changes, averaged across all blocks for the consistent, inconsistent and no contrast prosodic patterns. The permutation analysis revealed no significant clusters. The channel-by-channel t-tests comparing the consistent and inconsistent conditions yielded no significant results either (Table 1 for oxyHb and Table 2 for deoxyHb). We again performed an ANOVA with Grouping, Hemisphere and ROI as within-subject factors using oxyHb concentrations changes as the dependent measure, similarly to Experiment 1. As a confirmation of the previous analyses, the ANOVA showed no significant effects or interactions for oxyHb and deoxyHb. Overall, as for the intensity cue in Experiment 2, no differences between the grouping patterns were found. Again, this might be due to the lesser role of pitch in French prosody as compared to duration, and can be related to French-speaking adults’ difficulty at using pitch to group sounds according to the ITL (Bhatara et al., 2013). Taken together, Experiments 1–3 suggests that infants already show prosodic grouping preferences as predicted by the ITL at birth. This preference is very strong for duration, the cue most important in the language our newborns were exposed to prenatally (French), possibly indicating an effect of prenatal language experience. Little to no effects were found for the two other cues, suggesting that automatic perceptual biases for prosodic
4.1.1. Participants Eighteen healthy, full term newborns (12 females; mean age: 2.86 days; range: 2–5 days; Apgar score P8) born to Frenchspeaking families contributed data to the final analyses. An additional 18 infants were tested, but were excluded from data analysis due to the infant becoming awake or fussy (9), having thick hair (4), failing to complete the procedure (2) or due to equipment failure (3). All parents gave informed consent before the experiment. The present experiment was approved by The Conseil d’évaluation éthique pour les recherches en santé (CERES) ethics board (Université Paris Descartes).
4.1.2. Material and design We used eight different pure tone pairs, whereby tones within a pair contrasted in pitch (they were an octave apart) but were identical in their duration (both either 250 ms or both 500 ms) and intensity (both 70 dB). The tone pair sequences were presented in three conditions (see Fig. 1c). In the consistent condition, the pairs were grouped such that they followed the grouping pattern predicted by the ITL (high-low). In the inconsistent condition, they followed the non-predicted grouping pattern (low-high). In the nocontrast condition, no grouping cue was present (equal pitch). The
4.1.3. Procedure, data processing and analysis The procedure, data processing and analysis were identical to Experiments 1 and 2.
Fig. 7. Grand of average results of Experiment 3, monolingual newborns: Pitch. All graphical conventions are the same as those in Fig. 2.
N. Abboub et al. / Brain & Language 162 (2016) 46–59
grouping might be strongly modulated by language exposure. Importantly, across all three experiments, greater activation, whenever present, was always found for the grouping that was inconsistent with the universal biases and/or native prosodic patterns, confirming the assumption that the processing of these patterns is more effortful than the processing of well-formed or familiar patterns. To strengthen our interpretation that newborns’ responses in the previous experiments reflect the modulatory effect of prenatal language experience, we obtained data from a group of newborns exposed to a different linguistic environment. Accordingly, in Experiment 4, we tested whether newborns, exposed to French and another language, processed the ITL-consistent and inconsistent sequences differently. We chose to test them on the pitch contrast, since the second languages of our bilingual participants relied strongly on this cue in their prosody. If prenatal exposure does indeed have a modulatory effect, the bilingual group should show a stronger bias for pitch than the French-exposed monolinguals in Experiment 3. 5. Experiment 4 5.1. Methods 5.1.1. Participants Eighteen healthy, full term newborns (12 females; mean age: 2.17 days; range: 1–4 days; Apgar score P8) born to bilingual families speaking French and one other language contributed data to the final analyses. An additional 24 infants were tested, but were excluded from data analysis due to the infant becoming awake or fussy (13), having thick hair (8), or due to equipment failure (3). All parents gave informed consent before the experiment. The present experiment was approved by The Conseil d’évaluation éthique pour les recherches en santé (CERES) ethics board (Université Paris Descartes). The second language spoken by the mothers of the bilingual infants were: Arabic (8), Chinese (3), English (1), Bulgarian (1), Kabyle (1), Koyaka (1), Amerindian (1), Turkish (1), and Polish (1). The mothers of the bilingual infants reported speaking each language 40–60% of the time with 47% for the mean of French exposure. All these languages make use of pitch in their prosody to a much greater extent than French does, although in different ways (for lexical stress, lexical tone, phrasal prominence, see details below). These newborns were thus exposed in utero to a qualitatively different language input than those in Experiments 1–3: in addition to French, they also heard a language with a different prosody. 5.1.2. Material and design The material was identical to the one used in Experiment 3. 5.1.3. Procedure, data processing and analysis The procedure, data processing and analysis were identical to Experiment 3. Additionally, to directly test for the influence of prenatal language exposure, and since the stimuli and procedure were identical in both experiments, we compared the results of Experiment 4 to those of Experiment 3. Importantly, since NIRS data cannot be directly compared for between-subject factors, as the differential path length factor used in the Beer-Lambert law to calculate Hb concentrations is unknown and varies across individuals, we transformed signal changes in oxyHb to Z-scores using the Fisher Z transformation (e.g. Matsuda & Hiraki, 2006; Otsuka et al., 2007). The Z scores were averaged across participants for each condition in both experiments. Using these Z scores as the dependent variable, we performed a repeated measures ANOVA
55
with Grouping (Consistent vs. Inconsistent) and ROI as withinsubject factors and Prenatal Exposure (Monolingual – Experiment 3 vs. Bilingual – Experiment 4) as a between-subject factor. Since there was no significant grouping effect found in Experiment 3, the ROIs were chosen on the basis of the permutation test results for Experiment 4. 5.2. Results and discussion Fig. 8 presents the grand average results for oxyHb and deoxyHb concentration changes, averaged across all blocks for the consistent, inconsistent and no contrast prosodic patterns. The permutation analysis for oxyHb revealed a greater hemodynamic response for the inconsistent condition than for the consistent condition in one cluster in the LH, comprising channels 7, 9 and 10 (tcluster1 = 1040; pcluster1 < 0.0001) and in another cluster in the RH, including channels 19 and 21 (tcluster2 = 435; pcluster2 < 0.0001) (Fig. 5, see Table 4 for statistics). For deoxyHb, no significant clusters could be identified. The t-tests (Table 1 for oxyHb and Table 2 for deoxyHb) comparing the consistent and inconsistent conditions showed significantly greater responses to the inconsistent condition with oxyHb in channels, CH 7, 10, 21 (p < 0.05, d7 = 0.65; d10 = 0.60; d21 = 0.60). We again performed an ANOVA with Grouping, Hemisphere and ROI as a within-subject factor using oxyHb concentration changes as the dependent measure. The ROIs were chosen on the basis of the results of the permutation test above (ROI 1: CH 7, 9 and 10; ROI 2: CH 19 and 21 and their contralateral equivalents ROI 1: CH 18, 20 and 21; ROI 2: CH 6 and 9, see Fig. 8). These ROIs are not exactly identical to, but largely overlapping with those in Experiment 1. As a confirmation of the previous analyses, the ANOVA revealed a significant grouping effect [F(1, 17) = 5.79, p = 0.028] due to greater activation in the inconsistent condition [minconsistent = 0.027 mmol mm vs. mconsistent = 0.005 mmol mm]. We also found an effect of Hemisphere [F (1, 17) = 3.27, p = 0.088], due to a greater activation in the LH than the RH [mLH = 0.019 mmol mm vs. mRH = 0.002 mmol mm]. The analysis also revealed a significant Grouping Hemisphere ROI interaction [F(1, 17) = 6.54, p = 0.020]. Post-hoc Bonferroni comparisons showed that the differential activation in favor of the inconsistent grouping was stronger in ROI 2 in the LH [F(1, 17) = 8.93, p = 0.008], but in ROI 1 in the RH [F(1, 17) = 6.13, p = 0.024]. No other effect was significant. No effect was found for deoxyHb. The present results strongly resemble those found in Experiment 1 for the duration contrast in French-exposed monolinguals, suggesting that the bilingual infants tested here show a stable grouping bias for pitch, in particular in the LH. Note that the region in which the effect was observed was not exactly identical, but highly similar to that observed in Experiment 1, with two channels in the LH (CH 9 and 10) and one in the RH (CH 19) overlapping between the two experiments. The effect may thus be localized to the superior temporal/temporo-parietal regions in both experiments. We attribute the slight difference between the two ROIs to the fact that capping is based on surface landmarks, rendering channel locations approximate. To directly test for the effects of prenatal exposure, we performed an ANOVA with Grouping (Consistent vs. Inconsistent) and ROI (ROI 1 vs. ROI 2, as above) as within-subject factors and Prenatal Exposure (Monolingual – Experiment 3 vs. Bilingual – Experiment 4) as a between-subject factor, using the z scores of the oxyHb concentration changes as the dependent measure. This ANOVA showed a significant Grouping Prenatal Exposure interaction [F(1, 34) = 3.26; p = 0.08], with a Bonferroni post hoc test yielding greater differential activation in bilinguals (Experiment 4) than in monolinguals
56
N. Abboub et al. / Brain & Language 162 (2016) 46–59
Fig. 8. Grand of average results of Experiment 4, bilingual newborns: Pitch. All graphical conventions are the same as those in Fig. 2, with the exception of the ROIs. The continuous black line represents the significant clusters defined as ROI 1 (CH 9, 7 and 10) and ROI 2 (CH 19 and 21), while the dashed black line indicates their contralateral counterparts.
Channel number
Max cluster size (samples)
Time window (s)
p values
grouping bias, so the weak responses found in the French monolingual newborns in Experiment 3 cannot be attributed to the properties of the stimuli.
9 10 12
73 137 106
20.6–27.8 15–28.6 18.2–28.7
<0.0001
6. General discussion
17 19
162 91
21.6–37.7 22.3–36.8
<0.0001
Table 3 Summary of findings for Cluster based analysis in Exp. 1 in Fig. 5 for oxyHb. Cluster
1 2
Table 4 Summary of findings for Cluster based analysis in Exp. 4 in Fig. 5 for oxyHb. Cluster
1 2
Channel number
Max cluster size (samples)
Time window (s)
p values
7 9 10
125 46 164
10.7–27.7 12.4–16.9 12.6–28.9
<0.0001
19 21
70 103
15.4–22.3 10.4–20.6
<0.0001
(Experiment 3) (Mbil = 0.51 vs. Mmono = 0.18). Additionally, an interaction was found between Grouping Hemisphere ROI Prenatal Exposure [F(1, 34) = 3.78, p = 0.020]. Post-hoc Bonferroni comparisons revealed that the differential activation in favor of the inconsistent grouping was stronger in the bilingual group in ROI 1 in the RH [F(1, 34) = 3.2, p = 0.083], but in ROI 2 in the LH [F(1, 34) = 5.28, p = 0.028]. No other effect was significant. When tested on identical stimuli, bilinguals exposed to French and another language making extensive use of pitch are more sensitive to pitch-based prosodic grouping than monolinguals exposed only to French. These results also show that the pitch contrast used in Experiments 3 & 4 is perceptually sufficiently salient to elicit a
We conducted four NIRS experiments testing whether newborns from different language backgrounds have prosodic grouping preferences at birth. To our knowledge, the present study is the first to show prosodic grouping at such a young age and in infants from monolingual and bilingual backgrounds. It thus extends previous studies that had tested infants’ sensitivity to the Iambic-Trochaic Law (ITL) from about 6 months onwards and typically from monolingual language environments (Abboub et al., 2016; Bion et al., 2011; Hay & Saffran, 2012; Yoshida et al., 2010). Our data show greater brain activations to grouping patterns inconsistent with the ITL and/or with native language experience as compared to consistent groupings, establishing prosodic grouping at birth. In French-exposed monolinguals, a stable bias was found for the duration cue, whereas little to no differential response was found for the intensity and pitch cues. In Frenchother language-exposed bilinguals, a stable differential response was found for the pitch cue, the only cue we tested. Duration is the cue most typically used in French phrase level prosody, while pitch is used in the bilingual infants’ other languages. Therefore, our results suggest that while universal biases may be present at birth, as the weak effect found in response to the intensity cue in Experiment 2 might indicate, they are strongly influenced by prenatal exposure. Several aspects of our study merit further discussion. First, it needs to be considered how our bilingual participants’ other language and its reliance on pitch modulate these infants’ grouping biases. All infants in our bilingual group were receiving balanced
N. Abboub et al. / Brain & Language 162 (2016) 46–59
input between French and the other language, namely between 40% and 60% exposure to each of their two languages. Hence, while the group was heterogeneous with respect to what other language each infant heard, the infants were similar in that they had all received extensive exposure to a language that used pitch to a considerable extent in its prosodic structure. In some languages (Kabyle, Arabic, Polish, Bulgarian), pitch was used at the lexical level (Angoujard, 1986; Chaker, 1995; Dimitrova, Redeker, & Hoeks, 2009; Malisz, 2013), in others (Turkish) at the phrasal level (Nespor et al., 2008), in yet others (Chinese, Koyaga), as part of the lexical tonal system (Creissels & Gregoire, 1993; Lee, Chen, Luke, & Shen, 2002). Thus, it appears likely that hearing pitch contrasts before birth may have an effect on pitch-based grouping preferences and may lead to a stable bias at birth, similarly to the one observed in our French-exposed monolinguals for duration. These results add to previous work on newborns born to mothers speaking two different languages. These infants showed an equal preference and successful discrimination for both languages (ByersHeinlein et al., 2010). These findings can be taken as evidence that repeated exposure to certain prosodic features of speech before birth, such as pitch or duration, leads to the development or reinforcement of perceptual biases at birth. There exists an alternative, and not necessarily mutually exclusive, interpretation of our findings. Our bilingual participants might show a stronger grouping bias for pitch than monolinguals not (only) because their other language uses pitch as a prosodic cue, but (also) because being bilingual confers cognitive and perceptual flexibility and a later perceptual commitment to the native language(s). Indeed, recent observations highlight the early cognitive effects of bilingual exposure, bilinguals showing a number of cognitive and perceptual advantages compared to monolinguals, for instance better attention to perceptual details that distinguish spoken languages (Byers-Heinlein et al., 2010), talking faces (Sebas tián-Gallés, Albareda-Castellot, Weikum, & Werker, 2012), individual phonemes (Sundara, Polka, & Molnar, 2008) and stress patterns (Abboub, Bijeljac-Babic, Serres, & Nazzi, 2015; Bijeljac-Babic, Serres, Höhle, & Nazzi, 2012). They are also better than monolinguals at using context to determine which language is being used (Werker, 2012) and have enhanced flexibility to switch attention to and learn multiple regularities (Kovács & Mehler, 2009a, 2009b). It might thus be the case that our bilingual newborns were more flexible or advanced in language processing in general, or were paying more attention to subtle prosodic cues than our monolingual newborns. While further studies will be needed to dissociate the two previous hypotheses, our results nevertheless clearly demonstrate that differences in language exposure modulate perceptual grouping biases at birth, confirming the early prenatal impact of native language. A second and related issue concerns the developmental origin of the weak biases found in monolinguals for the two unfamiliar acoustic cues. Although we tested infants very early, i.e. at birth, well before neural commitment and perceptual attunement to the native language are fully completed, it is still possible to interpret the weakness of the biases we observed for the two unfamiliar acoustic cues in two alternative ways. On the one hand, it might be the case that at the very onset of hearing, biases for all three cues are equally strong and that those used in the language heard in utero are maintained, while the others weaken. Alternatively, it is possible that universal, language-independent biases are initially quite weak, and language experience selectively strengthens those that appear in the input. Subsequent studies with fetuses or preterm infants will be needed to help decide between the two alternatives. Third, we observed an at least partly language-dependent grouping bias at this young age for a non-linguistic stimulus material. This is of significance, as most previous behavioral studies
57
found language-specific grouping effects for linguistically complex materials in adults (Bhatara et al., 2013) and by 7 months of age in infants (Abboub et al., 2016; Bion et al., 2011; Molnar et al., 2014; Yoshida et al., 2010). Our results thus imply that linguistic experience with prosody might influence not only speech, but also general auditory perception – a point that further research will need to confirm. Fourth, we observed, mainly in Experiments 1 and 4, that inconsistent sequences elicited significant activation as compared to consistent sequences in the classical auditory temporal areas (Gervain, Macagno, et al., 2008; Gervain, Nespor, Mazuka, Horie, & Mehler, 2008; May et al., 2011) and in the parietal areas, both associated with prosodic processing (Kreitewolf et al., 2014; Skeide & Friederici, 2016). Further, the strongest differences between consistent and inconsistent grouping patterns, both in Experiment 1 and in Experiment 4, were most pronounced in the left temporo-parietal channels particularly in Experiment 4 (possibly involving the Superior Temporal Gyrus, Middle Temporal Gyrus and the premotor cortex). The processing of musical melody and speech prosody has often been found to be located in the RH both in infants and adults (Friederici, 2012; Homae, Watanabe, Nakano, Asakawa, & Taga, 2006; Kreitewolf et al., 2014; Wartenburger et al., 2007), which might appear at odds with our results at first sight. Importantly, however, what distinguishes consistent and inconsistent patterns in our study is not the presence or absence of prosody, as both sequences contain simple, basic prosodic structures, but rather the well-formedness of the consistent as opposed to the inconsistent patterns. These results suggest that independently of the specific acoustic cues (duration and pitch), well-formedness or consistency with a perceptual bias is processed similarly and in roughly the same brain area, although slight differences in channel activated might have resulted from slightly different probe place or slightly different neuronal networks involved in processing the two cues. In our experiments, this processing was bilateral, with a slight advantage for the LH, which is indeed consistent with the idea that newborns responded not simply to the acoustic or prosodic cues, but also to their sequential arrangement. Indeed, previous research shows that the left temporal cortex is sensitive to structural constraints very early in life for speech sounds (Gervain, Macagno, et al., 2008; Gomez et al., 2014) as well as for tone sequences, as used in the current study (Gervain, Berent, & Werker, 2010). Recent work with adults also supports this view (Kreitewolf et al., 2014; Skeide & Friederici, 2016). Investigating the processing of grammatical/structural vs. emotional prosody in speech-recognition vs. speaker-recognition tasks, a series of fMRI experiments with adults found that prosodic processing involves a complex network relying on right-hemisphere sensitivity to melody/rhythm as well as left-hemisphere sensitivity to grammar/structure. This lends support to the idea that the slight left-lateralization we observe is a signature not of prosodic processing per se, but of the differentiation between structurally well- and ill-formed prosodic sequences. In conclusion, our results are the first to demonstrate that at birth bilingual and monolingual infants show prosodic grouping preferences as proposed by the Iambic-Trochaic Law (Bion et al., 2011; Hayes, 1995). Furthermore, these biases converge with the prosody of the language(s) heard in utero, suggesting that prenatal experience already shapes speech perception and language learning. Such early perceptual biases could constitute powerful language acquisition tools, helping infants tune into the lexical and grammatical structures of their native languages, for example by helping them segment incoming speech into meaningful units such as words, or learn the word order used in their native language (Gervain, Nespor, et al., 2008; Gervain & Werker, 2013; Nespor et al., 2008).
58
N. Abboub et al. / Brain & Language 162 (2016) 46–59
Acknowledgements We thank the parents who graciously agreed to have their infants participate in the study and all of the personnel of Hopital Robert Debre, Paris, France. We also thank Maria Clemencia Ortiz Barajas for her help with the statistical analysis. This work was supported by LABEX EFL (ANR-10-LABX-0083) to NA, TN and JG, a Fyssen Foundation Startup Grant, an Emergence(s) Programme Grant from the City of Paris and a Human Frontiers Science Program Young Investigator Grant (RGY-0073-2014) to JG. References Abboub, N., Bijeljac-Babic, R., Serres, J., & Nazzi, T. (2015). On the importance of being bilingual: Word stress processing in a context of segmental variability. Journal of Experimental Child Psychology, 132, 111–120. Abboub, N., Boll-Avetisyan, N., Bhatara, A., Hoehle, B., & Nazzi, T. (2016). An exploration of rhytmic grouping of speech sequences by French- and Germanlearning infants. Frontiers in Human Neuroscience, 10, 1–12. http://dx.doi.org/ 10.3389/FNHUM.2016.00292. Angoujard, J. P. (1986). Les hiérarchies prosodiques en arabe. Revue Québécoise de Linguistique, 16(1), 11. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 289–300. Bhatara, A., Boll-Avetisyan, N., Agus, T., Höhle, B., & Nazzi, T. (2015). Language experience affects grouping of musical instrument sounds. Psychological Science. http://dx.doi.org/10.1111/cogs.12300. Bhatara, A., Boll-Avetisyan, N., Unger, A., Nazzi, T., & Höhle, B. (2013). Native language affects rhythmic grouping of speech. The Journal of the Acoustical Society of America, 134(5), 3828–3843. Bijeljac-Babic, R., Serres, J., Höhle, B., & Nazzi, T. (2012). Effect of bilingualism on lexical stress pattern discrimination in French-learning infants. PLoS One, 7(2), e30843. Bion, R. A. H., Benavides-varela, S., & Nespor, M. (2011). Acoustic markers of prominence influence infants’ and adults’ segmentation of speech sequences. Language and Speech, 54(1), 123–140. Bolton, T. L. (1894). Rhythm. The American Journal of Psychology, 6(2), 145–238. Byers-Heinlein, K., Burns, T. C., & Werker, J. F. (2010). The roots of bilingualism in newborns. Psychological Science, 21(3), 343–348. Chaker, S. (1995). Données exploratoires en prosodie berbère: I. L’accent en kabyle. Comptes Rendus Du GLECS, 31(I), 55–82. 27-54. Cheour-Luhtanen, M., Alho, K., Kujala, T., Sainio, K., Reinikainen, K., Renlund, M., ... Näätänen, R. (1995). Mismatch negativity indicates vowel discrimination in newborns. Hearing Research, 82(1), 53–58. Christophe, A., Dupoux, E., Bertoncini, J., & Mehler, J. (1994). Do infants perceive word boundaries? An empirical study of the bootstrapping of lexical acquisition. Journal of the Acoustical Society of America, 95, 1570–1580. Christophe, A., Gout, A., Peperkamp, S., & Morgan, J. L. (2003). Discovering words in the continuous speech stream: The role of prosody. Journal of Phonetics, 31(3–4), 585–598. Cohen, M. X. (2014). Analyzing neural time series data: Theory and practice. Cambridge, MA: MIT Press. Retrieved from https://books.google.com/books? id=rDKkAgAAQBAJ&pgis=1. Cooper, G., & Meyer, L. (1960). The rhythmic structure of music. Chicago: University of Chicago Press. Creissels, D., & Gregoire, C. (1993). La notion de ton marqué dans l’analyse d’une opposition tonale binaire: le cas du mandingue. Journal of African Languages and Linguistics, 14(2), 107–154. de la Mora, D. M., Nespor, M., & Toro, J. M. (2013). Do humans and nonhuman animals share the grouping principles of the iambic-trochaic law? Attention, Perception & Psychophysics, 75, 92–100. DeCasper, a. (1994). Fetal reactions to recurrent maternal speech. Infant Behavior and Development, 17(2), 159–164. Decasper, A. J., & Fifer, W. P. (1980). Of human bonding: Newborns prefer their mothers’ voices. Science, 208, 1174–1176. Decasper, A. J., & Spence, M. J. (1986). Prenatal maternal speech influences newborns’ perception of speech sounds. Infant and Child Development, 9, 133–150. Dehaene-Lambertz, G., Dehaene, S., & Hertz-Pannier, L. (2002). Functional neuroimaging of speech perception in infants. Science, 298(5600), 2013–2015. Dell, F., Hirst, D., & Vergnaud, J. -R. (1984). Forme sonore du langage: Structure des représentations en phonologie. (Paris: Her., pp. 65–122). Dimitrova, D. V, Redeker, G., & Hoeks, J. C. J. (2009). Did you say a BLUE banana? The prosody of contrast and abnormality in Bulgarian and Dutch. In INTERSPEECH (pp. 999–1002). Ding, X. P., Fu, G., & Lee, K. (2014). Neural correlates of own- and other-race face recognition in children: A functional near-infrared spectroscopy study. NeuroImage, 85, 335–344. Edwards, L. A., Wagner, J. B., Simon, C. E., & Hyde, D. C. (2015). Functional brain organization for number processing in pre-verbal infants. Developmental Science, 1–13. http://dx.doi.org/10.1111/desc.12333.
Fernald, A., & Kuhl, P. (1987). Acoustic determinants of infant preference for motherese speech. Infant Behavior and Development, 10, 279–293. Ferry, A. L., Fló, A., Brusini, P., Cattarossi, L., Macagno, F., Nespor, M., & Mehler, J. (2016). On the edge of language acquisition: Inherent constraints on encoding multisyllabic sequences in the neonate brain. Developmental Science, 19, 488–503. http://dx.doi.org/10.1111/desc.12323. Friederici, A. D. (2012). The cortical language circuit: From auditory perception to sentence comprehension. Trends in Cognitive Sciences, 16(5), 262–268. Gandour, J., Tong, Y., Wong, D., Talavage, T., Dzemidzic, M., Xu, Y., ... Lowe, M. (2004). Hemispheric roles in the perception of speech prosody. NeuroImage, 23 (1), 344–357. Gerhardt, K. J., Otto, R., Abrams, R. M., Colle, J. J., Burchfield, D. J., & Peters, A. J. (1992). Cochlear microphonics recorded from fetal and newborn sheep. American Journal of Otolaryngology, 13(4), 226–233. Gervain, J., Berent, I., & Werker, J. F. (2012). Binding at birth: The newborn brain detects identity relations and sequential position in speech. Journal of Cognitive Neuroscience, 24(3), 564–574. Gervain, J., Berent, I., Werker, J. (2010). The encoding of identity and sequential position in newborns: An optical imaging study. Talk presented at 17th ICIS, Mar 10–14, 2010, Baltimore, USA. Gervain, J., Macagno, F., Cogoi, S., Peña, M., & Mehler, J. (2008). The neonate brain detects speech structure. Proceedings of the National Academy of Sciences of the United States of America, 105(37), 14222–14227. Gervain, J., Nespor, M., Mazuka, R., Horie, R., & Mehler, J. (2008). Bootstrapping word order in prelexical infants: A Japanese-Italian cross-linguistic study. Cognitive Psychology, 57(1), 56–74. Gervain, J., & Werker, J. F. (2013). Prosody cues word order in 7-month-old bilingual infants. Nature Communications, 4, 1490. Gomez, D. M., Berent, I., Benavides-Varela, S., Bion, R. a. H., Cattarossi, L., Nespor, M., & Mehler, J. (2014). Language universals at birth. Proceedings of the National Academy of Sciences (21), 1–5. Granier-Deferre, C., Bassereau, S., Ribeiro, A., Jacquet, A.-Y., & Decasper, A. J. (2011). A melodic contour repeatedly experienced by human near-term fetuses elicits a profound cardiac reaction one month after birth. PLoS One, 6(2), e17304. Hay, J. F., & Diehl, R. L. (2007). Perception of rhythmic grouping: Testing the iambic/ trochaic law. Perception & Psychophysics, 69(1), 113–122. Hay, J. F., & Saffran, J. R. (2012). Rhythmic grouping biases constrain infant statistical learning. Infancy, 17(6), 610–641. http://dx.doi.org/10.1111/j.15327078.2011.00110.x. Hayes, B. (1995). Metrical stress theory: Principles and case studies. In Proceedings of the eleventh annual meeting of the Berkeley linguistics (pp. 429–446). Chicago, IL: University of Chicago Press. Hepper, P. G., & Shahidullah, B. S. (1994). Development of fetal hearing. Archives of Disease in Childhood, 71, 81–87. Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature Reviews Neuroscience, 8, 393–402. Hirsh-Pasek, K., Nelson, D. G. K., Jusczyk, P. W., Druss, K., wright, C. B., & Kennedy, L. (1987). Clauses are perceptual units for young infants. Cognition, 26, 269–286. Homae, F., Watanabe, H., Nakano, T., Asakawa, K., & Taga, G. (2006). The right hemisphere of sleeping infant perceives sentential prosody. Neuroscience Research, 54(4), 276–280. Homae, F., Watanabe, H., Nakano, T., & Taga, G. (2007). Prosodic processing in the developing brain. Neuroscience Research, 59(1), 29–39. Iversen, J. R., Patel, A. D., & Ohgushi, K. (2008). Perception of rhythmic grouping depends on auditory experience. The Journal of the Acoustical Society of America, 124(4), 2263–2271. Johnson, E. K., & Jusczyk, P. W. (2001). Word segmentation by 8-month-olds: When speech cues count more than statistics. Journal of Memory and Language, 44(4), 548–567. Jusczyk, P. W., Houston, D. M., & Newsome, M. (1999). The beginnings of word segmentation in English-learning infants. Cognitive Psychology, 39, 159–207. Kooijman, V., Hagoort, P., & Cutler, A. (2009). Prosodic structure in early word segmentation: ERP evidence from Dutch ten-month-olds. Infancy, 14(6), 591–612. Kovács, A. M., & Mehler, J. (2009a). Cognitive gains in 7-month-old bilingual infants. Proceedings of the National Academy of Sciences of the United States of America, 106(16), 6556–6560. Kovács, A. M., & Mehler, J. (2009b). Flexible learning of multiple speech structures in bilingual infants. Science, 325(5940), 611–612. Kreitewolf, J., Friederici, A. D., & von Kriegstein, K. (2014). Hemispheric lateralization of linguistic prosody recognition in comparison to speech and speaker recognition. NeuroImage, 102, 332–344. Lee, W., Chen, F., Luke, K. K., & Shen, L. (2002). The prosody of bisyllabic and polysyllabic words in Hong Kong Cantonese. In Proc. SP-2002 (pp. 451–454). Aix-en-Provence. Lloyd-Fox, S., Richards, J. E., Blasi, A., Murphy, D. G. M., Elwell, C. E., & Johnson, M. H. (2014). Coregistering functional near-infrared spectroscopy with underlying cortical areas in infants. Neurophotonics, 1(2), 025006. http://dx.doi.org/ 10.1117/1.NPh.1.2.025006. Mahmoudzadeh, M., Dehaene-Lambertz, G., Fournier, M., Kongolo, G., Goudjil, S., Dubois, J., ... Wallois, F. (2013). Syllabic discrimination in premature human infants prior to complete formation of cortical layers. Proceedings of the National Academy of Sciences of the United States of America, 110(12), 4846–4851. http:// dx.doi.org/10.1073/pnas.1212220110.
N. Abboub et al. / Brain & Language 162 (2016) 46–59 Malisz, Z. (2013). Speech rhythm variability in Polish and English : A study of interaction between rhythmic levels. Mampe, B., Friederici, A. D., Christophe, A., & Wermke, K. (2009). Newborns’ cry melody is shaped by their native language. Current Biology: CB, 19(23), 1994–1997. Maris, E., & Oostenveld, R. (2007). Nonparametric statistical testing of EEG- and MEG-data. Journal of Neuroscience Methods, 164(1), 177–190. http://dx.doi.org/ 10.1016/j.jneumeth.2007.03.024. Matsuda, G., & Hiraki, K. (2006). Sustained decrease in oxygenated hemoglobin during video games in the dorsal prefrontal cortex: A NIRS study of children. NeuroImage, 29(3), 706–711. Mattys, S. L., Jusczyk, P. W., Luce, P. A., & Morgan, J. L. (1999). Phonotactic and prosodic effects on word segmentation in infants. Cognitive Psychology, 38, 465–494. May, L., Byers-Heinlein, K., Gervain, J., & Werker, J. F. (2011). Language and the newborn brain: Does prenatal language experience shape the neonate neural response to speech? Frontiers in Psychology, 2, 222. Mehler, J., Jusczyk, P. W., Lambertz, G., Halsted, N., Bertoncini, J., & Amiel-Tison, C. (1988). A precursor of language acquisition in young infants. Cognition, 29, 143–178. Molnar, M., Lallier, M., & Carreiras, M. (2014). The amount of language exposure determines nonlinguistic tone grouping biases in infants from a bilingual environment. Language Learning, 64(s2), 45–64. Moon, C., Panneton Cooper, R., & Fifer, W. P. (1993). Two-day-olds prefer their native language. Infant Behavior & Development, 16, 495–500. Nazzi, T., Bertoncini, J., & Mehler, J. (1998). Language discrimination by newborns: Toward an understanding of the role of rhythm. Journal of Experimental Psychology. Human Perception and Performance, 24(3), 756–766. Nazzi, T., Floccia, C., & Bertoncini, J. (1998). Discrimination of pitch contours by neonates. Infant Behavior & Development, 21(4), 779–784. Nazzi, T., Iakimova, G., Bertoncini, J., Fredonie, S., & Alcantara, C. (2006). Early segmentation of fluent speech by infants acquiring French: Emerging evidence for crosslinguistic differences. Journal of Memory and Language, 54(3), 283–299. Nazzi, T., & Ramus, F. (2003). Perception and acquisition of linguistic rhythm by infants. Speech Communication, 41(1), 233–243. Nespor, M., Shukla, M., Vijver Van de, R., Avesani, C., Schraudolf, H., & Donati, C. (2008). Different phrasal prominence realizations in VO et OV languages. Lingue E Linguaggio, 2, 1–29. http://dx.doi.org/10.1418/28093. Nishibayashi, L.-L., Goyet, L., & Nazzi, T. (2015). Early speech segmentation in French-learning infants: Monosyllabic words versus embedded syllables. Language and Speech, 1–17. Otsuka, Y., Nakato, E., Kanazawa, S., Yamaguchi, M. K., Watanabe, S., & Kakigi, R. (2007). Neural activation to upright and inverted faces in infants measured by near infrared spectroscopy. NeuroImage, 34, 399–406. Peña, M., Maki, A., Kovacic´, D., Dehaene-Lambertz, G., Koizumi, H., Bouquet, F., & Mehler, J. (2003). Sounds and silence: An optical topography study of language recognition at birth. Proceedings of the National Academy of Sciences of the United States of America, 100(20), 11702–11705.
59
Querleu, D., Renard, X., Versyp, F., Paris-Delrue, L., & Crèpin, G. (1988). Fetal hearing. European Journal of Obstetrics and Gynecology and Reproductive Biology, 29(1), 191–212. Sansavini, A., Bertoncini, J., & Giovanelli, G. (1997). Newborns discriminate the rhythm of multisyllabic stressed words. Developmental Psychology, 33(1), 3–11. Sato, Y., Sogabe, Y., & Mazuka, R. (2007). Brain responses in the processing of lexical pitch-accent by Japanese speakers. NeuroReport, 18(18), 2001–2004. Sato, Y., Sogabe, Y., & Mazuka, R. (2010). Development of hemispheric specialization for lexical pitch-accent in Japanese infants. Journal of Cognitive Neuroscience, 22 (11), 2503–2513. Sebastián-Gallés, N., Albareda-Castellot, B., Weikum, W. M., & Werker, J. F. (2012). A bilingual advantage in visual language discrimination in infancy. Psychological Science, 23(9), 994–999. Shi, R., Werker, J. F., & Morgan, J. L. (1999). Newborn infants’ sensitivity to perceptual cues to lexical and grammatical words. Cognition, 72(2), B11–B21. Shi, F., Yap, P. T., Wu, G., Jia, H., Gilmore, J. H., Lin, W., & Shen, D. (2011). Infant brain atlases from neonates to 1- and 2-year-olds. PLoS One, 6(4). http://dx.doi.org/ 10.1371/journal.pone.0018746. Skeide, M. A., & Friederici, A. D. (2016). The ontogeny of the cortical language network. Nature Reviews Neuroscience, 17(5), 323–332. http://dx.doi.org/ 10.1038/nrn.2016.23. Sundara, M., Polka, L., & Molnar, M. (2008). Development of coronal stop perception: Bilingual infants keep pace with their monolingual peers. Cognition, 108(1), 232–242. Telkemeyer, S., Rossi, S., Koch, S. P., Nierhaus, T., Steinbrink, J., Poeppel, D., ... Wartenburger, I. (2009). Sensitivity of newborn auditory cortex to the temporal structure of sounds. The Journal of Neuroscience, 29(47), 14726–14733. Vannasing, P., González-Frankenberger, B., Florea, O., Tremblay, J., Paquette, N., Safi, D., ... Gallagher, A. (2016). Distinct hemispheric specializations for native and non-native languages in one-day-old newborns identified by fNIRS. Neuropsychologia, 84, 63–69. http://dx.doi.org/10.1016/j.neuropsychologia. 2016.01.038. Vouloumanos, A., & Werker, J. F. (2007). Listening to language at birth: Evidence for a bias for speech in neonates. Developmental Science, 10(2), 159–164. http://dx. doi.org/10.1111/j.1467-7687.2007.00549.x. Wartenburger, I., Steinbrink, J., Telkemeyer, S., Friedrich, M., Friederici, A. D., & Obrig, H. (2007). The processing of prosody: Evidence of interhemispheric specialization at the age of four. NeuroImage, 34(1), 416–425. Werker, J. F. (2012). Perceptual foundations of bilingual acquisition in infancy. Annals of the New York Academy of Sciences, 1251, 50–61. Werker, J. F., & Gervain, J. (2013). Speech perception in infancy : A foundation for language acquisition. In P. Zelazo (Ed.), The Oxford handbook of developmental psychology (pp. 909–925). Oxford University Press. Woodrow, H. (1951). Time perception. In S. Stevens (Ed.), Handbook of experimental psychology (pp. 1224–1236). New York: Wiley. Yoshida, K. A., Iversen, J. R., Patel, A. D., Mazuka, R., Nito, H., Gervain, J., & Werker, J. F. (2010). The development of perceptual grouping biases in infancy: A Japanese-English cross-linguistic study. Cognition, 115(2), 356–361.