Journal of Memory and Language 62 (2010) 204–225
Contents lists available at ScienceDirect
Journal of Memory and Language journal homepage: www.elsevier.com/locate/jml
Prosodic disambiguation in child-directed speech Vera Kempe a,*, Sonja Schaeffler b, John C. Thoresen c a b c
University of Abertay, Division of Psychology, Dundee DD1 1HG, United Kingdom Speech Science Research Centre, Queen Margaret University, Edinburgh, EH21 6UU, United Kingdom Department of Psychology, University of Durham, Durham DH1 3LE, United Kingdom
a r t i c l e
i n f o
Article history: Received 14 July 2009 revision received 10 November 2009 Available online 14 December 2009 Keywords: Prosodic disambiguation Child-directed speech Linguistic prosody Affective prosody
a b s t r a c t The study examines whether speakers exaggerate prosodic cues to syntactic structure when addressing young children. In four experiments, 72 mothers and 48 non-mothers addressed either real 2–4-year old or imaginary children as well as adult confederates using syntactically ambiguous sentences like Touch the cat with the spoon intending to convey either an instrument (high attachment) or a modifier (low attachment) interpretation. Mothers produced longer segments and pauses in child-directed speech (CDS) compared to adult-directed speech (ADS). However, in CDS, mothers lengthened post-nominal pauses in both the instrument and the modifier sentences to a similar extent thereby failing to disambiguate between the two interpretations. In contrast, non-mothers provided reliable prosodic disambiguation cues in CDS by producing post-nominal pauses that were longer in instrument than modifier sentences. Experiment 5, using ratings from 50 participants, determined that expressed positive affect was higher in the CDS of mothers than of nonmothers. Negative correlations between vocal affect and degree of prosodic disambiguation in CDS compared to ADS suggest that there may be a trade-off between affective and linguistic prosody such that greater dominance of affective prosody may limit the informativeness of prosodic cues as markers of syntactic structure. Ó 2009 Elsevier Inc. All rights reserved.
Introduction Young children receive a large part of their language input in the form of child-directed speech (CDS), a speech register that is characterised by exaggerated prosody as well as lexical and morpho-syntactic simplification (e.g. Ferguson, 1977; Fernald et al., 1989). When addressing small children, adults tend to raise their pitch, widen their pitch range and reduce their speech rate (Fernald et al., 1989). It has been suggested that CDS prosody is a manifestation of universal caretaking behaviors such as regulating the child’s arousal, eliciting their attention, soothing and calming, signalling approval or disapproval and communicating positive affect (Fernald, 1989, 1992; Fernald & Simon, 1984; Lieberman, 1996; Locke, 2001; Singh, * Corresponding author. E-mail address:
[email protected] (V. Kempe). 0749-596X/$ - see front matter Ó 2009 Elsevier Inc. All rights reserved. doi:10.1016/j.jml.2009.11.006
Morgan, & Best, 2002; Trainor, Austin, & Desjardins, 2000; Werker & McLeod, 1989). CDS prosody has also been credited with beneficial effects on language acquisition (Fisher & Tokura, 1996b). There are several ways in which CDS prosody may facilitate language learning. Recent analyses of speech directed to pre-linguistic infants have shown that CDS contains prosodic information that may help in distinguishing para-linguistic from linguistic input by placing prosodic boundaries between para-linguistic expressions (gasps, laughs, phatic expressions like oh) and informative speech (Soderstrom, Blossom, Foygel, & Morgan, 2008). Prosody may also help in segmenting words out of the speech stream by prosodically isolating them in 7–15% of instances (Brent & Siskind, 2001; Soderstrom et al., 2008). Somewhat more controversial is the issue as to whether CDS contains prosodic cues that facilitate the discovery of syntactic structure (Broen, 1972; Dale, 1974; Seidl, 2007).
V. Kempe et al. / Journal of Memory and Language 62 (2010) 204–225
While it is well established that small infants, probably even newborns, are sensitive to natural speech prosody (Sambeth, Ruohio, Alku, Fellman, & Huotilainen, 2008) and are able to detect misalignment between prosody and syntax (e.g. Christophe, Nespor, Guasti, & Van Ooyen, 2003; Hirsh-Pasek et al., 1987; Jusczyk et al., 1992; Seidl, 2007; Soderstrom, Seidl, Kemler Nelson, & Jusczyk, 2003), the issue as to how reliably CDS does contain such cues is far from clear. On the one hand, Fernald and McRoberts (1996) suggested three reasons for why CDS may not contain sufficiently reliable prosodic cues to syntax: firstly, prosodic cues like pauses, pitch modifications and segment lengthening tend to mark utterances, rather than clauses. Secondly, these utterances tend to be both non-clausal as well as clausal in nature, and even if syntactic clauses are present, in CDS they tend to be of non-canonical form such as in imperatives and interrogatives. Thirdly, there is little evidence for prosodic marking of syntactic structure below the clause level, which would be necessary if prosody is to aid the discovery of structures at the level of the phrase. These features may render CDS less, rather than more informative with respect to the underlying syntactic structure (Fernald & McRoberts, 1996). On the other hand, many researchers agree that even in adult-directed speech (ADS), prosodic and syntactic breaks do show sufficient alignment for prosodic cues to be informative (Ferreira, 1993; Nespor & Vogel, 1986; Price, Ostendorf, Shattuck-Hufnagel, & Fong, 1991; Selkirk, 1984; Shattuck-Hufnagel & Turk, 1996) although the degree of alignment depends on a host of factors related to the speaker, the communicative situation as well as to linguistic structure and content (Allbritton, McCoon, & Ratcliff, 1996; Kraljic & Brennan, 2005; Milotte, Wales, & Christophe, 2007; Schober & Brennan, 2003; Snedeker & Trueswell, 2003; Watson, Breen, & Gibson, 2006). Consequently, children could use prosody to bootstrap themselves into syntax regardless of whether prosodic cues are more prominent in CDS than in ADS or not. The logic of the Prosodic Bootstrapping Hypothesis (Morgan & Demuth, 1996) per se does not depend on greater prosodic clarity and greater alignment between prosody and syntax in CDS than in ADS. Still, there seems to be an implicit assumption that CDS is a richer source of linguistic information for children (Burnham, Kitamura, & VollmerConna, 2002; Fernald, 1992). In order to better understand the role of caretaker–child interaction in language development it is important to find out whether prosodic features are indeed more pronounced and more reliable in CDS than in ADS. Only a few studies have directly examined the reliability of prosodic cues to syntactic structure in CDS. For English and Japanese CDS addressed to 14-months-old infants, Fisher and Tokura (1996a) demonstrated the existence of reliable and prominent prosodic breaks at utterance and clause boundaries, which are characterised by longer pauses and vowels, increased amplitude before the boundary as well as increased changes in fundamental frequency across the boundary. However, only some prosodic cues tended to mark boundaries below the clause level. In English, only syllable length was a reliable indicator of boundaries between subject and verb phrases, while in Japanese such boundaries were indicated by pitch lowering. More
205
recently, Soderstrom et al. (2008) have shown that, for two English speaking mothers addressing pre-linguistic infants at 9 months of age, boundaries between subjects and verb phrases were prosodically marked in questions but not in declarative sentences. This may be a consequence of the frequent use of pronouns in CDS declarative sentences which tend to be prosodically grouped with the subsequent verb phrase. Interestingly, in questions, reliable prosodic cues differed between the two mothers: one mother relied more on differences in intonation and intensity across the boundary while the other mother also produced a durational cue, pre-boundary vowel lengthening (Soderstrom et al., 2008). Thus, while infants of this age certainly are sensitive to prosodic phrase-level boundary cues (Soderstrom et al., 2003), the presence of these cues in CDS seems to vary cross-culturally and interindividually. Fernald (1992) suggested that the function of infant-directed prosody is predominantly emotional and social which may render it not to be well suited to mark syntactic structure. One would expect that the reliability and salience of prosodic cues in general, and phrase-level boundary cues in particular, should increase as CDS becomes more complex (Huttenlocher, Vasilyeva, Waterfall, Vevea, & Hedges, 2007) and increasingly serves to support linguistic and cognitive development of the child. To our knowledge, there are no studies which have examined the prosodic structure of CDS addressed to linguistically more sophisticated children past 18 months of age. The present study aims to fill this gap. The prosodic structure of CDS can be examined in two ways. One way is to collect observational data by recording CDS in naturalistic settings, to code the syntactic structure and to measure the acoustic correlates of prosodic cues at clause and phrase-level boundaries as in the aforementioned studies of infant-directed speech prosody. While ecologically valid, this methodology often has other shortcomings such as limited generalizability due to small sample sizes (for a naturalistic study that has overcome this limitation see Huttenlocher et al. (2007)) and restricted coding reliability (Soderstrom et al., 2008). Moreover, to test the implicit assumption that CDS may contain more informative prosody than ADS explicitly would require comparing the prosody of naturalistic CDS with ADS of the same speakers. In the absence of such comparisons it remains unclear to what extent prosodic clarity is indeed a feature of CDS rather than a characteristic of the unique speaking style of the observed speakers. The alternative, pursued in the present study, is to elicit, and directly compare ADS and CDS in situations in which clarifying the syntactic structure of an utterance is of communicative importance, as in the case of syntactically ambiguous sentences. Controlled elicitation of syntactically ambiguous sentences permits the testing of a larger numbers of speakers, while eliminating reliability issues associated with coding the intended syntactic structure in natural speech. Prosodic disambiguation of syntactically ambiguous sentences Lexical and syntactic ambiguities can often be resolved using prosodic information. For example, a sentence like
206
V. Kempe et al. / Journal of Memory and Language 62 (2010) 204–225
Touch the cat with the spoon contains a prepositional phrase that can be attached either high, i.e. to the verb phrase (touch) or low, i.e. to the direct object noun (the cat). Out of context, or in an ambiguous context, the phrase with the spoon can be taken either as instrument of the action, or as a modifier of the direct object. Speakers can employ prosodic cues to disambiguate between these two attachments. For example, if an instrument interpretation (high attachment) is intended, many accounts would predict that the sentence is divided into two intonational phrases (Touch the cat // with the spoon), the boundary between which is marked by a prosodic break after the direct object noun (Clifton, Carlson, & Frazier, 2002; Cooper & PacciaCooper, 1980; Price et al., 1991; Selkirk, 1984; Snedeker & Trueswell, 2003; Snedeker & Yuan, 2008; Watson & Gibson, 2004). This prosodic break can be indicated through stressing and lengthening the pre-boundary noun, through including a post-nominal pause, and through stressing the preposition with as in (1a). Stress is manifested by a combination of higher pitch and amplitude as well as vowel lengthening. Conversely, prosodic marking of the modifier interpretation (low attachment) would involve including the NP (the cat) and the PP (with the spoon) into one intonational phrase resulting in a prosodic break after the verb touch, indicated by stressing and lengthening the verb and including a post-verbal pause as in (1b), but no prosodic break after the direct object noun. (1a) Touch the ca:t . . . WITH the spoon. (1b) Tou:ch . . . the cat with the spoon. There is considerable controversy as to how reliably speakers in adult–adult communication use prosodic cues to mark phrase boundaries in order to disambiguate such syntactically ambiguous sentences. Snedeker and Trueswell (2003) showed that speakers produce disambiguating prosodic cues only when they are aware of an ambiguity in the referential context (i.e. if there are two spoons and two cats present with one cat holding one of the spoons). If the referential context is unambiguous (i.e. if there is only one cat) speakers tend to follow a more parsimonious strategy that does not entail providing redundant prosodic cues. On the other hand, Kraljic and Brennan (2005) demonstrated that, in an interactive situation, speakers use prosodic disambiguation regardless of whether the context is ambiguous or not, i.e. regardless of the addressee’s needs. In addition to methodological differences between the studies that may be responsible for this discrepancy such as differences in utterance length, dialogue interactivity, use of routine prosodic patterns and prosodic affordances of the presented sentences (Milotte et al., 2007), it has also been suggested that speaker variables such as a speaker’s cooperativeness can influence the use of prosodic disambiguation cues in ambiguous contexts (Schober & Brennan, 2003). Assuming that speakers are more cooperative when addressing small children, one would predict that they produce clearer prosodic disambiguation cues in CDS. In the present study, we used a modified version of the paradigm employed by Snedeker and Trueswell (2003). Adult female speakers were presented with syntactically
ambiguous sentences of the type Touch the cat with the spoon, and asked to instruct listeners to perform depicted actions that corresponded to either the instrument or the modifier interpretation. We made a few modifications to the original design: (a) The original study used a number of different sentences with the same structure but different verbs and nouns. Because our procedure involved interaction with a small child, we had to keep duration of the procedure short enough to maintain the children’s attention throughout the procedure. We therefore used only one sentence template containing the verb touch with was deemed to signify an easy action for small children to perform. (b) In our design, half of the syntactically ambiguous sentences could readily be disambiguated by the context whereas the other half was presented with ambiguous referential context. This manipulation, presented in different experiments by Snedeker and Trueswell (2003), was included to see whether speakers engage in audience design when addressing a small child. Below, we will report on a series of experiments comparing the salience of prosodic cues to the intended syntactic structure (high attachment/instrument interpretation vs. low attachment/modifier interpretation) between CDS and ADS for different groups of speakers and addressees. Because conveying affect is one of the attested functions of CDS (Burnham et al., 2002; Fernald, 1992), and because genuine affect expression may alter speech prosody, we controlled for the degree of affective relationship between adult and child by comparing mothers, who have a strong emotional bond with their child, with non-mothers, who presumably do not experience such a bond when interacting with unfamiliar children. Furthermore, because the presence of a child addressee in referential communication tasks may impose additional demands we compared speech directed towards real children with speech directed towards imaginary children. The first experiment compared the informativeness of prosodic disambiguation cues in ADS and CDS by testing mothers addressing an adult confederate as well as their young child to examine whether there is clearer prosodic disambiguation of syntactically ambiguous sentences in CDS, at least when referential context is ambiguous. Experiment 1a: mothers addressing their child and an adult Method Participants Twenty-four mothers, all native speakers of English, mean age 35 years, range 23–46 years, and their children, mean age 2.7 years, range 1;9–3;9 years, were recruited from University playgroups, and received reimbursement of GBP 10.00. Materials We constructed two arrays of toy objects to provide a referential context for the sentences (Fig. 1). Each array contained two exemplars of the referent for the direct object of the sentence (the cat vs. the dog), one of which was
V. Kempe et al. / Journal of Memory and Language 62 (2010) 204–225
207
Fig. 1. Arrays of toys providing ambiguous and unambiguous context for the syntactically ambiguous sentences Touch the cat with the spoon, Touch the frog with the spoon, Touch the duck with the flower (left panel) and Touch the dog with the flower, Touch the fish with the flower, Touch the horse with the spoon (right panel).
holding a small object (the spoon vs. the flower). The arrays also contained a larger version of this object which could serve as an instrument with which to touch another toy. In addition, the arrays contained two other toy animals, one of which was also holding a small object. This allowed us to create sentences with prepositional phrase (PP) attachments that were syntactically and contextually ambiguous as in (1a) and (1b), and sentences with PP attachments that were syntactically ambiguous but were readily disambiguated by the referential context as in (2a–d). In addition, we created three filler sentences for each array which were either simple transitive sentences or sentences with a conjunctive transitive object (see ‘‘Appendix A”). Thus, there were always two critical sentences per array, one with ambiguous context and one with unambiguous context resulting in four critical sentences in total. (1a) Touch the cat with the spoon. (1b) Touch the dog with the flower. (2a) Touch the fish with the flower (instrument interpretation – high attachment). (2b) Touch the duck with the flower (modifier interpretation – low attachment). (2c) Touch the frog with the spoon (instrument interpretation – high attachment).
(2d) Touch the horse with the spoon (modifier interpretation – low attachment). Using the toy objects, we created color photographs of the intended actions which the mothers were asked to describe with the target sentences (see Fig. 2). For the instrument interpretation, the photographs showed a hand holding the instrument (e.g. the spoon) touching the designated toy (e.g. the cat that does not hold the little spoon). For the modifier interpretation, the photographs showed a hand touching the designated toy (e.g. a hand touching the cat holding the little spoon). The intended actions for the sentences with unambiguous referential context and for the filler sentences were depicted accordingly. The photographs were color-printed on portrait A4 paper with the corresponding target sentence placed in 28 point font underneath. In order to counterbalance the modifier and instrument interpretations for each target sentence across participants, all photographs and sentences were assembled into two booklets which differed only in the intended action of the ambiguous sentences. Thus, if in Booklet 1 a critical sentence was presented with a depicted action corresponding to an instrument interpretation, then in Booklet 2 it was presented with a modifier interpretation, and vice versa. Order of presentation of the sentences was fixed (see ‘‘Appendix A”) to make sure
Fig. 2. Examples of depicted actions corresponding to the instrument (left panel) and modifier (right panel) interpretations for the syntactically ambiguous sentences, here given for the sentence Touch the cat with the spoon.
208
V. Kempe et al. / Journal of Memory and Language 62 (2010) 204–225
that the critical sentences were always followed by filler sentences.
Procedure The mothers were told that they were taking part in a study testing a game to be used for another experiment, and that the aim was to see from what age on children could follow instructions specified for this game. Debriefing after the experiment suggested that, as intended, the mothers had remained unaware of the fact that it was their speech that was under scrutiny in this study. The mothers were seated in front of the array of toys placed next to each other and were given the booklet with the depicted target actions and the instruction sentences printed underneath. In the CDS condition, the children were seated on their mother’s lap in front of the toys. The booklet was placed next to the mother but out of the view of the children. Seating the children on the mothers’ lap created a relaxed atmosphere, and at the same time prevented the use of non-verbal cues like gestures or gaze to aid the children in performing the actions ensuring that the mothers had to rely exclusively on speech when giving instructions. The mothers were told that their task was to instruct their children to perform the depicted actions using the toys in front of them, and that they were not allowed to change the words of the instruction printed underneath. They were encouraged to study the picture of the intended action carefully before producing the sentence. They were also told that the children may not be able to follow the action, in which case they should not reiterate or paraphrase the instructions but simply move onto the next item. The mothers were fitted with a JHS MUD-805 uni-directional headset microphone which was connected to an iRiver iHP-120 which allows uncompressed wave-format recordings. Sound files were recorded at a sampling rate of 44.1 kHz. As the emphasis of the study was on the verbal performance of the mothers, and since the children in many cases were not able to comply with the instructions, their actions were not recorded. In the ADS condition, the mothers were asked to instruct an adult confederate using the sentences printed underneath the depicted actions without altering the words. The confederate could not see the pictures and sentences. After the confederate had completed the target action, the mothers moved onto the next item. The order of CDS and ADS was counterbalanced across participants. Mothers who produced ADS first were told that this was necessary to familiarise them with the sentences in order to ensure smooth interaction with their child. Mothers who produced ADS second were told that this was necessary to check whether the instructions could in principle be followed by a linguistically competent interlocutor. All participants were told that they should speak normally to the adult confederate and that there was no need to pretend to speak to a child. While the mothers addressed the adult confederate the experimenter occupied the child with a drawing game in an adjacent room with the door open but the child out of their mothers’ view in order to minimise the possibility of child-directed interactions. The entire session lasted about 15–20 min.
Results and discussion Sentences with ambiguous PPs can be disambiguated by placing an intonational phrase break either after the verb for a modifier interpretation, or after the direct object noun for an instrument interpretation, to signal the appropriate phrase boundaries. Intonational breaks can be indicated by pauses, segment lengthening before the break and pitch accents. These cues, when placed in combination, have been shown to determine infants’ prosodic preferences (Seidl, 2007). We therefore measured the duration of the critical pauses, the pause after the verb touch (henceforth: post-verbal pause) and the pause after the first noun (henceforth: post-nominal pause). We also measured the duration of the vowel in the verb touch, i.e. the vowel preceding the first potential site of a pause, and of the vowels in the preposition with, i.e. the vowel following the second potential pause. We did not analyse the vowel durations of the first noun as in this position, different nouns containing vowels of different inherent lengths were used in the ambiguous and the unambiguous conditions. This was an unavoidable consequence of using the same array of objects for both contexts in order to minimise disruption to the play situation. Moreover, because differences in inherent vowel lengths can also affect subsequent pauses any effects of context ambiguity on the post-nominal pause durations need to be interpreted with caution. Finally, we measured the presence of a pitch accent on the preposition with. As pitch accents can only be determined relative to the intonation contour of the entire sentence, we defined a pitch accent as present if the mean F0 for the vowel in with was higher than the mean F0 for the stressed vowels in the first and the second noun. All duration and F0 measurements were performed using PRAAT (Boersma & Weenink, 2005). Note that due to occasional creaky voice or whispering some vowel duration measurements were not available. The number of available observations for each dependent variable is given in Table 1 and all subsequent tables. To ensure measurement reliability some duration and pitch measurements were performed by two independent coders. The correlations between the pitch measurements ranged from .95 to .97, all p’s < .001; the correlation for the durational measures was r = .72, p < .001, N = 192 for the post-verbal pause measurements, and r = .96, p < .001, N = 192 for the post-nominal pause measurements. The lower correlation for the post-verbal pauses is a result of discrepancies between the coders in identifying the exact offset of the fricative in the verb touch. In order to not underestimate the duration of the post-verbal pauses we adopted a conservative approach and used the set of measurements with longer durations and greater variability in the post-verbal pauses for this and all subsequent analyses. The means for vowel and pause durations as well as percent of pitch accents are given in Table 1. Since for each speaker, there was only one critical sentence in each condition, a necessary limitation to keep the duration of the experiment short enough to accommodate small children, analyses by items were not possible in this study. The last column of Table 1 gives the results for a 2 (Addressee: adult vs. child) 2 (Ambiguity: ambiguous context vs.
V. Kempe et al. / Journal of Memory and Language 62 (2010) 204–225
209
Table 1 Mean durations of segments and pauses (and standard deviations in parentheses) as well as results of a 2 (Ambiguity) 2 (Addressee) 2 (Attachment) within-subjects ANOVA and percent of speakers with pitch peaks on the preposition with as a function of addressee, context ambiguity and attachment for mothers addressing their child and an adult confederate in Experiments 1a and 1b, *p < .05, **p < .01, ***p < .001. Adult-directed Instrument
Child-directed Modifier
Instrument
Ambiguity Addressee Attachment ANOVA Modifier
Ambiguous context Unambiguous context
Vowel duration verb touch (N = 20) 73 (20) 78 (26) 82 (18) 70 (16) 69 (18) 80 (22)
80 (24) 77 (21)
Addressee: F(1, 19) = 6.3, p < .05
Ambiguous context Unambiguous context
Post-verbal pause duration (N = 24) 56 (15) 58 (23) 62 (27) 54 (21) 55 (19) 69 (56)
58 (23) 65 (27)
Addressee: F(1, 23) = 5.2, p < .05
Ambiguous context Unambiguous context
Post-nominal pause duration (N = 24) 117 (134) 60 (40) 132 (124) 103 (73) 111 (116) 150 (116)
97 (73) 178 (253)
Addressee: F(1, 23) = 4.3, p < .05
Ambiguous context Unambiguous context
Vowel duration with (N = 20) 41 (15) 35 (12) 46 (16) 37 (11)
42 (10) 56 (34)
40 (15) 39 (13)
Attachment: F(1, 19) = 13.4, p > .01
Ambiguous context Unambiguous context
% Pitch peaks on with 20.8 12.5 33.3 16.6
8.3 16.6
8.3 12.5
Ambiguous context Unambiguous context
Comprehension: % correct 51.9 (14.3) 55.0 (11.5) 55.7 (10.3) 54.0 (10.3)
54.3 (18.2) 59.9 (17.2)
48.4 (18.4) 46.5 (14.9)
unambiguous context) 2 (Attachment: high vs. low) ANOVA by subjects for all dependent variables. A main effect of addressee was found for all durational parameters except for the duration of the short vowel in with, indicating that, as expected, mothers slowed their speech rate by lengthening vowels and pauses in CDS. There was evidence for prosodic disambiguation in that the vowels in with were longer in the instrument sentences compared to the modifier sentences suggesting that speakers produced an accent in this position. Crucially, if mothers were to display enhanced prosodic disambiguation in CDS, one would expect to see a significant interaction between Addressee and Attachment because the prosodic cues signalling the two possible interpretations should be more divergent in CDS. Here, such an interaction was not obtained for any of the durational cues. Similarly, if the mothers used a pitch accent on the preposition to mark the instrument interpretation one would expect a larger percentage of pitch accents in instrument sentences compared to modifier sentences, and if this tendency was more pronounced in CDS the difference in pitch accents between two types of attachment should be larger in CDS. Since the pitch accent data were not normally distributed we performed a set of planned comparisons between instrument and modifier conditions for sentences with ambiguous and unambiguous context in ADS and CDS using Wilcoxon’s Signed Ranks test. Overall, the mothers employed pitch accents 20% of times in instrument sentences and 13% of times in modifier sentences but this difference was not reliable as none of these four comparisons yielded a significant difference (all p’s > .2) suggesting that pitch accents were not used for disambiguation and certainly not emphasised in CDS. Thus, while there was evidence for prosodic disambiguation using durational prosodic cues in general, we did not
Ambiguity Attachment: F(1, 23) = 10.2, p < .01 Addressee Attachment: F(1, 23) = 4.4, p < .05
find any evidence for increased prosodic disambiguation in CDS. Instead, mothers lengthened pauses when addressing their child, a reflection of the slower speech rate typical for CDS. A longer post-nominal pause, however, is a misleading prosodic marker in modifier sentences. To make sure that we did not miss the effect of some other prosodic cues or of specific interactions between prosodic cues that were not captured by our analysis we devised Experiment 1b to examine how listeners interpreted these sentences. Experiment 1b: comprehension of speech of mothers addressing their child and an adult Method Participants Twenty-four native speakers of English (eight males), mean age 27 years, range 16–51 years, participated in the study. Participation was voluntary and none of the participants had taken part in the previous experiment. Materials The recordings of the 192 syntactically ambiguous target sentences (i.e. four ADS and four CDS sentences produced by each mother) from Experiment 1a were combined with two pictures depicting an object being touched either by a hand or by an instrument (e.g. either by the spoon or by the flower). The context toys were removed from the pictures to eliminate the difference between ambiguous and unambiguous context. An example of the two pictures is given in Fig. 3. Procedure We used a forced-choice procedure to examine listeners’ comprehension performance. On each trial, participants
210
V. Kempe et al. / Journal of Memory and Language 62 (2010) 204–225
Fig. 3. Examples of pictures of instrument (left panel) and modifier (right panel) interpretation used in Experiments 1b and 3b, here given for the sentence Touch the cat with the spoon.
heard a syntactically ambiguous target sentence over Beyerdynamic DT 250 high-quality headphones while viewing the two pictures depicting the instrument and the modifier interpretation placed on the left and the right half of the computer screen. They were instructed to decide which one of the two pictures corresponded to the meaning of the sentence intended by the speaker, and to press the C-key for the left picture and the M-key for the right picture. The side of the instrument vs. modifier pictures was randomized. Participants were told that the speaker always had only one of the two possible interpretations in mind, and that they should guess if they were not sure. Participants received five training trials which were randomly selected from the total set of 192 trials. Participants then listened to the 192 target sentences. Once they had made their choice, the next trial followed after 500 ms. Participants were given one short break after 64 trials and another one after 128 trials. The whole procedure lasted about 20 min. Results and discussion For each sentence, we computed the percentage of correct responses, i.e. responses corresponding to the interpretation that the speakers were instructed to convey. These data are given at the bottom of Table 1, as are the results of a 2 (Ambiguity) 2 (Addressee) 2 (Attachment) ANOVA. Note that prosodic cues in general appeared to be not very informative as the percentage of correctly identified interpretations did not stray too far from chance (50%). Still, if the mothers had provided clearer prosodic cues in CDS then we would expect a main effect of addressee on comprehension accuracy. This was not found. Instead, there was a significant interaction between Addressee and Attachment due to the fact that in CDS, comprehension accuracy remained on the same level as in ADS for instrument sentences, p = .3, but decreased for modifier sentences, F(1, 23) = 5.3, p < .05. This comprehension pattern corroborates the previous conclusion that the mothers’ general lengthening of the postnominal pauses in CDS resulted in a misleading cue in the modifier sentences which hampered comprehension. It also suggests that the lengthening of the post-nominal
pause in CDS instrument sentences was insufficient to improve prosodic clarity. We also found an interaction between Ambiguity and Attachment which was due to better comprehension of instrument sentences compared to modifier sentences in the unambiguous (F(1, 23) = 5.2, p < .05), but not in the ambiguous condition, p = .7. This is an unexpected result given that no such interaction, and no effect of ambiguity, had been found for any of the prosodic cues. It may suggest that the interplay of prosodic cues in the mothers’ speech may have marked syntactic structure in instrument sentences somewhat more clearly when the context was unambiguous. In any case, it does not suggest that disambiguation only took place when the context was ambiguous because speakers may have become aware of the ambiguity. On the contrary, lack of more pronounced prosodic marking in ambiguous sentences is in line with suggestions by Kraljic and Brennan (2005) that prosodic disambiguation is not a matter of audience design but a by-product of speech planning. In other words, awareness of ambiguity is not a prerequisite for prosodic disambiguation of syntactic ambiguity. However, it is possible that the cognitive load induced by the presence of their small child and the attempt to keep the child on task may have influenced the mothers’ speech planning. In order to reduce the task demands associated with interaction with a small child, we attempted a replication using CDS addressed to an imaginary child. We assumed that this may tap into a mother’s tacit knowledge about how she typically speaks to her own child. Experiment 2: mothers addressing an imaginary child and an adult Method Participants Twenty-four mothers of children between the ages of 1;6 and 4;0 years, all native speakers of English, mean age 35 years, range 27–42 years, were recruited from University playgroups, and received reimbursement of GBP 10.00. None of the mothers had participated in the previous experiment.
V. Kempe et al. / Journal of Memory and Language 62 (2010) 204–225
Materials The arrays of toys were not used in this experiment. The booklets were identical to the ones used in Experiment 1. Procedure The mothers were recruited from the playgroup while their children were playing in an adjacent room. They were told that the purpose of the study was to develop a game to be used for an experiment with small children, and that the aim is to get clear recordings of child-directed instructions pre-specified for this game which will then be played back to prospective child listeners in order to test how well children are able to understand the instructions. The mothers were informed that they were asked to participate in this study because of their current experience with CDS. They were given the booklet, fitted with the head-mounted microphone and asked to pretend to address a 2–4-year old child the way they ‘usually speak to a child’. As in Experiment 1, they were asked to use the instructions in the booklet verbatim. In the ADS condition, the mothers were asked to talk to an adult for practice purposes. Unlike in Experiment 1, the adult did not act out the instructions but simply listened to the sentences in order to maintain compatibility with the imaginary CDS condition, in which there also was no action feedback. Order of CDS and ADS was counterbalanced. Results and discussion Because of the satisfactory agreement between the coders in post-nominal pause duration and F0 measurements in Experiment 1, all measurements in this and all subsequent experiments were performed by one of the coders who was trained and experienced in using PRAAT. The means for vowel and pause durations as well as percent of pitch accents are given in Table 2, along with the results of the 2 (Addressee) 2 (Ambiguity) 2 (Attachment) ANOVA. The main effect of addressee in the duration of preposition vowels indicates segment length-
211
ening associated with a slower speech rate in CDS. The main effect of attachment in the vowel durations of the verb appears to be due to longer vowels in instrument sentences. This is counter to what would be expected if the mothers provided disambiguating prosodic information since breaks at this site should be more pronounced in the modifier condition. Finally, the mothers employed pitch accents 26% of times in instrument sentences and 21% of times in modifier sentences. The planned comparisons between instrument and modifier conditions in ambiguous and unambiguous context in ADS and CDS using Wilcoxon’s Signed Ranks test showed no significant differences (all p’s > .2). As in Experiment 1a, there was a tendency for the mothers to slow their speech although only in the vowel of with confirming that slowing of the speech rate is less pronounced than when addressing a real child (Jacobson, Boersma, Fields, & Olson, 1983). In general, however, there was no evidence for prosodic disambiguation. Thus, simplifying the experimental situation by removing the child did not result in prosodic disambiguation, neither in ADS nor in CDS. Before concluding that it is unlikely for CDS to contain more salient and reliable prosodic disambiguation cues than ADS, we devised Experiment 3 to see whether the lack of increased prosodic clarity was unique to maternal CDS or whether female speakers who are not mothers would show the same lack of increased prosodic clarity in CDS.
Experiment 3a: non-mothers addressing an imaginary child and an adult Method Participants Twenty-four women who did not have children (henceforth: non-mothers), mainly University students and staff, mean age 27 years, range 21–42 years, participated in the study. All non-mothers were native speakers of English.
Table 2 Mean durations of segments and pauses (and standard deviations in parentheses) as well as results of a 2 (Ambiguity) 2 (Addressee) 2 (Attachment) within-subjects ANOVA and percent of speakers with pitch peaks on the preposition with as a function of addressee, context ambiguity and attachment for mothers addressing an imaginary child and an adult confederate in Experiment 2, *p < .05, **p < .01, ***p < .001. Adult-directed Instrument
Child-directed Modifier
Instrument
Ambiguity Addressee Attachment ANOVA Modifier
Ambiguous context Unambiguous context
Vowel duration verb touch (N = 23) 59 (20) 61 (21) 65 (12) 63 (23) 56 (17) 69 (21)
62 (17) 63 (20)
Attachment: F(1, 23) = 5.9, p < .05
Ambiguous context Unambiguous context
Post-verbal pause duration (N = 24) 11 (21) 9 (15) 35 (65) 26 (44) 22 (25) 19 (22)
18 (24) 31 (30)
n.s.
Ambiguous context Unambiguous context
Post-nominal pause duration 57 (61) 47 (53) 58 (94) 90 (220)
79 (83) 97 (116)
56 (94) 83 (122)
n.s.
Ambiguous context Unambiguous context
Vowel duration with (N = 20) 29 (7) 29 (8) 29 (8) 26 (7)
31 (9) 33 (11)
31 (6) 34 (7)
Addressee: F(1, 19) = 6.7, p < .05
Ambiguous context Unambiguous context
% Pitch peaks on with 20.8 25.0 25.0 20.8
33.3 25.0
25.0 12.5
212
V. Kempe et al. / Journal of Memory and Language 62 (2010) 204–225
Materials and procedure Materials and procedure were identical to Experiment 2 except that the non-mothers were not told that they had been recruited because of their expertise with CDS. Thus, as in Experiment 2, the speakers did not receive action feedback following their speech but were told that the recordings of their speech would be played back to prospective child listeners. Results and discussion The means for vowel and pause durations as well as percent of pitch accents are given in Table 3, along with the results of the 2 (Addressee) 2 (Ambiguity) 2 (Attachment) ANOVA. As in the previous experiments, there were main effects of addressee indicating a lengthening of the post-nominal pause and of the vowels in the preposition. This slowing of the speech rate in CDS of non-mothers has been documented before (Kempe, 2009). Moreover, the main effect of attachment due to longer post-nominal pauses in instrument sentences constitutes evidence for prosodic disambiguation. Crucially, there was a significant interaction between Addressee and Attachment in the post-nominal pauses. Separate ANOVAs for instrument and modifier sentences revealed that the non-mothers lengthened their post-nominal pauses in CDS compared to ADS for instrument sentences, F(1, 23) = 10.8, p < .01, but not for modifier sentences, p = .2, thereby providing a clear prosodic cue signalling an instrument interpretation in CDS. Unlike the mothers in Experiment 1a, they did not provide a misleading prosodic cue because they did not lengthen the post-nominal pauses in modifier sentences. The percentage of non-mothers producing a pitch accent on the preposition with per condition is also given in
Table 3. Overall, the non-mothers employed pitch accents 19% of times in instrument sentences and 17% of times in modifier sentences. Planned comparisons between instrument and modifier conditions in ambiguous and unambiguous context in ADS and CDS using Wilcoxon’s Signed Ranks test showed that in the ambiguous ADS condition, the non-mothers produced more pitch accents in modifier sentences than in instrument sentences Z = 2.0, p < .05, thus providing a misleading prosodic cue. However, in unambiguous CDS sentences, they produced more pitch accents in instrument sentences compared to the modifier sentences, Z = 2.2, p < .05, a prosodic pattern that helps to disambiguate between instrument and modifier interpretation. No differences in frequency of pitch accents between instrument and modifier interpretations was found in unambiguous ADS and in ambiguous CDS sentences, all p’s > .3. While it is not clear what may account for the misleading use of pitch accents in ADS, the crucial finding with respect to the aim of the study is that in CDS, the non-mothers used pitch accent in congruency with the durational prosodic cue. Experiment 3b examined whether the altered durational and pitch cues in the CDS of non-mothers was sufficient to improve comprehension compared to ADS.
Experiment 3b: comprehension of speech of nonmothers addressing an imaginary child and an adult Participants Twenty-four native speakers of English (eight males), mean age 22 years, age range 17–42 years, participated in the study. Participation was voluntary and none of the participants had taken part in the previous experiment.
Table 3 Mean durations of segments and pauses (and standard deviations in parentheses) as well as results of a 2 (Ambiguity) 2 (Addressee) 2 (Attachment) within-subjects ANOVA and percent of speakers with pitch peaks on the preposition with as a function of addressee, context ambiguity and attachment for nonmothers addressing an imaginary child and an adult confederate in Experiment 3, *p < .05, **p < .01, ***p < .001. Adult-directed Instrument
Child-directed Modifier
Instrument
Ambiguity Addressee Attachment ANOVA Modifier
Ambiguous context Unambiguous context
Vowel duration verb touch (N = 24) 66 (17) 65 (21) 66 (14) 61 (18) 61 (14) 67 (13)
67 (18) 64 (15)
n.s.
Ambiguous context Unambiguous context
Post-verbal pause duration (N = 24) 49 (30) 55 (35) 48 (26) 50 (24) 54 (33) 63 (86)
59 (24) 56 (24)
n.s.
Ambiguous context Unambiguous context
Post-nominal pause duration (N = 24) 99 (84) 71 (69) 168 (116) 96 (74) 100 (77) 153 (126)
102 (72) 108 (115)
Addressee: F(1, 23) = 8.7, p < .01 Attachment: F(1, 23) = 6.3, p < .05 Addressee Attachment: F(1, 23) = 5.7, p < .05
Ambiguous context Unambiguous context
Vowel duration with (N = 24) 35 (8) 36 (4) 37 (8) 34 (7)
39 (10) 44 (17)
35 (7) 41 (13)
Addressee: F(1, 23) = 4.6, p < .05
Ambiguous context Unambiguous context
% Pitch peaks on with 0 16.6 12.5 16.6
29.2 33.3
20.8 12.5
Ambiguous context Unambiguous context
53.3 (10.9) 49.5 (10.4)
59.7 (13.0) 59.2 (11.8)
45.8 (10.9) 45.0 (14.7)
49.8 (12.8) 45.7 (13.1)
Addressee: F(1, 23) = 5.6, p < .05 Attachment: F(1, 23) = 12.4, p < .01 Addressee Attachment: F(1, 23) = 11.4, p < .01
V. Kempe et al. / Journal of Memory and Language 62 (2010) 204–225
Materials The recordings of the 192 syntactically ambiguous target sentences produced by the non-mothers in Experiment 3a were used in the same way as in Experiment 1b. Procedure The procedure was identical to Experiment 1b. Results and discussion The percentages of correct responses are given at the bottom of Table 3, as are the results of a 2 (Ambiguity) 2 (Addressee) 2 (Attachment) ANOVA. As in Experiment 1b, percent of correct interpretations stayed close to chance suggesting the prosodic cues in general appeared to be not very informative. However, in contrast to Experiment 1b, here we found a main effect of addressee suggesting that prosodic clarity improved in general when the non-mothers pretended to address a small child. There was also a main effect of attachment indicating that comprehension was in general better for instrument than for modifier sentences. Finally, these two main effects were qualified by an interaction which reflects the fact that in CDS, comprehension accuracy improved for instrument sentences, F(1, 23) = 21.8, p < .001, but not for modifier sentences, p = .3. This mirrors the pattern of post-nominal pause durations obtained in Experiment 3a confirming that the non-mothers lengthened the post-nominal pause indicating high attachment (instrument interpretation) when addressing an imaginary child but left the duration of the post-nominal pause unchanged in modifier sentences. In sum, non-mothers displayed more salient and reliable prosodic marking of the intended syntactic structure when addressing an imaginary child. Since setting and instructions were identical to Experiment 2, the mere fact that speakers addressed imaginary interlocutors and did not receive action feedback cannot be responsible for the production differences between the mothers in Experiment 2 and the non-mothers in Experiment 3. Still, although CDS directed to imaginary interlocutors is known to display many typical characteristics of CDS albeit in attenuated form (Jacobson et al., 1983), addressing imaginary children is fairly unnatural. In Experiment 4, we therefore decided to try to replicate this finding in the ecologically more valid situation of interaction of non-mothers with real children.
Experiment 4: non-mothers addressing a child and an adult In this experiment, we recruited mothers with their 2– 4-year old children, and matched them with a non-mother who was unacquainted with the child. The goal was to examine whether non-mothers provide exaggerated prosodic disambiguation cues when addressing a real child. However, since the mothers were already present in the laboratory, we used this opportunity to replicate the find-
213
ings of Experiment 1a by examining whether the mothers again failed to provide clearer prosodic cues in CDS. Participants Twenty-four mothers, mean age 36 years, range 28–46 years, and their children, mean age 1;9 years, range 1;8–4;1 years, as well as 24 non-mothers, mean age 23 years, range 18–33 years, participated in the experiment. The mothers and their children were recruited in various day care centers and playgroups, and were paid GBP 10.00 for their participation. The non-mothers were recruited on site or by word of mouth. The non-mothers were not previously acquainted with the mothers and their children. All participants were native speakers of English and had not participated in any of the previous experiments. Materials The materials were identical to Experiment 1. Procedure The mothers were asked for their consent to have their child interact with the non-mother. Mother and nonmother were seated in front of a table with the two arrays of toys placed next to each other. The children were seated on the mothers’ lap. Both adults were told that the goal of the study was to see whether children of the age between 2 and 4 years are able to follow pre-specified game instructions, and whether it made a difference if the instructions were given by a familiar person like the mother or by a stranger. First, the head-mounted microphone was fitted onto the non-mother and the booklet with the instruction sentences and the target action was placed next to her on a chair out of view of the child. Instructions to the nonmother were identical to the ones given in Experiment 1. In the ADS condition, the non-mother was instructed to address the mother, who was asked to act out the instructions. In the CDS condition, the non-mother was asked to address the child. After the non-mother had finished the interaction, the mother was fitted with the head-mounted microphone, and the instructions were repeated. In the ADS condition, the mother was asked to address the nonmother, who also was asked to act out the instructions. In the CDS condition, the mother was asked to address her child. Order of ADS and CDS was counterbalanced for mothers and non-mothers using the same explanation as in Experiment 1a. Results and discussion Non-mothers The means for vowel and pause durations as well as percent of pitch accents are given in Table 4, along with the results of the 2 (Addressee) 2 (Ambiguity) 2 (Attachment) ANOVA. Three pauses could not be analysed because the speakers paraphrased the sentences. There was again a main effect of addressee on the post-nominal pauses indicating slower speech in CDS. The main effect
214
V. Kempe et al. / Journal of Memory and Language 62 (2010) 204–225
Table 4 Mean durations of segments and pauses (and standard deviations in parentheses) as well as results of a 2 (Ambiguity) 2 (Addressee) 2 (Attachment) within-subjects ANOVA and percent of speakers with pitch peaks on the preposition with as a function of addressee, context ambiguity and attachment for nonmothers addressing real child and the child’s mother in Experiment 4, *p < .05, **p < .01, ***p < .001. Adult-directed Instrument
Child-directed Modifier
Instrument
Ambiguity Addressee Attachment ANOVA Modifier
Ambiguous context Unambiguous context
Vowel duration verb touch (N = 22) 58 (15) 58 (15) 65 (35) 60 (21) 57 (17) 65 (30)
66 (40) 58 (25)
n.s.
Ambiguous context Unambiguous context
Post-verbal pause duration (N = 22) 23 (24) 31 (28) 27 (31) 31 (34) 49 (68) 28 (24)
28 (27) 50 (36)
Attachment: F(1, 21) = 8.1, p < .01 Ambiguity: F(1, 21) = 6.7, p < .05
Ambiguous context Unambiguous context
Post-nominal pause duration (N = 23) 49 (52) 58 (69) 105 (120) 42 (40) 40 (35) 107 (147)
61 (63) 50 (49)
Addressee: F(1, 22) = 6.5, p < .05 Attachment: F(1, 22) = 7.7, p < .05 Addressee Attachment: F(1, 22) = 7.5, p < .05
Ambiguous context Unambiguous context
Vowel duration with (N = 21) 29 (6) 29 (10) 29 (12) 28 (6)
36 (20) 36 (20)
34 (22) 35 (24)
n.s.
Ambiguous context Unambiguous context
% Pitch peaks on with 37.5 33.3 16.7 25.0
4.2 8.3
4.2 20.8
n.s.
of attachment on the post-verbal pauses was due to longer post-verbal pauses in modifier sentences compared to instrument sentences, a pattern that is expected if speakers use these pauses for prosodic disambiguation although in the previous experiments we did not observe any effects of disambiguation in the post-verbal pauses. There was also a main effect of ambiguity indicating that post-verbal pauses generally were longer in sentences with unambiguous referential context, an unexpected finding that is difficult to interpret but not crucial to the main goal of the experiment. For the post-nominal pauses, in addition to the main effect of addressee there was a main effect of attachment due to longer post-nominal pauses in the instrument sentences, suggesting use of these pauses for prosodic marking. Crucially, as in Experiment 3a, the interaction between Addressee and Attachment was significant in the post-nominal pauses. Separate analyses showed that in CDS, non-mothers lengthened the post-nominal pauses in instrument sentences, F(1, 22) = 8.3, p < .01, but not for modifier sentences, p = .5. Looking at it another way, this means that the effect of attachment was significant in CDS, F(1, 23) = 9.3, p < .01, but not in ADS, p = .7. The percentage of non-mothers producing a pitch accent on the preposition with per condition is given in the bottom of Table 4. Overall, the non-mothers employed pitch accents 17% of times in instrument sentences and 21% of times in modifier sentences. The planned comparisons between instrument and modifier conditions for sentences with ambiguous and unambiguous context in ADS and CDS using Wilcoxon’s Signed Ranks test showed no significant differences (all p’s > .3). In sum, non-mothers showed evidence for prosodic disambiguation by lengthening post-verbal pauses in modifier sentences and post-nominal pauses in instrument sentences. Crucially, these speakers lengthened the postnominal pauses in the CDS instrument sentences but not in the CDS modifier sentences thereby providing a clear prosodic cue in the former and avoiding a misleading cue
in the latter. Unlike in Experiment 3a, in this experiment prosodic disambiguation was restricted to durational cues. Still, the post-nominal pause durations of the non-mothers replicate the findings of Experiment 3a in a more ecologically valid situation of direct interaction with a real child.
Mothers Table 5 shows the means for vowel and pause durations as well as the percent of pitch accents and the results of the 2 (Addressee) 2 (Ambiguity) 2 (Attachment) ANOVA. One post-verbal and one post-nominal pause duration could not be measured as the speaker paraphrased the sentence. Again, we found main effects of addressee indicating longer vowel durations and longer post-nominal pauses in CDS. This replicates quite closely what was found for the mothers tested in Experiment 1a: mothers lengthened pauses and segments when addressing their children, irrespective of intended attachment of the prepositional phrase thereby providing a misleading cue in the modifier sentences. The lack of any interaction between Addressee and Attachment shows that mothers did not exaggerate prosodic disambiguation cues when addressing their child. The main effect of ambiguity in the post-verbal pauses was due to longer pauses when the context was unambiguous, a finding that is similar to what has been found in the non-mothers addressing the child. This was specified by an interaction between Ambiguity and Attachment. Separate ANOVAs for ambiguous and unambiguous sentences showed that the effect of attachment fell short of significance in the unambiguous sentences, F(1, 23) = 2.9, p = .1, indicating a trend towards longer post-verbal pauses in modifier sentences when the context was unambiguous. This suggests that there was a tendency to disambiguate between the two interpretations when the context allowed only for one interpretation, another result that is counter to what would be predicted if speakers engaged in audience design.
V. Kempe et al. / Journal of Memory and Language 62 (2010) 204–225
215
Table 5 Mean durations of segments and pauses (and standard deviations in parentheses) as well as results of a 2 (Ambiguity) 2 (Addressee) 2 (Attachment) within-subjects ANOVA and percent of speakers with pitch peaks on the preposition with as a function of addressee, context ambiguity and attachment for mothers addressing their child and another adult in Experiment 4, *p < .05, **p < .01, ***p < .001. Adult-directed Instrument
Child-directed Modifier
Instrument
Ambiguity Addressee Attachment ANOVA Modifier
Ambiguous context Unambiguous context
Vowel duration verb touch (N = 22) 58 (15) 58 (15) 76 (32) 55 (11) 62 (19) 72 (28)
70 (30) 75 (27)
Addressee: F(1, 21) = 12.1, p < .01
Ambiguous context Unambiguous context
Post-verbal pause duration (N = 23) 23 (25) 16 (23) 38 (31) 26 (28) 53 (84) 42 (55)
33 (36) 56 (58)
Ambiguity: F(1, 22) = 4.6, p < .05 Ambiguity Attachment: F(1, 22) = 4.6, p < .05
Ambiguous context Unambiguous context
Post-nominal pause duration (N = 23) 55 (46) 53 (53) 68 (57) 59 (78) 52 (54) 132 (261)
103 (175) 145 (195)
Addressee: F(1, 22) = 5.6, p < .05
Ambiguous context Unambiguous context
Vowel duration with (N = 20) 30 (8) 30 (7) 32 (10) 30 (11)
39 (22) 45 (27)
34 (9) 41 (26)
Addressee: F(1, 19) = 7.0, p < .05
Ambiguous context Unambiguous context
% Pitch peaks on with 12.5 8.3 20.8 12.5
8.3 12.5
16.7 12.5
The percentage of mothers producing a pitch accent on the preposition with per condition is given in the bottom of Table 5. Overall, the mothers employed pitch accents 14% of times in instrument sentences and 13% of times in modifier sentences. Planned comparisons between instrument and modifier conditions for sentences with ambiguous and unambiguous context in ADS and CDS using Wilcoxon’s Signed Ranks test showed no significant differences (all p’s > .3). Thus, mothers did not use pitch accents to disambiguate between the two interpretations, neither when addressing an adult nor when addressing their child. Together, the findings of Experiment 4 showed that nonmothers lengthened post-nominal pauses differentially so as to provide a clearer prosodic cue to low attachment while mothers lengthened these pauses indiscriminately thereby producing a misleading cue in these sentences. Note that because the focus was on the speech of the non-mothers, the mothers were always tested second and had the opportunity to witness the use of disambiguating prosody by the non-mothers before producing the target sentences themselves. Still, in their own CDS, these mothers did not exaggerate the prosodic cues to the intended syntactic structure.
Joint Analysis of Experiments 1–4 In order to directly compare the effects of maternity and presence of child on the extent to which speakers provided more pronounced prosodic cues to syntactic structure in CDS we combined the data of all 120 speakers, and conducted a joint analysis with the factors Maternity and Child Presence as between-subjects factors. This resulted in a 2 (Addressee: adult vs. child) 2 (Ambiguity: ambiguous context vs. unambiguous context) 2 (Attachment: high vs. low) 2 (Maternity: mother vs. non-mother) 2 (Child Presence: real vs. imaginary) mixed-type ANOVA which can potentially yield a number of effects and interactions, not all of which are of interest in this context. For example,
the interaction between Maternity and Child Presence, observed for three of the parameters, indicates simply that the speaker groups differed in overall speech rate. Specifically, mothers addressing a real child (Experiments 1a and 4) and non-mothers addressing an imaginary child (Experiment 3a) happened to exhibit a slower overall speech rate. Since evidence for more exaggerated prosodic disambiguation in CDS was found in both groups of non-mothers, and not in any of the three groups of mothers, the observed group differences in speech rate do not constitute confounds and cannot be responsible for the differences in prosodic disambiguation in CDS. Crucial for the exploration of differences in prosodic disambiguation between the four speaker groups are the interactions of the within-subjects factors with the between-subjects factors, and these are the effects we will be focussing on below. Table 6 presents the results of the joint analysis for the various durational parameters. As expected, all parameters showed a main effect of addressee indicating that pauses and vowel durations were longer in CDS than in ADS. The most salient durational prosodic cue was the length of the post-nominal pause. Here, the interactions between Addressee and Attachment and between Attachment and Maternity were qualified by a significant three-way interaction between Addressee, Attachment and Maternity, which is depicted in Fig. 4. This interaction confirms that in CDS, only the non-mothers produced longer post-nominal pauses in instrument sentences (F(1, 46) = 19.3, p < .001) than in modifier sentences, p = .1, while the mothers increased the length of post-nominal pauses in CDS compared to ADS both for instrument sentences (F(1, 71) = 10.2, p < .01) and for modifier sentences (F(1, 70) = 5.6, p < .05). There was also an interaction between Ambiguity and Maternity in the post-nominal pauses indicating that the mothers produced longer postnominal pauses when the context was ambiguous than when it was unambiguous (F(1, 70) = 8.1, p < .01) while no such difference was present in the non-mothers, p = .7. However, due to different first nouns in the ambiguous
216
V. Kempe et al. / Journal of Memory and Language 62 (2010) 204–225
Table 6 Results of the joint analysis. Duration, df
Main effects
Vowel touch (1, 107)
Addressee: F = 11.8**
Two-way interactions
Post-verbal pause (1, 113)
Ambiguity: F = 8.4** Addressee: F = 8.7**
Three-way interactions
Maternity Child Presence: F = 36.4***
Post-nominal pause (1, 114) Addressee: F = 16.0*** Ambiguity Maternity: F = 4.5* *
Attachment: F = 6.1
Attachment Maternity: F = 4.4 Addressee Attachment: F = 5.1* Maternity Child Presence: F = 10.5**
p < .05. p < .01. p < .001.
duration of noun pause in ms
***
Addressee Attachment Maternity: F = 4.4* *
Mothers
160 140 120 100 80 60 40 20 0 ADS
instrument
CDS modifier
duration of noun pause in ms
*
Five-way: F = 4.4*
Addressee: F = 15.2*** Addressee Ambiguity: F = 4.3* Attachment: F = 8.0** Maternity Child Interl.: F = 18.4***
Vowel with (1, 101)
**
>Three-way interactions
160 140 120 100
Non-mothers
80 60 40 20 0 ADS instrument
CDS modifier
Fig. 4. Duration of post-nominal pauses of instrument and modifier sentences in ADS and CDS for the mothers (left panel) and the non-mothers (right panel). Error bars represent 1 SEM.
and unambiguous conditions effects of ambiguity at this site may be spill-over effects associated with the different intrinsic vowel lengths of the first noun and should therefore be viewed with caution. To compare pitch accents on the preposition with, we also performed a series of planned comparisons between instrument and modifier interpretation in ambiguous and unambiguous sentences in the adult-directed and child-directed condition for mothers and non-mothers separately using Wilcoxon’s Signed Rank test. Neither for mothers, nor for non-mothers did any of the comparisons reach significance, all p’s > .1. The planned comparisons for the real and the imaginary Child Presence conditions separately revealed significantly more pitch accents on the preposition with in instrument sentences compared to modifier sentences when the context was unambiguous and the speakers addressed an imaginary child, Z = 2.5, p < .05. This is an interesting finding as it suggests that even though pitch accents were used very infrequently in this study, they tended to occur slightly more often in conditions in which they were needed the least, namely, when the context provided disambiguating information and when no real interlocutor was present. Overall, the speakers did not produce reliable prosodic disambiguation cues when addressing an adult interlocutor although descriptively, the pattern of prosodic cues is syntactically appropriate for each of the intended interpre-
tation. Still, a joint 2 (Ambiguity) 2 (Attachment) 2 (Maternity) 2 (Child Presence) ANOVA conducted just for the post-nominal pauses – the most informative cue – in the ADS sentences separately showed that the main effect of attachment was not significant, p > .6. This lack of reliable disambiguation cues is in line with Allbritton et al. (1996) and in contrast to Snedeker and Trueswell (2003) and Kraljic and Brennan (2005). In Kraljic and Brennan (2005), speakers disambiguated prosodically regardless of context ambiguity; in Snedeker and Trueswell (2003), speakers disambiguated when the context was ambiguous. Because Kraljic and Brennan (2005) demonstrated that disambiguation can take place without ambiguity awareness we do not believe that the failure to observe reliable prosodic disambiguation in ADS in this study can be explained by lack of ambiguity awareness on the speakers’ part. It should be pointed out, however, that there were main effects of attachment in the expected direction in one of the mother groups (Experiment 1) and both of the non-mother groups (Experiments 3 and 4) which fell short of significance when examined for the ADS conditions separately. This suggests that the trend towards prosodic disambiguation may have been too weak due to the restricted numbers of speakers and sentences in each experiment. To accommodate the children, we had presented a markedly lower number of target sentences which may have made it more difficult for speakers
V. Kempe et al. / Journal of Memory and Language 62 (2010) 204–225
to adopt a strategy of producing consistent prosodic cues over the course of the experiment. Also, Experiments 2 and 3a did not provide explicit action feedback in the ADS condition which may have attenuated the speakers’ prosodic clarity. In addition, there may be dialectological differences in the use of prosody between the American English speakers in the studies cited above and our speakers who spoke different varieties of British English. Systematic differences in prosodic realizations of tonal contours and rhythmic structures of utterances between varieties of English (Bouzon, Auran, & Hirst, 2004) as well as differences in emotional display rules between the US and British culture, which may result in British speakers being prosodically more ‘reserved’, could have contributed to the general attenuation of prosodic cues, and particularly the low frequency of pitch accents, in ADS. This conjecture certainly needs further systematic research comparing the use of prosodic cues between different varieties of English and between different languages. It should be borne in mind, however, that examining prosodic disambiguation in ADS was not the aim of this study and that ADS was included to obtain a baseline against which to compare the prosody of CDS. In this respect, the joint analysis confirmed that only the non-mothers, but not the mothers, increased the difference in post-nominal pause duration between instrument and modifier sentences in CDS thereby providing a clear prosodic disambiguation cue.1 Mothers, in contrast, lengthened segments and pauses irrespective of underlying syntactic structure which resulted in misleading prosodic cues for modifier sentences.2 This is unlikely due to differences in ambiguity awareness between the two groups as the non-mothers did not show increased prosodic disambiguation in ADS. Moreover, that the differences between mothers and non-mothers in CDS prosody were observed regardless of whether the child interlocutor was real or imaginary. Similarity between real and imaginary CDS has been documented previously (Jacobson et al., 1983), and suggests that speakers’ CDS may, in part, be determined by their acquired notions about appropriateness of certain speech modifications, their assumptions about the needs and cognitive abilities of small children, their attitudes towards childrearing practices (Simmons & Johnston, 2007) or, in the case of mothers, their habits in addressing their child. Such 1 Recall that vowel duration on the first noun was not analysed due to the use of different nouns in the ambiguous and unambiguous sentences. However, joint analyses for ambiguous and unambiguous sentences separately confirmed that mothers did not use this cue for disambiguation while non-mothers showed a tendency towards longer first noun vowels in CDS instrument sentences in unambiguous sentences, F(1, 45) = 4.2, p < .05, and a trend towards disambiguation in the ambiguous sentences, F(1, 46) = 2.8, p = .1. 2 In this study, age and motherhood were confounded; mothers tended to be older than non-mothers. It could be argued that speaker age may have influenced the findings. However, matching mothers and non-mothers for age would have resulted in undesirable selection biases related to features of middle-aged non-mothers or of younger mothers. Still, to see whether there is an effect of age independently of motherhood, we performed joint analyses for mothers and non-mothers using age (below vs. above median) as a factor. These analyses showed that neither in the mothers nor in the non-mothers did age interact with addressee and attachment (both p’s > .32). This suggests that age differences were not responsible for the observed findings.
217
similarities in communicative behavior addressed to real and imaginary interlocutors have been shown elsewhere (e.g. Ferreira, Slevc, & Rogers, 2005) and are not uncommon in referential communication tasks. Before discussing potential explanations for these differences, it is important to explore whether the use of prosodic cues in this study could have been discouraged by the structure and length of the sentences. Lack of prosodic disambiguation may have been a result of the limited length of the first potential prosodic constituent which encompassed only one syllable (touch) and therefore did not constitute a viable prosodic phrase. This may have rendered prosodic breaks after the verb infelicitous (Milotte et al., 2007), and mothers may have been especially reluctant to produce sentences with unnatural prosody. On the other hand, Snedeker and Trueswell (2003) observed clear prosodic breaks depending on sentence interpretation both after a short verb and after the noun when speakers were aware of the ambiguity, suggesting that ambiguity awareness may have led speakers to override natural prosodic phrasing. We would like to argue, however, that the duration of the first prosodic phrase need not affect the viability of the prosodic break after the noun. Speakers have the option to break the sentence after the noun into two intonational phrases of 3–4 syllables each or to maintain one phrase of 6–7 syllables length. If speakers intend to convey a modifier interpretation they should be more likely to avoid this prosodic break altogether. Thus, in the sentences tested here, the post-nominal prosodic break is the main prosodic cue of interest, and this cue, measured as duration of the post-nominal pause, was more informative in the CDS of non-mothers. In future studies, it would be desirable to examine longer sentences that do not discourage prosodic breaks after the verb as in You can feel the frog with the feather (Snedeker & Yuan, 2008) or Try to touch the cat with the spoon. One could also argue that if speakers use pauses to decrease the speech rate, as would be expected in CDS, and if the sentence structure permits only one prosodically appropriate site then speakers may opt to put the pause there even if it is syntactically inappropriate. However, for this to be an explanation for the differences in prosodic disambiguation in CDS between mothers and non-mothers one would have to show that mothers decreased their speech rate more than non-mothers when addressing the child, but there was no evidence for this as the lack of an interaction between Addressee and Maternity on any of the durational measures suggests. Note that there was also no difference between the mothers and the non-mothers in the overall lengthening of postnominal pauses in CDS compared to ADS, p = .8; what differed was how they distributed pause durations in relation to the intended sentence interpretation. So why did the mothers not provide prosodic cues that would differentiate between instrument and modifier interpretation? Mothers and non-mothers differ from one another, among other things, in two crucial characteristics: mothers tend to have a strong emotional bond with their child and they tend to have more experience in caretaking and using CDS. Jacobson et al. (1983) studied the effect of experience on CDS by comparing F0 and F0 variability in the CDS of parents and non-parents, and found no
218
V. Kempe et al. / Journal of Memory and Language 62 (2010) 204–225
differences between these two groups. In contrast to our study, the infant and child addressees in Jacobson et al. (1983) were not the parents’ own but unfamiliar children allowing for the assessment of experience with CDS independently from the affective bond between adult and child. Their findings suggest that experience per se may not have much of an effect on the prosodic characteristics of CDS and is not responsible for the differences between CDS of mothers and non-mothers. We would like to suggest that differences in production of informative prosody may be related to how affectively charged a speakers’ CDS is. Neuro-imaging evidence showing activation in the orbito-frontal cortex and in rewardrelated regions of the brain confirms that the sight of their own child is a very potent positive mood inducer for mothers (Bartels & Zeki, 2004; Nitschke et al., 2004; Strathearn, Li, Fonagy, & Montague, 2008). Consequently, maternal CDS is predominantly speech expressing positive affect (Singh et al., 2002; Trainor et al., 2000). Indeed, when verbal affect is controlled, few differences remain between maternal CDS and emotional ADS, both in terms of acoustic features (Trainor et al., 2000) as well as with respect to infant preferences (Singh et al., 2002). If mothers display strong affective prosody in CDS this may take priority over linguistic prosody and influence the use of prosodic cues to syntactic structure, in contrast to non-mothers who are less likely to experience such a strong affective bond with the child. The finding that presence or absence of the child did not seem to affect the difference in prosodic disambiguation in CDS between mothers and non-mothers suggests that mothers may display strong positive affect in a variety of child-related settings which are not necessarily contingent on the presence of their own child. To obtain more direct evidence for the existence of a potential trade-off between affective and linguistic prosody, in the final experiment, we obtained ratings of the degree of positive affect expression in CDS, and correlated them with a measure for the degree of prosodic disambiguation in CDS. Experiment 5: rating expressed speaker happiness Method Participants Fifty participants, mainly staff and students at various Psychology departments, participated in the rating study. All participants were native speakers of English. Materials The five sets of 192 target sentences produced during Experiments 1–4 were presented to five different groups of listeners consisting of ten participants each. Procedure Participants were seated in front of a computer, provided with Beyerdynamic DT 250 high-quality headphones, instructed to listen to the set of 192 sentences, and to rate each sentence with respect to how happy the speaker sounded on a scale from 1 (very unhappy) to 7 (very happy). Once participants had entered their rating
Table 7 Mean difference in happiness ratings (1–7) between CDS and ADS and standard deviations (in parentheses) for mothers and non-mothers as a function of Child Presence.
Real child Imaginary child
Mothers
Non-mothers
0.54 (0.56), N = 48 0.40 (0.65), N = 24
0.24 (0.48), N = 24 0.24 (0.54), N = 24
using the corresponding number keys, the next item followed after an ISI of 500 ms. The 192 sentences per set were randomized individually for each participant. The whole procedure lasted about 10 min. Results and discussion For each sentence, we computed the mean happiness rating for 10 participants. Next, for each of the 120 speakers from Experiments 1–4, we computed the mean difference in happiness ratings between CDS and ADS, which provided a measure of the extent to which the speakers expressed more positive affect in CDS compared to ADS. The means for this measure, for mothers and non-mothers as a function of child presence, are given in Table 7. A 2 (Maternity) 2 (Child Presence) ANOVA with difference in happiness ratings between CDS and ADS as dependent variable yielded a main effect of maternity, F(1, 116) = 4.6, p < .05, indicating that the mothers displayed more positive affective prosody in CDS compared to the ADS baseline than the non-mothers, regardless of presence or absence of the child. Next, we computed a measure of the extent to which speakers amplified durational prosodic disambiguation cues in CDS compared to ADS, focussing on the pauses since vowel durations had not shown any clear effects of prosodic disambiguation. For each sentence, we computed the difference between post-nominal pauses and post-verbal pauses. The obtained difference for modifier sentences was subtracted from the obtained difference for instrument sentences. We then subtracted the obtained value in ADS from the one in CDS to quantify whether the difference between instrument and modifier pause durations was more pronounced in CDS. If there is a trade-off between affective prosody and linguistic prosody then this measure should be the smaller the more speakers increase positive affect expression in CDS. Indeed, we found a significant negative correlation between the relative increase in positive affect expression and the relative increase in prosodic disambiguation in CDS, r = .28, p < .01, N = 120. This correlation, depicted in Fig. 5, supports the notion of a trade-off between how much speakers used prosody for positive affect expression and how much they used it for disambiguation.3 Finally, we explored whether perceived happiness was related to pitch contours. In conversational ADS, imperative 3 A second measure of relative increase in prosodic disambiguation which normalized the pause difference for speech rate by dividing the differences between post-nominal and post-verbal pause durations by sentence duration, also yielded a negative correlation with positive affect expression, r = .24, p < .01, N = 120.
219
Disambiguation (CDS-ADS) in ms
V. Kempe et al. / Journal of Memory and Language 62 (2010) 204–225
Happiness Rating (CDS-ADS) Fig. 5. Scatterplot of difference in perceived happiness ratings between CDS and ADS vs. difference in degree of disambiguation using durational cues between CDS and ADS for all 120 speakers from Experiments 1–4. For calculation of degree of disambiguation see text.
sentences usually exhibit a declining intonation contour. In CDS, sentence-final pitch increase is a common feature which serves to capture the child’s attention. To quantify the amount of declining vs. rising pitch, we subtracted the mean F0 of the vowel of the first noun from the mean F0 of the stressed vowel of the second noun, and converted the difference to semitones to account for the non-linearity of pitch perception. A negative value indicates a declining intonation contour; a positive value indicates sentence-final F0 rise. In CDS, this measure ranged from 9.4 semitones to 10.9 semitones with a mean of 0.4 semitones, and was positively correlated with perceived speaker happiness, r = .43, p < .001, N = 120, and negatively correlated with degree of prosodic disambiguation, r = .39, p < .001, N = 120. This indicates that positive affect expression was associated with sentence-final F0 rise, and that the tradeoff between affective and linguistic prosody may manifest itself in a trade-off between pitch modification and use of durational prosodic cues: the use of exaggerated pitch contours in affectively colored speech may come at the expense of efficient use of durational cues and be associated with an indiscriminate lengthening of pauses regardless of sentences structure. The sentence-final F0 rise may also explain why pitch accents on the preposition with were not employed more frequently in CDS: we had defined pitch accents on with as instances in which the F0 of the vowel in with was higher than the F0 of the stressed vowels of the first and the second noun. A strong tendency towards an utterance-final F0 rise precludes the occurrence of pitch accents in the middle of the sentence. The notion of a trade-off between affective and linguistic prosody requires that the obtained correlations do not just reflect group differences between mothers and non-
mothers, which may be due to other factors, but also hold within each of the groups. For the mothers, increase in perceived happiness from ADS to CDS was negatively correlated with difference in amount of disambiguation between CDS and ADS, r = .23, p < .05, N = 72. Final pitch increase in the CDS of mothers was positively correlated with perceived happiness in CDS, r = .43, p < .001, N = 72, and negatively correlated with amount of disambiguation in CDS, r = .49, p < .001, N = 72. This pattern of correlations for the mothers confirms the trade-off between affective and linguistic prosody. For the non-mothers, the negative correlation between increase in perceived happiness in CDS and disambiguation fell short of significance, r = .23, p = .1, N = 48, but there was a positive correlation between final pitch increase and perceived happiness in CDS, r = .42, p < .01, N = 48. There was no correlation between final pitch increase and disambiguation in CDS which may be due to a floor effect since the non-mothers tended not to increase pitch towards the end of the sentence nearly as much as the mothers (the mean F0 difference between first and second noun in CDS was 1.2 semitones for the mothers which was significantly higher than the mean difference of 0.9 semitones for the nonmothers, t(118) = 2.8, p < .01) and to the smaller sample size. Still, these correlations and trends provide evidence for a genuine trade-off between affective and linguistic prosody. Finally, we were interested to see whether the features of CDS were linked to the age of the child given that the function of CDS is assumed to change from fostering an affective bond with pre-verbal infants to providing more support of linguistic and cognitive development once the children start producing their first words (Fernald, 1992).
220
V. Kempe et al. / Journal of Memory and Language 62 (2010) 204–225
The ages of the children of the 48 mothers tested in Experiments 1a and 4 ranged from 2;8 years to 4;1 years. Clearly, the children in our study were well past the prelinguistic stage and were all producing language to some extent. Still, it can be assumed that if mothers emphasise the didactic aspects of CDS that serve to support linguistic development with toddlers they will continue to increase these efforts the older the children get (Kitamura, Thanavishuth, Burnham, & Luksaneeyanawin, 2002). If this is correct, we would expect a decrease of expressed positive affect and an increase of prosodic disambiguation with increasing age of the child. Indeed, there was a negative correlation between child age and the difference between perceived happiness in maternal CDS and ADS, r = .35, p < .05, N = 48, while the positive correlation between child age and prosodic disambiguation fell short of significance, r = .28, p = .05, N = 48. Thus, the older the children the less did mothers express positive affect in their CDS and the stronger the trend to provide prosodic disambiguation cues. The correlational data suggest that increasing affect expression in CDS is associated with reduced disambiguating prosody, and that this association weakens as children get older. It could be argued that if affect expression attenuates prosodic disambiguation then there should have been clear prosodic disambiguation in ADS. A number of reasons that may have precluded both mothers and nonmothers from producing very clear disambiguation cues in the presumably affectively more neutral ADS have already been discussed above. However, given that at least a trend towards a syntactically appropriate distribution of durational cues was apparent in ADS and that it was not possible to control speaker mood in the adult-directed condition, the finding that the change in perceived affect expression from ADS to CDS was related to degree of prosodic disambiguation in CDS can be taken as preliminary evidence for a trade-off between affective and linguistic features of CDS prosody.
General discussion This study investigated whether speakers exaggerate prosodic cues to syntactic structure when addressing small children. In four experiments, we instructed female speakers to produce syntactically ambiguous sentences addressing an adult and a child. Only half of the sentences could be disambiguated by the referential context, the other half was fully ambiguous. As discussed above, in ADS prosodic disambiguation, although showing the syntactically appropriate pattern, was not reliable. In CDS, we observed reliable prosodic disambiguation when female non-mothers addressed small children. These speakers used durational prosodic cues for disambiguation such that they lengthened the post-nominal pause when an instrument interpretation was intended, in contrast to the modifier sentences, where post-nominal pauses were not lengthened compared to ADS. We also found some evidence that the non-mothers used pitch accents on the preposition with to signal an instrument interpretation, although this cue was not used as frequently and consistently as the
durational cue. Interestingly, the non-mothers provided prosodic disambiguation cues in CDS regardless of whether the context was ambiguous or not indicating that they did not engage in audience design, i.e. the production of prosodic cues was not related to awareness of an addressee’s need for disambiguation, a result that supports findings by Kraljic and Brennan (2005). Apparently, non-mothers used prosodic disambiguation as a general strategy to maximise communicative effectiveness when addressing a small child, imaginary or real. The mothers, in contrast, failed to produce reliable prosodic disambiguation cues in CDS. Instead, they lengthened pause durations in general. It could be argued that the mothers’ failure to disambiguate prosodically may be related to differences in socio-economic status or level of education. While we did not obtain data on these variables from our speakers we think this an unlikely explanation as all speakers were drawn from the same population, and there was a considerable number of university staff among the mothers as well as the non-mothers. Instead, we suggest that the nature of maternal CDS may differ from the CDS of non-mothers: mothers have a biological incentive to experience, express and maintain a strong affective bond with their child, and their CDS is governed by positive affect expression (Singh et al., 2002; Trainor et al., 2000). Affective prosody, in turn, may attenuate the production of prosodic cues to syntax. Specifically, linguistic prosody related to the marking of syntactic structure may be overridden by a general tendency to reduce the speech rate and to lengthen segments and pauses resulting in durational cues that may be misleading in certain contexts. Similarly, the tendency to raise pitch to communicate positive affect may limit the salience of pitch accents. Support for this comes from Experiment 5, where we found that positive affect expression, operationalized as perceived happiness of an utterance, was higher in the CDS of the mothers than that of the non-mothers, and that a negative correlation between expressed positive affect and final pitch raise on one hand and amount of prosodic disambiguation on the other hand could be demonstrated for all speakers and for mothers alone. Thus, speakers who emphasise affect expression in CDS may do so at the expense of linguistic prosody and emphasising affect expression is more common for mothers than for non-mothers. Our findings cast doubt on the idea that there is a qualitative shift in the function of maternal CDS from maintaining an emotional and social bond to supporting linguistic and cognitive development around the time when children start to speak. It seems that the affective features of CDS persist even when children start using language and may co-exist with features supporting language development. The idea of continuity of co-existing functions implies that features supporting language development may also be present in CDS directed to pre-linguistic infants, along with affective features. This is supported by the finding that maternal CDS addressed to pre-linguistic infants, which is rich in positive affect expression, also contains hyper-articulated vowels (Kuhl et al., 1997; Burnham et al., 2002), a feature that is thought to support the acquisition of phonetic categories (Liu, Kuhl, & Tsao, 2003).
V. Kempe et al. / Journal of Memory and Language 62 (2010) 204–225
Affective and linguistic prosody utilise the same suprasegmental features such as fundamental frequency of the voice, amplitude, segment and pause duration, but they are governed by different mechanisms. Neurophysiological evidence suggests that the neural substrate responsible for producing affective prosody is mainly located in the right hemisphere (Baum & Pell, 1999), while the generation of linguistic prosody is located in the left hemisphere (Wong, 2002). It has also been suggested that subcortical regions are involved in the processing of affective prosody and that lateralization of prosody depends on the acoustic domain (e.g. pitch vs. timing) and on the size of the structure (e.g. syllable, word or phrase) over which a certain prosodic feature is planned (Wong, 2002). Thus, despite some disagreement in detail, most evidence points towards different neural pathways for linguistic and affective prosody (see also Ziegler, 2003). Independence of affective and linguistic prosody is also supported by the observation that syllable-final F0 rise used to signal a question is diminished if F0 prominence is also used to convey emphatic stress, but not if F0 prominence is used to convey affect, presumably because affectively generated F0 raises are the result of increased subglottal pressure whereas linguistically motivated F0 rises are generated by contractions of laryngeal muscles (McRoberts, Studdert-Kennedy, & Shankweiler, 1995). Constraints on the temporal overlap of adjacent contractions are responsible for the diminishing F0 rises following emphatic stress. However, if different mechanisms are responsible for the generation of F0 modifications for the purposes of affective vs. linguistic prosody, one would expect that affectively charged CDS can also contain linguistically motivated F0 peaks and durational cues, as the different underlying mechanisms of affective and linguistic prosody should allow for both functions to be served simultaneously. It seems, then, that priority of affective vs. linguistic prosody may be related to differences in communicative goals of speakers even if these goals lie outside a speaker’s awareness. Given that interaction with their own small child is a strong positive mood inducer for mothers (Bartels & Zeki, 2004; Nitschke et al., 2004), mothers may prioritize affective prosody over prosodic cues to syntax. Situations in which affective features of maternal CDS come at the expense of features that are beneficial for language development have been observed before. For example, deaf mothers signing towards their children below 2 years of age tend to avoid facial expressions that are used to express wh-questions despite the fact that these expressions are required by the grammar of ASL because these expressions have negative affective connotations (Reilly & Bellugi, 1996). Papousek and Hwang (1991) showed that Chinese mothers reduce lexical tone information in order to preserve the affective intonation of CDS, a strategy that may compromise tonal properties thereby perhaps temporarily delaying their acquisition. Kitamura et al. (2002) found that if Thai mothers restricted the prosodic range of their CDS to preserve the tonal characteristics of their language they compensated by increasing the affective content of their speech. These findings support the idea that positive affect expression appears to be a ubiquitous feature of maternal CDS and
221
may sometimes be employed at the expense of linguistic information. This conclusion should not be taken to imply that affective prosody hinders language development as maternal CDS still contains sufficient prosodic information to support some form of prosodic bootstrapping (Fisher & Tokura, 1996a; Soderstrom et al., 2008). Moreover, there is evidence that some affective features of CDS can also facilitate language acquisition. For example, frequent use of diminutives to express endearment and affection has been shown to aid word segmentation (Kempe, Brooks, & Gillis, 2005; Kempe, Brooks, Gillis, & Samson, 2007) and acquisition of inflectional morphology in a number of languages (Kempe et al., 2009; Savickiene, Kempe, & Brooks, 2009; Ševa et al., 2007). Similarly, affective prosody may facilitate learning by capturing the child’s attention. For example, providing exaggerated prosodic contours associated with happy speech facilitates statistical learning of word boundary cues compared to neutral speech with identical informational content (Thiessen, Hill, & Saffran, 2005). Affective prosody may also support early word learning (Bhullar, 2008, but see Singh, Morgan, & White, 2004). Moreover, lack of affective prosody may be detrimental for learning. For example, associative learning of infants from clinically depressed mothers or fathers, who tend to produce prosodically flat CDS, is impaired if such parental CDS serves as stimulus (Kaplan, Bachorowski, & Zarlengo-Strauss, 1999; Kaplan, Sliter, & Burgess, 2007). It is conceivable that mothers favor affective over linguistic prosody because of their more intimate knowledge of their children’s ability (or lack thereof) to process prosodic cues. Our study was not designed to assess the children’s accuracy in carrying out the instructions. Preliminary observations indicated that the children greatly preferred modifier interpretations regardless of intended sentence structure; presumably because the associated actions were less complex (touching an object with your hand is simpler than picking up one object to touch another object). On the other hand, comprehension studies using a spoken language eye–gaze paradigm have demonstrated that 4–6-year old children are able to use prosodic cues to syntax of the type examined in this study, and that their mothers provide these cues in their CDS (Snedeker & Yuan, 2008). However, the children in our study were considerably younger, and we know of no studies testing the use of prosodic cues by children of this younger age range. Despite the evidence that even very young children prefer speech with prosodic cues at syntactically appropriate locations (e.g. Hirsh-Pasek et al., 1987; Jusczyk et al., 1992), toddlers may not yet be able to use this information to distinguish between different sentence interpretations. Mothers may have tacit knowledge of this, and may therefore opt not to provide prosodic information. This somewhat speculative suggestion clearly needs to be confirmed by further research into 2–4-year old children’s ability to process prosodic cues to syntax and into mothers’ intuitions about their children’s level of prosodic proficiency. Since even small infants are able to distinguish differences in communicative intent such as the difference between approval and disapproval (Fernald, 1993; Papousek, Bornstein, Nuzzo, Papousek, & Symmes, 1990)
222
V. Kempe et al. / Journal of Memory and Language 62 (2010) 204–225
or between comforting, approving and directive intonations (Kitamura & Lam, 2009), mothers may assume that their children are more likely to grasp those aspects of prosody that signal communicative intent rather than the fine-grained particulars of linguistic prosody. Indeed, the finding from Experiment 5 that maternal affect expression decreased, and prosodic disambiguation increased with age of the child lends some credence to the notion that prosodic disambiguation cues in maternal speech may depend on perceived linguistic competence of the child. In contrast, non-mothers with little caretaking experience may overestimate the children’s ability to handle linguistic prosody. Note that in our study, maternity was confounded with familiarity with the child. Thus, it is possible that non-mothers who are familiar with the child or with children of this age group, may have more accurate knowledge of the children’s abilities to process linguistic prosody, and may also fail to provide disambiguating cues if the children are judged to not be able to process them yet. An interesting question is whether more experienced non-mothers will also opt to emphasise affective prosody as mothers do. In future studies, this can be addressed by examining CDS of child minders, nannies, au pairs and nursery teachers. An additional factor that may have contributed to the mothers’ lack of prosodic disambiguation has been suggested by Snedeker (personal communication): mothers may have a greater desire for lexical clarity which may lead them to produce accents on both nouns. This would result in the production of two intonational phrases in CDS regardless of syntactic interpretation. Such a tendency to sacrifice syntactic clarity in favor of lexical clarity is certainly in line with the idea that there may be trade-offs between the different functions of CDS. However, in this particular study, it would not explain the observed relationship between prosodic disambiguation and perceived positive affect. We would like to conclude with two thoughts, one about the nature of CDS and one about the general relationship between emotion and communication. Firstly, the findings presented here may help to refine our notions of CDS in showing that functions and features of CDS can differ depending on the speaker’s relationship to the child. So far, studies of CDS prosody have predominantly examined the CDS of mothers (for an overview see Soderstrom (2007)), and, to a lesser extent that of other next-of-kin such as fathers (Fernald et al., 1989; Papousek, Papousek, & Haekel, 1987; Shute & Wheldall, 1999; Warren-Leubecker & Bohannon, 1984), siblings (Hoff-Ginsberg & Krueger, 1991; Shatz & Gelman, 1973) or grandparents (Shute & Wheldall, 2001). Considerably less is known about the CDS of non-kin speakers which is surprising given how much time children spend in the care of nursery teachers, child minders, baby sitters or nannies who provide a substantial part of language input (Soderstrom, 2007). Studying the CDS of these allo-parents, who may not have a strong affective bond with the child, may elucidate the nature of the knowledge that any given community of speakers shares about features and functions of this speech register (Jacobson et al., 1983). Our findings show that when non-kin speakers address toddlers they are likely
to produce prosodic information at the sub-clausal level that may help children around the age of 2 years and above to consolidate and expand their syntactic knowledge. Maternal CDS, on the other hand, may contain features that facilitate the social and emotional bond between mother and child but are not necessarily always most optimal for language development. At the very least, the results of this study should caution against the assumption that maternal CDS is always the most informative source of input when it comes to exposing structural features of language, and promote a more nuanced view on variants of this register which is important if we want to fully understand the various ways in which CDS can affect language acquisition and child development. Secondly, our findings invite some speculations about the interaction between emotion and language production in general. As suggested earlier, mothers addressing their child are speakers who experience and express strong positive affect (Bartels & Zeki, 2004; Nitschke et al., 2004). Such speakers may reduce the informativeness of their speech by letting affective prosody take priority over linguistically informative prosody. If this outcome is a consequence of the speaker’s emotional state it leads to the intriguing question about how valence of speaker emotion affects language production. Are happy speakers more or less informative in their communication? Is positive affective valence associated with more or less communicative clarity? To date, there is a dearth of studies exploring the relationship between affective valence and informational characteristics of speech. Some indirect evidence comes from social psychological studies which show that speakers who underwent positive mood induction formulated requests in a more direct, almost rude manner and were less persuasive. This was in contrast to speakers who underwent negative mood induction and formulated requests in a more indirect, polite manner, and who were also shown to be more persuasive (Forgas, 1999, 2007). Furthermore, speakers exhibiting extreme degrees of social anxiety, a condition that tends to show co-morbidity with negative affective states like depression, tended to be more realistic in the appraisal of their communicative success, specifically, the amount of prosodic disambiguation they had provided, than speakers scoring very low on measures of social phobia who systematically overestimated their communicative success (Fay, Page, Serfaty, Tal, & Winkler, 2008). While these studies do not provide sufficient evidence for the conjecture that speakers who are experiencing positive affect are slightly less informative in their speech than speakers who are experiencing negative affect they seem to hint at such a relationship. The present findings would certainly fit well into this overall picture: strong positive affect experienced by mothers appears to be associated with priority of affective prosody over more informative linguistic prosody. Due to the nature of our study, our findings are not suited to shed light on the effects of negative mood rendering the suggestion of a link between affective valence and informativeness of speech at this point a very tentative one. Future studies will have to explore the relationship between affective valence and language production more directly, by using controlled mood induction and by
V. Kempe et al. / Journal of Memory and Language 62 (2010) 204–225
exploring, in addition to prosody, a wider range of ways to resolve ambiguities, for example through the use of optional function words to resolve syntactic ambiguities or of paraphrases and modifiers to resolve lexical ambiguities. If, as suggested earlier, positive affect expression by itself incurs other social and communicative benefits, such as establishing rapport or attracting an interlocutor’s attention, future research may illuminate how much informativeness, if any, speakers can afford to trade in for these benefits.
Acknowledgments We gratefully acknowledge funding of this research by the University of Stirling, as well as by grants from the German Academic Exchange Service (DAAD) and the ESRC (PTA-026-27-1465) to the second author. We also would like to thank Gemma Potts, Kerstin Schillinger and Sandés Dindar for help in running the experiments, Felix Schaeffler for assistance with the acoustic analysis, and Jesse Snedeker and three anonymous reviewers for helpful comments on earlier drafts of the paper.
Appendix A. Sentences used in Experiments 1a, 2, 3a and 4.
Booklet 1
Array 1 Touch the snake (filler) Touch the dog with the flower (ambiguous, modifier) Touch the snake and the fish (filler) Touch the fish with the flower (unambiguous, instrument) Touch the fish (filler) Array 2 Touch the pig (filler) Touch the cat with the spoon (ambiguous, instrument) Touch the pig and the frog (filler) Touch the duck with the flower (unambiguous, modifier) Touch the frog (filler)
Booklet 2
Array 1 Touch the snake (filler) Touch the dog with the flower (ambiguous, instrument) Touch the snake and the fish (filler) Touch the horse with the spoon (unambiguous, modifier) Touch the fish (filler) Array 2 Touch the pig (filler) Touch the cat with the spoon (ambiguous, modifier) Touch the pig and the frog (filler) Touch the frog with the spoon (unambiguous, instrument) Touch the frog (filler)
223
References Allbritton, D. W., McCoon, G., & Ratcliff, R. (1996). Reliability of prosodic cues for resolving syntactic ambiguity. Journal of Experimental Psychology: Learning, Memory and Cognition, 22, 714–735. Bartels, A., & Zeki, S. (2004). The neural correlates of maternal and romantic love. NeuroImage, 21, 1155–1166. Baum, S., & Pell, M. (1999). The neural bases of prosody: Insights from lesion studies and neuroimaging. Aphasiology, 13, 581–608. Bhullar, N. (2008). Effects of facial and vocal emotion on word recognition in 11-to-13-month-old infants. Dissertation Abstracts International: Section B: The Sciences and Engineering, 69, 3293. Boersma, P., & Weenink, D. (2005). Praat: Doing phonetics by computer (Version 4.3.14).
Retrieved 26.05.05 [Computer program]. Bouzon, C., Auran, C., & Hirst, D. (2004). Comparative approaches to prosody across varieties of English. Tribune des Langues Vivantes, 36, 123–138. Brent, M. R., & Siskind, J. M. (2001). The role of exposure to isolated words in early vocabulary development. Cognition, 81, B33–B44. Broen, P. (1972). The verbal environment of the language-learning child. Monograph of the American Speech and Hearing Association, No. 17. Washington, DC: American Speech and Hearing Society. Burnham, D., Kitamura, C., & Vollmer-Conna, U. (2002). What’s new, pussycat? On talking to babies and animals. Science, 296, 1435. Christophe, A., Nespor, M., Guasti, M. T., & Van Ooyen, B. (2003). Prosodic structure and syntactic acquisition: The case of the head-direction parameter. Developmental Science, 6, 211–220. Clifton, Ch., Carlson, K., & Frazier, L. (2002). Informative prosodic boundaries. Language and Speech, 45, 87–114. Cooper, W. E., & Paccia-Cooper, J. (1980). Syntax and speech. Cambridge: Harvard University Press. Dale, P. S. (1974). Hesitations in maternal speech. Language and Speech, 17, 74–181. Fay, N., Page, A. C., Serfaty, C., Tal, V., & Winkler, C. (2008). Speaker overestimation of communication effectiveness and fear of negative evaluation: Being realistic is unrealistic. Psychonomic Bulletin & Review, 15, 1160–1165. Ferguson, C. A. (1977). Baby talk as a simplified register. In C. A. Ferguson & C. Snow (Eds.), Talking to children: Language input and acquisition. New York: Cambridge University Press. Fernald, A. (1989). Intonation and communicative intent in mothers’ speech to infants: Is the melody the message? Child Development, 60, 1497–1510. Fernald, A. (1992). Human vocalizations to infants as biologically relevant signals: An evolutionary perspective. In J. H. Barkow, L. Cosmides, & J. Tooby (Eds.), The adapted mind: Evolutionary psychology and the generation of culture (pp. 391–428). New York: Oxford University Press. Fernald, A. (1993). Approval and disapproval: Infant responsiveness to vocal affect in familiar and unfamiliar languages. Child Development, 64, 657–674. Fernald, A., & McRoberts, G. (1996). Prosodic bootstrapping: A critical analysis of the argument and the evidence. In J. L. Morgan & K. Demuth (Eds.), Signal to syntax: Bootstrapping from speech to grammar in early acquisition (pp. 365–388). Hillsdale: Lawrence Erlbaum Associates. Fernald, A., & Simon, T. (1984). Expanded contours in mothers’ speech to newborns. Developmental Psychology, 20, 104–113. Fernald, A., Taeschner, T., Dunn, J., Papousek, M., Boysson-Bardies, B., & Fukui, I. (1989). A cross-language study of prosodic modifications in mothers’ and fathers’ speech to preverbal infants. Journal of Child Language, 16, 477–501. Ferreira, F. (1993). Creation of prosody during sentence production. Psychological Review, 100, 233–253. Ferreira, V. S., Slevc, R. L., & Rogers, E. S. (2005). How do speakers avoid ambiguous linguistic expressions? Cognition, 96, 263–284. Fisher, C., & Tokura, H. (1996a). Acoustic cues to grammatical structure in infant-directed speech: Cross-linguistic evidence. Child Development, 67, 3192–3218. Fisher, C., & Tokura, H. (1996b). Prosody in speech to infants: Direct and indirect cues to syntactic structure. In J. L. Morgan & K. Demuth (Eds.), Signal to syntax: Bootstrapping from speech to grammar in early acquisition (pp. 343–364). Hillsdale: Lawrence Erlbaum Associates. Forgas, J. P. (1999). On feeling good and being rude: Affective influences on language use and request formulations. Journal of Personality and Social Psychology, 76, 928–939. Forgas, J. P. (2007). When sad is better than happy: Negative affect can improve quality and effectiveness of persuasive messages and social
224
V. Kempe et al. / Journal of Memory and Language 62 (2010) 204–225
influence strategies. Journal of Experimental Social Psychology, 43, 513–528. Hirsh-Pasek, K., Kemler Nelson, D., Jusczyk, P., Wright, K., Druss, B., & Kennedy, L. (1987). Clauses are perceptual units for prelinguistic infants. Cognition, 26, 269–286. Hoff-Ginsberg, E., & Krueger, W. (1991). Older siblings as conversational partners. Merrill-Palmer Quarterly, 37, 465–482. Huttenlocher, J., Vasilyeva, M., Waterfall, H. R., Vevea, J. L., & Hedges, L. V. (2007). The varieties of speech to children. Developmental Psychology, 43, 1062–1083. Jacobson, J. L., Boersma, D. C., Fields, R. B., & Olson, K. L. (1983). Paralinguistic features of adult speech to infants and small children. Child Development, 54, 436–442. Jusczyk, P. W., Hirsh-Pasek, K., Kemler-Nelson, D., Kennedy, L., Woodward, A., & Piwoz, J. (1992). Perception of acoustic correlates of major phrasal units by young infants. Cognitive Psychology, 24, 252–293. Kaplan, P. S., Bachorowski, J.-A., & Zarlengo-Strauss, P. (1999). Childdirected speech produced by mothers with symptoms of depression fails to promote associative learning in 4-month-old infants. Child Development, 70, 560–570. Kaplan, P. S., Sliter, J. K., & Burgess, A. P. (2007). Infant-directed speech produced by fathers with symptoms of depression: Effects on infant associative learning in a conditioned-attention paradigm. Infant Behavior and Development, 30, 535–545. Kempe, V. (2009). Child-directed speech prosody in adolescents: Relationship to 2D:4D, empathy, and attitudes towards children. Personality and Individual Differences, 47, 610–615. Kempe, V., Brooks, P. J., & Gillis, S. (2005). Diminutives in child-directed speech supplement metric with distributional word segmentation cues. Psychonomic Bulletin & Review, 12, 145–151. Kempe, V., Brooks, P. J., Gillis, S., & Samson, G. (2007). Diminutives facilitate word segmentation in natural speech: Cross-linguistic evidence. Memory & Cognition, 35, 762–773. Kempe, V., Ševa, N., Brooks, P. J., Mironova, N., Pershukova, A., & Fedorova, O. (2009). Elicited production of case-marking in Russian and Serbian children: Are diminutive nouns easier to inflect? First Language, 29, 147–165. Kitamura, C., & Lam, C. (2009). Age-specific preferences for infantdirected affective intent. Infancy, 14, 77–100. Kitamura, C., Thanavishuth, C., Burnham, D., & Luksaneeyanawin, S. (2002). Universality and specificity in infant-directed speech: Pitch modifications as a function of infant age and sex in a tonal and nontonal language. Infant Behavior and Development, 24, 372–392. Kraljic, T., & Brennan, S. E. (2005). Prosodic disambiguation of syntactic structure: For the speaker or for the addressee? Cognitive Psychology, 50, 194–231. Kuhl, P. K., Andruski, J. E., Chistovich, I. A., Chistovich, L. A., Kozhevnikova, E. V., Ryskina, V. L., et al. (1997). Cross-language analysis of phonetic units in language addressed to infants. Science, 277, 684–686. Lieberman, P. (1996). Some biological constraints on the analysis of prosody. In J. L. Morgan & K. Demuth (Eds.), Signal to syntax: Bootstrapping from speech to grammar in early acquisition (pp. 55–65). Hillsdale: Lawrence Erlbaum Associates. Liu, H., Kuhl, P. K., & Tsao, F. (2003). An association between mothers’ speech clarity and infants’ speech discrimination skills. Developmental Science, 6, F1–F10. Locke, J. (2001). First communication: The emergence of vocal relationships. Social Development, 10, 294–308. McRoberts, G., Studdert-Kennedy, M., & Shankweiler, D. (1995). The role of fundamental frequency in signalling linguistic stress and affect: Evidence for a dissociation. Perception & Psychophysics, 159–174. Milotte, S., Wales, R., & Christophe, A. (2007). Phrasal prosody disambiguates syntax. Language and Cognitive Processes, 22, 898– 909. Morgan, J. L., & Demuth, K. (1996). Signal to syntax: Bootstrapping from speech to grammar in early acquisition. Hillsdale: Lawrence Erlbaum Associates. Nespor, M., & Vogel, I. (1986). Prosodic phonology. Dordrecht: Foris. Nitschke, J., Nelson, E., Rusch, B., Fox, A., Oakes, T., & Davidson, R. (2004). Orbitofrontal cortex tracks positive mood in mothers viewing pictures of their newborn infants. NeuroImage, 21, 583–592. Papousek, M., Bornstein, M. H., Nuzzo, C., Papousek, H., & Symmes, D. (1990). Infant responses to prototypical melodic contours in parental speech. Infant Behavior and Development, 13, 539–545. Papousek, M., & Hwang, S.-F. C. (1991). Tone and intonation in Mandarin baby talk to presyllabic infants: Comparison with registers of adult conversation and foreign language instruction. Applied Psycholinguistics, 12, 481–504.
Papousek, M., Papousek, H., & Haekel, M. (1987). Didactic adjustments in fathers’ and mothers’ speech to their 3-month-old infants. Journal of Psycholinguistic Research, 16, 491–516. Price, P., Ostendorf, M., Shattuck-Hufnagel, S., & Fong, C. (1991). The use of prosody in syntactic disambiguation. Journal of the Acoustical Society of America, 90, 2956–2970. Reilly, J. S., & Bellugi, U. (1996). Competition on the face: Motherese in ASL. Journal of Child Language, 23, 219–239. Sambeth, A., Ruohio, K., Alku, P., Fellman, V., & Huotilainen, M. (2008). Sleeping newborns extract prosody from continuous speech. Clinical Neurophysiology, 119, 332–341. Savickiene, I., Kempe, V., & Brooks, P. J. (2009). Acquisition of gender agreement in Lithuanian: Exploring the effect of diminutive usage in an elicited production task. Journal of Child Language, 36. Schober, M. F., & Brennan, S. E. (2003). Processes of interactive spoken discourse: The role of the partner. In A. C. Graesser, M. A. Gernsbacher, & S. R. Goldman (Eds.), Handbook of discourse processes (pp. 123–164). Hillsdale, NJ: Lawrence Erlbaum. Seidl, A. (2007). Infants’ use and weighting of prosodic cues in clause segmentation. Journal of Memory and Language, 57, 24–48. Selkirk, E. O. (1984). Phonology and syntax. The relation between sound and structure. Cambridge, MA: MIT Press. Ševa, N., Kempe, V., Brooks, P. J., Mironova, N., Pershukova, A., & Fedorova, O. (2007). Cross-linguistic evidence for the diminutive advantage: Gender agreement in Russian and Serbian children. Journal of Child Language, 34, 111–131. Shattuck-Hufnagel, S., & Turk, A. E. (1996). A prosody tutorial for investigators of auditory sentence processing. Journal of Psycholinguistic Research, 25, 193–247. Shatz, M., & Gelman, R. (1973). The development of communication skills: Modifications in the speech of young children as a function of listener. Monographs of the Society for Research in Child Development, 38, 1–37. Shute, B., & Wheldall, K. (1999). Fundamental frequency and temporal modifications in the speech of British fathers to their children. Educational Psychology, 19, 221–233. Shute, B., & Wheldall, K. (2001). How do grandmothers speak to their grandchildren? Fundamental frequency and temporal modifications in the speech of British grandmothers to their grandchildren. Educational Psychology, 21, 493–503. Simmons, N., & Johnston, J. (2007). Cross-cultural differences in beliefs and practices that affect the language spoken to children: Mothers with Indian and Western heritage. International Journal of Language and Communication Disorders, 42, 445–465. Singh, L., Morgan, J. L., & Best, C. (2002). Infants’ listening preferences: Baby talk or happy talk? Infancy, 3, 365–394. Singh, L., Morgan, J. L., & White, K. S. (2004). Preference and processing: The role of speech affect in early spoken word recognition. Journal of Memory and Language, 51, 173–189. Snedeker, J., & Trueswell, J. (2003). Using prosody to avoid ambiguity: Effects of speaker awareness and referential context. Journal of Memory and Language, 48, 103–130. Snedeker, J., & Yuan, S. (2008). Effects of prosodic and lexical constraints on parsing in young children (and adults). Journal of Memory and Language, 58, 574–608. Soderstrom, M. (2007). Beyond baby talk: Re-evaluating the nature and content of speech input to preverbal infants. Developmental Review, 27, 501–532. Soderstrom, M., Blossom, M., Foygel, R., & Morgan, J. L. (2008). Acoustical cues and grammatical units in speech to two preverbal infants. Journal of Child Language, 35, 869–902. Soderstrom, M., Seidl, A., Kemler Nelson, D. G., & Jusczyk, P. W. (2003). The prosodic bootstrapping of phrases: Evidence from prelinguistic infants. Journal of Memory and Language, 49, 249–267. Strathearn, L., Li, J., Fonagy, P., & Montague, P. R. (2008). What’s in a smile? Maternal responses to infant facial cues. Pediatrics, 122, 40–51. Thiessen, E. D., Hill, E. A., & Saffran, J. R. (2005). Infant-directed speech facilitates word segmentation. Infancy, 7, 53–71. Trainor, L., Austin, C., & Desjardins, R. (2000). Is infant-directed speech prosody a result of the vocal expression of emotion? Psychological Science, 11, 180–195. Warren-Leubecker, A., & Bohannon, J. N. (1984). Intonation patterns in child-directed speech: Mother–father differences. Child Development, 55, 1379–1385. Watson, D., Breen, M., & Gibson, E. (2006). The role of obligatoriness in the production of intonational boundaries. Journal of Experimental Psychology: Learning, Memory and Cognition, 32, 1045–1156. Watson, D., & Gibson, E. (2004). The relationship between intonational phrasing and syntactic structure in language production. Language and Cognitive Processes, 19, 713–755.
V. Kempe et al. / Journal of Memory and Language 62 (2010) 204–225 Werker, J. F., & McLeod, P. J. (1989). Infant preference for both male and female infant-directed talk: A developmental study of attentional affective responsiveness. Canadian Journal of Psychology, 43, 230–246.
225
Wong, P. (2002). Hemispheric specialization of linguistic pitch patterns. Brain Research Bulletin, 59, 83–95. Ziegler, W. (2003). Speech motor control is task-specific: Evidence from dysarthria and apraxia of speech. Aphasiology, 17, 3–36.