Journal of Phonetics 54 (2016) 51–67
Contents lists available at ScienceDirect
Journal of Phonetics journal homepage: www.elsevier.com/locate/phonetics
Research Article
Accounting for variability in North American English /ɹ/: Evidence from children's articulation Lyra Magloughlin University of Ottawa, Department of Linguistics, Arts Hall, 70 Laurier Ave. East, Room 401, Ottawa, ON, Canada K1N 6N5
A R T I C L E
I N F O
Article history: Received 17 July 2014 Received in revised form 24 July 2015 Accepted 25 July 2015 Available online 6 October 2015 Keywords: Variability Articulation Ultrasound Rhotic English Acquisition
A B S T R A C T
This acoustic and articulatory pilot study examines the North American English /ɹ/ productions of Englishspeaking children during acquisition, and compares their early- and later-stage productions with /ɹ/ allophony patterns reported in previous studies with adults. Ultrasound imaging is used to investigate the articulatory behavior of four children, aged 3–6 years, during production of familiar lexical items containing prevocalic, postvocalic, and syllabic /ɹ/. Shape analysis of the tongue is conducted using a technique that is highly robust against rotational and translational differences from token to token. Participants exhibited behaviors that are consistent with adults' in previous studies, showing both intra- and inter-speaker variability, and similar patterns of allophony based on syllable position, consonant place of articulation, and vowel quality. For three participants, variable behavior occurred prevocalically, in contexts where adults tend to exhibit the greatest amount of allophonic variation. Variable behavior during acquisition of an articulatorily complex speech sound provides a plausible explanation for the variability that has been previously reported with adults. If a child's dominant strategy for reaching adult-like targets proves ineffective in certain contexts, that may motivate exploratory behavior that could lead to a stable alternative strategy in those contexts over time. Participants' later-stage productions mirror allophony patterns observed with adults in previous studies. The current research adds to the literature on children's articulatory behavior during acquisition, and to the body of accumulated knowledge on North American English /ɹ/. & 2015 Elsevier Ltd. All rights reserved.
1. Introduction This acoustic and articulatory pilot study examines the North American English /ɹ/ productions of four English-speaking children during acquisition and compares their early production strategies with /ɹ/ allophony patterns observed in previous studies with adults. North American English /ɹ/ is of interest in adult populations because it exhibits acoustic stability (e.g. low F3) despite considerable articulatory variability both within and between speakers (Delattre & Freeman, 1968; Guenther et al., 1999; Mielke, Baker, & Archangeli, 2010, 2016; Westbury, Hashi, & Lindstrom, 1998). In children, /ɹ/ is often one of the last sounds to be acquired (Sander, 1972; Smit, 1993), especially in prevocalic position (McGowan, Nittrouer, & Manning, 2004; Smit, Hand, Freilinger, Bernthal, & Bird, 1990; Stoel-Gammon, 1985). Tiede, Boyce, Espy-Wilson, and Gracco (2011) have suggested that children might attempt different vocal tract configurations during an “exploratory period” (p. 65) in acquisition, particularly in contexts where the articulatory demands are greater. Variable behavior during acquisition of an articulatorily complex speech sound provides a plausible explanation for the /ɹ/ allophony patterns observed in previous studies with adults. The current research uses ultrasound imaging to investigate the articulatory behavior of four English-speaking children, aged 3–6 years, during production of familiar lexical items containing /ɹ/. Children's early-stage /ɹ/ productions are examined and compared with their later-stage productions. Shape analysis of the tongue is conducted using a technique that is highly robust against rotational and translational differences from token to token (Ménard, Aubin, Thibeault, & Richard, 2012). Children's variable behavior is interpreted in reference to previously published work on tongue shapes used by adults when producing /ɹ/. Findings suggest that adult variability may emerge in childhood during an exploratory period in acquisition, as proposed by Tiede et al. (2011). Not only do participants exhibit behaviors that are consistent with adults' in previous studies (e.g. Mielke et al., 2010, 2016; Westbury et al., 1998), they also
E-mail address:
[email protected] 0095-4470/$ - see front matter & 2015 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.wocn.2015.07.007
52
L. Magloughlin / Journal of Phonetics 54 (2016) 51–67
demonstrate variable behavior in contexts where they have not yet acquired adult-like /ɹ/, and where adults tend to show the greatest amount of allophony (Mielke et al., 2010, 2016). While there is a growing body of literature on articulatory variability in adult production of /ɹ/ (see Mielke et al., 2016), and on /ɹ/ production in children experiencing phonological delay (e.g. Adler-Bock, Bernhardt, Gick, & Bacsfalvi, 2007; Bacsfalvi, 2010; Bernhardt, Gick, Bacsfalvi, & Adler-Bock, 2005; Davidson, Klein, & Grigos, 2007; Klein, McAllister Byun, Davidson, & Grigos, 2013; McAllister Byun & Hitchcock, 2012; McAllister Byun, Hitchcock, & Swartz, 2014), there remains virtually no articulatory data on children's production of /ɹ/ during typical development. This research makes a small contribution to the literature on children's articulation of this speech sound, and to the body of accumulated knowledge on North American English /ɹ/.
2. Background North American English /ɹ/ is an approximant speech sound that is often one of the last to be acquired by children (Sander, 1972; Smit, 1993), particularly in prevocalic position (McGowan et al., 2004; Smit et al., 1990; Stoel-Gammon, 1985). Its articulation involves two points of lingual constriction, in the pharyngeal space and along the palate, and (frequently) a third point of constriction at the lips, making it an articulatorily complex sound (Delattre & Freeman, 1968; Gick, 1999). Studies with adults show considerable inter- and intra-speaker articulatory variability during production (Delattre & Freeman, 1968; Mielke et al., 2016; Westbury et al., 1998). As illustrated in the /ɹ/ taxonomy presented in Fig. 1 (Delattre & Freeman, 1968), adults use different tongue shapes to produce /ɹ/, ranging from bunched postures with the tongue tip pointing down, to more retroflex postures with the tongue tip pointing up. Indeed, the bunched (tip-down) vs. retroflex (tip-up) distinction has remained an important one (Derrick & Gick, 2011; Hagiwara, 1995; Mielke, Baker, & Archangeli, 2016; Stavness, Gick, Derrick, & Fels, 2012), and while some speakers use exclusively one tongue shape across contexts, others employ different tongue shapes in different contexts (Mielke et al., 2016). Despite this articulatory variability both within and between speakers, /ɹ/ exhibits a strong degree of acoustic stability (Delattre & Freeman, 1968). 2.1. Acoustics of adult /ɹ/ One of the earliest acoustic studies of North American English /ɹ/ was conducted by Lehiste (1962), who performed a spectrographic analysis of the /ɹ/ productions of five Midwestern American speakers. Participants produced 135 target words, and midpoint formant values (F1–F3) were collected for each. Lehiste found that, even across a range of contexts, allophones of /ɹ/ had enough features in common to describe them as “phonetically similar” (p. 109). These common features included a low third formant and a relatively small difference in frequency between F2 and F3. In a larger-scale acoustic and articulatory examination of /ɹ/, Delattre and Freeman (1968) used cineradiography and magnetic tape recording to capture the /ɹ/ productions of 46 men and women living in different regions of the United States. They proposed eight tongue shapes (Fig. 1) to describe the various articulatory strategies employed by participants during production of 32 common English words containing /ɹ/. Acoustically, the study confirmed what had already been attested (Delattre, 1951; Lehiste, 1962; Potter, Kopp, & Kopp, 1947): that low F3 serves as an important acoustic characteristic of /ɹ/. Unlike Lehiste (1962), however, who had speculated that acoustic similarities may be due to retroflexion, Delattre and Freeman found that the F1, F2, and F3 frequencies of /ɹ/ s produced with bunched vs. retroflex tongue shapes were not significantly different from one another.
Fig. 1. Adapted from Delattre and Freeman (1968) /ɹ/ taxonomy (p. 41, images have been rotated to be rightward facing); types 4 and 7 illustrate the classic bunched/retroflex distinction (types 1, 2, and 8 represent shapes observed in non-rhotic varieties of English).
L. Magloughlin / Journal of Phonetics 54 (2016) 51–67
53
More recently, Zhou et al. (2008) conducted an acoustic and MRI study of higher order formants in the /ɹ/ productions of six speakers: three retroflexers and three bunchers, and compared these results to those obtained from computer vocal tract modeling. Results showed that F4 and F5 were consistently further apart in retroflex /ɹ/ as compared to bunched /ɹ/. The authors attributed this difference to the fact that F3, F4, and F5 are back cavity resonances and the size and shape of the back cavity changes depending on whether a retroflex or bunched /ɹ/ is produced. However, the authors suggested that while the spacing of F4–F5 may be a reliable way to distinguish between tongue shapes, F1–F3 remain critical for the perception of /ɹ/. This observation is consistent with the work of Twist, Baker, Mielke, and Archangeli (2007), who found that bunched and retroflex tongue shapes are perceptually difficult to distinguish, which may be due to the fact that the most salient acoustic marker of /ɹ/, namely low F3, is observed across tongue shapes (Delattre & Freeman, 1968; Lehiste, 1962; Mielke et al., 2016; Thomas, 1947; Westbury et al., 1998). 2.2. Acoustics of child /ɹ/ One of the earliest acoustic studies of children's /ɹ/ was conducted by Dalston (1975), who compared mean F1, F2, and F3 values for English sonorants /w/, /ɹ/ and /l/ in adults and children. Five adult participants read a list of 29 stimuli words, and 10 children between the ages of 3 and 4 identified stimuli presented as pictures. Dalston reported that children's correct productions of wordinitial targets showed formants with the same distinguishing features as adults, including low mean F3 values for /ɹ/ (∼2500 Hz). In a longitudinal study of typically developing children, aged 14–31 months, McGowan et al. (2004) measured average F2 and F3 frequencies and F3–F2 frequency distances in children's prevocalic, postvocalic, medial syllabic, and final syllabic /ɹ/ productions over time. The authors reported lower mean F3 values and smaller F3–F2 distances in children's postvocalic and syllabic /ɹ/s than in their prevocalic productions, suggesting that certain aspects of /ɹ/ production were acquired earlier in postvocalic and syllabic /ɹ/ contexts. More recently, in work with children experiencing phonological delay, Adler-Bock et al. (2007) reported that adolescents receiving treatment for misarticulation of /ɹ/ showed a drop in mid-point F3 in their post-treatment, on-target productions of /ɹ/, in line with previous acoustic research on adult production of /ɹ/ (e.g. Lehiste, 1962; Delattre & Freeman, 1968). Idemaru and Holt (2013) examined children's perception of synthesized /l/ and /ɹ/ during acquisition and found that F3 was a more robust acoustic cue than F2 in distinguishing between the two phonemes for children as young as 4 years of age. In addition, the authors found that the use of F2 as a secondary acoustic cue occurred at a much later stage of development, beginning as late as 8 or 9 years of age. Taken together, findings delineate F3 as an important acoustic cue for children during acquisition of /ɹ/, both in terms of production and perception. 2.3. Adult articulation of /ɹ/ Delattre and Freeman (1968) proposed eight tongue shapes to describe the variable articulatory strategies employed by the participants in their study (Fig. 1). Although not considered to be an exhaustive representation of tongue shapes used to produce /ɹ/ (see, e.g. Westbury et al., 1998), the Delattre and Freeman taxonomy remains a useful framework for describing the distinguishing articulatory features of /ɹ/, including: the presence or absence of pharyngeal and labial constriction, the part of the tongue (dorsum, blade, or tip) used to produce a palatal constriction, the presence or absence of tongue concavity, and a bunched posture with the tongue tip pointing down (away from the palate) vs. a retroflex posture with the tongue tip pointing up (toward the palate). 2.3.1. Inter- and intra-speaker articulatory variability The Delattre and Freeman (1968) study was the first of its kind to present clear evidence of inter- and intra-speaker articulatory variability in /ɹ/ production across phonetic contexts. The authors reported, for example, that many Americans used exclusively bunched (type 4) tongue shapes in prevocalic and postvocalic position, while others only bunched postvocalically, and although some speakers only produced retroflex (type 7) postures word-initially, one speaker used retroflex tongue shapes in all positions. Hagiwara (1995) also observed variability within and between speakers in a study with 15 American English participants. The study involved inserting a cotton swab into each participant's mouth during /ɹ/ production in order to determine whether contact was made on the surface (bunched, tip-down), underside (retroflex, tip-up), or tip (blade-up) of the tongue. Hagiwara reported that nine out of 15 participants produced retroflex, tip-up tongue shapes across contexts, five participants produced bunched, tip-down tongue shapes in syllabic and final contexts and blade-up shapes in initial contexts, and one participant produced bunched, tip-down tongue shapes in initial contexts, and blade-up shapes in syllabic and final contexts. In an x-ray microbeam study of 53 mid-western American English speakers, Westbury et al. (1998) also reported inter- and intraspeaker variability during production of /ɹ/, but found no evidence of a link between tongue shape and oral cavity size, gender, dialect, or formant frequencies. While the authors did observe speakers with bunched and retroflex tongue shapes across a variety of contexts, they argued against making typological inferences based on data from a limited pool of speakers (p. 221). More recently and consistent with previous studies, Mielke et al. (2010) reported articulatory variability, both within and between speakers. In an ultrasound study of 27 American English participants, two produced exclusively retroflex, tip-up /ɹ/, 14 produced only tip-down /ɹ/, and the remaining 11 used different combinations of both bunched and retroflex /ɹ/. 2.3.2. Contextual constraints on variability Many of the studies described above also reported contextual constraints on variability, with more retroflexion observed in onsets and next to back vowels, and less retroflexion near linguals. Delattre and Freeman (1968, pp. 60–66) reported retroflexion rates that
54
L. Magloughlin / Journal of Phonetics 54 (2016) 51–67
were highest in prevocalic position and next to labials, and lowest postvocalically and next to velars. Westbury et al. (1998, p. 222) reasoned that lingual stops before /ɹ/, which engage the tongue, may lead speakers to adopt alternate /ɹ/ production strategies in those contexts, in contrast with labial stops preceding /ɹ/ that do not engage the tongue and allow for “greater coarticulatory freedom”. Mielke et al. (2010, 2016) found that participants who retroflexed were far more likely to do so in prevocalic contexts and after labials, and least likely to do so before high front vowels and next to linguals, especially coronals, where bunching was more common. The authors also observed that bunched /ɹ/s often occurred next to segments that were produced with a similarly bunched tongue shape (e.g. /ʃ/, /k/, and /i/), a finding that was consistent with Ong and Stone (1998), who demonstrated that vowel context had an effect on whether one participant bunched (between front vowels) or retroflexed (between back vowels). More recently, Stavness et al. (2012) used biomechanical modeling to measure articulatory cost (defined as tongue displacement, relative strain, and relative muscle stress) during production of bunched and retroflex /ɹ/. The authors reported less articulatory cost during production of a retroflex /ɹ/ in the context of /ɑ/ – a back vowel, and during production of a bunched /ɹ/ in the context of /i/ – a front vowel. 2.3.3. Explaining variability Although reported adult allophony patterns suggest constraints on the types of tongue postures used to produce /ɹ/ based on syllable position, place of articulation, and vowel quality, they fail to provide an explanation for why there is such tremendous interand intra-speaker variability across contexts. Tiede et al. (2011) have argued that children might attempt different vocal tract configurations during acquisition of an articulatorily complex speech segment, particularly in contexts where the articulatory demands are greater. In a palatal perturbation study involving adults fitted with false palates to increase articulatory demands during speech, Tiede et al. (2011) found that participants employed a range of strategies during production of [ara], [iri], [uru], once they had determined that their established articulations for /ɹ/ were no longer effective. The authors proposed that participants may have been returning to strategies attempted during an exploratory stage in childhood. 2.4. Child articulation of /ɹ/ Children's acquisition trajectory toward adult-like articulation of /ɹ/ is variable (Sander, 1972; Smit et al., 1990; Templin, 1957). There is evidence to suggest that children show the greatest difficulty producing /ɹ/ in prevocalic position, although the research is inconsistent. Templin (1957) reported that children aged 3–8 years showed correct production of final /ɹ/ before initial /ɹ/, and StoelGammon (1985) found that /ɹ/ almost always appeared first word-finally for 34 normally developing children between the ages of 15 and 24 months. Similarly, in a study with children ranging in age from 3 to 9 years, Smit et al. (1990) observed slower rates of acquisition for word-initial vs. postvocalic /ɹ/, and McGowan et al. (2004) observed that children had attained adult-like postvocalic and syllabic /ɹ/ by 31 months, but showed no evidence of producing prevocalic /ɹ/ at that age. In contrast, Hoffman, Schuckers, and Daniloff (1980) reported that, as a group, children aged 42–62 months tended to acquire consonantal (prevocalic) /ɹ/ before unstressed vocalic /ɚ/, but also observed that two of their eight participants demonstrated the opposite developmental pattern, showing a lag in accurate productions of consonantal /ɹ/. In a study of children's misarticulation of /ɹ/, Curtis and Hardy (1959) found that children had less difficulty producing prevocalic /ɹ/ when it was part of a consonant blend. Although very little articulatory research has been conducted on typically developing children's production of /ɹ/, there is a growing literature on atypical /ɹ/ development in children (e.g. Adler-Bock et al., 2007; Bacsfalvi, 2010; Bernhardt et al., 2005; Davidson et al., 2007; Klein et al., 2013; McAllister Byun et al., 2014; McAllister Byun & Hitchcock, 2012). In a longitudinal ultrasound study of two children with phonological delay, Davidson et al. (2007), and later Klein et al. (2013), observed the development of two distinct bunched vs. retroflex strategies in their participants, despite similar treatment approaches. McAllister Byun et al. (2014) found that children with phonological delay showed greater improvement when provided with opportunities to explore different tongue shapes, rather than being directed to make use of one variant over another. There is some evidence to suggest that children as young as 11 months of age are articulatorily capable of reaching complex adult-like /ɹ/ targets. Gick et al. (2008) used ultrasound to demonstrate that one precocious 11-month-old's production of adult-like /ɹ/ in postvocalic ‘bear’ showed a correspondingly adult-like bunched tongue shape, complete with concavity at the tongue dorsum and a tip-down posture. The current research builds on these findings, adding to the literature on children's articulatory development and to the body of accumulated knowledge on North American English /ɹ/. 3. Methods This acoustic and articulatory study examines the North American English /ɹ/ productions of four English-speaking children over two separate sessions during acquisition, and compares their early production strategies with /ɹ/ allophony patterns observed in adults in previous studies. 3.1. Participants Participants were four English monolingual speakers with no reported speech or hearing disorders, and no known language delays, ranging in age from 3 to 6 years (Table 1). The female participant (Female 1) was born in British Columbia, and lived in Ontario at the time of the study. She spoke English at home (monolingual English parents). The three male participants (Male Twin 1, Male Twin 2, Male 3) were siblings, and included one set of twins whose zygosity was undetermined. The male participants were
L. Magloughlin / Journal of Phonetics 54 (2016) 51–67
55
Table 1 Age of participants, Sessions 1 and 2. Participant
Session 1, age (year;month)
Session 2, age (year;month)
Female 1 Male Twin 1 Male Twin 2 Male 3
4;3 3;8 3;8 5;8
4;9 4;0 4;0 6;0
Fig. 2. Visible physical landmarks used to maintain ultrasound probe alignment along the mid-sagittal plane during recording.
Table 2 Stimuli, organized by context and prevocalic subcontext. Context
Subcontext
Stimuli
Prevocalic
Word-initial Post-labial Post-coronal
rainbow, raisins, red, road, rocks, rooster bread, broom, frog, toothbrush, zebra mushroom, Shrek, string, stripe, tractor, tree, truck cracker, crayons, crown, grass, green ear, car, chair, door, forks, horse, pear bird, cracker, flower, giraffe, pepper, rooster, squirrel, tractor, water
Postvocalic Syllabic
Post-dorsal N/A N/A
born in British Columbia, and lived in Ontario at the time of the study. They spoke English at home (monolingual English mother, Dutch/English bilingual father), and had a ‘passable’ understanding of Dutch, as determined by a language background questionnaire. The oldest male participant (Male 3) had been attending a French immersion program (2.5 h per day) for 6 months at Session 1, and 10 months at Session 2.
3.2. Procedure The experiment was conducted in a soundproof booth at the University of Ottawa Sound Patterns Laboratory. Upon arrival, children were given a small toy they could keep, and parents were asked to complete a consent form and language background questionnaire for each child prior to participation. During the experiment, parents were seated in an optometry chair and participants were seated on their parent's lap with head resting against their parent's chest for stability. A hand-held transducer was positioned under the participant's chin by the researcher and held by the parent or researcher during recording, in order to capture mid-sagittal views of the tongue. The probe was repositioned during the experiment as necessary, in order to maintain visibility of the mid-sagittal plane through visual identification of physical markers such as the hyoid bone and mandible shadows, and the genioglossus muscle (Fig. 2). A computer screen displaying visual images of target stimuli was situated directly opposite the participant, and target items were presented one at a time. The participant was asked to identify photographs of the stimuli and was prompted with e.g. ‘this is a…’. A movie clapper was ‘clapped’ at the beginning of each recording for data synchronization purposes. Ethics approval of the research procedure was obtained through the University of Ottawa Research Ethics Board (REB) and the experiment was conducted in compliance with REB laws and guidelines.
56
L. Magloughlin / Journal of Phonetics 54 (2016) 51–67
All Participants: F3, by Perceptual Coding 4500
F3 at midpoint (Hz)
4000
*
3500
*
3000
2500
2000
adult
near
non
Perceptual coding Fig. 3. Midpoint F3 for participants across sessions, by perceptual coding of target /ɹ/s as adult-like, near-adult-like, and non-adult-like. (Boxes represent interquartile range of values, solid horizontal lines in boxes represent median, whiskers represent range of data points falling within 1.5 times the interquartile range. Circles depict outliers, and asterisks denote significant differences between categories.)
3.3. Stimuli Modelled after an articulatory study on adult production of North American English /ɹ/ (Mielke et al., 2010, 2016), 36 target words with /ɹ/ in prevocalic, postvocalic, and syllabic position were chosen (Table 2), in order to compare children's production patterns with allophonic patterns observed in adults in previous studies (e.g. Delattre & Freeman, 1968; Mielke et al., 2010, 2016; Westbury et al., 1998). Due to limitations on the number of words visually identifiable by children of this age, stimuli included both monosyllabic and bisyllabic words with /ɹ/ in stressed and unstressed syllables and across a wider range of vowel contexts than the Mielke et al. (2016) study. Stimuli consisted of familiar words that were predicted to be in the participants' active vocabularies, with /ɹ/ in prevocalic (e.g. road), postvocalic (e.g. bear), and syllabic (e.g. water) position. Prevocalic targets were further subcategorized into syllable-initial (e.g. rock), post-labial (e.g. frog), post-coronal (e.g. truck), and post-dorsal (e.g. green) positions. The decision to subcategorize the prevocalic context was based on findings from earlier studies showing this context's special status, both in terms of adult allophony patterns (e.g. Delattre & Freeman, 1968; Mielke et al., 2010, 2016), and children's acquisition trajectories (McGowan et al., 2004; Smit et al., 1990; Stoel-Gammon, 1985). Given the paucity of prevocalic, post-coronal tokens, target words like mushroom and Shrek were treated as equivalent, despite obvious differences: the target /ɹ/ in mushroom is in an unstressed syllable and the post-coronal segment it follows is in the preceding syllable. These differences were not shown to be relevant with respect to /ɹ/ tongue shape during analysis, nor were differences in stress (whether the target /ɹ/ was in a stressed or unstressed syllable).
3.4. Recording configuration Audio was recorded using a Shure KSM44/SL large-diaphragm condenser microphone connected to a Sound Devices USBPre microphone preamp and A/D converter; 16-bit audio was captured in mono at a sampling rate of 44.1 kHz using Audacity software. Ultrasound was recorded in real-time using a Terason T3000 portable ultrasound machine and an 8MC4 microconvex ultrasound transducer. Video was recorded with an Imaging Source DFK 21BU04 closed-circuit TV camera. Ultraspeech software (Hueber, Chollet, Denby, & Stone, 2008), running on the Terason T3000, was used to capture ultrasound images and video images directly, as separate .bmp files, at a rate of 30 fps.
3.5. Data synchronization Video and ultrasound images were captured separately using Ultraspeech (Hueber et al., 2008), and synchronized with audio, post-recording. This process involved identifying critical points in time in the audio (.wav), video (.bmp), and ultrasound (.bmp) files, based on information obtained from the recorded ‘clap’ of a movie clapper, which provided precise timing information that was both audible and visible. Ultrasound and video images were then time-aligned and merged into single jpeg images (Figs. 4–10) using a Python script, so that tongue shape information could be compared with lip shape information during data analysis (Mielke, 2015).
L. Magloughlin / Journal of Phonetics 54 (2016) 51–67
57
Fig. 4. Female 1, Session 1, postvocalic target horse, exhibiting a classic type 4 (Delattre & Freeman, 1968) tongue shape, with concavity at the tongue dorsum and a bunched, tip-down posture.
Fig. 5. Male 3, Session 2, syllabic target water, exhibiting a tip-down tongue shape, with concavity at the tongue dorsum and a bunched, tip-down posture.
Fig. 6. Female 1, Session 1, non-adult-like prevocalic post-labial target word frog, showing a backing of the tongue body and constriction at the velum, consistent with what we would expect to see for /w/ – a common articulatory substitution for /ɹ/ (e.g. Smit, 1993).
3.6. Data analysis Data analysis was conducted in three stages: (1) audio (.wav) files of the target words were perceptually coded, (2) these same . wav files were analyzed acoustically and, (3) merged video and ultrasound images were visually coded using the Delattre and Freeman (1968) /ɹ/ taxonomy for initial classification.
3.6.1. Perceptual coding Audio tokens of children's /ɹ/ productions were impressionistically coded by the researcher and one other experienced coder (both native speakers of North American English), during separate, independent sessions. Children's productions of target words were presented one at a time, in random order, using a multiple forced choice design created in Praat (Boersma & Weenink, 2007, “ExperimentMFC”), which prompted coders to: “Listen to the /r/ sound in each word and judge it to be adult-like, near-adult-like, or
58
L. Magloughlin / Journal of Phonetics 54 (2016) 51–67
Fig. 7. Female 1, Session 1, [ʃ] in target mushrooms (top), and prevocalic post-coronal /ɹ/ in target mushrooms (bottom), perceptually coded as near-adult-like, with a tongue shape similar to preceding coronal and not unlike a bunched /ɹ/.
Fig. 8. Adult-like postvocalic /ɹ/ in target word pear (Session 2): Male Twin 1 (top) exhibiting a bunched, tip-down tongue posture, and Male Twin 2 (bottom) exhibiting a retroflex, tip-up tongue posture.
non-adult-like. If you don't know, press the ?. Click to start.” Presentation of tokens was self-paced, and there was no limit on the number of times a coder could replay an individual token sound file prior to coding. Categorizations were compared across coders and only tokens where there was agreement were included for analysis (92% of tokens).
L. Magloughlin / Journal of Phonetics 54 (2016) 51–67
59
Fig. 9. Male Twin 2, Session 1, adult-like prevocalic post-coronal /ɹ/ in target word Shrek (top), and Male Twin 2, Session 2, adult-like prevocalic post-dorsal /ɹ/ in target word crayon (bottom), exhibiting a bunched, tip-down tongue posture and deviating from an otherwise retroflex, tip-up strategy in these contexts.
Fig. 10. Male Twin 2, Session 1, non-adult-like prevocalic post-dorsal target word green, showing a backing of the tongue body, constriction at the velum, and an undifferentiated /ɹ/ tongue shape.
3.6.2. Acoustic analysis Audio recordings were segmented manually in Praat (Boersma & Weenink, 2007) at both the word and phoneme level. First, second, and third formant values at the mid-point of each segmented /ɹ/ token were extracted using linear predictive coding (LPC) via the ‘Sound: To Formants (Burg)’ command in Praat, which is suitable for analyzing children's higher pitched voices because it allows for specification of the maximum formant frequency (Hz). Formant values were confirmed through visual inspection, and hand-coded where necessary to correct for erroneous peak determination (11.3% of tokens). For the purposes of this study, low F3 values (∼2500 Hz, Dalston, 1975) served as an acoustic measure of /ɹ/-ness across contexts. 3.6.3. Articulatory analysis Mid-sagittal images of the tongue appear as a dark region (the tongue) bordered by a white line (the tongue surface), which results from the scattering of ultrasound waves at the tongue-air interface (Stone, 2005). During the articulatory analysis stage, merged video and ultrasound images of /ɹ/ gestures, which typically spanned between two and four frames, were inspected visually to identify the peak (most /ɹ/-like point) of each gesture by examining movement of the tongue (body, root, or tip), and lips (Mielke et al., 2016). This technique is highly robust against rotational and translational differences from token to token (Ménard et al., 2012). Misaligned images were identified as those in which the hyoid bone, mandible (front-to-back alignment), and/or genioglossus muscle (side-toside alignment) were not visible, or a split tongue surface was visible, both indicating that the probe was off-centre at that point in the recording (misaligned), and the image being captured was no longer representative of a mid-sagittal section of the tongue.
60
L. Magloughlin / Journal of Phonetics 54 (2016) 51–67
Table 3 Perceptual coding of participants' /ɹ/ productions, categorized as adult-like, near-adult-like, and non-adult-like, and grouped by context (prevocalic, postvocalic, syllabic) and session. Perceptual coding, grouped by context Adult-like
Near-adult-like
Non-adult-like
Session 1 F1 Prevocalic Postvocalic Syllabic
2/41 26/26 13/13
(5%) (100%) (100%)
14/41 – –
(34%) – –
25/41 – –
(61%) – –
MT1 Prevocalic Postvocalic Syllabic
14/17 9/9 7/7
(82%) (100%) (100%)
1/17 – –
(6%) – –
2/17 – –
(12%) – –
MT2 Prevocalic Postvocalic Syllabic
14/20 5/5 7/7
(70%) (100%) (100%)
1/20 – –
(5%) – –
5/20 – –
(25%) – –
M3 Prevocalic Postvocalic Syllabic
31/31 8/8 5/7
(100%) (100%) (71%)
– – 2/7
– – (29%)
– – –
– – –
Session 2 F1 Prevocalic Postvocalic Syllabic
28/30 10/10 9/9
(93%) (100%) (100%)
1/30 – –
(3%) – –
1/30 – –
(3%) – –
MT1 Prevocalic Postvocalic Syllabic
25/27 8/8 8/8
(93%) (100%) (100%)
1/27 – –
(3%) – –
1/27 – –
(3%) – –
MT2 Prevocalic Postvocalic Syllabic
18/23 8/8 8/8
(78%) (100%) (100%)
2/23 – –
(9%) – –
3/23 – –
(13%) – –
M3 Prevocalic Postvocalic Syllabic
31/31 8/8 10/10
(100%) (100%) (100%)
– – –
– – –
– – –
– – –
Articulatory data were first categorized in independent sessions by the researcher and one other experienced ultrasound coder, based on the Delattre and Freeman (1968) /ɹ/ taxonomy (presented in Fig. 1). Following Westbury et al. (1998), the taxonomy was not treated as an exhaustive list of possible /ɹ/ tongue shapes, but as a useful guide for initial classification. Tongue shapes at the peak of each /ɹ/ gesture were coded as bunched, tip-down (Delattre & Freeman, 1968, type 4), blade-up, tip-down (types 5 and 6), or retroflex, tip-up (type 7). Tongue shapes showing a backing of the tongue body and no discernible /ɹ/ shape were coded as undifferentiated. During subsequent stages of analysis, bunched and blade-up tongue shapes were collapsed into a single, tip-down, category, and an additional category: resembles adjacent, was added to describe tokens that had been perceptually coded as nearadult-like and had tongue shapes that looked similar to the preceding coronal segment and not unlike a bunched /ɹ/.
4. Results 4.1. Perceptual coding summary A total of 191 Session 1 tokens and 180 Session 2 tokens were included for analysis during the perceptual coding stage. Each participant produced between 32 and 49 tokens per session,1 at least 25 of which were adult-like. Tokens were grouped by participant and context (Table 3). In Session 1, participants produced predominantly adult-like /ɹ/s in all but the prevocalic context, where the majority of tokens categorized as near-adult-like and non-adult-like were produced. This was not an unexpected result, given that prevocalic /ɹ/ is often the last to be acquired by children (McGowan et al., 2004; Smit et al., 1990; Stoel-Gammon, 1985). Male 3, the oldest participant, produced 0 tokens coded as non-adult-like and only 2 tokens coded as near-adult-like (Table 5), both in 1
A greater number of tokens (80 in total, including 41 in the prevocalic context) were collected from Female 1, who produced the stimuli twice in Session 1.
L. Magloughlin / Journal of Phonetics 54 (2016) 51–67
61
Table 4 Perceptual coding of participants' prevocalic /ɹ/ productions, categorized as adult-like, near-adult-like, and non-adult-like, and grouped by prevocalic subcontext (syllable-initial, post-labial, post-coronal, post-dorsal) and session. Perceptual coding, grouped by prevocalic subcontext Adult-like
Near-adult-like
Non-adult-like
Session 1 F1 Syllable-initial Post-labial Post-coronal Post-dorsal
– – – 2/6
– – – (33%)
– – 14/23 –
– – (61%) –
6/6 6/6 9/23 4/6
(100%) (100%) (39%) (67%)
MT1 Syllable-initial Post-labial Post-coronal Post-dorsal
5/5 2/2 5/5 2/5
(100%) (100%) (100%) (40%)
– – – 1/5
– – – (20%)
– – – 2/5
– – – (40%)
MT2 Syllable-initial Post-labial Post-coronal Post-dorsal
7/7 2/2 3/4 2/7
(100%) (100%) (75%) (29%)
– – 1/4 –
– – (25%) –
– – – 5/7
– – – (71%)
M3 Syllable-initial Post-labial Post-coronal Post-dorsal
8/8 7/7 7/7 9/9
(100%) (100%) (100%) (100%)
– – – –
– – – –
– – – –
– – – –
Session 2 F1 Syllable-initial Post-labial Post-coronal Post-dorsal
8/9 6/6 8/8 6/7
(89%) (100%) (100%) (86%)
– – – 1/7
– – – (14%)
1/9 – – –
(11%) – – –
MT1 Syllable-initial Post-labial Post-coronal Post-dorsal
6/6 6/6 7/8 6/7
(100%) (100%) (88%) (86%)
– – 1/8 –
– – (12%) –
– – – 1/7
– – – (14%)
MT2 Syllable-initial Post-labial Post-coronal Post-dorsal
6/6 4/4 4/5 4/8
(100%) (100%) (80%) (50%)
– – 1/5 1/8
– – (20%) (13%)
– – – 3/8
– – – (38%)
M3 Syllable-initial Post-labial Post-coronal Post-dorsal
8/8 5/5 8/8 10/10
(100%) (100%) (100%) (100%)
– – – –
– – – –
– – – –
– – – –
syllabic contexts. By Session 2, participants were producing adult-like /ɹ/s across contexts, although a small percentage of prevocalic tokens coded as near-adult-like and non-adult-like were still being produced as well. Tokens in the prevocalic context were further subcategorized into syllable-initial (e.g. rocks), post-labial (e.g. frog), post-coronal (e. g. truck), and post-dorsal (e.g. green) subcontexts (Table 4). A total of 109 prevocalic tokens produced during Session 1 were subcategorized, along with 111 prevocalic tokens produced during Session 2 (between 17 and 31 prevocalic tokens per participant, per session). For all participants, prevocalic productions coded as near-adult-like and non-adult-like occurred primarily in post-coronal and post-dorsal subcontexts. In contrast with the other participants, Female 1 produced non-adult-like tokens in syllable-initial and post-labial subcontexts as well, whereas Male 3 produced exclusively adult-like tokens across subcontexts. A summary of all tokens perceptually coded as near-adult-like and non-adult-like, by participant, is presented in Table 5. 4.2. Acoustic results In keeping with the literature on adult /ɹ/ (Delattre, 1951; Delattre & Freeman, 1968; Lehiste, 1962; Potter et al., 1947), low F3 served as an acoustic measure of /ɹ/-ness across contexts. It was expected that participants' adult-like /ɹ/ productions would exhibit the lowest mean F3 values, and their non-adult-like productions the highest. A one-way ANOVA and post-hoc tests were conducted to determine whether tokens that differed based on perceptual coding (e.g. adult-like vs. non-adult-like) exhibited corresponding and
62
L. Magloughlin / Journal of Phonetics 54 (2016) 51–67
Table 5 Tokens perceptually coded as near-adult-like and non-adult-like, for each participant, across sessions (see Table 2 for a complete list of stimuli). Summary of near-adult-like and non-adult-like tokens Session 1
F1 MT1 MT2 M3
Session 2
Near-adult-like
Non-adult-like
Near-adult-like
Non-adult-like
mushroom, Shrek, stripe, tree, truck green mushroom squirrel, water
all remaining prevocalic tokens green grass, green –
green
rooster
tree tree –
green green –
Table 6 Summary of mean F3 values (in Hertz), for all participants, across sessions, organized by perceptual coding categorization (adult-like, near-adult-like, non-adult-like). Numbers in parentheses show standard deviation. Mean F3 in Hz (with SD) – perceptual coding Session 1 Adult F1 MT1 MT2 M3
2640 2577 2697 2403
Session 2 Near
(254) (337) (325) (259)
3377 3023 3003 2559
(361) (–) (–) (111)
Non
Adult
3701 (275) 3269 (154) 3147 (159) –
2512 2526 2580 2477
(182) (227) (185) (196)
Near
Non
2871 (–) 2600 (–) 3050 (71) –
3400 (–) 3213 (–) 3646 (244) –
Table 7 Summary of mean F3 values (in Hertz) for all participants, across sessions, organized by context (prevocalic, postvocalic, syllabic). Numbers in parentheses show standard deviation. Mean F3 in Hz (with SD) – by context Session 1
F1 MT1 MT2 M3
Session 2
Prevocalic
Postvocalic
Syllabic
Prevocalic
Postvocalic
Syllabic
3557 2738 2926 2394
2643 2658 2501 2437
2583 2346 2546 2443
2587 2503 2807 2449
2403 2594 2580 2447
2523 2630 2406 2605
(357) (301) (330) (291)
(250) (403) (196) (48)
(242) (376) (189) (240)
(259) (283) (388) (209)
(144) (132) (182) (114)
(57) (165) (98) (151)
predicted differences in formant frequency. As illustrated in Fig. 3, participants' adult-like /ɹ/s showed correspondingly low F3 values (mean: 2549 Hz, SD: 258), as compared with their near-adult-like (mean: 3176, SD: 421) and non-adult-like productions (mean: 3573 Hz, SD: 323). The main effect of perceptual coding was significant [F(2,313) ¼244.4, p<0:001], and post hoc comparisons using the Tukey HSD test showed that mean F3 values for participants' adult-like, near-adult-like, and non-adult-like productions were all significantly different from one another (p <0:001). This general pattern also held for individual participants across sessions (Table 6). Findings were consistent with Dalston (1975), who observed low F3 values (∼2500 Hz) in the /ɹ/ productions of his child participants. It was expected that F3 values would be highest prevocalically, given that this is a context children often acquire last (McGowan et al., 2004; Smit et al., 1990; Stoel-Gammon, 1985). As illustrated in Table 7, F3 values were highest prevocalically, but only for the participants who were producing non-adult-like and near-adult-like /ɹ/s in this context (Session 1: Female 1, Male Twin 1, Male Twin 2; and Session 2: Male Twin 2).
4.3. Articulatory results The merged video and ultrasound images that follow2 are rightward facing (tongue tip at right, tongue root at left), and show a midsagittal section of the tongue. Female 1 and Male 3 produced adult-like /ɹ/s in postvocalic and syllabic /ɹ/ contexts using a bunched, tip-down strategy (Figs. 4 and 5), with characteristic concavity at the tongue dorsum, forming a trough between the pharyngeal and 2 Of the 191 Session 1 tokens and 180 Session 2 tokens analyzed during perceptual coding and acoustic analysis, 45 images from Session 1 and 18 images from Session 2 were excluded from the articulatory analysis due to probe misalignment during recording.
L. Magloughlin / Journal of Phonetics 54 (2016) 51–67
63
Table 8 Participants' dominant and secondary /ɹ/ production strategies. Contexts in which participants showed a lag in adult-like production (producing some near- and non-adult-like /ɹ/s) are indicated with x. Dominant strategy
F1 MT1 MT2 M3
Bunched Bunched Retroflex Bunched
Secondary strategy
Prevocalic
– – Bunched –
Postvocalic
Syllable-initial
Post-labial
Post-coronal
Post-dorsal
x
x
x x x
x x x
Syllabic
x
Table 9 Articulatory coding of participants' /ɹ/ productions as tip-down (bunch/blade), tip-up (retroflex), resembles adjacent, or undifferentiated, grouped by context (prevocalic, postvocalic, and syllabic), and by session. Articulatory coding of ultrasound images (context) Tip-down (bunch/blade)
Tip-up (retroflex)
Resembles adjacent
Undifferentiated
Session 1 F1 Pre Post Syll
13% 100% 100%
(3/24) (20/20) (11/11)
– – –
– – –
33% – –
(8/24) – –
54% – –
(13/24) – –
MT1 Pre Post Syll
75% 75% 100%
(9/12) (6/8) (6/6)
– 13% –
– (1/8) –
– 13% –
– (1/8) –
25% – –
(3/12) – –
MT2 Pre Post Syll
7% – –
(1/15) – –
67% 100% 100%
(10/15) (5/5) (7/7)
– – –
– – –
27% – –
(4/15) – –
MT3 Pre Post Syll
100% 100% 100%
(26/26) (5/5) (7/7)
– – –
– – –
– – –
– – –
– – –
– – –
Session 2 F1 Pre Post Syll
91% 100% 100%
(21/23) (8/8) (8/8)
– – –
– – –
4% – –
(1/23) – –
4% – –
(1/23) – –
MT1 Pre Post Syll
88% 86% 100%
(23/26) (6/7) (9/9)
4% 14% –
(1/26) (1/7) –
4% – –
(1/26) – –
4% – –
(1/26) – –
MT2 Pre Post Syll
32% – –
(7/22) – –
55% 100% 100%
(12/22) (8/8) (8/8)
– – –
– – –
14% – –
(3/22) – –
M3 Pre Post Syll
100% 100% 100%
(29/29) (6/6) (8/8)
– – –
– – –
– – –
– – –
– – –
– – –
palatal constrictions. This strategy remained stable for both participants across sessions. While Male 3 also employed a bunched strategy in prevocalic contexts, Female 1's prevocalic /ɹ/s were generally non-adult-like in Session 1, and near-adult-like prevocalically after coronals. Her non-adult-like productions exhibited a backing of the tongue body with constriction at the velum (Fig. 6). This posture is consistent with the fact that these productions sounded like [w] – a common articulatory substitution for /ɹ/ during acquisition (Smit, 1993), and had correspondingly low F2 values – an acoustic characteristic of [w] (see Dalston, 1975). In contrast, her near-adult-like productions, which appeared in prevocalic contexts after coronals, showed tongue shapes that were similar to the preceding coronal (Fig. 7). By Session 2, she had extended her bunched, tip-down strategy across contexts. In both sessions, Male Twin 1 and Male Twin 2 produced adult-like /ɹ/s in postvocalic and syllabic contexts, as well as in syllableinitial and post-labial prevocalic contexts, but showed a lag in adult-like production in prevocalic contexts after coronals and dorsals, where they produced near- and non-adult-like /ɹ/s. Male Twin 1 used predominantly bunched, tip-down tongue shapes to achieve
64
L. Magloughlin / Journal of Phonetics 54 (2016) 51–67
Table 10 Articulatory coding of participants' /ɹ/ productions as tip-up (bunch/blade), tip-down (retroflex), resembles adjacent, or undifferentiated, grouped by perceptual categorization (adult-like, near-adult-like, and non-adult-like), and by session. Articulatory coding of ultrasound images (perceptual) Adult-like
Near-adult-like
Non-adult-like
Session 1 F1 tip-down (bunch/blade) tip-up (retroflex) resembles adjacent undifferentiated
60% – – –
(33/55) – – –
2% – 15% –
(1/55) – (8/55) –
– – – 24%
– – – (13/55)
MT1 tip-down (bunch/blade) tip-up (retroflex) resembles adjacent undifferentiated
81% 4% – 4%
(21/26) (1/26) – (1/26)
– – 4% –
– – (1/26) –
– – – 8%
– – – (2/26)
MT2 tip-down (bunch/blade) tip-up (retroflex) resembles adjacent undifferentiated
4% 81% – –
(1/27) (22/27) – –
– – – –
– – – –
– – – 15%
– – – (4/27)
M3 tip-down (bunch/blade) tip-up (retroflex) resembles adjacent undifferentiated
100% – – –
(38/38) – – –
– – – –
– – – –
– – – –
– – – –
Session 2 F1 tip-down (bunch/blade) tip-up (retroflex) resembles adjacent undifferentiated
95% – – –
(37/39) – – –
– – 3% –
– – (1/39) –
– – – 3%
– – – (1/39)
MT1 tip-down (bunch/blade) tip-up (retroflex) resembles adjacent undifferentiated
90% 5% – –
(38/42) (2/42) – –
– – 2% –
– – (1/42) –
– – – 2%
– – – (1/42)
MT2 tip-down (bunch/blade) tip-up (retroflex) resembles adjacent undifferentiated
13% 74% – –
(5/38) (28/38) – –
5% – – –
(2/38) – – –
– – – 8%
– – – (3/38)
M3 tip-down (bunch/blade) tip-up (retroflex) resembles adjacent undifferentiated
100% – – –
(43/43) – – –
– – – –
– – – –
– – – –
– – – –
adult-like targets (Fig. 8, top). Male Twin 2 showed a consistent retroflex, tip-up strategy across contexts (Fig. 8, bottom), with one notable exception. In prevocalic post-dorsal and post-coronal contexts, he deviated from an otherwise retroflex strategy and produced bunched tongue shapes (Fig. 9), in contexts where he had initially showed a production lag (Fig. 10). A summary of the coding of articulatory images for each participant, by context and perceptual categorization, is presented in the tables that follow (token counts are in parentheses). Each of the participants adopted a dominant articulatory strategy to produce adult-like /ɹ/s, and showed a production lag in some contexts, producing near- and non-adult-like /ɹ/s instead (Table 8). For all but one participant (Male 3), this lag occurred prevocalically. Three participants (Female 1, Male Twin 1, and Male 3) adopted a dominant bunched strategy, and one participant (Male Twin 2), a retroflex strategy, with evidence of a secondary bunched strategy (Table 9). Female 1 produced non-adult-like /ɹ/s across prevocalic contexts, except after coronals, where she produced many near-adult-like /ɹ/ s. Male Twin 1 and Male Twin 2 also produced non- and near-adult-like /ɹ/s, but only prevocalically, after coronals and dorsals. Male 33 produced adult-like /ɹ/s across contexts (Table 10).
3
Due to probe misalignment, articulatory images of Male 3's only near-adult-like productions could not be included in the articulatory analysis (2 syllabic tokens, Session 1, Table 5).
L. Magloughlin / Journal of Phonetics 54 (2016) 51–67
65
5. Discussion The aim of this pilot study was to examine the North American English /ɹ/ productions of English-speaking children during acquisition, and compare their early- and later-stage productions with /ɹ/ allophony patterns reported in previous studies with adults. The basic findings of the study are consistent with the literature on child /ɹ/, and on adult /ɹ/ allophony patterns. Participants produced primarily adult-like /ɹ/s in postvocalic and syllabic contexts. In prevocalic contexts, participants produced near-adult-like and nonadult-like /ɹ/s, which showed correspondingly higher F3 values. Three of the four participants used bunched /ɹ/s to reach adult-like targets, and one participant used retroflex /ɹ/s in most contexts. Children's articulatory behavior was more variable in contexts where they showed a production lag, and the discussion focusses on how these findings may shed light on adult /ɹ/ allophony patterns. The main results were as follows: Male 3 and Female 1 exhibited a stable adult-like bunched /ɹ/ strategy in both sessions, but Female 1 also produced a large proportion of tokens not categorized as adult-like, especially in Session 1. These were acoustically and articulatorily similar to [w], except after coronals, where her /ɹ/s were typically categorized as near-adult-like. These post-coronal /ɹ/s were acoustically more similar to adult /ɹ/, and articulatorily similar to the preceding coronal. Female 1's bunched strategy was extended across contexts by Session 2. Male Twins 1 and 2 produced adult-like /ɹ/ across contexts, but with different (bunched vs. retroflex) tongue shapes. They both produced near- and non-adult-like /ɹ/s in prevocalic contexts following lingual consonants. By Session 2, in contexts where both twins initially showed a production lag, the retroflexing twin deviated from his dominant retroflex strategy and produced bunched tongue shapes instead, while the bunching twin extended his bunched strategy across contexts, as Female 1 did. Variable behavior during acquisition of an articulatorily complex speech sound provides a plausible explanation for the variability that has been observed with adults during production of /ɹ/. It is argued that an allophonic pattern involving more than one tongue shape can arise when a child's initial (dominant) strategy is found to be effective in some but not all contexts, leading to variable or exploratory behavior. The dominant /ɹ/ strategy seems especially likely to be ineffective in contexts that are less compatible with it – where compatibility is based on observed rates of production by adults and by articulatory modeling studies. Although the current study involved only four of the participants, three of them exhibited a production lag and variable behavior in the prevocalic contexts where adults show the most allophonic variation (Delattre & Freeman, 1968; Mielke et al., 2010, 2016). Male 3's exploratory period with respect to /ɹ/ was apparently complete before the first session, so his consistent bunched /ɹ/ does not provide information beyond what would be learned from an adult study. In contrast, Female 1 showed a production delay across all prevocalic contexts in Session 1. Most of her prevocalic /ɹ/s were produced with [w]-like tongue shapes, but she showed a partially successful interim production strategy after coronals. Her post-coronal /ɹ/s were produced with tongue shapes resembling the preceding consonant, and many of these tokens were categorized as near-adult-like.4 Prevocalic position may be articulatorily challenging, but coronals in particular may have played a facilitative role for this participant. As noted by Gick (1999), [ʃ] and tip-down [ɹ] are both produced by raising the blade of the tongue and rounding the lips (p. 51). This means that a learner can produce a more adult-like /ɹ/ simply by maintaining gestures from a preceding ʃ. This possibility is not available after consonants that are not articulatorily similar to adult /ɹ/. Another way to account for the near-adult-like categorizations in post-coronal contexts is the potential perceptually facilitative effect of affrication in pre-/ɹ/ coronal stops. Adults may perceive this as a cue to a more adult-like /ɹ/ even if the /ɹ/ itself is [w]-like. However, Female 1 also showed lower F3 values in her near-adult-like /ɹ/s, compared to her non-adult-like /ɹ/s. This acoustic difference suggests that the context is indeed articulatorily facilitative, and it may be perceptually facilitative as well. The most frequent /ɹ/ retroflexing context for adults is prevocalic, especially when not preceded by a lingual consonant (e.g. Mielke et al., 2010, 2016). Given Female 1's production delay in prevocalic contexts, a plausible outcome would have been for her to adopt an alternate strategy in prevocalic contexts not facilitated by a preceding coronal. This would have resulted in a common adult allophonic pattern. In the Mielke et al. (2016) study, 78% of the 27 adults either bunched exclusively (n¼ 16) or retroflexed only in a subset of prevocalic contexts (n¼5). If Female 1's Session 1 is a common acquisition state, then it is possible that these two groups share bunched /ɹ/ as a dominant strategy, but differ in whether they resolved the problem of challenging prevocalic contexts by extending the dominant bunched strategy (as Female 1 did), or by developing a new retroflex strategy. Male Twin 1 followed a similar trajectory, producing near- and non-adult-like /ɹ/s after coronals and dorsals in Session 1, before extending his dominant bunched strategy across contexts by Session 2. In contrast, Male Twin 2 exhibited a dominant retroflex, tipup strategy, but experienced a production lag in the same prevocalic contexts as Male Twin 1, after coronals and dorsals. These are both contexts where adult retroflexers who bunch in some contexts are likely to bunch (Mielke et al., 2016). Coronals seem to be facilitative to bunching, for the reasons discussed above, and the post-dorsal context is also an environment that favours bunching over retroflexion in adults (Delattre & Freeman, 1968; Mielke et al., 2010, 2016). In addition, Male Twin 2's production lag occurred before high front vowels – a context that is antagonistic to retroflexing (Ong & Stone, 1998; Stavness et al., 2012). By Session 2, Male Twin 2 had adopted a secondary bunched, tip-down strategy in prevocalic contexts after coronals and dorsals, and maintained his dominant retroflex strategy elsewhere. This is an allophony pattern that has been reported with adults (Mielke et al., 2016). It is notable that the twins' production delay in post-lingual contexts appears to be independent of dominant /ɹ/ tongue shape. Male Twin 2 arrived at a bunched tongue shape after coronals and dorsals, and this is consistent with the idea that adult allophony patterns arise when there is a production lag in contexts that are less compatible with the dominant tongue shape. It is also possible for
4 Female 1's near-adult-like /ɹ/s resembled the preceding coronal whether they occurred in the same syllable or across a syllable boundary, and regardless of stress (e.g., mushroom and Shrek showed no difference).
66
L. Magloughlin / Journal of Phonetics 54 (2016) 51–67
learners to develop a single strategy across all contexts, which may be more likely if the lag is in a context that is compatible with the dominant strategy. This interpretation is largely speculative until acquisition data are available for more children. The question of whether a child's developmental path for /ɹ/ is deterministic or not is an important one. The twins in this study adopted different articulatory strategies to reach adult-like targets, which could suggest that development is non-deterministic. Data from sets of monozygotic twins with identical vocal tracts but diverging articulatory strategies would support the view that /ɹ/ allophony patterns emerge during an exploratory period in acquisition, as suggested by Tiede et al. (2011), rather than as a result of some predetermined production pattern. This pilot research confirms the methodological approach as viable for collecting /ɹ/ production data from participants as young as 3 years of age. A small percentage (17%) of tokens were discarded due to ultrasound probe misalignment, but this was not unexpected given the difficulty of head stabilization in research with young children. Shape analysis of the tongue was conducted using a technique that is highly robust against rotational and translational differences from token to token (Ménard et al., 2012). Future research aims to bring children into the lab at earlier stages of /ɹ/ development, in order to capture more variable and exploratory behavior during acquisition. While research in this area should obviously seek a more balanced stimulus list and higher token counts, there are limits imposed by the range of words with /ɹ/ that small children can recognize from pictures, and the length of time they are willing to sit still in a strange environment with unfamiliar technology. 6. Conclusion This pilot study takes an important first step in examining the relationship between children's articulatory behavior during acquisition of North American English /ɹ/, and variability patterns previously reported with adults. Participants exhibited behaviors that were consistent with an exploration-based account of adult allophony patterns. Three of the participants acquired prevocalic targets last and showed more variable behavior in this context, providing a plausible explanation for why allophonic variation with adults is commonly found here as well. For one participant, variable behavior led to the development of a secondary bunched strategy in the prevocalic contexts where his dominant retroflex strategy proved ineffective for reaching adult targets, mirroring a reported adult allophony pattern (Mielke et al., 2016). Another participant exhibited variable behavior in contexts where she had a production lag, but ultimately extended an already dominant bunched strategy across contexts. Given that bunching is the most common articulatory strategy employed by adults, variable behavior in childhood that leads to an exclusively bunched strategy is not unexpected, but when variability results in the adoption of a new articulatory strategy, an adult-like allophony pattern can develop. Twins in this study employed very different (bunched vs. retroflex) strategies to achieve adult targets, although further research with sets of twins whose monozygosity has been confirmed will be necessary before any claims about non-determinism can be made. Acknowledgments This research was supported in part by the Social Sciences and Humanities Research Council through Canada Graduate Scholarships at the Master's and Doctoral level. Aspects of the research were presented at Ultrafest VI (Edinburgh, 2013), the International Congress on Acoustics (Montreal, 2013), the International Child Phonology Conference (Minneapolis, 2012), and the Canadian Acoustical Association – Acoustics Week in Canada (Quebec City, 2011). Thank you to Jeff Mielke, and three anonymous reviewers for their tremendously helpful comments. Thank you also to Marc Brunelle, Robin Dodsworth, Yaroslav Konar, Laura Sabourin, Tania Zamuner, and conference attendees for feedback at various stages, and to members of the University of Ottawa Sound Patterns Laboratory, and the participants and their parents. References Adler-Bock, M., Bernhardt, B. M., Gick, B., & Bacsfalvi, P. (2007). The use of ultrasound in remediation of North American English /r/ in 2 adolescents. American Journal of SpeechLanguage Pathology, 16, 128–139. Bacsfalvi, P. (2010). Attaining the lingual components of/r/with ultrasound for three adolescents with cochlear implants établissement des composantes linguales du son/r/à l'aide d'ultrasons chez trois adolescents avec un implant cochléaire. Revue canadienne d'orthophonie et d'audiologie, 34, 206. Bernhardt, B., Gick, B., Bacsfalvi, P., & Adler-Bock, M. (2005). Ultrasound in speech therapy with adolescents and adults. Clinical Linguistics & Phonetics, 19, 605–617. Boersma, P., & Weenink, D. (2007). Praat: Doing phonetics by computer [Computer program]. Curtis, J. F., & Hardy, J. C. (1959). A phonetic study of misarticulation of /r/. Journal of Speech, Language, and Hearing Research, 2, 244–257. Dalston, R. M. (1975). Acoustic characteristics of English w,r,l spoken correctly by young children and adults. Journal of the Acoustical Society of America, 57, 462–469. Davidson, L., Klein, H., & Grigos, M. (2007). Perceptual, kinematic, and ultrasound measurement of /r/ development in children with phonological delay. Paper presented at Ultrafest 4. Delattre, P. (1951). The physiological interpretation of sound spectrograms volumeLXVI. New York: Modern Language Association of America. Delattre, P., & Freeman, D. C. (1968). A dialect study of American r's by x-ray motion picture. Linguistics, 6, 29–68. Derrick, D., & Gick, B. (2011). Individual variation in English flaps and taps: A case of categorical phonetics. Canadian Journal of Linguistics, 56, 307–319. Gick, B. (1999). A gesture-based account of intrusive consonants in english. Phonology, 16, 29–54. Gick, B., Bacsfalvi, P., Bernhardt, B. M., Oh, S., Stolar, S., & Wilson, I. (2008). Amotor differentiation model for liquid substitutions in children’s speech. In Proceedingsof Meetings on Acoustics (p. 060003). Acoustical Society of America, volume 1. Guenther, F., Espy-Wilson, C., Boyce, S., Matthies, M., Zandipour, M., & Perkell, J. (1999). Articulatory tradeoffs reduce acoustic variability during American English /r/ production. Journal of the Acoustical Society of America, 105(5), 2854–2865. Hagiwara, R. (1995). Acoustic realizations of American /r/ as produced by women and men. In UCLA Working Papers in Phonetics (Vol. 90, pp. 1–187). Los Angeles, CA: Department of Linguistics, UCLA. Hoffman, P. R., Schuckers, G. H., & Daniloff, R. G. (1980). Developmental trends in correct /r/ articulation as a function of allophone type. Journal of Speech, Language, and Hearing Research, 23, 746–756. Hueber, T., Chollet, G., Denby, B., & Stone, M. (2008). Acquisition of ultrasound, video and acoustic speech data for a silent-speech interface application. In Proceedings of ISSP2008 (pp. 365–369). Strasbourg, France: ISSP.
L. Magloughlin / Journal of Phonetics 54 (2016) 51–67
67
Idemaru, K., & Holt, L. L. (2013). The developmental trajectory of children's perception and production of english(/r/-/l/a). The Journal of the Acoustical Society of America, 133, 4232–4246. Klein, H. B., McAllister Byun, T., Davidson, L., & Grigos, M. (2013). A multidimensional investigation of children's /r/ productions: Perceptual, ultrasound, and acoustic measures. American Journal of Speech-Language Pathology, 22, 540–553. Lehiste, I. (1962). Acoustical characteristics of selected english consonants. Ann Arbor, MI: The University of Michigan Communication Sciences Laboratory. McAllister Byun, T., & Hitchcock, E. R. (2012). Investigating the use of traditional and spectral biofeedback approaches to intervention for/r/misarticulation. American Journal of SpeechLanguage Pathology, 21, 207–221. McAllister Byun, T., Hitchcock, E. R., & Swartz, M. T. (2014). Retroflex versus bunched in treatment for rhotic misarticulation: Evidence from ultrasound biofeedback intervention. Journal of Speech, Language, and Hearing Research, 57, 2116–2130. McGowan, R. S., Nittrouer, S., & Manning, C. J. (2004). Development of [ɹ] in young, Midwestern, American children. Journal of the Acoustical Society of America, 115, 871–884. Ménard, L., Aubin, J., Thibeault, M., & Richard, G. (2012). Measuring tongue shapes and positions with ultrasound imaging: A validation experiment using an articulatory model. Folia Phoniatrica et Logopaedica, 64, 64–72. Mielke, J. (2015). An ultrasound study of Canadian French rhotic vowels with polarsmoothing spline comparisons. The Journal of the Acoustical Society of America, 137, 2858–2869. Mielke, J., Baker, A., & Archangeli, D. (2010). Variability and homogeneity in American English /ɹ/ allophony and /s/ retraction. In C. Fougeron, B. Kühnert, M. D'Imperio, & N. Vallée (Eds.), Variation, detail, and representation (LabPhon 10) (pp. 699–719). Berlin: Mouton de Gruyter. Mielke, J., Baker, A., & Archangeli, D. (2016). Individual-level contact limits phonological complexity: Evidence from bunched and retroflex ɹ. Language, 92. Ong, D., & Stone, M. (1998). Three-dimensional vocal tract shapes in /r/ and /l/: A study of MRI, ultrasound, electropalatography, and acoustics. Phonoscope, 1, 1–13. Potter, R. K., Kopp, G. A., & Kopp, H. C. G. (1947). Visible speech. New York: Van Nostrand. Sander, E. (1972). When are speech sounds learned?. Journal of Speech and Hearing Disorders, 37, 55–63. Smit, A. B. (1993). Phonological error distributions in the Iowa–Nebraska articulation norms project: Consonant singletons. Journal of Speech and Hearing Research, 36, 533–547. Smit, A. B., Hand, L., Freilinger, J. J., Bernthal, J. E., & Bird, A. (1990). The Iowa articulation norms and its Nebraska replication. Journal of Speech and Hearing Disorders, 55, 779–798. Stavness, I., Gick, B., Derrick, D., & Fels, S. (2012). Biomechanical modeling of English /r/ variants. Journal of The Acoustical Society of America—Express Letters, 131, EL355–EL360. Stoel-Gammon, C. (1985). Phonetic inventories, 15–24 months: A longitudinal study. Journal of Speech, Language, and Hearing Research, 28, 505–512. Stone, M. (2005). A guide to analyzing tongue motion from ultrasound images. Clinical Linguistics and Phonetics, 19, 455–502. Templin, M. (1957). Certain language skills in children. Minnesota: University of Minnesota. Thomas, C. K. (1947). An introduction to the phonetics of American English. New York: The Ronald Press Company. Tiede, M. K., Boyce, S. E., Espy-Wilson, C. Y., & Gracco, V. L. (2011). Variability of North American English /r/ production in response to palatal perturbation. In B. Maassen, & P. vanLieshout (Eds.), Speech motor control: New developments in basic and applied research (pp. 53–67). Oxford: Oxford University Press. Twist, A., Baker, A., Mielke, J., & Archangeli, D. (2007). Are ‘covert’ /ɹ/ allophones really indistinguishable?. Penn Working Papers in Linguistics, 13.2. Westbury, J., Hashi, M., & Lindstrom, M. (1998). Differences among speakers in lingual articulation for American English ɹ. Speech Communication, 26, 203–226. Zhou, X., Espy-Wilson, C. Y., Boyce, S., Tiede, M., Holland, C., & Choe, A. (2008). A magnetic resonance imaging-based articulatory and acoustic study of “retroflex” and “bunched” American English /r/. Journal of the Acoustical Society of America, 123, 4466–4481.