Journal of Phonetics 73 (2019) 91–112
Contents lists available at ScienceDirect
Journal of Phonetics journal homepage: www.elsevier.com/locate/Phonetics
Special Issue: Plasticity of Native Phonetic and Phonological Domains, eds. de Leeuw & Celata
Socio-indexical phonetic features in the heritage language context: Voiceless stop aspiration in the Calabrian community in Toronto R. Nodari a,*, C. Celata a, N. Nagy b a b
Scuola Normale Superiore, Italy University of Toronto, Canada
a r t i c l e
i n f o
Article history: Received 11 January 2018 Received in revised form 3 December 2018 Accepted 11 December 2018
Keywords: VOT Heritage languages Plasticity Phonological attrition Sociophonetics Spontaneous speech Italian
a b s t r a c t This study examines cross-generational transmission of a sociophonetic variable in a heritage language context. Voiceless stop aspiration is a sociophonetic variable in Calabrian Italian, indexing socio-cultural values about the speaker’s social and geographical origin. We investigate the production of voiceless stops by three generations of Calabrian Italians (immigrants and the next two generations) in Toronto, via acoustic and auditory analysis of nearly 5000 tokens from conversational speech in Calabrian Italian. Both Italian and English use long-lag VOT, but they differ in its phonological distribution: long-lag VOT is preferentially associated with pre-tonic, wordinitial stops in English and with post-tonic, post-sonorant or geminate stops in Calabrian Italian. We show that, in heritage Calabrian Italian in Toronto, both phonetic implementation (cued by VOT duration) and phonological distribution of aspiration (as cued by perceived aspiration rate across phonological contexts) change crossgenerationally, but some changes are non-linear, as third generation speakers appear to reproduce some patterns attested in the speech of first generation speakers. External variables such as the sex of the speakers modulate the cross-generational effects, with males producing more aspirated stops and exhibiting a more conservative behavior in certain phonetic contexts. Ó 2018 Elsevier Ltd. All rights reserved.
1. Introduction1 1.1. Heritage language phonology
The pronunciation patterns of heritage languages spoken in ethnic minority settings have been shown to differ from those of monolinguals and to additionally display changes according to whether they are spoken by first generation immigrants or their descendants (Amengual, 2016; Au, Knightly, Jun, & Oh, 2002; Au, Oh, Knightly, Jun, & Romo, 2008; Chang, Yao, Haynes, & Rhodes, 2008; Evans, Mistry, & Moreiras, 2007; Flores & Rato, 2016; Flores, Rinke, & Rato, 2017; Knightly, Jun, Oh, & Au, 2003; Mayr & Siddika, 2016; McCarthy, Evans, & Mahon,
* Corresponding author. E-mail address:
[email protected] (R. Nodari). The three authors jointly developed the research presented here and collaboratively edited the entire paper. For academic purposes, RN takes responsibility for the annotation of all items of the VOT corpus according to the methodology described in Section 2.2 and for drafting Section 1.2; CC drafted Section 1 (with the exclusion of Section 1.2), Section 2.2, half of Sections 2.3, 3.1 and 4; NN drafted Section 2.1, half of Sections 2.3 and 3.2; NN is also principal investigator of the HLVC project, which collected and curated the speech and survey data analyzed here. 1
https://doi.org/10.1016/j.wocn.2018.12.005 0095-4470/Ó 2018 Elsevier Ltd. All rights reserved.
2013; Nagy & Kochetov, 2013; Sharma & Sankaran, 2011). First generation immigrants acquired the heritage language as their L1 (either monolingually or not) and then experience contact with the host language in adulthood, yet often retain strong cultural and personal connections with the homeland. Their children, born in the migratory setting, have experience with the minority language from birth in familial setting and with the host language when they start school, often shifting from heritage to host language dominance during childhood (McCarthy, Mahon, Rosen, & Evans, 2014; Montrul, 2008; Rothman, 2009). Third generations are more similar to second than to first generations inasmuch as the host language is the language of primary education and are said to show even stronger traces of the dominant language in their heritage language phonetics and phonology (Mayr & Siddika, 2016; Nagy & Kochetov, 2013). Heritage language speakers are generally said to be more accurate in pronunciation and to show a phonological advantage with respect to L2 learners, likely due to early exposure to the heritage language (Amengual, 2016; Au et al., 2002; Chang & Yao, 2016; Chang, Yao, Haynes, & Rhodes, 2011; Flores et al., 2017), although their
92
R. Nodari et al. / Journal of Phonetics 73 (2019) 91–112
range of proficiency in the heritage language is reported to vary considerably (Amengual, 2016; Evans et al., 2007; Mayr & Siddika, 2016; Nagy & Kochetov, 2013; Sharma & Sankaran, 2011). The dominant language can also be produced with a foreign accent by heritage language speakers (Darcy & Krüger, 2012; Stangen, Kupisch, Ergün, & Zielke, 2015), and bi-directional effects of the heritage language over the pronunciation of the host language and of the host language over the phonetics of the heritage language have been documented (Mayr & Siddika, 2016). Arising from these observations, the study of heritage language pronunciation patterns provides insight into the plasticity of the native phonetic and phonological domains. Little is known about sociophonetic variation in the heritage language context. The retention of heritage language phonetic features by newer generations to index ethnicity and sociocultural belonging has been documented for regional Asian accents of English (e.g. Alam & Stuart-Smith, 2011; Kirkham, 2011; McCarthy, Evans, & Mahon, 2011; Sharma & Sankaran, 2011). In a partly different context, new speakers of Gaelic who acquire the minority language through revitalization programs acquire some features of phonetic variation (e.g. intonation) better than others (e.g. segmental contrasts; Nance, 2014), and are able to exploit phonetic variants for socio-stylistic and identity-related purposes (Khattab, 2013; Nance, McLeod, O’Rourke, & Dunmore, 2016). However, none of these studies investigate the possibility that the phonetic features convey sociolinguistic or communicative meanings in the homeland variety (understandably, as they were all looking at use of the host rather than the heritage language). The issue is relevant not only for the transmission and long-term maintenance of heritage languages, but also for current developmental approaches to sociolinguistic variation, that mostly have monolingual transmission mechanisms in their focus (e.g. Chevrot & Foulkes, 2013; Labov, 2014; Nardy, Chevrot, & Barbu, 2013); in this research domain, heritage languages and phonological acquisition in conditions of impoverished input are much less debated, but relevant in terms of our knowledge of plasticity in the native languages. This study is one of the very few sociophonetic crossgenerational studies focused on an Italian regional variety as a heritage language, cf. Celata and Cancila (2010), Avesani et al. (2017). We investigate the production of VOT by three generations of Calabrian Italians (immigrants and the next two generations) in Toronto, via acoustic and auditory analysis of conversational speech in Calabrian Italian, bringing together standard sociolinguistic and acoustic phonetic practices to provide a full account of the transmission process. Both the heritage and the host language possess long-lag VOT in their phonetic repertoire, but they differ in the phonological distribution of the feature, since long-lag VOT is preferentially associated to pre-tonic, word-initial stops in English and to post-tonic, post-sonorant or geminate stops in Calabrian Italian. Another fundamental difference between the two languages is that voiceless stop aspiration is a sociophonetic variable in Calabrian Italian, indexing information about speaker attitudes and socio-cultural features, while it has not been shown to have any such role in Toronto English. This study is therefore the first to examine cross-generational transmission of an internal sociophonetic variable and contrast it with a
contact-induced phonetic variable. Contact-induced variation can be defined as reflecting increased similarity with the majority language in correlation with increased contact, while internal changes are independent of the majority language’s structure and of the degree of contact, being rooted in the sociolinguistic patterns of the heritage language instead. We analyse the two kinds of variability in light of a series of alternative hypotheses about if and how socio-indexical features are passed from one generation to the next in a minority language setting. 1.2. Linguistic background: VOT of voiceless stops in Calabrian Italian and English
Since the investigation by Lisker and Abramson (1964), the VOT of voiceless stop consonants has been extensively studied in many languages (e.g. Cho & Ladefoged, 1999; see Abramson & Whalen for a useful review of achievements and complications in this research domain). In this section, we review the main facts about VOT in the two contact languages of the study, Calabrian Italian and Canadian English. After a brief sketch of major differences and similarities, we discuss linguistic, individual and social factors potentially affecting VOT variation, with a focus on those relevant to our investigation. Studies on VOT production by bilinguals are also briefly reviewed. Calabrian Italian is characterized by an allophonic rule of voiceless stop aspiration that preferentially applies in unstressed syllables, either when the stop is preceded by a sonorant, e.g. [ˈstaNkho] stanco ‘tired’, or when it is a geminate, e.g. [ˈstakkho] stacco ‘I disconnect’ (Falcone, 1976). Aspiration optionally applies to stops in stressed syllables too, but never to post-pausal or post-vocalic singletons; for instance, a paroxytone word such as tutto ‘all’ may optionally be realised with aspiration in the first syllable if it is preceded by a consonant-final word such as proclitic prepositions (as in per tutto [perˈthut:ho] ‘for all’) or under syntactic doubling (as in può tutto [pwoˈt:hut:ho] 's/he can do everything'), but never when it is produced in isolation or after a vowel-ending word that does not trigger syntactic doubling. By contrast, in English, voiceless stops are produced with long-lag VOT when they are onsets of stressed syllables, and preferentially in word initial position, e.g. [phɪn] 'pin'. Thus, a first difference between Calabrian Italian and English is that, in Calabrian Italian, aspiration of voiceless stops typically applies to stops in unstressed C.CV contexts (the first 'C' representing a sonorant or the first portion of a geminate stop), whereas in English, it applies preferentially to stops in stressed CV contexts. Cross-linguistic research on stop consonant production has shown that VOT variations can be modelled as prominence effects, with e.g. languages such as Dutch showing VOT shortening in prosodically strong position (stressed, accented or boundary syllables) in contrast to languages such as English showing VOT lengthening in the same prosodically strong positions (Cho & McQueen, 2005; Cho, Lee, & Kim, 2014; Kim, Kim, & Cho, 2018). The comparison between aspiration contexts in Calabrian Italian and English thus provides additional evidence of language-specific phonetic implementation of prosodic strengthening, but with the additional constraint, for Calabrian Italian, that long-lag VOT is not confined to unstressed
R. Nodari et al. / Journal of Phonetics 73 (2019) 91–112
syllables but can apply to stops in stressed syllables too, provided that they are not intervocalic, as explained above. Another difference between Calabrian Italian and English is that voiceless stop aspiration conveys social meaning only in Calabrian Italian (Falcone, 1976; Nodari, 2017a, 2017b), not Canadian English (although socially-induced VOT variation has been reported for other varieties of English, e.g. Scobbie, 2006; Docherty, Watt, Llamas, Hall, & Nycz, 2011). Nodari (2017a, 2017b) has shown both phonetic and extralinguistic conditioning of VOT. Young speakers from the Calabrian town of Lamezia Terme show fine-grained differences in the realization of stop aspiration. VOT showed a strong effect of attitude towards schooling and an interaction between this factor and the sex factor, with males who hold a negative attitude toward schooling producing significantly longer VOT, and females with a positive attitude producing the least aspirated variants. An important role in predicting long-lag VOT was also played by the speakers’ attitude towards the local culture, with more aspiration produced by speakers with more positive attitudes. Among the phonetic variables, the constriction location of the stop significantly predicted the duration of aspiration (shortest in /p/, longest in /k/), but when the stops were geminated (or preceded by a rhotic), the VOT of the alveolar stops was as long as the VOT of the velar stops (Nodari, 2017a: 219 ff.). An in-depth analysis of geminate and postrhotic /t/ aspiration subsequently revealed that some external factors that were not significant in the dataset as a whole, such as the age of the speakers, their socio-economic status and the type of school that they attended (vocational or classics), governed VOT variation in the coronals (Nodari, 2017a: 222). Differences related to sex, the type of school attended and socio-economic status were significant only in sentencereading tasks, whereas in spontaneous conversations, there was less polarization and more variation among speakers. In sum, the study by Nodari (2017a, 2017b) has shown that voiceless stop aspiration in the community under investigation acts as a sociophonetic variable and alveolar stop aspiration is particularly invested with covert prestige to index the speakers' local and regional identity, group membership and social distinctiveness. Calabrian Italian and English VOT are also reported to differ for mean duration, with long-lag VOT longer in English than in Calabrian. This can be inferred from the relevant literature which includes, for Calabrian Italian, Sorianello (1996) and Nodari (2017a), summarized in Table 1. Although general cross-linguistic comparisons can be taken as indicative of a tendency only, the values for Calabrian Italian are all lower than the values reported for English, also shown in Table 1. Among the phonetic predictors of VOT variation, the place of articulation of the stop is one of the most important. Cho and Ladefoged (1999) typological survey established a hierarchy in which bilabials have the shortest VOT, velars have the longest and coronals are in between. This pattern is not observed in all languages equally. For example, in varieties of English, some studies have found that the VOT in bilabials is significantly shorter than in alveolars and velars but the latter two are not significantly different (e.g. Docherty, 1992 for British English, Fowler, Sramko, Ostry, Rowland, & Halle, 2008 for Canadian English). In contrast, the three-way distinction is found in Glaswegian English by Stuart-Smith,
93
Sonderegger, Rathcke, and Macdonald (2015) and in Canadian English by Caramazza, Yeni-Komshian, Zurif, and Carbone (1973). In Calabrian Italian, the three places of articulation are significantly different according to both Sorianello (1996) and Nodari (2017a). The same holds true for other non-Calabrian varieties of Italian according to Esposito (2002: 210) and Stevens and Hajek (2010). However, distinct from other Italian varieties, the Calabrian variety spoken in Catanzaro shows an exceptionally long VOT for /t/, almost comparable to that of /k/ (Stevens & Hajek, 2010: 1560). As explained above, Nodari (2017b) additionally shows that VOT for /t/ and /k/ is not significantly different in two out of four aspiration contexts (i.e. after a rhotic and when geminated), and the VOT of the alveolar stop correlates much more strongly with certain sociophonetic variables (such as attitudes towards the local culture, attitudes towards schooling, sex and age differences etc.) compared to the VOT of /p/ and /k/. Vowel height is also a relevant phonetic predictor of VOT variation. VOT is longer before high vowels than before low vowels (cf. Klatt, 1975; Ohala, 1981; Morris, McCrea, & Herring, 2008; Esposito, 2002 for Italian). According to a common physiological explanation, contraction of the genioglossus muscle causes forward movement of the hyoid bone and increased contraction of the extrinsic laryngeal musculature, whose effect may be that of delaying voicing by increasing phonation threshold pressure (Klatt, 1975). Since, however, high vowels are shorter than low vowels, there is the additional implication that the shorter the vowel, the longer the VOT (e.g. Esposito, 2002: 218; Farnetani, 1989 for Italian). Based on these observations, authors have convincingly argued that VOT should be considered more a property of the syllable, or of the CV sequence, rather than of the stop itself (see e.g. Maddieson, 1999; Nance & Stuart-Smith, 2013). Finally, individual variables such as the speakers’ age and sex have been found to influence VOT. Age and sex are complex factors that can be said to be biological and cultural categories simultaneously. Physiological aging has been found to impact VOT, although the picture emerging from different studies is inconsistent. VOT is shorter in older speakers than in younger speakers according to Docherty et al. (2011), Benjamin (1982), Ryalls, Simon, and Thomason (2004), Nance and Stuart-Smith (2013). Other studies have found non-significant age-related difference (e.g. Neiman, Klich, & Shuey, 1983) and more variance in elderly speakers compared to younger people (e.g. Petrosino, Colcord, Kurcz, & Yonker, 1993). Finally, age effects differ across populations in studies such as Torre and Barlow (2009) thus reducing the strength of age as an indicator of change in progress in VOT production. Different results for VOT values between adults and children could be the effect of a lack of normalization, with children showing slower speech rates and, consequently, longer VOT values (Lee, Potamianos, & Narayanan, 1999), but could also be caused by developmental effects (Lowenstein & Nittroues, 2008). From a sociolinguistic perspective, real and apparenttime studies of language change have shown that, when local speech rate is controlled (thus allowing comparison between elder and young speakers), changes in the VOT values of a linguistic community can reflect a phonological change in progress, and not the results of physiological aging (Stuart-Smith et al., 2015).
94
R. Nodari et al. / Journal of Phonetics 73 (2019) 91–112
Table 1 Mean VOT measurements (in msec) reported for Calabrian Italian and English (in their respective long-lag contexts). p
t
k
English Canadian English monolinguals Canadian English monolinguals Ohio English Glaswegian English Canadian English bilinguals Canadian English bilinguals
Word list Sentences Conversation Conversation Word list Sentences
62 ms 60 ms 51 ms 38 ms 39 ms 53 ms
70 ms 76 ms 53 ms 45 ms 48 ms 70 ms
90 ms 78 ms 58 ms 54 ms 67 ms 70 ms
Caramazza et al. (1973) Fowler et al. (2008) Yao (2009) Stuart-Smith (2007) Caramazza et al. (1973) Fowler et al. (2008)
Calabrian Italian Cosenza Lamezia Terme
Sentences Conversation
24 ms 34 ms
38 ms 41 ms
47 ms 48 ms
Sorianello (1996) Nodari (2017a, 2017b)
Like age, the speakers’ sex potentially affects VOT but again results differ across studies. For English-speaking adults (including Canadians), females have been reported to have longer VOT than males (Sweeting & Baken, 1982; Swartz, 1992; Ryalls, Zipprer, & Baldauff, 1997; Whiteside & Irving, 1997; Morris et al., 2008). For other languages, there may be the opposite pattern, with males exhibiting longer VOT than females (e.g. Oh, 2011; Kang & Nagy, 2016 for Korean). Speakers of different sexes can also be differently judged by listeners from the same speech community with respect to the way they implement stop category distinctions (e.g. Khattab, Al-Tamimi, & Heselwood, 2006). Differences among individual can indeed index sociophonetic gender variations imbued with sociophonetic values (Robb, Gilbert, & Lerman, 2005; Oh, 2011) and, furthermore, speakers can show talkerspecific realization of VOT patterns (Allen, Miller, & DeSteno, 2003; Chodroff & Wilson, 2017). Bilingual speakers have been shown to produce VOT differently from monolinguals; the role of social factors in bilingual production has been little investigated. VOT production has been most frequently investigated for late bilinguals whose L1 and L2 both possess a two-way phonemic voicing distinction: this is the case for bilinguals acquiring English as a second language late in life and speaking French (Flege, 1987), Portuguese (Major, 1992; Sancier & Fowler, 1997), Japanese (Harada, 2003), Spanish (Lord, 2008; Flege & Eefting, 1987b), Dutch (Flege & Eefting, 1987a; Mayr, Price, & Mennen, 2012), Italian (Nagy & Kochetov, 2013) or Russian (or other Slavic languages) (Dmitrieva, Jongman, & Sereno, 2010; Nagy & Kochetov, 2013) as a first language. In these studies, bilinguals react in a variety of ways to contact with an L2. In some cases, they assimilate the L1 phonetic properties to those of the L2, producing intermediate VOT values (e.g. Flege, 1987; Harada, 2003; Lord, 2008; Major, 1992; Sancier & Fowler, 1997). In some other cases, they exaggerate the L1 phonetic properties to maximally differentiate them from the properties of the L2 (e.g. Flege & Eefting, 1987a, 1987b), or enhance L1 contrasts by transferring some acoustic cues that are typical of the L2 into the L1 (Dmitrieva et al., 2010). A few studies focused on speakers of English and Korean, two languages that differ in the number of phonological categories distinguished along the VOT continuum (two and three, respectively). In these studies, late bilinguals show bidirectional cross-linguistic effects and a reduction in the number of phonological contrasts (Kang & Guion, 2006), sometimes with sex-related differences (Kang & Nagy, 2016).
Cross-generational changes are also investigated in a study of VOT production in heritage Russian, Ukrainian and Calabrian Italian, for speakers in Toronto (Nagy & Kochetov, 2013). This study examined only the word-initial, stressedsyllable context in which aspiration is the norm according to English rules. In that study, Ukrainian and Russian speakers showed a lengthening of VOT in voiceless stops across generations, drifting toward English-like VOT in their heritage varieties. However, for the 11 Calabrian Italian speakers, the authors did not report cross-generational increase in VOT. This difference was not related to length of residence in Canada, ethnic orientation or language use and the authors provisionally interpreted it as an effect of accidental factors such as the presence of greater institutional support of Italian in Toronto (which could have led the second and third generation speakers to attend classes of Italian as a heritage language). 1.3. Experimental hypotheses
Inspired by the results of Nagy and Kochetov (2013) as well as by the literature on VOT production in second and heritage languages reviewed above, the current study more broadly investigates voiceless stop aspiration in the Calabrian community of Toronto, with the aim of addressing three research questions. The first question is whether the apparent lack of crossgenerational change in VOT realization in stressed syllables, attested in Nagy and Kochetov (2013) can be generalized to unstressed syllables, which are, as we have seen, the preferential context for Calabrian sociophonetic aspiration to occur. To answer this question, stressed CV and unstressed C.CV contexts will be considered separately. In addition, besides analysing aspects of phonetic implementation (VOT duration), we will look at the distribution of aspiration in the dataset: to this aim, the percentages of tokens perceived (by native Italian speakers) to be aspirated in the two contexts were calculated. The second research question is therefore whether, even in the lack of any apparent change at the phonetic implementation level (VOT duration), there can be a cross-generational shift at the distributional level, from preferential aspiration in unstressed C.CV contexts (as typical for Calabrian Italian) to preferential aspiration in stressed CV contexts (as typical for English). A third research question is whether the socio-indexical values that characterize voiceless stop aspiration in homeland Calabrian Italian are maintained in heritage Calabrian Italian, and transmitted from first generation immigrants to their chil-
R. Nodari et al. / Journal of Phonetics 73 (2019) 91–112
dren and grandchildren. Specifically, we want to test three competing hypotheses. A first hypothesis predicts that socio-indexical features, being rooted by definition in the social dynamics of the linguistic community, change when speakers of that community move to another community (in which these speakers represent a minority group). According to this hypothesis, changes in the distribution and/or phonetic implementation of Calabrian Italian aspiration should be evident in the speech of first generation immigrants already, in comparison with homeland speech. Alternatively, a second hypothesis predicts that socio-indexical uses of speech variation are maintained when speakers move to another social environment, but as the language in the new environment is spoken by only a minority of the speakers within a multilingual setting and with mutated social conditions, sociophonetic variables are no longer transmitted to subsequent generations. According to this hypothesis, then, first-generation immigrants should differ significantly from their descendants in the use of the relevant sociophonetic variable. Finally, a third hypothesis would predict that both first generation immigrants and their descendants maintain similar use of the sociophonetic variable, under the assumption that there is full cross-generational transmission of socio-indexical features in heritage language settings. In other words, it could be hypothesized that, by virtue of their special status as indices of extra-linguistic meaning, in addition to purely linguistic meaning, sociophonetic variables undergo a different fortune in cross-generational transmission of heritage languages and are less subject to the influence of the majority language, compared to phonetic or phonological features that do not carry sociolinguistic information. By additionally looking at the speakers' sex, we further hypothesize that maintenance or loss of socio-phonetics features in the heritage situation is not homogeneous within each generational group, being determined, among other things, by identity-related characteristics of the speakers. We address this third question by contrasting three contexts in which voiceless stop VOT varies. Two have already been mentioned: the unstressed C.CV context, where variation is sociolinguistically relevant (or socioindexical) for homeland Calabrian Italian speakers, and the stressed CV context, where it is not, but where heritage Italian may be subject to contact-induced effects from English. The third context is the unstressed CV context, where aspiration is not expected in either Calabrian Italian or English, but where heritage Italian may be subject to a behavioural shift if contact-induced effects from English generalize to singleton consonants, irrespective of whether they are pre- or post-tonic. We did not include the fourth logical possibility, i.e. the stressed C.CV context, because both the heritage and the majority language allow aspiration in this kind of syllable, making it difficult to interpret in terms of either contact-induced change or retention of a sociophonetic feature. In sum, by analysing the incidence and the phonetic realization of voiceless stop aspiration in the speech of three generations of speakers, we can track transmission of this sociophonetic feature, to investigate aspects of L1 plasticity after migration in the speech of first generation immigrants, and to further document the complex L1-L2 dynamics regulating the speech production patterns of heritage language speakers.
95
2. Materials and methodology 2.1. Speakers from the HLVC corpus
Data for this study come from the Heritage Language Variation and Change in Toronto Project (HLVC, Nagy, 2009, 2011). Participants provide data via two tasks. The first is a Labovian-style sociolinguistic interview (Labov, 1984), a structured conversation, conducted in the heritage language, in which the interviewer seeks to elicit a relaxed speech style by asking questions on topics of interest to the participant. Interviewers are heritage speakers of Calabrian Italian. Topics include the speaker’s background, social practices, and experiences with languages. These interviews are orthographically transcribed in full, producing time-aligned transcripts in which the data representing variable linguistic patterns (such as, in this case, VOT) is later annotated. The second task is a verbally-administered ethnic orientation questionnaire (EOQ) which asks open-ended questions about the speaker’s language use practices and preferences, cultural practices and preferences, and experiences as a speaker of their particular language (here Italian). This task is also conducted in the heritage language. Responses are digitally recorded, after which the responses were coded into three categories: 2 points = oriented toward the Heritage language/culture; 1 point = in between, or both; 0 points = oriented toward the English language or Canadian culture. This scale provides numeric scores that can be compared across speakers or groups. The HLVC corpus contains data for three generations of heritage speakers of eight languages, including Calabrian Italian. First generation speakers (Gen 1) were born in the homeland (here Calabria) and moved to the Greater Toronto Area (hereafter, Toronto) after age 18. They had lived in Toronto at least 20 years at the time of recording. Second generation speakers (Gen 2) were born in Toronto (or came there before age 6) and their parents qualify as Gen 1 speakers. Finally, third generation speakers (Gen 3) were born in Toronto and their parents qualify as Gen 2 speakers. The sample examined in this paper consists of excerpts from 23 speakers, distributed by generation and sex as shown in Table 2.
2.2. The dataset: aspiration contexts
The data come from the first HLVC task, the sociolinguistic interview. 4973 words containing one of the three stops /p t k/ (around 215 words per speaker on average), were selected and annotated. The stops were categorized as either potential targets of Calabrian-like aspiration (as defined above, Section 1.2: post-sonorant or geminate stops in unstressed syllables, or the C.CV context for short), or of English-like aspiration (intervocalic singletons in stressed syllables or the CV context for short) or even as targets of neither (unstressed CV contexts or CV_U). These three contexts will be examined separately throughout. The selection of tokens began after the first 5 min of each interview. A number of caveats were observed for the selection of tokens, mostly related to the specificities of conversational speech. A first caveat concerns the definition of ‘stressed’
96
R. Nodari et al. / Journal of Phonetics 73 (2019) 91–112
Table 2 Speaker sample for this study. For details on speaker codes, see Nagy (2009). Generation
Females
Males
Gen 1
I1F59A I1F65A
I1M60A I1M61A I1M62A I1M75A
Gen 2
I2F32A I2F44A I2F45A I2F53A I2F57A
I2M14A I2M19A I2M42A I2M53A
Gen 3
I3F21A I3F21B I3F23A I3F33A 11
I3M18B I3M22B I3M27AI3M28A
Total
12
and ‘unstressed’. We relied on relative prominence in selecting the relevant syllables for the two subsets. Perceived prominence being related to several frequency and amplitude parameters (Terken, 1991), we assessed the relative prominence of syllables both auditorily and spectrographically, based on pitch and intensity differences. The data set of the current analysis included only broad focus and contrastive focus statements (no questions). Based on the typology of intonation labelling developed within the autosegmentalmetrical approach (e.g. D'Imperio, 2002 for southern varieties of Italian), only pre-nuclear pitch accents were counted as stressed. Sentence-final words were excluded, to avoid syllables that could have undergone pre-boundary lengthening. Similarly excluded were English or dialectal Calabrian words (borrowings, code-mixing units) as well as isolated words, words pronounced under emphatic stress or with hesitation. A further caveat concerned the presence of syntactic doubling (SD), which is present in the Calabrian variety of Italian under investigation. SD was much more frequent in Gen 1 speakers than elsewhere. When SD produced geminates before unstressed vowels (e.g. si fa capace '(s/he) becomes able' produced as [sifak:aˈpatSe]), the stop was included, and labeled as a geminate (as opposed to post-sonorant) in the C.CV context. Geminates were however only a minority of the tokens (188 out of 1455 total C.CV tokens). When SD produced geminates before stressed vowels (as in e.g. [eˈp: oi] e poi 'and then'), the stop was not included in the stressed CV dataset, as only singletons were. Stops could be followed by any of the five vowels of Calabrian Italian. Due to the uneven distribution of different types of vowels across different consonants, we reduced the levels in this factor. The best fit to the data made a distinction only between [+high] and [-high] vowels. In addition, stops followed by glides, unvoiced or breathy voice vowels were excluded. After being selected, each token underwent an acoustic and an auditory annotation. Segmentations and annotations were performed manually in PRAAT 6.0.36 (Boersma & Weenink, 2015). The acoustic annotation identified the closure duration of each stop, its VOT, and the duration of the following vowel. Closure duration was defined as the interval between the offset of F2 energy of the preceding vowel or sonorant, and the beginning of the stop release. VOT was defined as the duration from the onset of the stop burst to the first zero-crossing of the first (quasi)periodic wave of the following vowel (Turk, Nakai, & Sugahara, 2006; see also Abramson & Whalen, 2017: 83, who
explicitly mention that separating the end of the burst from the beginning of the aspiration is a challenging task with poor theoretical interest for VOT analysis). Tokens with no clear burst were discarded. When regions with both voicing and aspiration were found (Abramson & Whalen, 2017), they were assigned to the vowel. A PRAAT script automatically extracted duration values for the three intervals. In addition to raw values, normalized VOT duration was calculated. There is a debate in the scientific literature about whether and how VOT duration should be normalized (see e.g. Nakai & Scobbie, 2016). In agreement with the view in which VOT is not a property of a single segment but of the stop-vowel sequence (see Section 1.2 above), we opted for normalization in terms of CV duration (i.e., the VOT was divided by the duration of the entire CV sequence). The auditory coding classified each item as aspirated or non-aspirated, based on the traditional view that languages distinguish stops according to a limited number of categories (usually two to four), as well as on the assumption that many phonetic dimensions jointly concur to define stop categories in a language-specific way, including f0 at vowel onset, closure duration, F1 frequency and transition duration at vowel onset in addition to VOT (e.g. Cho, Jun, & Ladefoged, 2002; Green, Zampini, & Clarke, 1998; Llanos, Dmitrieva, Francis, & Shultz, 2013; Stevens & Klatt, 1974). These dimensions covary in cross-linguistically different ways and, although VOT can be considered among the most important cues in stop discrimination and identification (e.g. Liberman, Harris, Kinney, & Lane, 1961), the definition of stop categories is the result of the simultaneous contribution of different cues. As we were interested in the distribution of aspiration in different contexts as a cue to potential changes in progress, two native speakers of a Tuscan variety of Italian annotated each selected syllable as either aspirated or non-aspirated. The annotators were aware that the speech samples had been produced by Calabrian immigrants in anglophone Canada of different ages and generations. The choice of Tuscan Italian annotators was motivated by the need that the annotators have both a precise understanding of the speech input (thus necessarily having Italian as their L1) and a familiarity with non-phonemic voiceless stop aspiration (such as in the case of gorgia toscana or gemination for Tuscan Italian speakers, see Stevens & Hajek, 2010; Marotta, 2008), at the same time not being of Calabrian origin, because expectations about the dialectal distribution of aspiration as well as implicit knowledge of the socioindexical values associated to it might bias their annotation procedure (see e.g. Niedzielski, 1999 for the effects of social information on the perception of phonetic features). The annotation was performed independently by each annotator. Inconsistencies between annotators corresponded to 7% of the cases (348 tokens). After observation that most of these were in the speech of those speakers that had been annotated first by both annotators, these 348 tokens were re-annotated independently by both authors and the inconsistencies dropped to 2.7% of the dataset. These 135 tokens were removed from the dataset. Auditory coding of this nature has long been the standard in variationist sociolinguistic analysis as a way of providing categorical judgments, relevant to perception and transmission, across continuously variable contexts.
R. Nodari et al. / Journal of Phonetics 73 (2019) 91–112
Table 3 contains summary data describing the dataset. Due to the nature of the source data (conversational speech), there was an inescapably uneven distribution of items across places of articulation (more /t/ than /p/ and /k/) and vowel heights (more non-high vowels than high vowels). This, of course, reflects the uneven distribution of exposure for speakers as well. Relative frequency of occurrence may well be relevant to transmission patterns. 2.3. Factors in the analysis
We analysed voiceless stop aspiration produced by the HLVC Calabrian Italian speakers in two ways: acoustic and auditory. Analyses were run separately for the three contexts (C.CV, CV and CV_U). All six analyses tested the following independent variables. Two variables were related to the phonetic properties of the signal. These were the type of the consonant (three-level variable CPlace: /p/, /t/, /k/) and the height of the following vowel (Vheight: high, non-high). Another two variables were social: the speaker’s generation (Gen 1, Gen 2, Gen 3) and sex (male, female). We wanted to include the speakers' age as an additional factor, as it proves to be one of the most interesting and controversial predictors of VOT variation (see above, Section 1.2). However, age and generation are collinear,2 forcing us to exclude one of them from the models. We opted to keep generation, which is directly relevant to our hypotheses. In the acoustic analysis, the fact of being part of a syllable that had been auditorily judged as aspirated or not was included as an additional factor (Auditory_Rating: aspirated, non-aspirated). To understand how transmission of the variable operates, it is critical to know the contexts in which aspiration is and is not perceived, a fact that is not derivable from acoustic measures. The model of the acoustic variable compares mean VOT values over contexts while the auditory analysis distinguishes whether, e.g., a short mean VOT in a particular speaker group indicates that all (or the majority of) those speakers consistently produce short VOTs or whether it is rather the case that a few speakers produce longer VOTs and others (or most) produce no or very short aspiration. That provides two quite different snapshots of a change in progress. 3. Results 3.1. VOT duration 3.1.1. Data distribution
Fig. 1 shows the distributional density of VOT values before (left graph) and after (right graph) time normalization, for the three relevant contexts pooled across all speakers. In all contexts, voiceless stops are realized predominantly with relatively long VOT, in the range associated with audible aspiration, but tokens with short-lag VOT are also present, indicating that some stops are unaspirated. These are present in particular in the CV and CV_U contexts. A difference between these two contexts emerges in the normalized VOT graph due to overall shorter vowels in unstressed compared to stressed 2 In the speaker sample the average age is 64 years for Gen 1, 40 for Gen 2, and 24 for Gen 3.
97
vowels, which produces longer normalized VOT in the former and shorter normalized VOT in the latter context. VOT is on average longer in C.CV (41 ms, s.d. 22) than in CV (26 ms, s.d. 17) and CV_U (28 ms, s.d. 10) contexts (all differences were statistically significant in the linear mixed model, see analysis below). This could either mean that more tokens were produced with long-lag VOT in the C.CV contexts (i.e., syllables are more often aspirated here than in the CV contexts), or that VOT reached on average longer values (independently from aspiration rate). The analyses in the following subsections will show that both are true. When looking at place of articulation distinctions across generations (Figs. 2, 3 and 4), we start appreciating the presence of some cross-generational change, at least as far as the raw data are concerned. Fig. 2 shows that, in the C.CV context, the /p/-/t/-/k/ distinction is present in all three generations, but with values concentrated over progressively shorter intervals from first to second and third generation. By contrast, in the CV context (Fig. 3), there is an opposite trend relating first and third generation, with VOT distribution occupying a progressively wider range of values. The second generation is more similar to the first than to the third, particularly for having very few occurrences of VOT in the long-lag range (between 30 and 100 ms) for /p/. Finally, in the CV_U context (Fig. 4) the pattern resembles that of CV contexts, inasmuch as the third generation shows much more occurrences of long-lag VOTs (between 30 and 80 ms) in /p/ and /t/; /k/ is also more centred to the right of the graph. Finally, Fig. 5 provides an overview of cross-individual variation with respect to place of articulation. The distribution of raw VOT values for the three consonants of interest is plotted for each speaker separately, pooled across all three contexts. The first number after “I” in the speaker code indicates the generation of the speaker. Some of the speakers show a large amount of variation for all three consonants, often failing to distinguish the places of articulation from one another (e.g. I1F59A, I1F65A, I2M42A, I3F33A, I3M28A). Other plots (e.g. I1M61A, I2F57A, I2M14A, I3M27A) suggest that the speakers are more consistent across contexts in differentiating the three places of articulation. Plots of both types occur for speakers of all three generations, suggesting that cross-individual variation is stronger than cross-generational variation if we look at the data from this point of view. Another source of crossindividual variation comes from the fact that for most speakers, the distribution of VOT values for /p/ is concentrated over the range of the shortest durations, but this is not true for a few of the speakers (e.g. I1F65A, I3F21B). 3.1.2. Analysis of the residuals and general model with main effects
The 188 tokens with a geminate were excluded henceforth from the analysis of VOT variation. With the remaining 4785 tokens, we performed a linear mixed effects analysis (Baayen, Davidson, & Bates, 2008) using R (version 3.4.3; R Core Team, 2017) and lme4 package (Bates, Maechler, Bolker, & Walker, 2015). Normalized VOT was the dependent factor. As fixed effects we entered Generation, CPlace, Vheight, Auditory_Rating, Sex, as well as the distinction between the three contexts. Participant was included as a random effect to account for pervasive speaker-related variation and prevent any outliers from having undue effect on the
98
R. Nodari et al. / Journal of Phonetics 73 (2019) 91–112
Table 3 The dataset (n = 4973). Mean VOT values are reported for each context. See (Section 2.2). Following segment
Stressed CV contexts (CV) (e.g. post-pausal tutto 'all')
Unstressed C.CV contexts (C.CV) (e.g. stanco 'tired')
Unstressed CV contexts (CV_U) (e.g. Calabria ‘Calabria’)
Total
n
Mean VOT (s.d.)
n
Mean VOT (s.d.)
n
Mean VOT (s.d.)
/p/
high V non-high V Total
217 476 693
37 (27) 19 (13)
73 212 285
37 (24) 32 (19)
35 469 504
23 (10) 21 (11)
325 1157 1482
/t/
high V non-high V Total
162 524 686
27 (12) 23 (12)
174 579 753
47 (22) 37 (21)
132 446 578
27 (10) 25 (9)
468 1549 2017
/k/
high V non-high V Total
42 543 585
42 (18) 31 (15)
38 374 412
53 (22) 45 (21)
69 408 477
31 (14) 29 (10)
149 1325 1474
Fig. 1. Density plot for the VOT of voiceless stops in C.CV (n = 1237), CV (n = 1921) and CV_U (n = 1815) contexts, raw values (in ms) on the left, time normalized values on the right.
speaker groups in which they belong.3 We report the significance levels of main effects using deviation coding; for threelevel factors (such as CPlace and Generation) we additionally looked at least-squared mean (lsmean) differences, in order to compare levels among each other. Fig. 6 shows that, overall, VOT slightly decreased from Gen 1 to Gen 2 and Gen 3, but the differences are never statistically 3 While it would be similarly helpful to include Word as a random effect, that was not possible in this sample from spontaneous speech, where many words are used only once.
significant (p > .05). Concerning CPlace, /k/ has the longest VOT and /p/ the shortest, whereas /t/ is intermediate but closer to /p/ than to /k/; both comparisons are statistically significant (/p/-/t/ with p < .05; /t/-k/ with p < .01). VOT is significantly longer in C.CV than in CV_U, and also longer in CV_U than in CV (p < .01 for both comparisons). VOT is also significantly longer before high than before non-high vowels (p < .01), and in syllables that are rated as aspirated than in those that are rated as non-aspirated (p < .01). Sex-related differences are not statistically significant (p > .05).
R. Nodari et al. / Journal of Phonetics 73 (2019) 91–112
99
Fig. 2. Density plot for the VOT of voiceless stops in C.CV contexts as a function of generation.
This preliminary exploration suggested that some factors strongly impact VOT variation in the dataset as a whole and are expected to potentially contribute to best modeling VOT variation in all three contexts of investigation. Others factors, such as Generation and Sex, did not emerge as significant main effects and were therefore expected to play different roles (if any) across contexts. We thus proceeded with fitting mixed effects models for the CV, C.CV and CV_U contexts. Before that, 101 tokens with values more than 2.5 SD from the mean were excluded, corresponding to 2.1% of the dataset. The trimmed dataset comprised 4684 tokens. 3.1.3. Factors predicting VOT variation: CV, C.CV and CV_U models
A separate linear mixed effects model (Baayen et al., 2008) was fit to the normalised VOT data for each context. In each model, Generation, CPlace, Vheight, Auditory_Rating and Sex were entered as fixed effects. Participant was treated in the analysis as a crossed random effect; the effects of Generation and Consonant were also modelled as random slopes of the Participant random term. To find the model that best fitted the data, the step function of the lmerTest package (Kuznetsova, Brockhoff, & Christensen, 2017) was used; it performs automatic elimination of all non-significant effects by comparing the AIC (Akaike information criterion) improvements from dropping each candi-
date variable, and adding each candidate variable between the upper and lower bound regressor sets supplied by the model, and by dropping or adding the variable that gives the best AIC improvement. Starting from an initial model including all variables, elimination of all non-significant effects proceeds one variable at a time. Elimination of the random part is performed first, followed by elimination of the fixed part. Elimination of the random part is done by using the likelihood ratio test; if a correlation is present between the slope and the intercept, then the simplified model will retain just the intercept. Linear mixed models were fit by REML (residual maximum likelihood) tests; the p-values for the fixed effects were calculated from F tests based on Satterthwaite’s approximation, whereas the p-values for the random effects were based on the likelihood ratio test. The mixed-effect models included a random intercept term for Participant with 0.0004, 0.0015 and 0.0005 values of variance for the CV, C.CV and CV_U contexts, respectively. Maximum correlations among main effects were always <0.35 for all three contexts. Table 4 reports significant main effects and interactions for the CV model. With deviation coding, the intercept refers to the VOT grand mean across the dataset. Two-level variables are deviation coded, to show the main effect of those variables (e.g. for a two-level variable such as Sex, we test the null hypothesis that
100
R. Nodari et al. / Journal of Phonetics 73 (2019) 91–112
Fig. 3. Density plot for the VOT of voiceless stops in CV contexts as a function of generation.
VOT does not differ between males and females). Interactions between factors are deviation coded as well. For main effects of three-level variables, Table 4 reports the contrast (leastsquared mean (lsmean) differences) instead, in order to compare levels among each other. Only significant effects and interactions are shown. The results show that the null hypothesis could be rejected in the cases of Sex, Vheight and Auditory_Rating, since stops before non-high vowels were significantly shorter than before high vowels, males showed significantly longer VOT values than females, and perceived aspirated syllables had a longer VOT than perceived non-aspirated syllables. However, Sex and Vheight were found to interact. An analysis of contrasts revealed that the difference between males and females was significant only in the high vowel context. CPlace was a powerful predictor too, with /t/ differing from /k/ as well as from /p/ (p < .001 in both contrasts). The interaction between CPlace(t-p) and Auditory_Rating further suggested that the /t/-/p/ difference between consonants was not equally distributed in syllables perceived as aspirated and in syllables perceived as non-aspirated. In fact, inspection of contrasts showed that the difference was significant in perceived non-aspirated syllables, whereas it was not significant (p > .05) in the perceived aspirated ones. This however reduces only slightly the predictiveness of CPlace, since
all other contrasts turned out to be significant. Moreover, the significant interactions between CPlace and Vheight suggested that the comparison between each consonant and the category grand mean could be different in the high vs nonhigh vowel context. As we are more interested in how the three constriction locations differ from each other, we looked at the lsmeans differences between consonants in the two vowel contexts. The contrasts showed that, when vocalic contexts were separately considered, the difference between /t/ and /p/ was significant only in the high vowel condition (p < .001), whereas the difference between /t/ and /k/ was significant only in the non-high vowel condition. In general, the effects of CPlace, Vheight and Auditory_ Rating (and their interactions) were consistent with phoneticphysiological explanations, whilst the effect of Sex was consistent with what is reported in the literature for languages other than English (see above, Section 1.2). By contrast, there was no main effect of Generation, thus indicating that VOT did not significantly change across generations in the CV context. However, Generation did modulate the effect of CPlace, suggesting that speakers of different generations realize the /p/-/t/-/k/ contrast in different ways. Again, analysis of lsmeans differences revealed that the difference between /p/ and /t/ was significant in Gen 2 and Gen 3 (p < .001 in both cases), but not Gen 1 (p > .01): for the latter
R. Nodari et al. / Journal of Phonetics 73 (2019) 91–112
101
Fig. 4. Density plot for the VOT of voiceless stops in CV_U contexts as a function of generation.
speakers, VOT varied according to a dichotomous distinction between dorsal and non-dorsal consonants. Table 5 reports significant main effects and interactions for the C.CV model. Again there are still significant effects of Auditory_Rating and of phonetic variables (Vheight, Cplace), as expected; Sex was however not included in the optimized model. Interestingly, Generation exerted a significant effect both alone, in the comparison between Gen 1 and Gen 3 speakers (the latter having shorter VOT), and in interaction with CPlace. Post-hoc analysis of lsmeans shows that neither Gen 1 nor Gen 3 speakers differentiated /t/ from /p/ (p > .05 in both cases), whereas Gen 2 speakers did (p < .001). Finally, Table 6 reports significant main effects and interactions for the CV_U model. As in the CV model, there is a main effect for Vheight, CPlace and Auditory_Rating, but no main effect of Generation. The speakers’ generation, however, modulates the realization of place- and vowel-related distinctions. By looking at lsmeans, it is found that, as in the C.CV model above, first and third generation speakers made significant distinctions between /p/ and /k/ and between /t/ and /k/ (p < .0001 in both cases), whereas for both groups of speakers /p/ and /t/ are not significantly different; by contrast, second generation speakers significantly differentiate between /p/ and /t/ (p < .05), in addition to /p/-/k/ and /t/-/k/ (p < .0001 in both
cases). There was an effect of Generation on Vheight distinctions: only Gen 3 speakers have significantly longer VOT values before high than non-high vowels. Finally, Sex had no effect when considered alone, but modulated the results with respect to Cplace distinctions: females did not distinguish between /p/ and /t/, whereas males did. To sum up, VOT variation is accounted for by phonetic and socio-individual factors that are partly different in the three contexts. The most powerful predictors are phonetic-physiological. The height of the following vowel and the place of articulation of the stop are significant in all models, as predicted by the literature. There is also a clear correspondence between the acoustic duration of VOT and the auditory judgment of the syllable as aspirated or non-aspirated in all contexts. On the other hand, socio-individual factors such as the sex and the generation of the speakers play a different role across context. Sex is absent from the C.CV model, whereas Generation is absent (as a main effect) from the CV and CV_U models. 3.2. Perceived aspiration
In this section, we first look at the range of patterns broadly, then examine the patterns via graphs. Then we present optimized logistic mixed effects models for each of the three
102
R. Nodari et al. / Journal of Phonetics 73 (2019) 91–112
Fig. 5. Density plot for the VOT of voiceless stops as a function of the speaker.
contexts. Finally, for a more detailed perspective on individual differences related to how the patterns are transmitted from one generation to the next, we look at the effects of linguistic attitudes and practices by seeking correlation of individuals’ EO scores and rates of perceived aspiration. The whole dataset (4973 items, including geminates) was used for this analysis. Table 7 lists the speakers and indicates how often they were perceived to aspirate in the C.CV, CV and CV_U contexts. 3.2.1. General distributional patterns
Table 7 shows clear trends of cross-generational change for all three contexts, differing from the previous analysis of VOT (where no cross-generational effects were perceived in the
CV and CV_U contexts). Perceived aspiration in the C.CV context was produced either categorically (I1F59A), very frequently (I1F65A, I1M60A, I1M62A, I1M75A) or frequently (I1M61A) by Gen 1 speakers. However, only three of nine speakers in Gen 2 and two of eight speakers in Gen 3 showed rates of aspiration comparable to Gen 1. In contrast, perceived aspiration in the CV-stressed context was rarely or never produced by Gen 1 speakers, whereas four Gen 2 speakers and all but one Gen 3 speaker showed rates higher than 15%. Every Gen 1 speaker is perceived to aspirate much more frequently in the C.CV than in the CV context; all Gen 2 speakers retain this contrast but to a less stark degree. In contrast, half the speakers in Gen 3 have reversed the contrast, being
103
R. Nodari et al. / Journal of Phonetics 73 (2019) 91–112
Fig. 6. Plots of the main effects predicted by the general model (n = 4785).
Table 4 Results of the best-fitting linear mixed model for the CV context (significant effects and interactions only; number of observations: 1938, subjects: 23). Main effects: deviation coding for twolevel variables (Sex, Vheight, Auditory_Rating), lsmeans differences for three-level variables (Generation, CPlace). Interactions: deviation coding. See text for further details and explanations. Estimate
Std. Error
t Value
Pr(>|t|)
Intercept
0.385
0.009
36.550
0.00001
***
Sex(M) Vheight(non-high) Auditory_Rating(non-aspirated) CPlace: /t/-/p/ CPlace: /t/-/k/ CPlace: /p/-/k/
0.194 0.031 0.011
0.001 0.013 0.014 0.004 0.005 0.005
2.564 13.333 23.343 4.41 5.47 8.99
0.01807 0.00001 0.00001 0.00001 0.00001 0.00001
* *** *** *** *** ***
Generation(1)*CPlace(p) CPlace(p)*Vheight CPlace(t)*Vheight CPlace(p)*Auditory_Rating Sex*Vheight
0.005 0.243 0.248 0.379 0.494
0.070 0.009 0.004 0.008 0.004
2.476 4.311 3.206 3.310 2.293
0.01335 0.00068 0.00136 0.00094 0.02197
* *** ** *** *
perceived to aspirate more often in the CV context than in the C.CV context, suggesting an influence of English distributional patterns on their Italian. Fig. 7 further shows that the sex of the speakers and the consonant modulated the distribution of perceived aspiration across the two contexts. /t/ tokens were perceived as aspirated more often than /p/ and /k/ in Gen 1 and Gen 2 speakers, for both sexes, in the C.CV (Calabrese aspiration) context.
In the CV (English aspiration) context, the opposite generational pattern was apparent and /t/ tokens were not perceived as aspirated more often than /p/ and /k/ in any group. In the CV_U context, where aspiration is not predicted to occur frequently by the rules of either language, we see low rates of perceived aspiration, though still with a gradual increase from Gen 1 to Gen 2 to Gen 3. Here, aspiration is perceived most often for /p/ and least for /k/.
104
R. Nodari et al. / Journal of Phonetics 73 (2019) 91–112
Table 5 Results of the best-fitting linear mixed model for the C.CV context (significant effects and interactions only; number of observations: 1217, subjects: 23). Estimate
Std. Error
t Value
Pr(>|t|)
Intercept
0.427
0.001
26.197
0.00001
***
Vheight(non-high) Auditory_Rating(non-aspirated)
0.055 0.020
0.004 0.004
7.057 11.424
0.00001 0.00001
*** ***
Generation: Gen 1-Gen 3
0.022
2.12
0.046
*
CPlace: /t/-/p/ CPlace: /t/-/k/ CPlace: /p/-/k/
0.006 0.005 0.007
2.75 7.57 8.28
0.006 0.00001 0.00001
** *** ***
0.012 0.011
2.256 2.692
0.02424 0.00719
* **
Generation(1)*CPlace(p) Generation(2)*CPlace(p)
0.001 0.712
Table 6 Results of the best-fitting linear mixed model for the CV_U context (significant effects and interactions only; number of observations: 1529, subjects: 23). Estimate
Std. Error
t Value
Pr(>|t|)
Intercept
0.525
0.005
32.080
0.00001
***
Vheight(non-high) Auditory_Rating(non-aspirated)
0.373 0.027
0.013 0.006
3.806 10.710
0.00014 0.00001
*** ***
0.0017 0.00001 0.00001
** *** ***
0.01176 0.01791 0.00021 0.00032
* * *** ***
CPlace: /p/-/t/ CPlace: /p/-/k/ CPlace: /t/-/k/
0.007 0.008 0.007
Generation(2)*CPlace(p) Generation (2)*Vheight CPlace(p)*Sex Cplace(t)*Sex
0.002 0.554 0.203 0.313
3.445 14.875 11.789
0.041 0.005 0.013 0.015
2.522 2.370 5.275 4.393
Table 7 Rates (%) of perceived aspiration by speaker and context (n = 4973). Gen1
C.CV
CV
CV_U
Gen2
C.CV
CV
CV_U
Gen3
C.CV
CV
CV_U
I1F59A I1F65A I1M60A I1M61A I1M62A I1M75A
100 88 88 67 99 83
0 9 5 0 0 9
0 0 5 0 4 0
I2F32A I2F44A I2F45A I2F53A I2F57A I2M14A I2M19A I2M42A I2M53A
50 76 13 42 10 21 91 90 47
17 4 0 3 9 15 37 26 0
4 7 0 5 0 0 31 3 7
I3F21A I3F21B I3F23A I3F33A I3M18B I3M22B I3M27A I3M28A
7 13 4 81 11 18 41 81
6 60 60 23 18 16 82 29
4 20 7 34 0 16 27 15
Average for generation
87
5
2
49
13
6
32
37
16
3.2.2. Factors predicting perceived aspiration: CV, C.CV and CV_U models
In order to see which factors significantly correspond to the differences in perceived aspiration, and explore the possibility of interactions among the factors, a number of mixed effects models were compared, again according to AIC values. A requirement of the models was the exclusion of categorical contexts. Therefore, the speaker who aspirated in the C.CV context 100% of the times, the five speakers who never aspirated in the CV context, and the eight speakers who never aspirated in the CV_U context were excluded from the datasets used to construct the models below. This means that the generational effects are even stronger than what is shown, given that the majority of categorical speakers were in Gen 1 – they further weight the results in the directions shown below. A separate binomial (logit) mixed effects model (Baayen et al., 2008) was fit to the binary dependent variable of Perceived Aspiration for each context. In each model, CPlace, Vheight, Generation and Sex were entered as fixed effects. Categorical predictors were contrast coded and Participant was treated as a random intercept.
The initial model includes all variables, while the final model, in Table 8, rejects Sex as a predictor. Given our hypotheses, comparison between this simple model and models with interactions of Generation and Sex, and of Generation and C_Place were compared but rejected in favor of the simpler model, which has a lower AIC value. Interactions did not reach significance. Generalized linear mixed models were fit by maximum likelihood using LaPlace transformations. The mixed-effect models included a random intercept term for speaker with 2.1, 1.023 and 0.52 values of variance for the C.CV, CV and CV_U contexts, respectively. Maximum correlations among main effects were low: <0.36, <0.23 and <0.04 for C.CV, CV and CV_U context respectively. Table 8 reports the main effects for the CV model. Reference levels are provided in italics. Significance levels are indicated in the rightmost column. With contrast coding, the intercept models the mean rate of perceived aspiration with the reference values of the predictors (listed in italics in Table 8). Its estimate is 2.5, so the fitted model for the mean is 1/(1 + e2.5) = 8%. Variables are contrast
105
R. Nodari et al. / Journal of Phonetics 73 (2019) 91–112
Fig. 7. Perceived aspiration rate by context, generation and sex of the speakers. Light grey bars: /p/; dark grey: /t/; striped: /k/.
Table 8 Results of the best-fitting logistic mixed model for the CV context (number of observations: 1629, subjects: 18). See text for further details. Estimate Intercept CPlace(K) CPlace(P) CPlace(T) Vheight(high) Vheight(non-high) Generation(1) Generation(2) Generation(3) Sex(F) Sex(M)
2.50 0.41 0.05 1.04
Std. Error
n
% Perceived aspirated
0.73 0.18 0.19 0.17
0.89 2.16
0.75 0.73
0.60
0.51
z Value
Pr(>|z|)
Sig.
0.00
***
0.02 0.78
*
0.00
***
1.18 2.94
0.24 0.00
**
1.17
0.24
3.43 482 587 560 366 1263 416 581 632 778 851
coded; to show the main effect of those variables (e.g. for a twolevel variable such as Sex, we test the null hypothesis that the rate of perceived aspiration does not differ in each listed context
20% 29% 21% 34% 21% 8% 15% 41% 22% 25%
2.28 0.28 6.27
from the reference value, e.g., the row labelled C_Place(P) indicates a significant difference in rate of perceived aspiration for / p/ tokens than for /k/ tokens.
106
R. Nodari et al. / Journal of Phonetics 73 (2019) 91–112
The results show that the null hypothesis could be rejected in the cases of CPlace, Vheight and Generation. /p/ tokens were perceived as aspirated significantly more often than /k/ tokens, though /t/ tokens were perceived as aspirated at a non-significantly higher rate than /k/ tokens. Stops before non-high vowels were perceived as aspirated significantly less often than before high vowels. Third generation speakers were perceived to aspirate in this context significantly more than first generation, while second generation speakers also were perceived to aspirate more than first generation, though not significantly more. Finally, we accept the null hypothesis that sex has no effect. The effects of Vheight is consistent with phoneticphysiological explanations, but the finding that /p/ tokens are perceived as aspirated more often than /k/ seems counter to the expectation that velar stops have longer VOT than bilabial stops. The main effect of Generation indicates that VOT significantly increases across generations in the CV context, but the difference is only significant between first and third generation. This confirms the distributional results in Fig. 3. Table 9 reports the main effects for the C.CV model. Again, there were significant effects of CPlace and Generation, as expected; but Sex and Vheight were not included in the optimized model. Here, differences between all three levels of Generation are found, with less aspiration perceived in speakers of later generations. Of note, in this context, /t/ is modeled as being perceived as aspirated significantly more often than /p/ and /k/, while no significant difference emerges between the latter two. Models with interactions were explored but again rejected. Table 10 reports the main effects for the CV_U model. Again, there were significant effects of CPlace, Vheight and Generation; but Sex was not included in the optimized model. Here, we see a difference between the first and third generation, with less aspiration perceived in speakers of earlier generations. In this respect, this context exhibits the same type of behaviour as CV, where we expect influence of English aspiration rules. As the CV_U context includes many word-initial stops, that effect is expected: English speakers preferentially aspirate word-initial voiceless stops, even when unstressed. Of note, in this context, /p/ and /t/ are modelled as being perceived as aspirated significantly more often than /k/. In contrast, in this context, aspiration is perceived more often with a following non-high vowel, the opposite of the effect seen in the C.CV and CV contexts. The intercept and the estimates are, overall, smaller than in the two contexts where aspiration was predicted by either English or Calabrian Italian rules, reflecting the low likelihood of aspiration (being perceived) in this context. 3.2.3. Effects of ethnic orientation
We examined the effects of speakers' attitudes and cultural choices by considering the correlation of each EO score to the intercepts for individual speakers generated as random effects in the best model, for the three contexts. Speakers who were categorically perceived to aspirate all the time (in the C.CV context) or never (in the CV contexts) were assigned dummy intercept values that were the next integer value above (or below) the highest (or lowest) intercept calculated for any of
the variable speakers. Correlations are calculated using Spearman’s rank correlation test, so the exact values assigned to the extreme speakers is unimportant. Only two scores in the CV dataset, were correlated to rates of perceived aspiration, as shown in Table 11. The more one (reports that s/he) uses Italian and the more one (reports that s/he) chooses Italian-oriented cultural activities (Lg_use and Cult_choice, respectively), the less one is perceived to aspirate in CV contexts. In contrast, the lack of any correlation to these social factors of the rate of perceived aspiration in the C.CV context highlights an important distinction: the internal variable is not affected by these measures of contact with, or attitudes toward, the English language, while the contactinduced variable does respond to the amount of use of each language as well as to choices about participating in Calabrian/Italian cultural activities. Four of six correlations emerge as significant in the CV_U context; this reflects the fact that seven Gen1/Gen2 speakers were never perceived to aspirate in this context, but only one Gen3 speaker behaved that way. Gen1/Gen2 speakers have higher scores than Gen3 for all EO measures. 4. Discussion
This study has shown that Calabrian Italian, as spoken as a heritage language in Toronto, is evolving from generation to generation. First generation immigrants, who were born in Calabria and acquired Calabrian Italian from birth, experienced contact with Toronto English as adults, after Calabrian Italian was fully acquired, including its system of socio-indexical relations. By contrast, their descendants were born in Toronto and experienced Calabrian Italian as a heritage language from immigrant Calabrian parents (2nd generation speakers) or Toronto-born parents and immigrant grandparents (3rd generation speakers). Our study has shown that aspiration of voiceless stops, a sociophonetic variable of Calabrian Italian, changes across generations in a complex way, and a series of language-internal and language-external factors intervene in modulating the patterns of change. We analysed VOT duration and the rate of perceived aspiration across three phonotactic contexts: geminates or postsonorant stops in unstressed syllables, where aspiration is expected to occur according to Calabrian norms, as opposed to singletons in stressed syllables, where aspiration is expected according to English norms, and singletons in unstressed syllables, where aspiration is not expected according to the norms of either language. We reasoned that evidence of cross-generation change could be expected either at the phonetic implementation level, with VOT duration progressively increasing from Gen 1 to Gen 2 and Gen 3, as previous studies had hypothesized (Nagy & Kochetov, 2013), or at a phonological distributional level, with aspiration shifting from the phonotactic contexts that are typical of Calabrian Italian to those contexts that are typical of English. Our analysis thus focused on both aspects, i.e. implementation and distribution. Consistent with previous studies, we found no evidence of cross-generational change (as a main effect) at the implementation level in the stressed syllable context that is typical of English aspiration; however, we did find evidence of change in the unstressed context that is typical of Calabrian aspiration.
107
R. Nodari et al. / Journal of Phonetics 73 (2019) 91–112 Table 9 Results of the best-fitting linear mixed model for the C.CV context (number of observations: 1368, subjects: 22).
Intercept CPlace(K) CPlace(P) CPlace(T) Vheight(high) Vheight(non-high) Generation(1) Generation(2) Generation(3) Sex(F) Sex(M)
Estimate
Std. Error
1.10
0.88
0.08 0.59
0.21 0.18
0.10
0.19
1.84 2.99
0.88 0.89
1.06
0.67
n
% Perceived aspirated
389 261 718 269 1099 510 404 454 490 878
56% 60% 59% 65% 57% 84% 53% 34% 35% 71%
z Value
Pr(>|z|)
1.24
0.21
0.38 3.37
0.70 0.00
0.54
0.59
2.11 3.38
0.04 0.00
1.58
Sig.
***
* ***
0.11
Table 10 Results of the best-fitting linear mixed model for the CV-unstressed context (significant effects and interactions only; number of observations: 1090, subjects: 15). Estimate Intercept
Std. Error
5.70
CPlace(K) CPlace(P) CPlace(T) Vheight(high) Vheight(non-high) Generation(1) Generation(2) Generation(3) Sex(F) Sex(M)
N
% Perceived aspirated
z Value
0.92
1.64 0.61
0.28 0.30
1.07
0.40
1.00 1.85
0.77 0.76
0.57
0.47
Pr(>|z|)
Sig.
0.00
***
5.96 2.04
0.00 0.04
*** *
2.71
0.01
**
1.30 2.45
0.19 0.01
*
1.23
0.22
6.21 328 360 402 159 931 164 386 540 476 1090
6% 23% 10% 5% 14% 5% 10% 18% 12% 13%
Table 11 Correlations between Ethnic Orientation scores and the rates of perceived aspiration (factor weights for individual speakers). Significant correlations are bolded. Context
Measure
Ethnic_ID
Lg_choices
Cult_env
Lg_use
Cult_choices
Discrim
CV
?? p ?? p ?? p
0.37 0.10 0.38 0.09 0.01 0.98
0.22 0.35 0.19 0.39 0.65 0.00
0.28 0.21 0.16 0.48 0.55 0.00
0.55 0.01 0.23 0.33 0.61 0.00
0.47 0.04 0.02 0.92 0.60 0.00
0.38 0.10 0.5 0.88 0.28 0.22
C.CV CV_U
In addition, we found clear evidence of a shift at the distributional level, with aspiration progressively moving from unstressed C.CV (in Gen 1 speakers) to stressed CV syllables (in Gen 3 speakers). In what follows, we discuss each result in detail, showing the impact of linguistic, individual and social factors on the cross-generational transmission of this Calabrian Italian feature. The analysis of VOT duration (Section 3.1) provides a negative answer to our first research question (Section 1.3) of whether the previously-reported lack of cross-generational change in VOT in stressed syllables can be generalized to unstressed syllables. Nagy and Kochetov (2013) had shown that, differently from other heritage language speakers in Toronto, heritage Calabrian speakers did not lengthen VOT in stressed syllables as a function of generation and of increased contact with English. This finding was confirmed here, as concerns main effects on VOT in the stressed CV context, the one in which aspiration usually occurs according to English pronunciation norms. However, VOT duration was found to decrease from Gen 1 speakers to Gen 3 speakers in the C.CV context, where Calabrian aspiration usually occurs. This finding suggested that, here, Calabrian aspiration is progressively chang-
ing in the speech of successive generations of heritage language speakers. Importantly, this cross-generational change affects the prototypical context of post-consonantal stops in unstressed syllables only (C.CV), and does not generalize to intervocalic stops in unstressed syllables (CV_U), as might be expected under the hypothesis that contact with English induces generalization of aspiration to all unvoiced singletons, irrespective of stress position. We conclude that the change that we observe for VOT in C.CV contexts is a change of the Calabrian sociophonetic variable across generations of heritage language speakers, and not a general structural change due to contact with the host language. This finding is consistent with other reported crossgenerational changes in phonetic detail (e.g. Mayr & Siddika, 2016; Celata & Cancila, 2010) and with studies suggesting that such changes may be gradual and incremental across successive generations (e.g. Sharma & Sankaran, 2011). However, Gen 2 speakers were found to perform more like Gen 1 speakers in our data. More abrupt changes in the phonetic implementation of VOT occur instead between the 2nd and the 3rd generation. One possible explanation is that, when the input in the heritage language is provided by speakers who grew
108
R. Nodari et al. / Journal of Phonetics 73 (2019) 91–112
up in the homeland, in a monolingual environment, pronunciation features are better maintained in the speech of the subsequent generations than when the input is provided by both immigrant and Toronto-born speakers. With respect to the traditional view, according to which the biggest differences in the heritage languages are found between the speakers who acquired the language as an L1 and came in contact with the L2 during adulthood (1st generation immigrants), on one hand, and the subsequent generations, on the other, the current results suggest a partially different picture: 3rd generation speakers may show qualitatively different patterns compared to 2nd generation speakers (see also Mayr & Siddika, 2016 for similar results). We considered the effects of phonetic factors on VOT production. Broadly speaking, in our dataset, a stop is more likely to have a proportionally longer VOT if it is a dorsal /k/, and if it is followed by a high vowel. This was found in all contexts, not surprisingly, since a huge phonetic literature predicts a similar outcome on the basis of typological comparisons and in-depth descriptive studies of particular languages (Section 1.2). An interaction between place and generation provided a more nuanced picture. In all three contexts, Gen 1 speakers differentiated VOT duration between dorsal and non-dorsal consonants. In contrast, Gen 2 speakers differentiated three classes. Gen 3 speakers had a mixed pattern: they differentiated only between dorsal and non-dorsal in C.CV and CV_U (thus, whenever the syllable was unstressed) but differentiated the three classes in the stressed CV context. The dorsal vs. non-dorsal distinction was a rather unexpected result, inconsistent with Sorianello (1996) report on Calabrian readspeech data, which shows a three-way distinction, as well as with Nodari’s (2017a, 2017b) conversational speech study, which reports that /t/ is produced with a VOT almost as long as the VOT of /k/. In our data, the dorsal-nondorsal distinction is a robust feature of Gen 1 speech, disappears from Gen 2 speech, and reappears in the unstressed contexts (C.CV and CV_U) in Gen 3 speech. According to a linear development, we would have expected the Gen 1 pattern to be either maintained in Gen 2 and lost in Gen 3, or lost in both Gen 2 and 3. One possible account for the deviant pattern observed in the data is that Gen 3 speakers reproduce pronunciation features that they perceive as typical of traditional homeland pronunciation because they hear them in Gen 1 speech but not in Gen 2 speech. Thus Gen 3 speakers, having in their input the phonetic realizations of both native and heritage speakers of Calabrian Italian, would emphasize those exogenous features that they perceive as more strongly characterizing authentic Calabrian Italian speech, as opposed to the Englishaccented speech of their parents. In some cases, females would be the leaders of such change (recall the significant CPlace*Sex interaction in the CV_U model). This scenario would emphasize the role that young adults play in reproducing vernacular forms to index group membership (recall that the mean age of our Gen 3 speakers is 24, as opposed to 40 and 64 for Gen 2 and Gen 3, respectively), a pattern that has been widely reported in the sociolinguistic and sociophonetic literature, although often with respect to incoming, rather than long-standing, vernacular forms, and particularly for adolescents (Alam & Stuart-Smith, 2011; Kirkham, 2011; Tagliamonte & D'Arcy, 2009; van Hofwegen & Wolfram,
2010). The fact that Gen 3 speakers do not maintain the binary dorsal-nondorsal distinction in the CV contexts might reinforce our hypothesis, because that is the prototypical context where aspiration is produced according to English norms (and a distinction among three classes of stops is coherently expected). To sum up, besides our three initial hypotheses about cross-generational maintenance of socioindexical patterns (Section 1.3), a 'u-shaped' development emerges as another option for cross-generational transmission of heritage languages. To the best of our knowledge, no similar u-shaped patterns are attested in studies of heritage phonetic transmission or phonetic-phonological attrition; however, this could be due to the dearth of studies comparing 2nd and 3rd generation speech patterns. Another potential explanation for the absence of parallels in the literature is that no study so far has investigated the transmission of socio-indexically relevant features in a heritage language. Further investigation of different phonetic variables and different heritage languages is needed in order to assess the validity of the pattern hypothesized here and to provide an explanation for non-linear phenomena of cross-generational transmission potentially present in heritage language acquisition. The sex of the speakers was found to predict VOT duration only in stressed syllables, with longer VOT values for males than females. The fact that the effect of sex was visible in the stressed CV contexts, where it may be seen as a contact-induced effect, and not in the unstressed C.CV suggested that VOT production in the contexts of Calabrian sociophonetic aspiration was less impacted by social characteristics of the speakers. That is, our research design allows us to partially disentangle biological from socio-indexical effects (Fuchs, 2017). We turn next to the distributional patterns of perceived aspiration. Clear evidence of cross-generational change was found for all contexts: while the rate of Calabrian-like aspiration in unstressed syllable decreased, the rate of English-like aspiration in stressed syllable increased cross-generationally. The rate of aspiration in unstressed CV syllables was very low overall, but higher in Gen 3 than in the earlier generations. Unlike VOT analysis, which provides information on fine phonetic changes across the dataset as a whole, the analysis of perceived aspiration shows that the distribution of voiceless stop aspiration in the lexicon has changed, and significant differences exist between each generation. Thus, for the second research question, we conclude that the answer is affirmative: there is cross-generational shift at the distributional level, from preferential aspiration in unstressed C.CV contexts (as is typical for Calabrian Italian) to preferential aspiration in stressed CV contexts (as is typical for English). The phonetic factor of vowel height explains much of the variation in both CV contexts, although it did not account for the distribution in the context of sociophonetic Calabrian (C. CV) aspiration, thus reinforcing the view that voiceless stop aspiration in the C.CV context is less influenced by phoneticphysiological factors than in the CV contexts. By contrast, consonant place of articulation was powerful in explaining the rate of aspiration in all three contexts. It is worth noting that, in the Calabrian aspiration context, perceived aspiration was much more frequent in Gen 1 speakers (compared to Gen 2 and Gen 3), and with /t/ (compared to /p/ and /k/). This suggested
R. Nodari et al. / Journal of Phonetics 73 (2019) 91–112
that, in realizing Calabrian aspiration, Gen 1 speakers reproduced the pattern, found by Nodari (2017a, 2017b) for young speakers of Calabrian Italian, which targets the alveolar stop more consistently than the other places of articulation. Preferential aspiration of /t/ in unstressed syllables dramatically decreases in the later generations, suggesting gradual replacement of Calabrian pronunciation norms with English ones. An interesting similarity between the results of VOT and of perceived aspiration is that, in both analyses, the sex of the speakers influenced the production of stops in the CV contexts. Males produce longer VOTs and are perceived to aspirate more often than females. The difference is not found for the two other contexts, thus suggesting, as already anticipated, that aspiration in the English contexts is partly governed by extra-linguistic variables that are less relevant to the Calabrian contexts. This view is further confirmed by the correlations with ethnic orientation scores: the more one reports using and preferring to use the heritage language and makes cultural choices that are consistent with the Italian culture, the less one is perceived to aspirate in the English aspiration context (and in the CV_U context), with, again, no effect in the Calabrian context. In the language contact situation described here, the introduction of the aspiration patterns of the host language into the heritage language is significantly predicted by the socio-individual orientation of the speakers, whereas the maintenance of the heritage language aspiration patterns is not. Based on the cumulative evidence discussed here, we are now able to provide a synthetic tentative answer to our third research question, asking how the voiceless stop aspiration that is a socio-indexical feature in homeland Calabrian is transmitted cross-generationally in the heritage language context (Section 1.3). The three hypotheses considered were: Hypothesis 1: Homeland – Heritage (no transfer to the heritage context due to different social conditions) Hypothesis 2: Homeland = Gen1 – Gen 2 = Gen3 (no transmission beyond the immigrant generation) Hypothesis 3: Homeland = Gen1 = Gen 2 = Gen3 (full transmission of the pattern) Although we can only provide an indirect comparison with homeland speech, based on previous literature (Nodari, 2017a; Sorianello, 1996), Hypothesis 1 is not supported, as our Gen 1 speakers did show a strong preference for producing aspiration in unstressed syllables, compared to stressed ones, and are perceived to aspirate more often in /t/ than in /p/, /k/, which is known to characterize the sociophonetics of Calabrian aspiration. Because this effect dissipated in the speech of subsequent generations, we can equally reject Hypothesis 3. Hypothesis 2 predicted a scenario that most closely resembles what we found: first-generation immigrants differ significantly from their descendants as far as the use of the relevant sociophonetic variable is concerned. However, we found that aspiration patterns for Gen 2 and Gen 3 differed from each other, suggesting that the departure from the Gen 1 pattern is gradual across generations (see also Mayr & Siddika, 2016; Sharma & Sankaran, 2011), and can eventually show a u-shaped development with Gen 3 patterns partly reflecting Gen 1 patterns.
109
We briefly mention several limitations to the current study that we hope to address in future work. The analysis would benefit from assessing the aspiration patterns in use in a comparable group of homeland Calabrian speakers, matching Gen 1 speakers for age, sex, local origin and sociolinguistic features, and matching the methods of elicitation and analysis used in the two sites. This will improve knowledge of relevant aspects of voiceless stop production in the native speakers of Calabrian Italian still resident in the region, building on existing phonetic (e.g. Falcone, 1976; Sorianello, 1996) and sociophonetic literature (Nodari 2017a, 2017b), the latter providing an interesting picture but necessarily limited to the speakers' sample under investigation (young speakers from one town). As significant changes may have occurred in the language of the country of origin since the time in which the immigrants left, further inquiry should seek evidence of changes in progress in the homeland variety as well (see e.g. de Leeuw, Mennen, & Scobbie, 2012: 688 for a similar criticism of cross-sectional methodology in determining L1 attrition). Another point which is not directly assessed in the present study, but which could prove of fundamental relevance in future investigations, is the degree of interaction between members of successive generations. In order to assess the quality and quantity of the input provided to second and third generations, it would be useful to track changes in the pronunciation patterns within families. In a similar vein, an investigation of VOT in the host language as produced by the same speakers of heritage Calabrian would be revealing of potential bidirectional interactions between the two languages (see e.g. Darcy & Krüger, 2012; Mayr & Siddika, 2016; Stangen et al., 2015), thus providing insight on the plasticity of the bilingual phonetic and phonological repertoire. Finally, future investigations could also target voiced stops in the same set of conditions, and additional phonetic properties of voiceless stop aspiration. Extending the analysis to voiced stops might reveal how subsequent generations of Calabrian speakers maintain the phonological contrast between voiceless and “true voiced” consonants, in the process of increased contact with a non-true-voicing language such as English. Regarding instead additional properties of voiceless stop aspiration, phonetic aspects such as f0 at the vowel onset or the spectral properties of the stop burst have been shown to work as secondary cues in the perception/production of voicing distinctions cross-linguistically (e.g. Holt, Lotto, & Kluender, 2001; Llanos et al., 2013) and to convey both linguistic and sociolinguistic information (e.g. Themistocleous, 2016). Bilinguals have been shown to co-vary primary and secondary cues in a different way compared to monolinguals (see Llanos et al., 2013 for an example concerning stop voicing distinctions). A closer look at non-durational properties of aspiration is therefore expected to further explain the cross-generational transmission patterns of socio-indexical aspiration in heritage Calabrian, with particular reference to the phonetic implementation of /t/-aspiration and the apparent discrepancy between the results of perceived aspiration (pointing at /t/ being more often perceived as aspirated than /p/ and /k/) and the results of VOT analysis (which, in contrast, do not show longer VOT values for /t/). The analyses presented here for both VOT and distributional patterns in perceived aspiration show cross-
110
R. Nodari et al. / Journal of Phonetics 73 (2019) 91–112
generational change. This is evident as a main effect for the Calabrian (unstressed C.CV) pattern but is subtler for the English (stressed CV) context, where it is modulated by interactions with additional factors, highlighting a difference between internal vs. contact-induced sociolinguistic variables. Acknowledgements
Individuals: We are grateful to the speakers who volunteered their time and knowledge to this project, and to the research assistants who interviewed, transcribed, and helped code the data. They are listed at http://projects.chass.utoronto.ca/ngn/HLVC/3_2_active_ra.php and http://projects. chass.utoronto.ca/ngn/HLVC/3_3_former_ra.php. The contribution of C. Bertini (Scuola Normale Superiore, Pisa) to the analysis is also acknowledged. Funding
This work was supported by the Scuola Normale Superiore, Pisa [grant number 1702-SNS15C_A_Celata], Social Science and Humanities Council of Canada [grant numbers 410-20092330 and 435-2016-1430], and the University of Toronto. Appendix A. Excerpts from ethnic orientation questionnaire analyzed in this study
Responses to the following questions were numerically coded, as described in Section 2.1. A. Identità etnica Ethnic Identity A1. Ti identifichi come Italiano? Canadese? ItaloCanadese? Do you think of yourself as Italian, Canadian or ItalianCanadian? A2. La maggioranza dei tuoi amici è italiana? Are most of your friends Italian? A3. La gente nel tuo quartiere è italiana? Are people in your neighbourhood Italian? A4. I tuoi colleghi sono anche italiani? Are the people you work with Italian? A5. Quando eri piccolo/a o più giovane i tuoi compagni di scuola erano italiani? I tuoi amici? E la gente che abitava nel tuo quartiere? When you were growing up, were the kids in your school Italian? Were your friends? The kids in your neighbourhood? B. Lingua Language B1. Parli italiano? Parli bene? A che livello diresti? Parli italiano spesso? Quante volte per giorno/settimana/mese? Se no: capisci l’italiano? Do you speak Italian? How well? How often? If no: Can you understand Italian? B2. Dove hai imparato l’italiano? A casa? A scuola? 3) Preferisci parlare italiano o in inglese? Where did you learn Italian? At home? In school? B4. Preferisci scrivere e leggere in italiano o in inglese? Leggi dei giornali italiani o delle riviste italiane? Quali? Do you prefer to read and write in Italian or English? Do you read Italian magazines and newspapers? Which ones?
B5. Preferisci ascoltare la radio o guardare programmi televisivi in italiano o in inglese? Do you prefer to listen to the radio or watch TV in Italian or English? C. Scelta delle lingue Language choice C1. Che lingua parla la tua famiglia quando siete tutti insieme? What language does your family speak when you get together? C2. Che lingua parli con i tuoi amici? What language do you speak with your friends? How about when you were younger? C3. Che lingua usi quando parli di cose personali? Quando sei arrabbiato/a? What language do you speak when you're talking about something personal? When you’re angry? C4. Parli con i tuoi genitori in italiano? Con i tuoi nonni? Did/do you speak to your parents in Italian? Your grandparents? E. Genitori Parents E1. I tuoi genitori si identificano come italiani, canadesi, italo-canadesi? Do your parents think of themselves as Italian, Canadian or Italian-Canadian? G. La cultura italiana Italian culture G1. Secondo te, i bambini italo-canadesi dovrebbero imparare l’italiano e degli aspetti della cultura italiana (storia, geografia, ecc.)? Should Italian-Canadian kids learn Italian? Italian culture? G2. Preferiresti abitare in un quartiere italiano? Would you rather live in an Italian neighbourhood? G3. Gli italiani dovrebbero sposarsi solo con altri italiani? Should Italians only marry other Italians? H. Discriminazione Discrimination H1. Hai mai avuto un problema ad ottenere lavoro perché sei italiano/a? Have you ever had a problem getting a job because you're Italian? H4. Ti hanno mai trattato/a male perché sei italiano/a? Have you ever been treated badly because you're Italian? H5. C’è molta discriminazione contro gli italiani? Is there a lot of discrimination against Italians? References Abramson, A. S., & Whalen, D. (2017). Voice Onset Time (VOT) at 50: Theoretical and practical issues in measuring voicing distinctions. J Phonet, 63(75), 86. Alam, F., & Stuart-Smith, J. (2011) Identity and ethnicity in /t/ in Glasgow-Pakistani high school girls. Proceedings of the 17th International Congress of Phonetic Sciences (ICPhS XVII) (pp. 216–219). Hong Kong. Allen, J. S., Miller, J. L., & DeSteno, D. (2003). Individual talker differences in voiceonset-time. Journal of the Acoustical Society of America, 113(1), 544–552. Amengual, M. (2016). Acoustic correlates of the Spanish tap-trill contrast: heritage and L2 Spanish speakers. Heritage Language Journal, 13(2), 88–112. Au, T. K., Knightly, L., Jun, S. A., & Oh, J. (2002). Overhearing a language during childhood. Psychological Science, 13, 238–243. Au, T. K., Oh, J., Knightly, L., Jun, S. A., & Romo, L. (2008). Salvaging a childhood language. Journal of Memory and Language, 58, 998–1011. Avesani, C., Galatà, V., Best, C., Vayra, M., Di Biase, B., & Ardolino, F. (2017). Phonetic details of coronal consonants in the Italian spoken by Italian-Australians from two areas of Veneto. In C. Bertini, C. Celata, G. Lenoci, C. Meluzzi, & I. Ricci (Eds.),
R. Nodari et al. / Journal of Phonetics 73 (2019) 91–112 Fattori sociali e biologici nella variazione fonetica - Social and biological factors in speech variation (pp. 289–314). Milano: Officinaventuno. Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modelling with crossed random effects for subjects and items. Journal of Memory and Language, 59, 390–412. Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. Benjamin, B. (1982). Phonological performance in gerontological speech. Journal of Psycholinguistic Research, 11, 159–167. Boersma, P. and Weenink, D., Praat: doing phonetics by computer [Computer program]. Version 6.0.46, retrieved 3 January 2015 from http://www.praat.org/ 2015. Caramazza, A., Yeni-Komshian, G., Zurif, E. B., & Carbone, E. (1973). The acquisition of a new phonological contrast: The case of stop consonants in French-English bilinguals. Journal of the Acoustical Society of America, 54, 421–428. Celata, C., & Cancila, J. (2010). Phonological attrition and the perception of geminate consonants in the Lucchese community of San Francisco (CA). International Journal of Bilingualism, 14(2), 185–209. Chang, C. B., & Yao, Y. (2016). Toward an understanding of heritage prosody: Acoustic and perceptual properties of tone produced by heritage, native, and second language speakers of Mandarin. Heritage Language Journal, 13(2), 134–160. Chang, C. B., Yao, Y., Haynes, E. F., & Rhodes, R. (2008). Production of phonetic and phonological contrast by heritage speakers of Mandarin. Journal of the Acoustical Society of America, 129(6), 3964–3980. Chang, C. B., Yao, Y., Haynes, E. F., & Rhodes, R. (2011). Production of phonetic and phonological contrast by heritage speakers of Mandarin. Journal of the Acoustical Society of America, 129, 3964–3980. Chevrot, J. P., & Foulkes, P. (2013) (eds.). Language acquisition and sociolinguistic variation. Special issue of Linguistics 51(2), 251–254. Cho, T., Jun, S. A., & Ladefoged, P. (2002). Acoustic and aerodynamic correlates of Korean stops and fricatives. Journal of Phonetics, 30(2), 193–228. Cho, T., & Ladefoged, P. (1999). Variation and universals in VOT: Evidence from 18 languages. Journal of Phonetics, 27(2), 207–229. Cho, T., Lee, Y., & Kim, S. (2014). Prosodic strengthening on the /s/-stop cluster and the phonetic implementation of an allophonic rule in English. Journal of Phonetics, 46, 128–146. Cho, T., & McQueen, J. (2005). Prosodic influences on consonant production in Dutch: Effects of prosodic boundaries, phrasal accent and lexical stress. Journal of Phonetics, 33(2), 121–157. Chodroff, E., & Wilson, C. (2017). Structure in talker-specific phonetic realization: Covariation of stop consonant VOT in American English. Journal of Phonetics, 61, 30–47. Darcy, I., & Krüger, F. (2012). Vowel perception and production in Turkish children acquiring L2 German. Journal of Phonetics, 40, 568–581. de Leeuw, E., Mennen, I., & Scobbie, J. (2012). Dynamic systems, maturational constraints and L1 phonetic attrition. International Journal of Bilingualism, 17(6), 683–700. D'Imperio, M. (2002). Italian intonation: An overview and some questions. Probus, 14(1), 37–69. Dmitrieva, O., Jongman, A., & Sereno, J. (2010). Phonological neutralization by native and non-native speakers: The case of Russian final devoicing. Journal of Phonetics, 38, 483–492. Docherty, G. (1992). The timing of voicing in British English obstruents. Berlin/New York: Foris. Docherty, G., Watt, D., Llamas, C., Hall, D., & Nycz, J. (2011). Variation in voice onset time along the Scottish – English border. Paper presented at 17th international congress of phonetic sciences, Hong Kong, China, August 2011. Esposito, A. (2002). On vowel height and consonantal voicing effects: Data from Italian. Phonetica, 59(4), 197–231. Evans, B., Mistry, A., & Moreiras, C. (2007). An acoustic study of first- and second generation Gujurati immigrants in Wembley: Evidence for accent convergence? Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS XVI) (pp. 1741–1744). Saarbrücken, Germany. Falcone, G. (1976). Calabria. Pisa: Pacini. Farnetani, E. (1989). Acoustic correlates of linguistic boundaries in Italian: a study on duration and fundamental frequency. Eurospeech '89 Conference on Speech Communication and Technology, 2, 332–335. Flege, J. E. (1987). The production of ‘new’ and ‘similar’ phones in a foreign language: evidence for the effect of equivalence classification. Journal of Phonetics, 15, 47–65. Flege, J. E., & Eefting, W. (1987a). Production and perception of English stops by native Spanish speakers. Journal of Phonetics, 15, 67–83. Flege, J. E., & Eefting, W. (1987b). Cross-language switching in stop consonant perception and production by Dutch speaker of English. Speech Communication, 6, 185–202. Flores, C., & Rato, A. (2016). Global accent in the Portuguese speech of heritage returnees. Heritage Language Journal, 13(2), 161–183. Flores, C., Rinke, E., & Rato, A. (2017). Comparing the outcomes of early and late acquisition of European Portuguese: An analysis of morpho-syntactic and phonetic performance. Heritage Language Journal, 14(2), 124–149. Fowler, C. A., Sramko, V., Ostry, D. J., Rowland, S. A., & Halle, P. (2008). Cross language phonetic influences on the speech of French-English bilinguals. Journal of Phonetics, 36, 649–663. Fuchs, S. (2017). Changes and challenges in explaining speech variation: A brief overview. In C. Bertini, C. Celata, G. Lenoci, C. Meluzzi, & I. Ricci (Eds.), Fattori sociali e biologici nella variazione fonetica - Social and biological factors in speech variation (pp. 29–44). Milano: Officinaventuno.
111
Green, K. P., Zampini, M. L., & Clarke, C. M. (1998). The role of preceding closure interval and voice onset time in the perception of voicing: A comparison of English versus Spanish-English bilinguals. The Journal of the Acoustical Society of America, 104(3), 1835. Harada, T. (2003). L2 influence on L1 speech in the production of VOT. In M. J. Solé, D. Recasens, & J. Romero (Eds.), Proceedings of the 15th international congress of phonetic sciences (pp. 1085–1088). Barcelona, Spain: Causal Productions. Holt, L. L., Lotto, A. J., & Kluender, K. R. (2001). Influence of fundamental frequency on stop-consonant voicing perception: A case of learned covariation or auditory enhancement? Journal of the Acoustical Society of America, 109, 764–774. Kang, K. H., & Guion, S. G. (2006). Phonological systems in bilinguals: Age of learning effects on the stop consonant systems of Korean-English bilinguals. Journal of the Acoustical Society of America, 119(3), 1672–1683. Kang, Y., & Nagy, N. (2016). VOT merger in Heritage Korean in Toronto. Language Variation and Change, 28(2), 249–272. Khattab, G. (2013). Phonetic convergence and divergence strategies in English-Arabic bilingual children. Linguistics, 51(2), 439–472. Khattab, G., Al-Tamimi, F., & Heselwood, B. (2006). Acoustic and auditory differences in the /t/- /T/ opposition in male and female speakers of Jordanian Arabic. In S. Boudelaa (Ed.), Perspectives on Arabic Linguistics XVI: Papers from the sixteenth annual symposium on Arabic linguistics (pp. 131–160). Cambridge, UK: John Benjamins. Kim, S., Kim, J., & Cho, T. (2018). Prosodic-structural modulation of stop voicing contrast along the VOT continuum in trochaic and iambic words in American English. Journal of Phonetics, 71, 65–80. Kirkham, S. (2011). The acoustics of coronal stops in British Asian English. Proceedings of the 17th International Congress of Phonetic Sciences (ICPhS XVII) (pp. 1102– 1105). Hong Kong. Klatt, D. (1975). Voice Onset Time, frication, and aspiration in word-initial consonant clusters. Journal of Speech, Language, and Hearing Research, 18, 686–706. Knightly, L., Jun, S. A., Oh, J., & Au, T. K. (2003). Production benefits of childhood overhearing. Journal of the Acoustical Society of America, 114, 465–474. Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest Package: Tests in linear mixed effects models. Journal of Statistical Software, 82(13), 1–26. Labov, W. (1984). Field methods of the project on linguistic change and variation. In J. Baugh & J. Scherzer (Eds.), Language in use: Readings in sociolinguistics (pp. 28–53). Englewood Cliffs, NJ: Prentice Hall. Labov, W. (2014). The sociophonetic orientation of the language learner. In C. Celata & S. Calamai (Eds.), Advances in sociophonetics (pp. 17–29). Amsterdam/ Philadelphia: John Benjamins Publishing Company. Lee, S., Potamianos, A., & Narayanan, N. (1999). Acoustics of children’s speech: developmental changes of temporal and spectral parameters. Journal of the Acoustical Society of America, 105(3), 1455–1468. Liberman, A. M., Harris, K. S., Kinney, J., & Lane, H. (1961). The discrimination of relative onset-time of the components of certain speech and non-speech patterns. Journal of Experimental Psychology, 61, 379. Lisker, L., & Abramson, A. (1964). A cross-language study of voicing in initial stops: Acoustical measurements. Word, 20(3), 527–565. Llanos, F., Dmitrieva, O., Francis, A., & Shultz, A. (2013). Auditory enhancement and second language experience in Spanish and English weighting of secondary voicing cues. Journal of the Acoustical Society of America, 134(3), 2213–2224. Lord, G. (2008). Second language acquisition and first language phonological modification. In J. B. de Garavito & E. Valenzuela (Eds.), Selected proceedings of the 10th Hispanic linguistics symposium. Somerville (pp. 184–193). MA: Cascadilla Proceedings Project. Lowenstein, J. H., & Nittroues, S. (2008). Patterns of acquisition of native voice onset time in English-learning children. Journal of the Acoustical Society of America, 124 (2), 1180–1191. Maddieson, I. (1999). Phonetic universals. In W. Hardcastle & J. Laver (Eds.), The handbook of phonetic sciences (pp. 619–639). Oxford: Blackwell. Major, R. C. (1992). Losing English as a first language. The Modern Language Journal, 76(2), 190–208. Marotta, G. (2008). Lenition in Tuscan Italian (Gorgia Toscana). In J. Brandão de Carvalho, T. Scheer, & Ph. Ségéral (Eds.), Lenition and fortition (pp. 235–272). Mouton de Gruyter: Berlin. Mayr, R., Price, S., & Mennen, I. (2012). First language attrition in the speech of DutchEnglish bilinguals: The case of monozygotic twin sisters. Bilingualism: Language and Cognition, 15(4), 687–700. Mayr, R., & Siddika, A. (2016). Inter-generational transmission in a minority language setting: Stop consonant production by Bangladeshi heritage children and adults. International Journal of Bilingualism. McCarthy, K., Evans, B., & Mahon, M. (2011). Detailing the phonetic environment: A sociophonetic study of the London Bengali community. Proceedings of the 17th International Congress of Phonetic Sciences (ICPhS XVII) (pp. 1354–1357). Hong Kong. McCarthy, K., Evans, B., & Mahon, M. (2013). Acquiring a second language in an immigrant community: The production of Sylheti and English stops and vowels by London-Bengali speakers. Journal of Phonetics, 41, 344–358. McCarthy, K., Mahon, M., Rosen, S., & Evans, B. (2014). Speech perception and production by sequential bilingual children: A longitudinal study of voice onset time acquisition. Child Development, 85, 1965–1980. Montrul, S. (2008). Incomplete acquisition in bilingualism. Re-examining the Age Factor. Amsterdam: John Benjamins. Morris, R. J., McCrea, C. R., & Herring, K. D. (2008). Voice onset time differences between adult males and females: Isolated syllables. Journal of Phonetics, 36, 308–317.
112
R. Nodari et al. / Journal of Phonetics 73 (2019) 91–112
Nagy, N. (2011). A multilingual corpus to explore geographic variation. Rassegna Italiana di Linguistica Applicata, 43, 65–84. Nagy, N., & Kochetov, A. (2013). Voice onset time across the generations. A crosslinguistic study of contact-induced change. In P. Siemund, I. Gogolin, M. E. Schulz, & J. Davydova (Eds.), Multilingualism and language diversity in urban areas. Acquisition, identities, space, education (pp. 19–38). Amsterdam/Philadelphia: John Benjamins Publishing Company. Nagy, N. (2009). Exploring heritage language variation in Toronto. http://projects.chass. utoronto.ca/ngn/HLVC. Accessed 4 December 2017. Nakai, S., & Scobbie, J. (2016). The VOT category boundary in word- initial stops: counter-evidence against rate normalization in English spontaneous speech. Laboratory Phonology, 7(1), 1–31. Nance, C. (2014). Phonetic variation in Scottish Gaelic laterals. Journal of Phonetics, 47, 1–17. Nance, C., McLeod, W., O’Rourke, B., & Dunmore, S. (2016). Identity, accent aim, and motivation in second language users: New Scottish Gaelic speakers’ use of phonetic variation. Journal of Sociolinguistics, 20(2), 164–191. Nance, C., & Stuart-Smith, J. (2013). Pre-aspiration and post-aspiration in Scottish Gaelic stop consonants. Journal of the International Phonetic Association, 43(2), 129–152. Nardy, A., Chevrot, J.-P., & Barbu, S. (2013). The acquisition of sociolinguistic variation: looking back and thinking ahead. Linguistics, 51(2), 255–284. Neiman, G., Klich, R., & Shuey, E. (1983). Voice onset time in young and 70-year-old women. Journal of Speech and Hearing Research, 26, 118–123. Niedzielski, N. (1999). The effects of social information on the perception of sociolinguistic variables. Journal of Language and Social Psychology, 18, 62–85. Nodari, R. (2017a). L'italiano degli adolescenti: Aspirazione delle occlusive sorde in Calabria e percezione della varieta locale PhD dissertation. Scuola Normale Superiore di Pisa. Nodari, R. (2017b). Indexicality and aspiration in Calabrian Italian: a sociophonetic approach. Paper presented at XIII Convegno Nazionale AISV, Pisa, Italy. https:// www.researchgate.net/publication/321975484_Indexicality_and_aspiration_in_ Calabrian_Italian_a_sociophonetic_approach. Oh, E. (2011). Effects of speaker gender on voice onset time in Korean stops. Journal of Phonetics, 39, 59–67. Ohala, J. J. (1981). Articulatory constraints on the cognitive representation of speech. In T. Myers, J. Laver, & J. Anderson (Eds.), The cognitive representation of speech (pp. 111–122). Amsterdam: North Holland. Petrosino, L., Colcord, R., Kurcz, K., & Yonker, R. (1993). Voice onset time of velar stop productions in aged speakers. Perceptual and Motor Skills, 76, 83–88. R Core Team (2017). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. URL https://www.R-project.org/. Robb, M., Gilbert, H., & Lerman, J. (2005). Influence of gender and environmental setting on voice onset time. Folia phoniatrica et logopedica, 57, 125–133. Rothman, J. (2009). Understanding the nature and outcomes of early bilingualism: Romance languages as Heritage Languages. International Journal of Bilingualism, 13(2), 145–155. Ryalls, J., Simon, M., & Thomason, J. (2004). Voice Onset Time production in older Caucasian- and African-Americans. Journal of Multilingual Communication Disorders, 2, 61–67.
Ryalls, J., Zipprer, A., & Baldauff, P. (1997). A preliminary investigation of the effects of gender and race on voice onset time. Journal of Speech, Language and Hearing Research, 40(3), 642–645. Sancier, M. L., & Fowler, C. A. (1997). Gestural drift in a bilingual speaker of Brazilian Portuguese and English. Journal of Phonetics, 25(4), 421–436. Scobbie, J. (2006). Flexibility in the face of incompatible English VOT systems. Laboratory phonology 8, varieties of phonological competence. Phonology and phonetics (4(2), pp. 367–392). de Gruyter: Berlin & New York. Sharma, D., & Sankaran, L. (2011). Cognitive and social forces in dialect shift: Gradual change in London Asian speech. Language Variation and Change, 23, 399–428. Sorianello, P. (1996). Indici fonetici delle occlusive sorde nel cosentino. Rivista Italiana di Dialettologia, 20, 123–159. Stangen, I., Kupisch, T., Ergün, A. L., & Zielke, M. (2015). Foreign accent in heritage speakers of Turkish in Germany. In H. Peukert (Ed.), Transfer effects in multilingual language development (pp. 87–108). Amsterdam/Philadelphia: John Benjamins. Stevens, M., & Hajek, J. (2010). Post-aspiration in standard Italian: Some first crossregional acoustic evidence. Proceedings of Interspeech 2010 (pp. 1557–1560), Makuhari, Japan. Stevens, K. N., & Klatt, D. H. (1974). the role of formant transitions in the voice-voiceless distinction for stops. Journal of the Acoustical Society of America, 55(3), 653. Stuart-Smith, J. (2007). Empirical evidence for gendered speech production: /s/ in Glaswegian. In J. Cole & J. I. Hualde (Eds.), Laboratory phonology 9 (pp. 65–86). Berlin & New York: de Gruyter. Stuart-Smith, J., Sonderegger, M., Rathcke, T., & Macdonald, R. (2015). The private life of stops: VOT in a real-time corpus of spontaneous Glaswegian. Laboratory Phonology, 6(3–4), 505–549. Swartz, B. L. (1992). Gender difference in voice onset time. Perceptual and Motor Skills, 75, 983–992. Sweeting, P. M., & Baken, R. J. (1982). Voice onset time in a normal-aged population. Journal of Speech and Hearing Research, 25, 129–134. Tagliamonte, S. A., & D'Arcy, A. (2009). Peaks beyond phonology: Adolescence, incrementation, and language change. Language, 85(1), 58–108. Terken, J. (1991). Fundamental frequency and perceived prominence of accented syllables. The journal of the Acoustical Society of America, 89, 1768–1776. Themistocleous, C. (2016). The burst of stops can convey dialectal information. Journal of the Acoustical Society of America, 140, EL334. Torre, P., & Barlow, J. (2009). Age-related changes in acoustic characteristics of adult speech. Journal of Communication Disorders, 42(5), 324–333. Turk, A., Nakai, S., & Sugahara, M. (2006). Acoustic segment durations in prosodic research: A practical guide. In S. Sudhoff, D. Lenertova, R. Meyer, S. Pappert, P. Augurzky, I. Mleinek, N. Richter, & J. Schliesser (Eds.), Methods in Empirical Prosody Research (pp. 1–27). Berlin & New York: de Gruyter. van Hofwegen, J., & Wolfram, W. (2010). Coming of age in African American English: A longitudinal study. Journal of Sociolinguistics, 14(4), 427–455. Whiteside, S. P., & Irving, C. J. (1997). Speakers' sex differences in voice onset time: some preliminary findings. Perceptual and Motor Skills, 85(2), 459–463. Yao, Y. (2009). Understanding VOT variation in spontaneous speech. In M. Pak (Ed.), Current numbers in unity and diversity of languages (pp. 1122–1137). Seoul: Linguistic Society of Korea.