Continuous versus categorical aspects of Japanese consecutive devoicing

Journal of Phonetics 52 (2015) 70–88 Contents lists available at ScienceDirect Journal of Phonetics journal homepage: www.elsevier.com/locate/phonet...

Download PDF

3MB Sizes 0 Downloads 15 Views

Report

Full Text

Journal of Phonetics 52 (2015) 70–88

Contents lists available at ScienceDirect

Journal of Phonetics journal homepage: www.elsevier.com/locate/phonetics

Research Article

Continuous versus categorical aspects of Japanese consecutive devoicing Kuniko Y. Nielsen

⁎

Oakland University, Linguistics Department, 1025 Human Health Building, Rochester, MI 48309-4401, USA

A R T I C L E

I N F O

A B S T R A C T

The phenomenon of high vowel devoicing is almost obligatory in the Tokyo dialect, except for some environments in which complete devoicing is often blocked. One such case is so called consecutive devoicing, where two or more consecutive vowels are in devoicing environments. Although several accounts of consecutive devoicing have been proposed, its linguistic nature is still being debated. The current study presents a detailed investigation of the nature of consecutive devoicing in Japanese, examining the inﬂuence of various factors on its likelihood, speaker variability, and phonetic realization. Twenty-four native speakers of Tokyo Japanese produced 30 words containing consecutive devoicing environments. Mixed-effects modeling revealed that several phonetically and phonologically motivated factors simultaneously contribute to the likelihood of consecutive devoicing and the pattern of partial devoicing. Wide intra- and inter-speaker variability was observed, indicating that realization of consecutive devoicing is not always consistent within a speaker, and that the relative weight of conditioning factors may vary across speakers. Vowel duration in consecutive devoicing environments (which reﬂects partial devoicing) showed a bimodal distribution, indicating that realization of consecutive devoicing cannot be determined solely by its phonetic environment. Taken together, the results demonstrate both continuous and categorical aspects of consecutive devoicing in Japanese. & 2015 Elsevier Ltd. All rights reserved.

Article history: Received 19 December 2013 Received in revised form 13 May 2015 Accepted 25 May 2015

Keywords: Japanese vowel devoicing Consecutive devoicing Speaker variability

1. Introduction In the standard and many other dialects of Japanese, short high vowels /i/ and /u/ tend to be devoiced or deleted when they occur between two voiceless consonants (Han, 1962; McCawley, 1968; Vance, 1987). (1)

/hikaku/>[çi ̥kakɯ]/[çkakɯ] /kokusai/>[kokɯ̥ sai]/[koksai] /kikai/>[ki ̥kai]/[kʲkai]

‘comparison’ ‘international’ ‘machine’

High vowel devoicing is almost obligatory in the Tokyo dialect, except in some environments where complete devoicing is often blocked. One such case is so called consecutive devoicing, where two or more consecutive vowels are in devoicing environments. Although several accounts of consecutive devoicing have been proposed (e.g., Kondo, 2005; Tsuchida, 1997; Yoshida, 2004), its linguistic nature, in particular whether it is a phonetically driven continuous process or a phonologically driven categorical process, is still being debated. The current study aims to elucidate the nature of consecutive devoicing, by examining the relative contribution of various factors which inﬂuences its likelihood, its phonetic realization, as well as inter-speaker variability. This introductory section is structured as follows: In 1.1, existing theoretical accounts of Japanese devoicing are discussed, followed by a brief overview of phonetic realization of Japanese vowel devoicing in Section 1.2. Conditioning factors of Japanese vowel devoicing are reviewed in Section 1.3, and Section 1.4 presents an overview of consecutive devoicing and its theoretical accounts, followed by a review of reported individual variability in Section 1.5. Lastly, Section 1.6 summarizes the issues and motivates the current experiment. Sections 2 and 3 present the experimental methods and results, respectively, and Section 4 discusses the theoretical implications of the observed results. ⁎

Tel.: + 1 248 370 2175; fax: + 1 248 370 3144. E-mail address: [email protected]

0095-4470/$ - see front matter & 2015 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.wocn.2015.05.003

K.Y. Nielsen / Journal of Phonetics 52 (2015) 70–88

71

1.1. Theoretical accounts of Japanese vowel devoicing The phenomenon of Japanese vowel devoicing was traditionally considered a phonological assimilation of the feature [ +/−voice] (e.g., McCawley, 1968). More recently, alternative analyses were proposed by Tsuchida (2001) and Varden (1998) employing Optimality Theory (Prince & Smolensky, 1993) and Feature Geometry (Clements, 1985), respectively, in which Japanese voiceless vowels are speciﬁed as [+spread glottis] instead of [−voice]. Despite the difference in featural representations, these accounts consider Japanese vowel devoicing a phonological assimilation process. Alternatively, Jun and Beckman (1993) proposed an account of Japanese vowel devoicing as a gradient phonetic process involving overlap of glottal gestures. According to this view, Japanese vowel devoicing is the result of extreme overlap and blending of the vowel's voiced glottal gesture by the adjacent consonant's voiceless glottal gesture. Their proposal was based on the following observations: (1) mean duration of devoiced syllables are shorter than that of the same type of syllables with voiced vowels (Beckman, 1982); (2) in their spectral analysis, the initial part of the Japanese word /suki/ ‘like’ with devoiced /u/ was identical to the /sk/ consonant cluster in the English word ski, and furthermore, there was no resemblance between the devoiced /su/ syllable in /suki/ and the quintessential devoiced vowel /h/ in the English word who. This gestural overlap view was supported by Imaizumi, Hayashi, and Deguchi (1995), which showed that professional teachers of hearing-impaired children reduced their vowel devoicing in order to improve their listeners' comprehension. Imaizumi et al. (1995) proposed three possible ways in which devoicing can be achieved as a result of gestural overlap: (1) the devoicing gestures of the voiceless obstruents can be shifted toward the vowel due to an increase in speech rate; (2) the size of devoicing gestures can be ampliﬁed, or (3) a combination of the two. More recently, Tsuchida (1997) and Fujimoto, Murano, Niimi, and Kiritani (1998) have proposed that Japanese devoicing is a combination of both phonological and phonetic processes. Tsuchida (1997), by examining the muscular activation for glottal abduction and glottal opening patterns for various voicing patterns of vowels using electromyography (EMG), demonstrated that there are two distinct patterns of glottal gesture in devoicing, and proposed that there are two distinct mechanisms for Japanese devoicing: phonological and phonetic. The phonological devoicing is regular and complete, which includes devoicing of high vowels between voiceless consonants (except between voiceless fricatives and consecutive devoicing). The phonetic devoicing is a result of gestural overlap or undershoot, and thus it is irregular in occurrence and is gradient in the degree depending on phonetic factors. She noted that devoicing of non-high vowels, high vowels between voiceless fricatives, and consecutive devoicing fall in this category. The acoustic analysis presented in Varden (1998) supports Tsuchida's proposal that both phonological devoicing and phonetic loss of voicing are simultaneously at work in Japanese. Fujimoto (2004), by comparing the devoicing patterns of Tokyo speakers and Osaka speakers using photoelectric glottography (PGG), argued that devoicing in the Tokyo dialect is phonologically driven, where no voicing gesture was observed for the devoiced vowel (the arytenoids in a midsagittal image retracted and the glottis in a cross sectional image continuously opened during the devoiced /CVC/ sequence), and occurred regularly without speaker variation. On the other hand, devoicing in the Osaka dialect is phonetically driven, in which a voicing gesture is observed for the devoiced vowel (and there are two glottal opening peaks for the two consonants surrounding the voicing gesture) even if the voicing is not realized, and the rate of devoicing varied among speakers from close to zero up to 100%. Examining the data from consecutive devoicing cases, Kondo (2005) argued that Japanese vowel devoicing is part of a vowel weakening process, and that the vowel weakening process may be inﬂuenced by Japanese syllable structure. She further proposed that two different mechanisms, namely phonetic and phonological processes, control Japanese vowel devoicing depending on the environment. Cross-linguistically speaking, devoicing tends to affect high vowels that are in prosodically weak positions and adjacent to voiceless consonants, and is often considered to be a part of the vowel reduction process where vowels are ﬁrst reduced in duration and centralized in quality, then eventually devoiced. In European Portuguese, devoicing is restricted to high vowels in pre-pausal and unstressed positions (Cruz Ferreira, 1999), and to unstressed high vowels in Modern Greek (Dauer, 1980) and Quebec French (Walker, 1984). Similarly, recent work on French vowel reduction (e.g., Torreira & Ernestus, 2011) has shown that unaccented high vowel devoicing (especially /u/) in the phrase-medial position is common. At the same time, Smith (2003) showed that vowel devoicing in standard French also occurs in the prosodically prominent sentence-ﬁnal position, indicating that devoicing in French is unlikely to be a pure reduction process. Although devoicing is more probable in unaccented vowels in Japanese (Kitahara, 2001), previous studies suggest that devoicing is not a canonical reduction process in Japanese, either. First, devoicing does not involve apparent centralization of vowels in Japanese (Kondo, 2005). Second, devoicing is not suppressed in clear/careful speech, which is unlikely if its basis is vowel reduction: Martin, Utsugi, and Mazuka (2014) showed that speakers increased devoicing rate in careful, read speech compared to adult-directed speech, and in Fais, Kajikawa, Amano, and Werker (2010), virtually the same devoicing rate was observed between infant-directed speech and adult-directed speech. Further, Ogasawara and Warner (2009) showed that devoicing might facilitate word processing: in word recognition tasks, listeners performed better when vowels were devoiced in the environment where vowel devoicing was expected. Lastly, a negative correlation between devoicing rate and lexical frequency (i.e., more devoicing among lowfrequency words) has been reported in large corpus studies (Kilbourn-Ceron, 2015; Maekawa & Kikuchi, 2005), which also suggests that Japanese vowel devoicing is unlikely to be a reduction process, given that more probable words are reduced more in general (Jurafsky, Bell, Gregory, & Raymond, 2001).

72

K.Y. Nielsen / Journal of Phonetics 52 (2015) 70–88

Fig. 1. Waveforms and spectrograms of a word /akisame/ ‘autumn rain’ produced by the same speaker, with an undevoiced (¼ voiced) vowel [i] (left), and with a vowel deletion (right).

1.2. Phonetic realization of Japanese vowel devoicing Despite the widely used term ‘vowel devoicing’, previous studies have reported that the phonetic realization of Japanese vowel devoicing is not homogeneous, ranging from a shortened/reduced vowel, to a devoiced vowel, to complete vowel deletion (Beckman, 1982; Kondo, 1997; Maekawa and Kikuchi, 2005; Tsuchida, 1997). It has also been noted that vowels in devoicing environments are often deleted or reduced as opposed to ‘devoiced’ (Beckman,1982; Keating & Huffman, 1984; Ogasawara & Warner, 2009), suggesting that the term ‘vowel devoicing’ is not phonetically accurate. Fig. 1 presents waveforms and spectrograms of the word /akisame/ ‘autumn rain’ with an undevoiced ( ¼ voiced) vowel [i] (left panel), and with complete vowel devoicing (right panel), produced by the same speaker. As seen, there is no trace of a phonetically devoiced vowel (i.e., energy excited at frequencies of vowel formants with an aspiration source) between [kʲ] and [s] in either example. Vance (2008), in describing an example of devoiced /i/ in the word /kitai/ ‘expectation’, noted that the supposedly devoiced vowel (i.e., interval between the release of [kʲ] and the closure for the [t]) is acoustically just like the voiceless dorso-palatal fricative [ç], indicating that there is no ‘devoiced’ vowel in the token, and that the aspiration followed by a stop release is replaced by supra-laryngeal frication noise. The same pattern, namely the phonetic realization of an underlying sequence /ki/ as a dorso-palatal affricate [kʲç] (with no trace of devoiced vowel), can also be seen in Fig. 1 (right panel). Vance (2008) further noted that the term ‘devoiced’ is used to comply with Japanese phonotactics: “[i]t's possible to say that some or even all devoiced vowels are phonetically absent, but this analysis complicates the phonemic inventory and the phonotactics of Tokyo Japanese in a way that native speakers ﬁnd highly counterintuitive” (p. 209). That is, analyzing Japanese vowel devoicing as a phonological vowel deletion process has profound consequences in Japanese phonotactics, which disallows consonant clusters and coda obstruents (except for the case of geminates). However, there is little functional consequence of high vowel deletion in Japanese, such as risk of perceptual ambiguity, due to the allophonic distribution of Japanese obstruents. For example, of two voiceless fricatives in Japanese, /h/ is realized as [ç] before /i/ and /j/, and as [ɸ] before /u/, and /s/ is realized as [ɕ] before /i/ and /j/, and as [s] before /u/. Similarly, /k/ is realized as [kʲ] before /i/ and /j/, and as [k] before /u/, and /t/ is realized as [tɕ] before /i/ and /j/, and as [ts] before /u/. The voiceless bilabial stop /p/ and palatalized consonants are the only voiceless obstruents that do not have a clear allophonic distinction for the following /i/ and /u/. As for /p/, it seldom appears in a devoicing environment because singleton /p/ is only allowed post-nasally in Sino-Japanese or in Yamato (native Japanese words). As for the distinction between obstruents followed by /i/ and palatalized obstruents followed by /u/ (e.g., /kasi/ [kaɕi] ‘snack’ versus /kasʲu/ [kaɕɯ] ‘singer’), Japanese listeners have been shown to use coarticulatory vestiges to identify devoiced vowels at much better than chance accuracy (Beckman and Shoji, 1984; Ostreicher & Sharf, 1976). In short, previous studies report that phonetic realization of Japanese devoicing is more often deletion than devoicing, although it has not been systematically examined with a large sample size. Further, phonetic deletion of high vowels (following voiceless obstruents) does not risk phonemic contrast in Japanese, due to their effects in conditioning allophony in the preceding consonant. 1.3. Conditioning factors of Japanese vowel devoicing A number of both phonetically and phonologically motivated factors have been reported to affect the likelihood of vowel devoicing in Japanese. Consonantal context of the vowel, especially manner of articulation of preceding and following consonants, has been reported to affect the devoicing rate (Han, 1962; Maekawa, 1989; Tsuchida, 1997). In Maekawa and Kikuchi (2005), the most prominent effect was the manner of the following consonant, in particular vowels followed by fricatives showed distinctly lower devoicing rate than vowels followed by stops or affricates. On the other hand, rate of devoicing was the highest when vowels were preceded by fricatives. They found signiﬁcant main effects of the preceding consonant manner and the following consonant manner

K.Y. Nielsen / Journal of Phonetics 52 (2015) 70–88

73

as well as an interaction between the two: the overall highest devoicing rate was observed when high vowels are preceded by fricatives and followed by stops (F-S), and the second highest devoicing rate was observed when fricatives are followed by affricates (F-A), while the lowest devoicing rate was observed when high vowels are preceded and followed by fricatives (F-F). Further, it has been reported that likelihood of devoicing is particularly low when the vowel is followed by the phoneme /h/ (Fujimoto, 2004; Maekawa & Kikuchi, 2005). Speech rate also inﬂuences likelihood of devoicing. Kuriyagawa and Sawashima (1989) analyzed nine pairs of disyllabic words produced as fast and slow speech (elicited as “at a slower and a faster tempo than the subject’s own speech”), and showed that average rate of devoicing was higher in fast speech compared to slow speech. The effect of speech rate has been observed in a number of studies, including the corpus study of spontaneous speech by Maekawa and Kikuchi (2005). At the same time, Takeda, Sagisaka, and Kuwabara (1989) showed that devoicing is not limited to fast or casual speech, as it is found in most careful read speech (see also Kondo, 1997). Another factor known to inﬂuence the likelihood of devoicing is the presence of pitch accent: it has been reported that accented vowels are less likely to be devoiced compared to unaccented vowels (Han, 1962; Kuriyagawa & Sawashima, 1989). McCawley (1968) showed that devoicing in accented morae triggers accent shift, and that morphological stems are the domain of the process. However, a number of more recent studies (Kondo, 1997; Kitahara, 2001; Varden, 1998; Yoshida, 2006) show that devoicing does occur in accented vowels without necessarily triggering accent shift, especially in fast speech (Kuriyagawa & Sawashima, 1989). Vance (1987) noted that a morpheme boundary could suppress devoicing. However, all of his examples are compounds, and more recently Yoshida (2004) showed that the effect of morpheme boundary (not compound boundary) does not suppress devoicing. These ﬁndings suggest that morphologically motivated boundaries (stems or morphemes) do not inﬂuence the likelihood of devoicing. In addition, the moraic position of vowels in the devoicing environment has also been reported to affect the likelihood of devoicing: vowels in the 1st mora (of two-mora words) are more likely to be devoiced than the vowels in the 2nd mora (Kuriyagawa & Sawashima, 1989; Varden, 1998). However, the likelihood of devoicing in subsequent morae is unknown. The vowel quality and height also affect the likelihood of devoicing: the data from Maekawa and Kikuchi (2005) show that devoicing rate was highest for high vowels (89% for /i/, 84% for /u/), followed by /o/ (3%), /e/ (3%), and /a/ (2%). Further, Yoshida (2004, 2006) reported that the rate of devoicing is higher when the following vowel is /a/ as opposed to /i/ or /u/. Lastly, sociolinguistic register is another factor which inﬂuences devoicing, as the occurrence of devoicing differs considerably across dialects and speech style (Beckman, 1994; Imaizumi, Fuwa, & Hosoi, 1999). Above all, it has been reported that the most important factor affecting the likelihood of devoicing is whether the devoicing environment is single or consecutive, i.e., whether there is another devoicing environment in adjacent syllables (Kondo, 1997). Kondo (1997) reported that high vowels were almost always devoiced in the single devoicing environment, whereas devoicing appeared to be an optional process in the consecutive devoicing environment. This observation was later conﬁrmed in Varden (1998). 1.4. Consecutive devoicing and its theoretical account Previous studies agree that Japanese vowel devoicing in the canonical devoicing environments (i.e., high vowels between two voiceless obstruents) is highly likely with little variability, except for a few environments in which complete devoicing is often blocked (Maekawa & Kikuchi, 2005). One of the most studied cases is so called consecutive devoicing, where two or more consecutive vowels are in devoicing environments as shown below.

(2)

/kikiaesu/>[ki ̥ki ̥kaesɯ]/[kʲkʲkaesɯ] /zokusʲutu/>[zokɯ̥ ɕɯ̥ tsɯ]/[zokɕʷtsɯ] /tikuseki/>[tɕi ̥kɯ̥ seki]/[tɕkseki]

‘ask again’ ‘one after another’ ‘accumulation’

It has been claimed that consecutive devoicing may be avoided in order to maintain perceptual ease (Maekawa, 1989; Vance, 1987). However, the precise reason for its low likelihood is not well understood. According to the analysis by Kondo (2005), which assumes that vowel devoicing in Japanese is a categorical phonological process that occurs before the phonetic implementation of a word is planned, consecutive devoicing is blocked by the Japanese syllable structure: vowels in consecutive devoicing sites become devoiced only when resyllabiﬁcation of the remaining consonants is possible. When the devoiced vowel loses its sonority, the preceding consonant in the devoiced syllable cannot constitute a syllable on its own, and as a consequence the syllable structure of a word is altered (e.g., #CV.C(V).CV>#CVC.CV). Kondo argued that a triple consonant cluster in onset position (e.g., /futukajoi/>[fɯ̥ tsɯ̥ kajoi], ‘hangover’) is illformed in Japanese because it creates a trimoraic syllable which is unresyllabiﬁable (3c). According to her analysis, simplex onset clusters (CCV) are bimoraic and are allowed word initially (3b), while complex onset clusters (CCCV) are trimoraic and not allowed word initially (3c), because they cannot be resyllabiﬁed into two syllables without a preceding syllable. Kondo further argued that none of the phonetically motivated factors that are known to inﬂuence devoicing (e.g., manners of surrounding consonants, presence of pitch accent) consistently block the devoicing in single devoicing sites, and thus phonetic analyses cannot explain the low likelihood of consecutive devoicing.

74

K.Y. Nielsen / Journal of Phonetics 52 (2015) 70–88

In contrast, Yoshida (2004) and Tsuchida (1997) proposed that the phonetic environment of the vowels determines the likelihood of consecutive devoicing. Yoshida (2004) reported that the manners of surrounding consonants had a clear effect on the rate of consecutive devoicing, while the effect of the position in the word was relatively weak. Tsuchida (1997) argued that consecutive devoicing is a phonetic and continuous process whose likelihood is dependent on various phonetic factors, based on her observation using EMG that single site devoicing was complete and regular while consecutive devoicing is gradient and irregular. In addition, Kuriyagawa and Sawashima (1989) reported that the likelihood of consecutive devoicing increased considerably in fast speech, while it occurred seldom in slow speech, reveling that a non-categorical factor (i.e., speech rate) strongly inﬂuences the likelihood of consecutive devoicing. It is also possible that likelihood of consecutive devoicing is determined by the combination of both continuous (or phonetic) and categorical (or phonological) factors. Ellis and Hardcastle (2002) examined inter- and intra-speaker variability of alveolar-to-velar assimilation in English using EPG and EMA, and their results revealed that gradient and categorical patterns co-exist within a speech community: one group of speakers showed a gradual pattern of assimilation, while the other group showed categorical assimilation involving a segmental substitution of place of articulation. 1.5. Speaker variability of consecutive devoicing Maekawa and Kikuchi (2005) examined Japanese vowel devoicing by analyzing a large corpus (the Corpus of Spontaneous Japanese) (Maekawa, Koiso, Furui, & Isahara, 2000), and reported that consecutive devoicing was observed in 26.4% of all eligible cases. A similar number is reported in Kuriyagawa and Sawashima (1989), where consecutive devoicing was observed in 25% of cases in fast speech which was elicited as “at a faster tempo than the subject's own speech”. What their data also showed is clear individual differences among six speakers in both slow and fast speech. In fast speech, the rates of consecutive devoicing for two speakers were over 50% (41 out of 80 disyllabic words) for the high-low accent type, while the rate of another speaker in the same setting was 2.5% (2 out of 80 disyllabic words). Mimatsu, Fukumori, Sugai, Utsugi, & Shimada (1999) also reported that the occurrence of consecutive devoicing differed to a high degree across their two participants (i.e., 47.8% and 74.8%). However, in previous studies on Japanese devoicing (except for the corpus study by Maekawa & Kikuchi, 2005), the data are typically collected from a relatively small number of subjects and words, which makes it difﬁcult to examine effects of multiple phonetic/phonological factors and speaker variability simultaneously. In addition, phonetic realization of Japanese devoicing has not been investigated across speakers, and thus its speaker variability is unknown. 1.6. Goals of the current study While previous research has revealed a number of phonetic and phonological factors that affect the likelihood of vowel devoicing in Japanese, the relative importance of these factors on consecutive devoicing is not well understood. In addition, although it has been reported that the phonetic realization of Japanese vowel devoicing is not homogeneous, ranging from complete deletion to vowel shortening, the distribution of its precise implementation, especially for consecutive devoicing, has not been empirically investigated. Further, speaker variability in the likelihood of consecutive devoicing has also been reported, although the number of participants in previous studies is relatively small (e.g., six speakers in both Kuriyagawa & Sawashima, 1989, and Kondo, 2005) and thus the degree and distribution of the variability is not known. By addressing these issues, the present study aims to elucidate the linguistic nature of consecutive devoicing. We believe that a careful examination of these issues is crucial for informing the debate on the nature of consecutive devoicing in Japanese. We conducted an experiment with a larger number of participants as well as lexical items compared to previous studies, and examined (1) the relative contribution of previously reported phonetically and phonologically motivated factors on the likelihood of consecutive devoicing, (2) the acoustic realization of consecutive devoicing, and (3) the inter- and intra-speaker variability in the likelihood of consecutive devoicing as well as the realization of devoiced vowels. Based on the existing literature, we can identify several different factors that may interact with consecutive devoicing: (1) manners of consonants surrounding the second vowel in the devoicing environment (i.e., stop, fricative, or affricate), (2) position of consecutive devoicing site within a word (i.e., which mora does the vowel in the potential devoicing environments occupy), (3) quality of the vowel following consecutive devoicing site, and (4) presence of pitch accent in the devoicing mora.

K.Y. Nielsen / Journal of Phonetics 52 (2015) 70–88

75

Based on previous ﬁndings, we expect each of these factors to have the following inﬂuence. As for the manners of consonants surrounding the second vowel in the consecutive devoicing environment, we expect combinations such as fricative-stop, affricatestop, and stop-stop to show higher occurrence of consecutive devoicing than fricative-fricative, as previously reported (Fujimoto, 2004; Maekawa and Kikuchi, 2005; Yoshida, 2006) This prediction has a phonetic basis, because vowels followed by voiceless stops are phonetically shorter than vowels followed by fricatives (Umeda, 1975; Van Santen, 1992) and thus are more vulnerable to devoicing. As for the position of consecutive devoicing, if consecutive devoicing is phonetically driven as argued in Yoshida (2004) and Tsuchida (1997), we would expect to see higher occurrence in word-initial position, because maximum glottal apertures for the voiceless obstruents are greater for word-initial position than word-medial position (Fujimoto, 2004; Kuriyagawa & Sawashima, 1989), and in turn vowels in word-initial morae are shorter than the ones in subsequent morae (and consonants in word-initial morae are longer than their counterpart in subsequent morae) (Ota, Ladd, & Tsuchiya, 2003). On the other hand, if consecutive devoicing requires resyllabiﬁcation of a consonant cluster as proposed in Kondo (2005), we would expect a higher likelihood of consecutive devoicing in word-medial positions, because a trimoraic syllable can be resyllabiﬁed into two syllables to avoid forming a superheavy syllable when they occur word medially (e.g., #CV.CCCV >#CVC.CCV), while they cannot be resyllabiﬁed if they occur word initially (#CCCV). Consecutive devoicing sites without a pitch accent are also expected to show higher rate of consecutive devoicing when compared to sites with pitch accent, because realization of pitch accent requires elevated fundamental frequency (as all Japanese pitch accents are High–Low), which is known to decrease the likelihood of devoicing in general (Kuriyagawa & Sawashima, 1989; Fujimoto, 2004; Yoshida, 2004).1 As for the quality of vowel following the consecutive devoicing site ( ¼V3), we expect that low vowels in V3 position would show higher occurrence of consecutive devoicing than high vowels, as shown in Yoshida (2004). Having another high vowel in the following mora could create an environment for tri-moraic consecutive devoicing, which is reported to be extremely rare. If the phonetic environment for the following high vowel is more optimal for devoicing, the second vowel might not be devoiced in order to avoid tri-moraic consecutive devoicing. On the other hand, there is no possibility for tri-moraic consecutive devoicing if the following vowel is not a high vowel. Regarding the phonetic realization of consecutive devoicing, if Japanese devoicing is a phonological assimilation process of features such as [−voice] or [ +spread glottis], we would expect vowels to be either voiced or devoiced, as opposed to complete deletion or a combination of the two (partially devoiced and voiced). In terms of the distribution of vowel duration, if devoicing is strictly categorical in nature, we would expect to see clearly deﬁned multiple peaks (e.g., indicating complete devoicing and full voicing), which in turn shows that speakers’ production varies between vowel voicing and devoicing in an optional yet quantal fashion. On the other hand, if devoicing is strictly continuous in nature, we would expect to see a gradient and normal distribution from complete devoicing ( ¼ 0ms) to the canonical range of high vowel duration. 2. Method 2.1. Participants Twenty-four native speakers of Tokyo Japanese (age 18–50, mean 28.25, 12 females) with no reported speech, hearing, or language disorders served as participants for this experiment. None of the participants had lived outside of the greater Tokyo area (Tokyo, Chiba, Kanagawa, and Saitama), and none of them spoke a foreign language on a daily basis. They were recruited from the Tokyo University of Foreign Studies population, and were paid for their participation. All participants were recruited and tested according to approved IRB protocols for the treatment of human subjects. 2.2. Stimuli selection The production list consisted of 110 Japanese words, including (1) 30 test words with consecutive devoicing environment, (2) 60 words with single devoicing environment, and (3) 20 Filler words containing no devoicing environment. Lexical frequency and familiarity were determined from the NTT database (Lexical properties of Japanese, Amano & Kondo, 2003). All words had high familiarity ( > 5.0 on a 7-point scale). Test words were classiﬁed according to the following factors: (1) moraic position (i.e., in which mora is the consecutive devoicing site located), (2) C2 (Stop, Affricate, Fricative), (3) C3 (Stop, Affricate, Fricative), (4) V3 (/i/, /u/, /e/, /o/, and /a/), and (5) Pitch Accent (Accented versus Unaccented). For a complete list of the test words in this experiment, see Appendix A. 2.3. Procedure The experiment was conducted in the Research Institute for Languages and Cultures of Asia and Africa (ILCAA) at the Tokyo University of Foreign Studies in Japan. After a brief explanation of the procedure by the author, each participant was seated in front of a computer in a sound booth. The experimental stimuli were presented using Matlab (The MathWorks Inc.). The visual instruction (in 1 Amplitude does not have a signiﬁcant role in Japanese prosodic structure the way it does in English (Beckman, 1986), and the amplitude of pitch accented mora is not necessarily greater than that of unaccented mora (Sugito, 1982; Weitzman 1969).

76

K.Y. Nielsen / Journal of Phonetics 52 (2015) 70–88

Fig. 2. An example of consecutive devoicing where two consecutive vowels are deleted (/sukitooru/ ‘be transparent’ – labels are phonological).

Japanese) read as follows: “Please read aloud the words presented on the computer screen. Please pronounce them as naturally as possible.” The words in the production list were visually presented one at a time on a computer screen every 2.5 s, in random order. For each participant, the list was presented twice in order to elicit two tokens of each word. The word naming task without a carrier sentence was chosen in order to keep the speech rate at a relatively slow tempo, as Kuriyagawa and Sawashima (1989) showed a large effect of speech rate on devoicing. Each participant's speech was digitally recorded into a computer (sampling rate ¼22,050 Hz; 16-bit quantization) through a Logitech A-0356A headset-microphone. 2.4. Analysis and hypotheses Speech signals were ﬁrst segmented by the author and two research assistants using both waveforms and spectrograms in Praat (Boersma & Weenink, 2007). Consonants preceding vowels in the devoicing environment often displayed spectral coloring, in which frication noise energy was excited near the vowels' resonant frequencies throughout the consonant. These cases were classiﬁed as consonants as opposed to voiceless vowels, unless there was a visible transition from frication to aspiration to separate the two. In order to examine the data phonologically as well as phonetically, the degree of consecutive devoicing was quantiﬁed in two ways: (1) as a binary measure, obtained by coding the presence (or absence) of consecutive devoicing for each test item, and (2) as a gradient measure, obtained by measuring the duration of voiced vowels in devoicing environments. The basis of the binary judgment was whether there are three consecutive consonants without any vocalization (cyclic glottal pulse) in between. Note that this methodology does not distinguish phonetic vowel deletion (i.e., complete absence of vocalic gesture or voiceless vowel in phonetic realization) and vowel devoicing (i.e., presence of energy excited at frequencies of vowel formants without voicing, which is acoustically distinct from adjacent consonants), as our goal in this binary analysis was to determine the presence of consecutive devoicing, and thus the distinction between vowel deletion and devoicing is irrelevant. In addition to the binary and gradient measures, the distribution of phonetically devoiced vowels was examined separately to further elucidate the phonetic realization of consecutive devoicing. Fig. 2 illustrates the waveform and spectrogram of the word /sukitooru/ ‘be transparent’, which shows an example of consecutive devoicing. As seen, there is no vocalization until the third mora (i.e., /to/), and the ﬁrst two underlying vowels (i.e., /u/ and /i/) are replaced by frication noise of [s] and [kʲç]. Furthermore, the frication noise is fairly uniform and there is no clear transition into a voiceless vowel (i.e., noise energy excited at the vowels’ resonant frequencies with an aspiration source without voicing, as in /h/), indicating that these vowels were phonetically deleted rather than devoiced. The lowering of frication noise toward the end of /s/ is most likely due to anticipatory coarticulation for /k/, as the same pattern was observed regardless of the devoiced vowel. Statistical analysis for the binary measure was based on generalized linear mixed-effects modeling using R (R Development Core Team, 2008) with the lme4 package version 0.999999-0 (Bates, Maechler, & Bolker, 2012). The binary measure, namely presence or absence of consecutive devoicing, was ﬁtted to a model with Participant and Word as random effects, and (1) Moraic Position, (2) C2 manner, (3) C3 manner, (4) V3, (5) Pitch Accent, and (6) Gender as ﬁxed effects.2 P-values were estimated using the lmerTest package (Kuznetsova, Brockhoff, & Christensen, 2013). As discussed in Section 1.6, if consecutive devoicing is a phonetically driven process, we would expect to see an effect of moraic position (i.e., a higher rate of consecutive devoicing for words that have a devoiceable sequence in word-initial mora than in subsequent morae), because maximum glottal apertures for the voiceless obstruents are greater for word-initial position than wordmedial position. We would also expect to see an effect of C3 (i.e., a higher devoicing rate for stops in C3 position), because vowels followed by voiceless stops are phonetically shorter than vowels followed by fricatives. On the other hand, given the previous ﬁnding that the ﬁrst vowel in a consecutive devoicing site is almost always devoiced (Maekawa & Kikuchi, 2005), we predict that the effect of 2

The preliminary analysis showed that there was no relationship between rate of devoicing and speaker age.

K.Y. Nielsen / Journal of Phonetics 52 (2015) 70–88

77

Table 1 Summary of generalized linear mixed-effects modeling for the binary analysis with all factors, including the interaction between C2 and C3.

(Intercept) Moraic position C2 Affricate C2 Stop C3 Affricate C3 Stop V3 /u/ V3 /e/ V3 /o/ V3 /a/ PA Gender C2A:C3A C2S:C3A C2S:C3S

Estimate

Std. error

z value

Pr( > 9z9)

−3.0148 0.7212 −0.1704 0.7964 −0.3116 2.5849 −0.4777 1.0031 −0.3184 0.6528 −1.3692 −0.8843 0.3126 0.3169 −1.9122

1.5254 0.3665 0.5809 1.1474 1.0614 1.1849 0.4732 0.6151 0.5473 0.4137 0.4636 0.562 1.0626 1.1148 1.3615

−1.976 1.968 −0.293 0.694 −0.294 2.182 −1.01 1.631 −0.582 1.578 −2.953 −1.573 0.294 0.284 −1.405

0.0481 0.04906 0.76924 0.48762 0.76908 0.02915 0.31273 0.10294 0.56078 0.11459 0.00314 0.11561 0.76863 0.7762 0.16018

⁎ ⁎

⁎

⁎⁎

Signif. codes: 0 ‘⁎⁎⁎’ 0.001 ‘⁎⁎’ 0.01 ‘⁎’ 0.05 ‘.’ 0.1 ‘ ’ 1.

Fig. 3. Effect of moraic position on consecutive devoicing.

C2 is weaker than that of C3. In addition to testing these hypotheses, we also aim to examine the relative importance of each predictor to see which factors affect the likelihood of consecutive devoicing more strongly, and also the intra- and inter-speaker variability of consecutive devoicing. For the gradient analysis, the distribution of (voiced) vowel duration in devoicing environments was ﬁrst examined, and then ﬁtted to a linear mixed-effects regression model using R with the lme4 package. The ﬁxed effects entered in the gradient analysis were (1) preceding consonant, (2) following consonant, (3) pitch accent, (4) vowel, (5) moraic position of the vowel, and (6) gender of the speaker. (Note that consonant type was used instead of manner of consonant in the gradient analysis in order to examine more ﬁnegrained distinctions between consonants.) P-values were estimated using the lmerTest package (Kuznetsova et al., 2013), and skewness and kurtosis were estimated using the ‘moments’ package (Komsta & Novomestky, 2012). If Japanese devoicing is a phonological assimilation process, we would expect vowels to be either voiced or devoiced without much variability in duration, resulting in a bimodal distribution with relatively tight peaks at 0 ms and around canonical range for high vowels, without a gradient transition between the two. On the other hand, if it is a gradient phonetic process involving overlap of glottal gestures, we would expect to see a gradient and normal distribution of vowel durations from complete devoicing ( ¼ 0 m) to full vowel durations. As for the factors examined, we expect all factors (except for speaker gender) to affect vowel duration. The identity of surrounding consonants and presence of pitch accent have been known to inﬂuence the likelihood of devoicing, so if devoicing involves overlap of glottal gestures, vowels that are more likely to be devoiced (e.g., vowels with no pitch accent or followed by stops) should still have shorter duration even if they were not devoiced. The quality of vowel (/i/ versus /u/) should also inﬂuence the duration given that /u/ is inherently shorter than /i/. Similarly, moraic position of the vowel should affect vowel durations as glottal apertures for the voiceless obstruents are greater for word-initial position (Fujimoto, 2004) and thus vowels in word-initial morae are expected to be shorter than those in subsequent morae.

78

K.Y. Nielsen / Journal of Phonetics 52 (2015) 70–88

Fig. 4. Effect of C2–C3 manner (manner of articulation for the consonants surrounding the second devoiceable vowel) on consecutive devoicing.

Fig. 5. Effect of C3 manner on consecutive devoicing.

3. Results 3.1. Binary analysis The overall rate of consecutive devoicing was 27.1%, conﬁrming previous studies that showed low likelihood of consecutive devoicing in Tokyo Japanese. This result is also comparable to the likelihood of consecutive devoicing reported in Maekawa and Kikuchi (2005) (26.4%), revealing that consecutive devoicing is not limited to connected speech, and occurs as frequently in elicited laboratory-speech as in spontaneous speech. Table 1 summarizes the results of generalized linear mixed-effects modeling. The ﬁxed effects for the binary analysis were: (1) Moraic Position, (2) C2 manner, (3) C3 manner, (4) V3, (5) Pitch Accent, and (6) Gender. The effect of Moraic Position on the consecutive devoicing rate was signiﬁcant [z ¼1.97, p<0.05]. As can be seen in Fig. 3, the rate of consecutive devoicing was the highest when the ﬁrst vowel occupied word-initial mora (27%), followed by the second mora (24%), and the third mora (22%). At the same time, the difference between the three positions was relatively small, and so was the estimated effect size from mixed-effects modeling, suggesting that position of devoicing site is not a robust predictor of consecutive devoicing. As for the effect of the manner of articulation for C2 and C3, only C3 had a signiﬁcant effect in the model, where stops in C3 position showed higher rate of consecutive devoicing. This conﬁrms previous studies, which noted that devoicing between two fricatives appears to be unlikely while devoicing between fricative and stop as well as affricate and stop is highly probable (e.g., Kondo, 1997; Maekawa & Kikuchi, 2005; Maekawa, 1989; Tsuchida, 1997). Fig. 4 shows the effect of C2 and C3 ( ¼ manner of articulation for the consonants surrounding V2), and Fig. 5 presents the effect of C3 on the rate of consecutive devoicing. As seen, the rate of consecutive devoicing was notably higher for Affricate-Stop and Fricative-Stop than other conditions, indicating that the manner of C3 has a stronger inﬂuence on the rate of consecutive devoicing compared to that of C2. To examine the relative contribution of the six ﬁxed effects examined in the binary analysis, seven ﬁtted models (six models, each of which lacked one of the six factors, and the full model with no interaction) were compared using ANOVA. The model without C3 showed the worst ﬁt, indicating that C3 is the most important factor determining the likelihood of consecutive devoicing. The secondmost important factor was Pitch Accent, followed by V3, Gender, Moraic Position, and C2. The ﬁrst three factors (i.e., C3, PA, and V3) showed signiﬁcant improvement to the model [p<0.0001], conﬁrming the previous ﬁndings that manner of following consonant and presence of pitch accent affect the likelihood of devoicing. This is also consistent with Yoshida (2004, 2006), who reported that the

K.Y. Nielsen / Journal of Phonetics 52 (2015) 70–88

79

Table 2 Summary of generalized linear mixed-effects modeling for the binary analysis (Final Model).

(Intercept) C3 Affricate C3 Stop PA + V3 /u/ V3 /e/ V3 /o/ V3 /a/

Estimate

Std. error

z value

Pr( > ∣z∣)

−1.8622 −0.3303 1.4258 −1.3139 −0.4475 1.3824 −0.8528 0.6102

0.5382 0.4706 0.3418 0.4514 0.4177 0.6653 0.5671 0.4415

−3.46 −0.702 4.171 −2.911 −1.071 2.078 −1.504 1.382

0.00054 0.48277 3.03E−05 0.00361 0.28411 0.03773 0.13265 0.16689

⁎⁎⁎ ⁎⁎⁎ ⁎⁎ ⁎

Signif. codes: 0 ‘⁎⁎⁎’ 0.001 ‘⁎⁎’ 0.01 ‘⁎’ 0.05 ‘.’ 0.1 ‘ ’ 1

Fig. 6. Speaker variability in the rate of consecutive devoicing. The average rate of consecutive devoicing (y-axis) is plotted for each speaker from the highest (66%) to the lowest (3%).

rate of (single) devoicing was higher when the following vowel was /a/ as opposed to /i/ or /u/: our data showed higher rates of consecutive devoicing when the consecutive devoicing site was followed by /e/ or /a/ (54% and 40%, respectively), compared to high vowels (/i/ ¼21%, /u/¼ 15%). The contribution of moraic position was shown to be insigniﬁcant despite the signiﬁcant main effect shown in our ﬁrst model. Lastly, the data showed a trend of higher rate of consecutive devoicing for male speakers (31%) than female speakers (22%), although the effect of Gender was not signiﬁcant [p>0.1]. Table 2 presents the summary of generalized linear mixed-effects modeling for the ﬁnal model, which was hand ﬁtted for the best ﬁt. In order to avoid possible overﬁtting, both AIC (Akaike information criterion) and BIC (Bayesian information criterion) were used in the model selection. It does not include ﬁxed effects or interactions that did not improve the ﬁt of the model. The binary analysis also revealed that the two random effect variables, Participant and Word, had signiﬁcant effects on the rate of consecutive devoicing. The model with Participant as a random factor had a signiﬁcantly better ﬁt compared with a model without [p<0.0001], and so did the model with Word compared with a model without [p<0.0001]. The average rate of consecutive devoicing per participant ranged from 3% to 66%, revealing that there is considerable inter-speaker variability in the rate of consecutive devoicing. The average rate of consecutive devoicing also varied across words, ranging from 0% (sukihoudai ‘self-indulgence’ and tsukusu ‘exhaust’) to 75% (futsukayoi ‘hangover’), again revealing that the phonetic and phonological environment of the consecutive site strongly inﬂuences the likelihood of consecutive devoicing. Fig. 6 presents each speaker’s average rate of consecutive devoicing from the highest (66%) to the lowest (3%), which shows large speaker variability. In addition to the observed inter-speaker variability, the binary analysis also revealed intra-speaker variability of consecutive devoicing. As noted earlier, each speaker produced two repetitions of each word in the production list. To examine whether speakers produced consistent patterns of consecutive devoicing across two repetitions of the same word, 30 test words were classiﬁed into three categories (i.e., “none”, “once”, and “both”) based on how many times a given word was consecutively devoiced by a given speaker. Table 3 summarizes the intra-speaker variability analysis: the “None” category shows the number of words that were not devoiced consecutively in either repetition by the given speaker, while “One” shows the number of words that were consecutively devoiced in one repetition only. The “Both” shows the number of words that were consecutively devoiced in both repetitions. In other words, the One category ( ¼ middle row) shows the number of words for which speakers produced inconsistent patterns of consecutive devoicing across two repetitions. As seen, most speakers produced at least some words inconsistently, showing that realization of consecutive devoicing for a given word was not always consistent within a speaker. In addition, the proportion of inconsistent words varied greatly across speakers, showing considerable inter-speaker variability regarding how categorically consecutive devoicing is produced. Further, those speakers who are more likely to split their productions (e.g., AK, AR, KH, ST, TY, and US) tend to produce consecutive devoicing in the middle range of the distribution (20–40%, as seen in Fig. 6), as would be predicted in a stochastic process.

80

K.Y. Nielsen / Journal of Phonetics 52 (2015) 70–88

Table 3 Intra- and inter-speaker variability of consecutive devoicing. Thirty test words were classiﬁed based on how many times a given word was consecutively devoiced by a given speaker (each speaker produced two repetitions of each word). For example, for speaker AK, ﬁve words were devoiced in both repetitions, 10 words were devoiced only once, and 15 words were never devoiced. ID

AK

AR

GS

GT

HY2

JM

KH

KK

MM

MO

NM

NT

NT2

OT

OY

ST

TH

TH2

TM

TT

TY

US

UT

YH

None Once Both

15 10 5

9 12 9

14 4 12

18 6 6

9 3 18

26 4 0

17 8 5

11 5 14

24 3 3

26 0 4

23 1 6

28 2 0

25 2 3

8 6 16

28 2 0

16 8 6

18 5 7

26 4 0

19 4 7

26 3 1

17 10 3

18 8 4

25 3 2

26 4 0

Fig. 7. An example of tri-moraic consecutive devoicing. The ﬁrst three vowels in a word kuchikiki ‘middleman’ were deleted, resulting in CCCCVC. (Labels are phonological.)

Fig. 8. Distribution of Japanese high vowel duration in non-devoicing environments (left) and devoicing environments (right).

Lastly, the binary examination of data also revealed that ten speakers (out of 24) produced tri-moraic consecutive devoicing (e.g., denshikiki ‘electronics’; kishukusha ‘dormitory’; kikuchisan ‘Mr./Ms. Kikuchi’; kuchikiki ‘middleman’). Fig. 7 shows an example of kuchikiki: as seen, the ﬁrst three morae show no vocalization. This result shows that although tri-moraic consecutive devoicing is far less common than bimoraic consecutive devoicing, it does occur in elicited laboratory speech for some speakers.

3.2. Gradient analysis Our gradient-measure data, obtained by measuring the duration of voiced vowels in devoicing environments, showed that 59% of vowels in consecutive devoicing environments were completely devoiced (/i/ ¼52%; /u/ ¼74%). Fig. 8 shows the distribution of high vowel duration in non-devoicing environments (left) and devoicing environments (right). The duration of voiced vowels in devoicing environments had a clear unimodal distribution, and was shorter and more tightly distributed compared with vowels in non-devoicing environments (devoicing environment: mean ¼50.31 ms, skewness¼ 0.23, kurtosis ¼2.81; non-devoicing environment: mean ¼73.8 ms, skewness ¼0.99, kurtosis ¼ 4.06). The mean duration of high vowels in non-devoicing environments was comparable to the numbers reported in Hirata and Tsukada (2009) for ‘normal’ speech rate. To determine whether the realization of devoicing in consecutive devoicing environments is continuous or categorical at the individual level, the distribution of vowel durations for each speaker was examined. If consecutive devoicing is a strictly continuous process, we would expect to see a wide range of vowel durations with a gradient transition between complete devoicing and voiced vowels. On the other hand, if consecutive devoicing is strictly categorical, speakers should vary between vowel voicing and devoicing in a categorical fashion, resulting in a bimodal distribution with relatively tight peaks without a gradient transition between the two.

K.Y. Nielsen / Journal of Phonetics 52 (2015) 70–88

81

Fig. 9. Distribution of high vowel durations in consecutive environments, presented separately for each speaker.

Fig. 9 shows the distribution of vowel duration in consecutive devoicing environments, presented separately for each speaker. These histograms show a bimodal distribution for each speaker, namely a combination of a high peak for complete devoicing and a normal distribution for voiced vowels, indicating that vowel voicing and devoicing are expressed in a categorical manner. The high occurrence of complete devoicing ( ¼ 0 ms) across speakers shows that even for the infrequent consecutive devoicers, a large proportion of vowels in the devoicing environment are regularly devoiced. Further, the gap between complete devoicing and voiced vowels for each speaker shows that there is a tendency to avoid producing extremely short vowels, most likely due to the complex relationship between articulatory and aerodynamic constraints, indicating that vowel duration cannot be determined solely by its phonetic environment. At the same time, the distribution of voiced vowels is widely spread and normal in most cases, revealing that most speakers produce partially devoiced vowels in a gradient and not categorical manner. Together, these patterns suggest that realization of consecutive devoicing is both continuous and categorical. The data also revealed that the realization of consecutive devoicing shows inter- and intra-speaker variation. As seen in Fig. 9, the location, spread, and shape of the voiced vowel distribution vary across speakers, which suggests that speakers differ in terms of the relative balance between phonetic and phonological factors that determine the phonetic implementation of vowel devoicing. For example, speaker JM shows a broad distribution without clearly deﬁned peaks, indicating that her vowel durations are heavily inﬂuenced by phonetic environments. On the other hand, speaker MO shows a relatively compact distribution, showing that the vowel duration is categorically distributed, and relatively stable regardless of phonetic environment. Those speakers with higher likelihood of complete devoicing ( ¼ high bar at 0 ms) tend to have shorter mean duration of voiced vowels (e.g., OT: 38 ms, HY2: 43 ms) while speakers with lower likelihood of complete devoicing show longer mean duration of voiced vowels (e.g., TT: 65 ms). Further, each speaker’s voiced vowel duration varied continuously, indicating a gradient within-speaker variation. Next, the duration of vowels in consecutive devoicing environments was ﬁtted to a linear mixed-effects regression model, with (1) Preceding Consonant, (2) Following Consonant, (3) Pitch Accent, (4) Vowel, (5) Mora ( ¼moraic position of the vowel in a word), and (6) Gender as ﬁxed effects, and Participant and Word as random effects. Table 4 below summarizes the linear mixed-effects modeling results which includes all predictors and no interaction between them. As seen in Table 4, the model revealed signiﬁcant main effects of Pitch Accent, Vowel, and Mora. Vowels with pitch accent were longer than vowels without [p<0.0001], and /i/ was longer than /u/ in average [p<0.0001]. Vowels in the ﬁrst mora were consistently shorter than that of the second mora (conﬁrming Kuriyagawa & Sawashima, 1989; Tsuchida, 1997; Varden, 1998), and the vowels in the third mora were also shorter than that of the second mora [p <0.0001]. The effect of gender was not signiﬁcant [p>0.1]. In order to determine the relative contribution of the six ﬁxed effects examined in the gradient analysis, seven models (six models, each of which lacked one of the six factors, and the full model shown in Table 4) were compared using ANOVA. The result showed that all predictors except for Gender signiﬁcantly improved the ﬁt of the model [p <0.0001], and that the contribution of Preceding Consonant to the model was the greatest among the six factors examined, followed by Following Consonant, Mora, Pitch Accent, Vowel, and Gender. This result shows that (devoiceable) vowel duration is affected more strongly by the identity of preceding and following consonants and moraic position than presence of pitch accent or vowel quality. Next, the interactions between the ﬁxed effects were examined in additional linear mixed-effects regression models. The model which included interaction terms for the ﬁve signiﬁcant predictors (i.e., excluding ‘gender’) showed the best ﬁt compared to other models with interaction terms for subset of predictors [p<0.001]. Table 5 summarizes the results of the ﬁnal model, presenting only

82

K.Y. Nielsen / Journal of Phonetics 52 (2015) 70–88

Table 4 Summary of linear mixed-effects modeling for the gradient analysis with all ﬁxed effects, including the non-signiﬁcant predictor ‘gender’.

(Intercept) pre #f pre #k pre #s pre #sh pre #ts pre ch pre f pre h pre k pre sh pre ts follow f follow h follow k follow s follow sh follo wt follow ts PA (+ ) vowel /u/ mora 2 mora 3 gender F

Estimate

Std. error

df

t-Value

Pr( > ∣t∣)

23.92 −9.426 0.009 30.58 −5.088 −12.52 1.317 −26.19 −25.19 2.141 −33.17 −24.84 21.4 45.42 −13.22 −17.63 −13.76 15.65 −18.59 12.5 −5.777 27.98 16.19 4.037

4.614 3.88 3.401 7.036 3.801 4.443 4.057 16.61 6.041 3.916 4.68 5.688 16.19 5.552 2.18 3.344 2.704 7.249 2.735 1.639 1.334 2.214 2.116 2.84

142 2990 3405 524 2856 2802 3051 28 1489 3016 2347 1748 25 579 3286 2431 2295 523 2382 2452 3071 2876 3175 22

5.184 −2.429 0.003 4.346 −1.339 −2.817 0.325 −1.577 14.169 0.547 −7.086 −4.366 1.322 8.181 −6.066 −5.274 −5.089 2.158 −6.798 7.628 −4.329 12.64 7.652 1.422

7.35E−07 0.01519 0.99778 1.66E−05 0.18075 0.00487 0.74546 0.12609 3.24E−05 0.58452 1.82E−12 1.34E−05 0.19812 1.78E−15 1.46E−09 1.45E−07 3.90E−07 0.03136 1.33E−11 3.38E−14 1.54E−05 2.00E−16 2.62E−14 0.16917

⁎⁎⁎ ⁎ ⁎⁎⁎ ⁎⁎

⁎⁎⁎ ⁎⁎⁎ ⁎⁎⁎ ⁎⁎⁎ ⁎⁎⁎ ⁎⁎⁎ ⁎⁎⁎ ⁎ ⁎⁎⁎ ⁎⁎⁎ ⁎⁎⁎ ⁎⁎⁎ ⁎⁎⁎

Signif. codes: 0 ‘⁎⁎⁎’ 0.001 ‘⁎⁎’ 0.01 ‘⁎’ 0.05 ‘.’ 0.1 ‘ ’ 1

Table 5 Summary of linear mixed-effects modeling for the gradient analysis with all ﬁxed effects (excluding ‘gender’) and interactions between them. Only signiﬁcant effects are shown, as the full model includes over 50 rows.

(Intercept) pre #s pre #sh pre f follow f follow h follow sh pre ts:follow k pre k:follow sh follow k:PA+ follow s:PA+ pre #sh:vowel /u/ mora 2:vowel /u/

Estimate

Std. error

df

t-Value

Pr( >∣t∣)

3.3351 −70.1445 −61.3551 −64.0641 45.497 43.2974 100.8363 −70.9179 −97.6427 32.064 29.1925 −46.5414 −48.3605

7.2938 17.9293 27.3161 27.7536 20.4158 12.637 31.8708 20.8328 31.5879 12.1164 9.9525 18.2243 18.5054

24.2 26.3 31 20.4 21.1 24.2 23.6 27.3 22.5 25.8 43.5 23.9 23.3

0.457 −3.912 −2.246 −2.308 2.229 3.426 3.164 −3.404 −3.091 2.646 2.933 −2.554 −2.613

0.65158 0.000579 0.031972 0.031546 0.03685 0.002196 0.004251 0.002067 0.005236 0.013688 0.005337 0.017457 0.015441

⁎⁎⁎ ⁎ ⁎ ⁎ ⁎⁎ ⁎⁎ ⁎⁎ ⁎⁎ ⁎ ⁎⁎ ⁎ ⁎

Signif. codes: 0 ‘⁎⁎⁎’ 0.001 ‘⁎⁎’ 0.01 ‘⁎’ 0.05 ‘.’ 0.1 ‘ ’ 1

signiﬁcant effects. The most signiﬁcant interaction was found between preceding [ts] and following [k], where the voicing duration of vowels preceded by [ts] was shorter when it was followed by [k].

3.3. Devoiced vowels Lastly, acoustically devoiced vowels were analyzed in order to examine its likelihood and distribution. In the current analysis, the phrase “devoiced vowel” refers to a segment in which frequencies of vowel formants are excited by an aspiration source (replacing the underlying voice source) which is acoustically distinct from adjacent consonants (cf. Fant, 1970). It was apparent from the earliest stage of the analysis that cases of devoiced vowels were extremely rare in our data. In fact, out of 1440 consecutive devoicing environments (30 words×2 repetitions×24 speakers), there was only one token in which we observed devoiced vowels. Fig. 10 presents waveforms and spectrograms of the word kishukusha ‘dormitory’, produced by four speakers. The devoiced vowel example is shown in Fig. 10(a): unlike (c) or (d), there is a clear attenuation of amplitude between the ﬁrst two consonants [kʲ] and [ɕ], where the formants for [i] are visible (except for F1). Similarly, the transition from [ɕ] to [k] is rather abrupt in (b), (c), or (d), while it is gradient in (a), where formants for [u] are visible right before the stop closure. Beyond the consecutive devoicing environments, we observed 82 cases of devoiced vowels in our data, which includes single devoicing sites and non-devoicing sites (n¼9370). Among them, 69 cases were vowels in the word-initial morae. There were 58 cases of devoiced mid-vowels ([o]¼ 37, [e] ¼21), 20 cases of devoiced low-vowels, and four cases of devoiced high-vowels (([i] ¼2, [u] ¼2]). Out of 82 cases, 22 cases of the devoiced vowels were partially devoiced, either preceded or followed by a voiced vowel.

K.Y. Nielsen / Journal of Phonetics 52 (2015) 70–88

83

Fig. 10. Waveforms and spectrograms of the word kishukusha ‘dormitory’, produced by four speakers. Devoiced vowels were observed in (a): ➀ and ➁ show examples of devoiced [i] and [u], respectively.

Fig. 11. An example of partial vowel devoicing of /o/ in sokai ‘evacuation’. The ﬁrst portion of /o/ is devoiced.

Fig. 11 presents an example of a partially devoiced [o] in the word sokai ‘evacuation’: as seen, the devoiced vowel is followed by a regular vowel. Our data revealed that the overwhelming majority of high vowel devoicing tokens (99%<) were phonetically realized as vowel deletions. As seen in Fig. 10(c) and (d), the transition between [kʲ] and [ɕʷ] does not involve apparent vocalic gesture, and the two consonants appear to be a consonant cluster. Further, visual inspection of the data also revealed that these seemingly deleted vowels were typically accompanied by lengthening of the preceding stop release or frication (e.g., [kʲ] in Fig. 10(a) and (b) compared to (c) and (d)). This observation is consistent with Kondo (2005), who reported that consonants in devoiced morae were longer than consonants in morae with voiced vowels. As described in the Introduction, deleted high vowels are perceptually easily recoverable due to their allophonic distribution which involves strong spectral coloring on the preceding consonants: consonants which precede deleted /i/ typically displayed spectral coloring around 3000 Hz and above, which was clearly higher than its F2. On the other hand, consonants which precede deleted /u/ often displayed spectral coloring around 1500 Hz (which is roughly where its F2 is seen: Keating & Huffman, 1984), except for [ts] or [s] which showed little spectral coloring. As seen in Figs. 1, 2, 7, and 10(b), (c), (d), the frication energy of consonants which precede deleted vowels appear to be robust and spectrally stable, and the spectral coloring of the deleted vowel is often on the entire segment (instead of toward the edge of the segment, as in place coarticulation). Lastly, as noted in the Introduction, voiceless velar stops were usually produced as affricates when they precede deleted vowels (i.e., [kʲç]

84

K.Y. Nielsen / Journal of Phonetics 52 (2015) 70–88

before deleted [i], and [kx] before deleted [u]), which makes these segments acoustically more salient and easily distinguishable from each other (see Figs. 7 and 10). This stop to affricate change also increases the duration of these consonants, and might attribute to our observation that phonetic vowel deletion was often accompanied by lengthening of the preceding consonant.

4. Discussion 4.1. Nature of consecutive devoicing and conditioning factors Results from the binary analysis showed that of all ﬁxed-effect factors examined, the manner of articulation for the consonant following the second vowel in a consecutive devoicing site ( ¼C3) is the strongest predictor of the likelihood of consecutive devoicing. It was followed by the presence of pitch accent within the devoicing site ( ¼PA) and the quality of vowel following the consecutive devoicing site ( ¼V3). These three factors contributed to the ﬁnal model signiﬁcantly, while other factors, namely gender of the speaker, the moraic position of the devoicing site in a word, and the manner of articulation for the consonant preceding the second vowel in a consecutive devoicing site ( ¼ C2) did not show signiﬁcant effects. The three signiﬁcant factors (i.e., C3, PA, and V3) could be considered as categorical or phonological, yet they all have a strong phonetic basis: manners of articulation, namely stops versus fricatives and affricates, have different amounts of airﬂow, and realization of pitch accent in Japanese involves various phonetic factors such as longer mora duration and increased pitch (Haraguchi, 1988). Vowel quality is also known to affect vowel duration and oral airﬂow (Erickson, 2000; Higgins, Netsell, & Schulte, 1998). Further, the observed effect of V3 (i.e., the quality of vowel that follows a consecutive devoicing site affects the likelihood of consecutive devoicing) suggests that the consecutive devoicing is not simply due to overlap of gestures at the low-level phonetic implementation, but is sensitive to articulatory planning of following vowel gestures. Note that the clear effect of C3, combined with the lack of signiﬁcant effect of C2, conﬁrms the previous ﬁnding that the likelihood of consecutive devoicing largely depends on whether the second vowel in the consecutive site can be devoiced, as the ﬁrst vowel is most likely devoiced (Maekawa & Kikuchi, 2005). Our gradient analysis revealed that duration of voiced vowels in the devoicing environment is signiﬁcantly inﬂuenced by the following factors (listed in order of effect size): (1) preceding consonant, (2) following consonant, (3) moraic position of the vowel, (4) pitch accent, (5) vowel quality. The ﬁnding of moraic position effect is consistent with previous studies (Kuriyagawa & Sawashima, 1989; Varden, 1998) as well as the trend observed in our binary analysis, which showed that consecutive devoicing is more likely when the ﬁrst devoiceable mora was in the word initial position. These results show that phonetic realization of vowel devoicing in consecutive environments is determined by various phonetically motivated factors. More importantly, our gradient analysis revealed that there are clearly two types of processes involved in the realization of Japanese consecutive devoicing: (1) the continuous variation between the longest voiced vowels and the shortest (one or two glottal pulses only) voiced vowels, and (2) the relatively discrete (or quantal) alternation between presence of the (almost invariantly voiced) glottal energy source following the consonant, and the supra-laryngeal (turbulent) energy source of the consonant without voicing source.

4.2. Inter- and intra-speaker variability Our results also revealed considerable inter- and intra-speaker variability in the likelihood of consecutive devoicing among the 24 participants. The average rate of consecutive devoicing per speaker (across all test words) ranged from 3% to 66%, and speakers as a random effect factor was shown to be a crucial element in our mixed-effects modeling. This ﬁnding is consistent with Kuriyagawa and Sawashima (1989), which reported clear individual differences in the rate of consecutive devoicing among six speakers. Similarly, the gradient analysis showed that the realization of partial devoicing varied greatly across speakers. Previous studies have shown individual differences in laryngeal functions (e.g., Hunson, 1997; Koenig, Mencl, & Lucero, 2005) indicating that speakers use different articulatory strategies to achieve the same phonological goals. The observed inter-speaker variability shows that the patterns of Japanese consecutive devoicing cannot be accounted for by a set of phonetic or phonological factors without factoring in individual differences. Furthermore, our binary and gradient analyses revealed intra-speaker variation of consecutive devoicing patterns: the realization of (binary) complete consecutive devoicing for a given word was not always consistent within a speaker, and the distribution of vowel duration was widespread and gradient for all speakers. Although the observed inter-speaker variability by itself cannot challenge a categorical interpretation of consecutive devoicing, as many categorical (or phonological) phenomena could show inter-speaker variability, this intra-speaker variability of consecutive devoicing cannot be accounted for by a strictly categorical interpretation, and thus suggests that both likelihood of complete consecutive devoicing and realization of partial devoicing in consecutive sites are at least in part determined by continuous factors. This co-existence of continuous and categorical patterns within a speech community is consistent with Ellis and Hardcastle (2002). However, unlike the alveolar-to-velar assimilation in Ellis and Hardcastle (2002), none of the speakers in the current study demonstrated a strictly continuous and categorical pattern. In other words, despite the wide interspeaker variability, all our speakers showed both continuous and categorical aspects of consecutive devoicing. This result of intraspeaker variability provide support for the dual-mechanism view of Japanese devoicing proposed by Tsuchida (1997) and Fujimoto (2004).

K.Y. Nielsen / Journal of Phonetics 52 (2015) 70–88

85

4.3. Consecutive devoicing and syllable structure As discussed in Section 1, Kondo (2005) proposed that consecutive devoicing is blocked by the Japanese syllable structure, and that consecutive devoicing is possible only if the devoiced morae can be resyllabiﬁed. According to her analysis, consecutive devoicing in the word-initial position is phonologically illformed because it creates a trimoraic syllable (CCCV) that is unresyllabiﬁable. That is, simplex onset clusters (CCV) are bimoraic and are allowed word initially in Japanese, while complex onset clusters (CCCV) are trimoraic and are not allowed word initially. This view makes an explicit prediction regarding the distribution of consecutive devoicing, namely, consecutive devoicing is favored in a word medial position compared to a word-initial position, because a trimoraic syllable can be resyllabiﬁed into two syllables when they occur word medially (e.g., #CV.CCCV >#CVC.CCV), while they cannot be resyllabiﬁed if they occur word initially (#CCCV). Contrary to this prediction, our results showed a higher rate of consecutive devoicing in word-initial position, although the effect was not a signiﬁcant predictor in our ﬁnal binary model. In addition, it is well known that Japanese is a mora-timed language, in which the basic phonological unit is the mora rather than the syllable or stress (Kubozono, 1989, 1999; Poser, 1990; Otake, Hatano, Cutler, & Mehler, 1993). Although it has been reported that there is a tendency towards equalizing the duration of morae in Japanese (Port, Dalby, & O'Dell, 1987), more recent studies suggest that the mora is not a timing unit, but rather an abstract unit which plays a structural role in Japanese phonology (Warner & Arai, 2001). If we consider mora as the fundamental phonological unit in Japanese at which Japanese vowel devoicing takes place, the devoicing process may be viewed simply as a loss of mora sonority at the phonetic implementation level. In addition, if we assume the syllable structure proposed by Kondo, our ﬁnding of tri-moraic consecutive devoicing would mean that some speakers produced a syllable with four morae, which is cross-linguistically highly marked (Gordon, 2006). Furthermore, except for geminates, onsets are not known to contribute to syllable weight, and an open syllable with quadruple consonant cluster is an unlikely form of superheavy syllable. Based on these observations, it is unlikely that constraints motivated by Japanese syllable structure is responsible for the low likelihood of consecutive devoicing. 4.4. On the mechanism of Japanese devoicing As described in Section 1, Japanese vowel devoicing has been traditionally considered a phonological feature assimilation process on the basis of phonological constraints (e.g., McCawley, 1968; Kondo, 2005; Vance, 1987). Alternatively, Jun and Beckman (1993) argued that Japanese devoicing is a gradient phonetic process, and proposed a gestural overlap account of devoicing that is based on Browman and Goldstein (1989). More recently, Tsuchida (1997) and Fujimoto (2004) have proposed that Japanese devoicing is a combination of both phonological and phonetic processes. Our results provide overall support for the dual-mechanism view of Japanese devoicing. The strongest predictors of consecutive devoicing were C3 and pitch accent, which are both phonetically and phonologically motivated. The likelihood of consecutive devoicing was shown to vary continuously across phonetic/phonological environments as well as across and within speakers. At the same time, the data also showed that most high vowels in canonical single devoicing sites as well as one of the two vowels in consecutive devoicing sites are devoiced regularly with very little variability. These ﬁndings suggest that devoicing is categorical in environments that are optimal for devoicing, while devoicing is continuous in environments where one or more factors could reduce the likelihood of devoicing, such as presence of pitch accent or vowel surrounded by fricatives. Given the low likelihood of consecutive devoicing, additional factors that do not affect single site devoicing are apparently at work. Recall that our binary analysis showed a trend in which the rate of consecutive devoicing was higher when the ﬁrst devoiceable vowel was in the word initial mora. Our gradient analysis also showed an effect of moraic position, in which vowels were shorter in the ﬁrst or third morae over the second mora. Given the high likelihood of devoicing in word-initial position, the observed effect of moraic position might be a combination of a phonetically driven preference for word-initial devoicing and OCP of devoiced vowel proposed by Tsuchida (2001), in which devoiced vowels are speciﬁed as the [ +s.g.] while voiceless consonants are not, and co-occurrence of the feature is restricted. Alternatively, we could speculate that prosodic constraint(s) motivated by Japanese bimoraic foot may inﬂuence the likelihood of consecutive devoicing. That is, if Japanese has a constraint which penalizes a bimoraic foot without a voiced vowel or a sonority peak, its effect would be seen only in consecutive devoicing, and should be weakly realized in word-initial positions due to the preference for word-initial devoicing. Given that bimoraic foot boundaries often coincide with morpheme boundaries in Japanese, the previously reported effect of morpheme boundaries (e.g., Vance, 1987), which Yoshida (2004) later argues against, might attribute to the effect of bimoraic foot. Previous studies have shown the prominent status of bimoraic foot in Japanese prosodic structure (e.g., Itô & Mester, 2007; Poser, 1990; Shinohara, 2000). Given that sonority is a prerequisite for prosodic prominence, it would not be surprising if consecutive devoicing which would eliminate the entire voicing/sonority within a bimoraic foot is penalized in Japanese. However, these constraints are most likely weakly realized for most speakers, and thus consecutive devoicing does occur when the environment surrounding the consecutive site has no other constraint that reduces the likelihood of devoicing. On the other hand, when combined with the constraint(s) speciﬁc to consecutive devoicing, those constraints which would not regularly block the devoicing in single devoicing sites (e.g., pitch accent, between fricatives) are likely to block devoicing, resulting in the observed low likelihood. Lastly, our data also revealed that the majority of complete devoicing tokens were realized as phonetic vowel deletion without an apparent transition from (consonant) frication to devoiced vowel (i.e., energy excited at frequencies of vowel formants with an aspiration source). This ﬁnding is in agreement with previous physiological studies that showed absence of vocalic gestures during the devoiced /CVC/ sequence by Tokyo speakers (e.g., Fujumoto et al., 1998; Fujimoto, 2004; Sawashima, 1969), as well as acoustic

86

K.Y. Nielsen / Journal of Phonetics 52 (2015) 70–88

studies that reported absence of phonetically devoiced vowels (e.g., Beckman,1982; Keating & Huffman, 1984; Ogasawara & Warner, 2009; Vance, 2008). Alternatively, one could argue that Japanese vowel devoicing is an endpoint of vowel reduction as a result of extreme overlap of consonant gestures (cf. Jun & Beckman, 1993), and that the spectral coloring of voiceless consonants arises from the reduction of vocalic gestures. Although this view might account for vowel devoicing in many other languages, it is unlikely the sole explanation for vowel devoicing in Tokyo Japanese. The strongest counterevidence for this view comes from the aforementioned physiological studies, which showed single (mono-modal) glottal opening pattern during the devoiced/CVC/ sequence (e.g., Fujumoto et al., 1998; Fujimoto, 2004; Sawashima, 1969; Tsuchida, 1997). If vowel reduction were the basis of devoicing in Tokyo Japanese, we would expect to see a bimodal glottal opening pattern (which Fujimoto (2004) observed for Osaka speakers), even if the voicing is not realized acoustically. Secondly, Kondo (2005) reported that consonants in devoiced morae were longer than consonants in morae with voiced vowels, and we observed a similar pattern in our data. This is highly unlikely if devoicing is the result of vowel reduction. Further, our data showed that the spectral coloring of voiceless consonants do not necessarily match the formants of devoiced vowels: as discussed in 3.3, consonants which precede devoiced /i/ typically displayed spectral coloring around 3000 Hz and above, which was clearly higher than its F2 (see [k(i)] in Fig. 7). At the same time, our gradient analysis showed that many speakers indeed produced very short vowels in devoicing environments, which supports the overlap and reduction view of devoicing. Taken together, these observations provide additional support for the dual-mechanism view of Japanese devoicing (Fujimoto, 2004; Tsuchida, 1997). As discussed in Section 1, Tsuchida (1997) noted that the phonological devoicing is regular and complete, while the phonetic devoicing is a result of gestural overlap or undershoot, which is irregular and gradient. In single devoicing sites, the devoicing seems mostly categorical (or phonological) and is realized as phonetic vowel deletion. This type of devoicing must account for the single glottal opening pattern during the devoiced /CVC/ sequence observed in the previous physiological studies (e.g., Sawashima, 1969; Tsuchida, 1997; Fujimoto et al., 1998) which suggests that the vocalic gesture is not planned. At the same time, when the devoicing environment is not optimal due to factors such as presence of pitch accent or a consequent devoiceable vowel, the voicing gesture is planned, and realized as a reduced or devoiced vowel. It is important to recall that there is little functional consequence of phonetic deletion of high vowels in Japanese, given that those (phonetically) deleted high vowels were perceptually easily recoverable due to their allophonic distribution which involves strong spectral coloring on the preceding consonants (Beckman & Shoji, 1984).3 This indicates that despite the absence of a vocalic gesture, devoiced vowels in Japanese are not deleted at the underlying level, and are unlikely to trigger sound changes that involve the categorical loss of vowels. At the same time, the phonetic realization of Japanese devoicing as a phonetic vowel deletion could have theoretical implications for Japanese phonological structure. One possible account is to consider Japanese devoicing as a loss of sonority at mora level. As discussed earlier, if we consider mora as the fundamental phonological unit at which Japanese devoicing takes place (instead of segment/vowel as traditionally considered), the devoicing could be viewed as a loss of mora sonority. Although this account may seem speculative and is a drastic departure from the conventional analysis of Japanese devoicing, given the attested importance of moraic representation in Japanese (e.g., Kubozono, 1989; Otake et al., 1993; Poser, 1990) as well as the clear (phonetic) vowel deletion patterns observed in our data and previous studies (e.g., Beckman, 1982; Fijumoto et al., 1998; Keating & Huffman, 1984; Tsuchida, 1997), we believe it is a possibility worth investigating in future research.

5. Conclusions The current study investigated the linguistic nature of consecutive devoicing in Japanese, by examining the roles played by various factors on its likelihood, inter- and intra-speaker variability, and its phonetic realization. Mixed-effects modeling revealed that several phonetically and phonologically motivated factors simultaneously contribute to the likelihood of consecutive devoicing as well as the pattern of partial devoicing. Wide intra- and inter-speaker variability was observed in both binary and gradient analyses, suggesting that realization of consecutive devoicing (for a given word) is not always consistent within a speaker, and that the relative weight of conditioning factors may vary across speakers. At the same time, the duration of vowels in the consecutive devoicing environment showed a bimodal distribution, indicating that realization of consecutive devoicing cannot be strictly continuous. Taken together, the results demonstrate that consecutive devoicing in Japanese is both a phonetically driven continuous process and a phonologically driven categorical process.

Acknowledgments This study was supported by an NSF Dissertation Improvement Grant (BCS-0547578, PI: Patricia Keating) and a UCLA Dissertation Year Fellowship. The author would like to thank Mary Beckman and Andries Coetzee for their constructive comments, and Yuko Abe and Peri Bhaskararao for their generous help with data collection in Japan. 3 Note that this functional load of (phonetic) vowel deletion is even smaller for words with consecutive devoicing sites, as they are at least trimoraic, and thus have few phonological neighbors. From a word recognition point of view, there is little need to recover the supposedly lost contrast between /i/ and /u/.

K.Y. Nielsen / Journal of Phonetics 52 (2015) 70–88

87

Appendix A. List of test words. Word

Japanese

Moraic position

V1

V2

V3

Pitch accent

Mora #

C2–C3

chichikata chikuseki denshikiki fukushi fukushikikokyuu fukusuu fukutsu futsukayoi kashitsuchishi kashitsuke kikikaesu kikikomi kikuchi-san kishukusha kousakukikai kuchikiki kushikatsu shikichi shikikin shikisha shikitsumeru shishutsu shouhishishutsu shukushaku shukushou sukihoudai sukitouru tekishutsu tsukisasu zokushutsu

父方蓄積電子機器福祉腹式呼吸複数不屈二日酔い過失致死貸し付け聞き返す聞き込み菊池さん寄宿舎工作機械口利き串カツ敷地敷金指揮者敷き詰める支出消費支出縮尺縮小好き放題透き通る摘出突き刺す続出

1 1 3 1 1 1 1 1 2 2 1 1 1 1 4 1 1 1 1 1 1 1 3 1 1 1 1 2 1 2

i i i u u u u u i i i i i i u u u i i i i i i u u u u i u u

i u i u u u u u u u i i u u i i i i i i i u i u u i i u i u

a e i i i u u a i e a o i u a i a i i a u u u a o o o u a u

0 0 1 1 1 1 0 0 1 0 1 0 0 1 1 0 0 0 1 1 1 0 1 0 0 1 1 0 1 0

4 4 5 3 7 4 3 5 5 4 5 4 5 4 7 4 4 3 4 3 5 3 3 4 4 6 5 4 4 4

AS SF SS SF SF SF SA AS FA FA SS SS SA FS SS AS FS SA SS SF SA FA FF SF SF SF SS SF SF FA

References Amano, S., & Kondo, T. (2003). Nihongo-no Goi-Tokusei (Lexical properties of Japanese) (Vols. 1–7), CD-ROM version, Sanseido, Tokyo. Bates, D., Maechler, M., & Bolker, B. (2012). lme4: Linear mixed-effects models using S4 classes. R package version 0.999999-0. Computer software. Retrieved from 〈http://CRAN. R-project.org/package0lme4〉. Beckman, M. E. (1982). Segment duration and ‘mora’ in Japanese. Phonetica, 39, 113–135. Beckman, M. E. (1986). Stress and non-stress accent (Vol. 7). Walter de Gruyter. Beckman, M. E. (1994). When is a syllable not a syllable? Ohio state working papers in linguistics No. 44, Papers from the Linguistics Laboratory, 50-69. Columbus, OH: The Ohio State University. Beckman, M. E., & Shoji, A. (1984). Spectral and perceptual evidence for CV coarticulation in devoiced /si/ and /syu/ in Japanese. Phonetica, 41, 61–71. Boersma, P., & Weenink, D. (2007). Praat: Doing phonetics by computer (Version 4.5.25) [Computer program]. 〈http://www.praat.org/〉 Retrieved 07.05.07. Browman, C. P., & Goldstein, L. (1989). Articulatory gestures as phonological units. Phonology, 6, 201–251. Clements, G. N. (1985). The geometry of phonological features. Phonology yearbook (Vol. 2, pp. 225–252). Dauer, R. M. (1980). The reduction of unstressed high vowels in Modern Greek. Journal of the International Phonetic Association, 10(1–2), 17–27. Ellis, L., & Hardcastle, W. J. (2002). Categorical and gradient properties of assimilation in alveolar to velar sequences: Evidence from EPG and EMA data. Journal of Phonetics, 30, 373–396. Erickson, M. L. (2000). Simultaneous effects on vowel duration in American English: A covariance structure modeling approach. The Journal of the Acoustical Society of America, 108, 2980. Fais, L., Kajikawa, S., Amano, S., & Werker, J. F. (2010). Now you hear it, now you don’t: Vowel devoicing in Japanese infant-directed speech. Journal of Child Language, 37(02), 319–340. Fant, G. (1970). Acoustic theory of speech production. Walter de Gruyter. Cruz-Ferreira, M. (1999). Portuguese (European). Handbook of the International Phonetic Association: A guide to the use of the International Phonetic Alphabet, 126–130. Fujimoto, M. (2004). Effects of consonant type and syllable position within a word on vowel devoicing in Japanese. In Proceedings of speech prosody 2004 (pp. 625–628). Fujimoto, M., Murano, E., Niimi, S., & Kiritani, S. (1998). Correspondence between the glottal gesture overlap pattern and vowel devoicing in Japanese. In ICSLP. Gordon, M. (2006). Syllable weight: Phonetics, phonology, typology. New York: Routledge. Han, M. S. (1962). Unvoicing of vowels in Japanese. Onsei no Kenkyuu, 10, 81–100. Hanson, H. M. (1997). Glottal characteristics of female speakers: Acoustic correlates. Journal of the Acoustic Society of America, 101, 466–481. Haraguchi, S. (1988). Pitch accent and intonation in Japanese. Autosegmental Studies on Pitch Accent, 123–150. Higgins, M. B., Netsell, R., & Schulte, L. (1998). Vowel-related differences in laryngeal articulatory and phonatory function. Journal of Speech, Language and Hearing Research, 41(4), 712. Hirata, Y., & Tsukada, K. (2009). Effects of speaking rate and vowel length on formant frequency displacement in Japanese. Phonetica, 66(3), 129–149. Imaizumi, S., Hayashi, A., & Deguchi, T. (1995). Listener adaptive characteristics of vowel devoicing in Japanese Dialogue. Journal of the Acoustic Society of America, 98, 768–778. Imaizumi, S., Fuwa, K., & Hosoi, H. (1999). Development of adaptive phonetic gestures in children: Evidence from vowel devoicing in two different dialects of Japanese. Journal of the Acoustical Society of America, 106, 1033–1044. Itô, J., & Mester, A. (2007). Prosodic adjunction in Japanese compounds. In Formal approaches to Japanese linguistics (Vol. 4, pp. 97–111). Jun, S. & Beckman, M. E. (1993). A gestural-overlap analysis of vowel devoicing in Japanese and Korean. Paper presented at the 1993 annual meeting of the LSA. Los Angeles, 7–10 January.

88

K.Y. Nielsen / Journal of Phonetics 52 (2015) 70–88

Jurafsky, D., Bell, A., Gregory, M., & Raymond, W. D. (2001). Probabilistic relations between words: Evidence from reduction in lexical production. Typological Studies in Language, 45, 229–254. Keating, P. A., & Huffman, M. K. (1984). Vowel variation in Japanese. Phonetica, 41, 191–207. Kilbourn-Ceron, O. (2015). Categorical and variable allophony? The case of Japanese high vowel devoicing in spontaneous speech. McGill working papers in linguistics 25. Kitahara, M. (2001). Category structure and function of pitch accent in Tokyo Japanese. Indiana University (Doctoral dissertation). Koenig, L. L., Mencl, W. E., & Lucero, J. C. (2005). Multidimensional analysis of voicing offsets and onsets in female subjects. Journal of Acoustical Society of America, 118, 2535–2550. Komsta, L.& Novomestky, F. (2012). Moments, cumulants, skewness, kurtosis and related tests. R Package Version 0.13. Kondo, M. (1997). Mechanisms of vowel devoicing in Japanese (Doctral dissertation). University of Edinburgh. Kondo, M. (2005). Syllable structure and its acoustic effects on vowels in devoicing environments. In J. van de Weijer, K. Najo, & T. Nishhara (Eds.), Voicing in Japanes (pp. 229–245). Mouton de Gruyter. Kubozono, H. (1989). The mora and syllable structure in Japanese: Evidence from speech errors. Language and Speech, 32, 249–278. Kubozono, H. (1999). Mora and syllable. In N. Tsujimura (Ed.), The handbook of Japanese linguistics (pp. 31–61). Malden, MA: Blackwell Publishers. Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2013). lmerTest: Tests for random and ﬁxed effects for linear mixed effect models (lmer objects of lme4 package). R package version, 2(6). Kuriyagawa, F., & Sawashima, M. (1989). Word accent, devoicing and duration of vowels in Japanese. Annual Bulletin of the Research Institute of Logopedics and Phoniatrics (Vol. 23, pp. 85–108). University of Tokyo.. Maekawa, K. (1989). Boin no museika [Vowel devoicing] in Japanese. In M. Sugito (Ed.), Koza Nihongo to Nihongo Kyoiku, vol.2, Nihongo no Onsei, On’in (pp. 135–153). Tokyo: Meiji Shoin. Maekawa, K., Koiso, H., Furui, S., & Isahara, H. (2000). Spontaneous speech corpus of Japanese. In Proceedings of the second international conference of language resources and evaluation (LREC) (Vol. 2, pp. 947–952). Maekawa, K., & Kikuchi, H. (2005). Corpus-based analysis of vowel devoicing in spontaneous Japanese: An interim report.. In J. van de Weijer, K. Najo, & T. Nishhara (Eds.), Voicing in Japanese (pp. 205–228). Mouton de Gruyter. Martin, A., Utsugi, A., & Mazuka, R. (2014). The multidimensional nature of hyperspeech: Evidence from Japanese vowel devoicing. Cognition, 132(2), 216–228. McCawley, J. D. (1968). The phonological component of a grammar of Japanese. The Hague: Mouton. Mimatsu, K., Fukumori, T., Sugai, K., Utsugi, A., & Shimada, T. (1999). Vowel Devoicing in Japanese: An acoustic al investigation of consecutive vowel devoicing of disyllabic words in Tokyo Japanese. Journal of General Linguistics (2), 73–101. Ogasawara, N., & Warner, N. (2009). Processing missing vowels: Allophonic processing in Japanese. Language and Cognitive Processes, 24, 376–411. Ostreicher, H. J., & Sharf, D. J. (1976). Effects of coarticulation on the identiﬁcation of deleted consonant and vowel sounds. Journal of Phonetics, 4, 285–301. Ota, M., Ladd, D. R., & Tsuchiya, M. (2003). Effects of foot structure on mora duration in Japanese? In Proceedings of ICPhS’03. Otake, T., Hatano, G., Cutler, A., & Mehler, J. (1993). Mora or syllable? Speech segmentation in Japanese. Journal of Memory and Language, 32, 253–278. Port, R. F., Dalby, J., & O’Dell, M. (1987). Evidence for mora timing in Japanese. Journal of the Acoustical Society of America, 81(5), 1574–1585. Poser, W. J. (1990). Evidence for foot structure in Japanese. Language, 78–105. Prince, A., & Smolensky, P. (1993). Optimality theory: Constraint Interaction in generative grammar, ms. New Brunswick and Boulder: Rutgers University and University of Colorado. Sawashima, M. (1969). Devoiced syllables in Japanese. Annual Bulletin of the Research Institute of Logopedics and Phoniatrics, 3, 35–41. R Development Core Team. (2008). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, URL http://www.R-project.org, ISBN 3-900051-07-0. Shinohara, S. (2000). Default accentuation and foot structure in Japanese: Evidence from Japanese adaptations of French words. Journal of East Asian Linguistics, 9, 55–96. Sugito, M. (1982). Nihongo Akusento no Kenkyuu [Studies on Japanese accent]. Sanseido, Tokyo, Japan. Smith, C. L. (2003). Vowel devoicing in contemporary French. Journal of French Language Studies, 13, 177–194. Takeda, K., Sagisaka, Y., & Kuwabara, H. (1989). On sentence-level factors governing segmental duration in Japanese. The Journal of the Acoustical Society of America, 86, 2081–2087. Torreira, F., & Ernestus, M. (2011). Realization of voiceless stops and vowels in conversational French and Spanish. Laboratory Phonology, 2(2), 331–353. Tsuchida, A. (1997). Phonetics and phonology of Japanese vowel devoicing (Doctoral dissertation). Cornell University. Tsuchida, A. (2001). Japanese vowel devoicing: Cases of consecutive devoicing environments. Journal of East Asian Linguistics, 10, 225–245. Umeda, N. (1975). Vowel duration in American English. The Journal of the Acoustical Society of America, 58, 434–445. Van Santen, J. P. (1992). Contextual effects on vowel duration. Speech Communication, 11(6), 513–546. Vance, T. J. (1987). An introduction to Japanese phonology. Albany, NY: State University of New York Press. Vance, T. J. (2008). The sounds of Japanese. Cambridge University Press. Varden, J. K. (1998). On high vowel devoicing in standard modern Japanese: Implications for current phonological theory (Doctoral dissertation). University of Washington. Walker, D. C. (1984). The pronunciation of Canadian French. Ottawa, ON: University of Ottawa Press. Warner, N., & Arai, T. (2001). The role of the mora in the timing of spontaneous Japanese speech. The Journal of the Acoustical Society of America, 109, 1144–1156. Weitzman, R. S. (1969). Japanese accent: An analysis based on acoustic-phonetic data Doctoral dissertation. University of California. Yoshida, N. (2004). A phonetic study of Japanese vowel devoicing (Doctoral dissertation). Kyoto University. Yoshida, N. (2006). A phonetic study of Japanese vowel devoicing. On’in Kenkyuu (Phonological Studies), 12, 173–180.

Continuous versus categorical aspects of Japanese consecutive devoicing

Continuous versus categorical aspects of Japanese consecutive devoicing

Recommend Documents