Journal of Phonetics 40 (2012) 625–638
Contents lists available at SciVerse ScienceDirect
Journal of Phonetics journal homepage: www.elsevier.com/locate/phonetics
Weak voicing in fricative production Ca´tia M.R. Pinho a,b,1, Luis M.T. Jesus a,b,n,2, Anna Barney c,3 a b c
Institute of Electronics and Telematics Engineering of Aveiro (IEETA), University of Aveiro, 3810-193 Aveiro, Portugal School of Health Sciences (ESSUA), University of Aveiro, 3810-193 Aveiro, Portugal Institute of Sound and Vibration Research (ISVR), University of Southampton, Southampton S017 1BJ, UK
a r t i c l e i n f o
abstract
Article history: Received 14 September 2011 Received in revised form 13 June 2012 Accepted 14 June 2012 Available online 7 July 2012
Understanding of the production mechanisms of voiced fricatives lags significantly behind that of other phonemic categories of speech. This paper presents a new voicing classification criterion to distinguish the voicing in fricatives from that of their contextual vowels in VCV tokens: weak vs strong voicing. The criterion is based on the oral airflow, distinguishing it from previous criteria based jointly on the acoustic and EGG signals. Aerodynamic and EGG recordings of four normal adult speakers (two females and two males), producing a speech corpus of 9 isolated words with the European Portuguese (EP) voiced fricatives /v, z, W/ in word-initial, -medial and -final position, and the same 9 words embedded in 42 different real EP carrier sentences, were analysed. Fricatives were characterised in terms of oral airflow, fundamental frequency, first formant intensity level and glottal open quotient in absolute terms and relative to the values found in their surrounding vowels. The voicing during fricative production presented properties distinct from the voicing of the contextual vowels, leading to the development of a classification criterion based on the relative amplitude of the oscillations in the oral airflow signal. This contributes to distinguish voicing in fricatives from the modal voicing of the vowels. & 2012 Elsevier Ltd. All rights reserved.
1. Introduction It is well known that voiced fricatives are less frequent than other classes of speech sounds in the world’s languages (Ohala & Sole´, 2010; Sole´, 2010), but the reason why is still an open problem (Shadle, 2010). The multiple aerodynamic, articulatory and acoustic interactions that govern the production principles involved are not as well understood as for their voiceless counterparts (Shadle, 2010). The phonetic realisation of voicing in different languages is highly variable, and most definitions of voicing are based on properties of the acoustic signal and use ¨ articulatory terms (Mobius, 2004, p. 5). Despite the fundamental interaction of voicing mechanisms with airflow, the differences in aerodynamic behaviour (Barney, Jesus, & Santos, 2008) have not been used, to our knowledge, to investigate voicing in vowel– fricative–vowel (VFV) productions. Stevens (1991) indicates that maintenance of voicing in fricatives can be problematic as the constriction required to
n Corresponding author at: Institute of Electronics and Telematics Engineering of Aveiro (IEETA), University of Aveiro, 3810-193 Aveiro, Portugal. Tel.: þ 351 234370530; fax: þ351 234370545. E-mail addresses:
[email protected] (C.M.R. Pinho),
[email protected] (L.M.T. Jesus),
[email protected] (A. Barney). 1 Tel.: þ351 234370530; fax: þ 351 234370545. 2 Tel.: þ351 234401558; fax: þ351 234401597. 3 Tel.:þ 44 2380592294; fax: þ 44 2380593190.
0095-4470/$ - see front matter & 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.wocn.2012.06.002
produce frication noise acts to reduce the trans-glottal pressure drop. A strong noise source can only be achieved during fricative production at the expense of voice efficiency. If strong voicing is to be maintained, the amplitude of the noise source becomes small. Smith (1997) and Pirello, Blumstein, and Kurowski (1997) observed high percentages of devoicing for American English fricatives. Jesus and Shadle (2002, 2003) showed that the devoicing rate was very high for European Portuguese (EP) fricatives, especially when compared with studies of other languages, and that devoicing occurred more often in word-final than wordinitial position. Both the acoustic signal and the electroglottograph (EGG) signal have been used in the past (Jesus & Shadle, 2002) to determine manually whether a fricative was devoiced. In a study of devoicing of EP fricatives, Jesus and Shadle (2003) used a criterion based on the ratio of variances of the EGG signal during the vowel–fricative (VF) transition and during the fricative to classify the examples into two categories (voiced/devoiced). Jesus and Jackson (2008) recently proposed, based on the acoustic signal, an automatic method for phonetic analysis of the durational characteristics of voicing and frication features in EP and British English. The final output was an objective annotation of voiced and unvoiced frication to 1 ms resolution, from which duration statistics were obtained. The oral airflow, however, being more closely related to the laryngeal mechanisms of voice production (in contrast to the more perceptually related acoustic output of the vocal tract) may offer
626
C.M.R. Pinho et al. / Journal of Phonetics 40 (2012) 625–638
new insight into the maintenance or cessation of voicing under different phonological conditions. We therefore wished to explore the differences that might occur from phone to phone in the oral airflow during the production of fricatives in inter-vocalic contexts. Our previous work (Pinho, Jesus, & Barney, 2009) suggested that measurements of this kind might lead, in the longer term, to a more in-depth understanding of the conditions required for the maintenance or cessation of glottal vibrations during voiced fricative production. Accordingly, we report here the outcomes of Rothenberg (1973) mask recordings, a device which provides a well-established and straightforward means to measure the oral airflow, together with recordings of laryngeal activity from an EGG signal. Our goal was to establish systematically the airflow characteristics under different phonetic conditions of voiced fricative production relative to those of a preceding and following contextual vowel. Specifically, we considered absolute values of oral airflow oscillation amplitude, fundamental frequency, first formant intensity level and glottal open quotient. In addition to using a different measurement regime to previous studies, we asked our subjects to produce real EP words and phrases for our data collection. The main motivation for using real words in a rich variety of contexts (resulting in multiple within word and crossword interaction effects), was evidence from the literature (Fitch, 1990; Higgins, Netsell, & Schulte, 1994; Netsell, Lotz, & Shaughnessy, 1984) that ‘‘more natural speaking tasks y might reveal subtle speech production mechanisms y because of the different physiological demands placed upon the phonatory mechanism’’ (Higgins, et al., 1994, p. 38). Fitch (1990) has shown that natural speaking tasks result in less between-trial variability than other speech. Further, Higgins et al. (1994, p. 39) suggest that ‘‘Speaking tasks that result in minimal normal variability while still approximating the dynamics of normal speech production would be the most desirabley’’. The next section of this paper describes in detail our data collection methods, the corpus used to elicit speech from our subjects and the analysis methods we adopted for the oral airflow and EGG signals. We then report the results of our measurements and analyse them in terms of the changes in parameters between phones in VCV sequences with a statistical analysis to identify significant differences. We also report on the suitability of classical measures of devoicing defined for the acoustic signal and adapted here for application to the oral airflow. Finally, we identify a mode of voicing, weak voicing, prevalent during voiced fricatives, that differs from the more modal voicing found during the contextual vowels and define a threshold value for the relative amplitude of the oral airflow oscillations between the fricative and its surrounding vowels which is characteristic of this voicing mode.
2. Data collection and analysis 2.1. Subjects Data were collected from two adult female (JG and HV) and two adult male (LJ and RS) speakers of EP with an age range of 20–39 years. None of the subjects had reported speech, language or hearing disorders. They were assessed by an experienced Speech and Language Therapist (SLT) using a standardised evaluation protocol (Jesus et al., 2009) and all were found to have phonatory behaviour that lay within normal parameters. 2.2. Corpus The corpus was designed so that a large number of real EP sentences (grammatically possible and appropriate) was used as the basis for the analysis.
Speakers were recorded producing 51 utterances: – 9 isolated words containing the EP voiced fricatives /v, z, W/ in word-initial, word-medial and word-final positions; – the same 9 words embedded in 42 different carrier sentences of the form: ‘‘Diga X Y por favor’’.
Here X is one of the 9 words and Y was a segment starting with a word that had an initial phone chosen to represent one of the possible consonantal (taps, laterals, stops and nasals) or vocalic (close, open front and back vowels) real EP contexts (see Appendix A for details of the words and carrier sentences used). The 9 isolated words containing the EP voiced fricatives /v, z, W/ in word-initial, word-medial and word-final positions, included vocalic contexts representing the different vowel heights used in EP: open, open-mid, close-mid and close. The carrier phrases represented a change from the carrier sentence ‘‘Diga X por favor’’ used by Jesus and Shadle (2002) and Lousada, Jesus, and Hall (2010). The Y segment started with a word with an initial phoneme chosen to represent one of the possible consonantal (taps, laterals, stops and nasals) and vocalic real EP contexts (X was one of the 9 words). The vowels in the initial syllable of the first word in the Y segments were divided into two groups according to their height: group 1—/i, X, u, e, o/; group 2—/e, L, P, a/. A rich variety of phonetic contexts using real EP words was selected to study the most relevant phoneme variants (Jesus & Shadle, 2002), and fully describe the aerodynamic properties of EP fricatives. Words were chosen and sentences were built following language-specific phonological rules (Jesus & Shadle, 2002), for example: vowels /P/ and /u/ can occur in the tonic syllable; vowels /P/, /X/ and /u/ can occur before and after the tonic syllable; the fricatives /v, z, W/ can all occur in initial and medial positions; /W/ is the only fricative that can occur in word-final position. Phonetically, any fricative can be found in word final position as a consequence of deletion of unstressed vowels (Jesus & Shadle, 2002). The corpus was designed to include word and crossword contexts that might elicit devoicing (or help maintain voicing) in fricatives. The following phonological environments were used. – Environment 1 (files 001–009—9 tokens): A baseline is established by producing the nine (9) words without a frame sentence. The fricatives are better controlled and easier to analyse than those occurring in frame sentences. – Environment 2 (files 019–030—12 tokens): Words with fricatives in final position are produced in a frame sentence where the word that follows the fricative has the following initial phonemic context: Vowel from group 2 (/e, L, P, a/) followed by a lateral (019, 023 and 027) and a tap (020, 024 and 028); Vowel from group 1 (/i, X, u, e, o/) followed by a lateral (021, 025 and 029) and a tap (022, 026 and 030). – Environment 3 (files 031–039—9 tokens): The nine (9) words with fricatives are embedded in the frame sentences ‘‘Diga X por favor’’ previously used by Jesus and Shadle (2002) and Lousada et al. (2010) to facilitate comparisons. – Environment 4 (files 040–048—9 tokens): Words with fricatives in final position are produced in a frame sentence where the word that follows the fricative has the following initial phonemic context: Voiced velar stop (040, 043 and 046); Vowel from group 1 (/i, X, u, e, o/) followed by voiced velar stop (041, 044 and 047); Vowel from group 2 (/e, L, P, a/) followed by voiced velar stop (042, 045 and 048). – Environment 5 (files 049–054—6 tokens): Words with fricatives in final position are produced in a frame sentence where the word that follows the fricative has one of the following initial phonemic contexts: Vowel from group 2 (/e, L, P, a/) followed by
C.M.R. Pinho et al. / Journal of Phonetics 40 (2012) 625–638
nasal stop (049, 051 and 053); Vowel from group 1 (/i, X, u, e, o/) followed by nasal stop (050, 052 and 054). – Environment 6 (files 055–060—6 tokens): Words with fricatives in final position are produced in a frame sentence where the word that follows the fricative has one of the following initial phonemic contexts: Vowel from group 2 (/e, L, P, a/) followed by voiced postalveolar fricative (055, 057 and 059); Vowel from group 1 (/i, X, u, e, o/) followed by voiced postalveolar fricative (056, 058 and 060). A number of different factors have been reported in the literature as having an influence in the maintenance of voicing. These have determined our choice of word and sentence contexts, namely: Place of articulation—voicing may cease earlier for more posterior places of articulation (Jesus & Shadle, 2002; Keating, Linker, & Huffman, 1983; Ohala & Riordan, 1979). Word-position—Westbury and Keating (1985) showed that from an aerodynamic point of view a voiced obstruent is more likely to be produced in medial position, whereas in utterance initial and final position it is more probable to produce devoiced items. Consonant duration—Ohala and Sprouse (2003) showed in a vented valve experiment that it was difficult to maintain voicing for longer than around 60 ms, i.e., the longer the consonant duration, the more probable that voicing ceases. Context—For English, Ohala and Riordan (1979) observed that obstruents coarticulated with high vowels maintained voicing longer than when coarticulated with low vowels. They explained the results by the enlarged pharyngeal cavity for high vowels. For German, Pape, Mooshammer, Hoole, and Fuchs (2006) confirmed this vowel-dependency, with a higher obstruent devoicing percentage when followed by any low vowel. This result was obtained even when the vowel-stop cluster spanned a word boundary. However, despite the reported vowel-dependent results regarding obstruent voicing for other languages, this has not been observed in previous studies for EP fricatives. Jesus and Shadle (2002) found that ‘‘ythere seems to be no particular vowel context that is primarily associated with devoicingy’’ and further that ‘‘ydevoicing occurs more often in word-final than word-initial position, but is unrelated to a particular vowel contexty’’ Pape and Jesus (2011) analysed the effect of the height of the following vowel on devoicing and found no difference for three of the four speakers in their study. Although the evidence from Ohala and Riordan (1979) suggests that stops coarticulated with high vowels are less likely to
627
devoice than stops coarticulated with non-high vowels, Ohala (1983) reviewed additional experimental evidence related to voicing in stops and did not find ‘‘ymuch support for this in the phonological literature.’’ (Ohala, 1983, p. 198). Ohde’s (1984) analysis based on acoustic signals is one of the very few pieces of work that revealed a significant effect of vowels on the change of f0 from the first to the second glottal periods following VOT of American English stops. However, the degree of articulatory constraint (DAC) model of speech production predicts that fricatives /z, W/ (that we focus on this study) are resistant to coarticulation effects (Recasens & Espinosa, 2009). 2.3. Recordings Recordings were made in a quiet room (ABS-AUD.45.1, Absorsor, Portugal; 45 dB sound reduction) using a Rothenberg mask (Rothenberg, 1973) and a PTW-1 pressure transducer (Glottal Enterprises, USA) for measuring the airflow at the mouth. An EGG signal was also collected using an EGG processor (model EG2-PCX, Glottal Enterprises, USA). The oral airflow and EGG signals were recorded with a MS-110 electronics unit (Glottal Enterprises, USA), connected via an external sound card to a laptop computer running WaveviewPro Version 2.2.6 (16 bits, 44.1 kHz sampling frequency). Airflow calibration and zero-setting of signals were undertaken before each recording session using a Glottal Enterprises FC-1 airflow calibrator and WaveviewPro Version 2.2.6 standard procedures. Speakers were asked to read the words and phrases from a cue card placed in front of them with normal effort and as close as possible to their natural speech. Speakers spent a short time acclimatising to speaking wearing the mask prior to recording. 2.4. Corpus annotation The time waveforms of all the corpus words were manually annotated using Praat Version 5.0.43 (Boersma, 2001) to detect the start of phone 1 (the vowel or silence preceding the fricative), the start and end of the fricative (phone 2) and the end of phone 3 (the vowel or silence following the fricative). Manual annotation was used to guarantee a consistent segmentation, between utterances and subjects as recommended by Jesus and Shadle (2002, p. 442). The start of phone 1 (preceding the fricative) was defined by the presence of periodicity in the airflow waveform (see Fig. 1) and checked against the spectrogram for the visible presence of
Fig. 1. Oral airflow signal. The vertical green lines represent the start of phone 1 and the end of phone 3, the surrounding vowels. The vertical blue lines represent the start and end of phone 2, the fricative. The bidirectional arrows represent the location and typical duration of the analysis windows. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
628
C.M.R. Pinho et al. / Journal of Phonetics 40 (2012) 625–638
F2. The start of phone 2 (the fricative) was considered to occur when there was a simultaneous decrease in airflow amplitude (see Fig. 1), a cessation of voicing (for devoiced cases) and an onset of frication noise. For phone 3 the start was defined as the point where one or all of the following criteria was satisfied: frication noise ceased; voicing restarted (for devoiced or partially devoiced phone 2); there was an increase in oral airflow amplitude (see Fig. 1). The end of phone 3 was established by listening to the time-derivative of the airflow and checking the spectrogram for the presence of F2. For each metric discussed below average values were calculated from 20 ms windows centred within phones 1, 2 and 3 (see Fig. 1). We chose a window duration of 20 ms based on previous work by Stevens, Blumstein, Glicksman, Burton, and Kurowski (1992) and Pirello et al. (1997). Both these studies suggest that in any fricative perceived as voiced there will be a minimum of 20 ms where periodicity can be detected in the acoustic speech signal. In vowels this corresponds approximately to a steady state portion of the utterance. In the fricatives the signal is not always stationary during this analysis window but it represents, nevertheless, a portion of the phone which excludes the phonemic transition regions. 2.5. Absolute measures Oral airflow (OA—cm3/s): The mean value of the amplitude of the oscillations in the oral airflow over the analysis window was extracted for phones 1, 2 and 3 from all the recordings and for every speaker. Matlab scripts were used to extract the peaks and troughs of the signal cycle-by-cycle. The mean amplitude of the oral airflow fluctuations within each analysis window was obtained by averaging the cycle-by-cycle values. We describe this measure as the absolute oral airflow (OA). Fundamental frequency (f0—Hz): Absolute mean fundamental frequency was calculated for the analysis windows of the three phones from all the recordings and for every speaker using the Praat Version 5.0.43 autocorrelation method with the following parameters: 0 s time step; 75 Hz pitch floor; 600 Hz pitch ceiling. First formant intensity level (IF1—dB/Hz): The amplitude and decay rate of the oscillatory patterns in the oral airflow are related to the amplitude and broadness of the corresponding formant peaks in the spectrum of the acoustic speech signal (Hertegard & Gauffin, 1995). We therefore refer, in this study, to the first spectral peak extracted from the oral airflow as the first formant (F1) and use an extraction method designed for use with the acoustic signal. A Praat Version 5.0.43 split Levinson algorithm was applied to the oral airflow signal to extract the absolute first formant frequency track (F1) sampled over the analysis windows for each phone with the following parameters: time step of 25% of analysis window length; maximum number of formants 5; maximum formant frequency 5500 Hz; 0.025 s analysis window length; 50 Hz preemphasis. This gave an estimate of F1 at each sampling point. Next, a spectrogram was generated from the raw oral airflow and the power spectral density (PSD) (cm6 s 2/Hz) was extracted at the F1 frequency corresponding to each sampling point. The parameters for the spectrogram were: 0.005 s window length; maximum frequency 5000 Hz; 0.002 s time step; 20 Hz frequency step; Gaussian window shape. The mean F1 intensity for the window was calculated by averaging over all PSD estimates. The mean F1 intensity level (IF1) is the mean F1 intensity expressed in dB. Open quotient (OQ—%): the open quotient, a parameter used to characterise the nature of the quasiperiodic vocal fold vibrations, is defined as the ratio of the duration of the glottal open phase to that of the fundamental period. OQ changes can be correlated to physiological constraints in different phonation types (Bouzid & Ellouze, 2007). The mean open quotient over the analysis windows for each phone was extracted from the EGG signal using
custom Matlab scripts based on the open source software MOQ interface (Henrich, 2007). The method DEGG DECOM, based on the raw and differentiated EGG (DEGG) signals was used to estimate f0 and OQ. Henrich, Alessandro, Doval, and Castellengo (2004) reported that this method was an improvement on those proposed by Rothenberg and Mahshie (1988) and Howard (1995). For each phone we extracted every valid OQ value to obtain the average OQ over the analysis window. In cases where the peaks in the DEGG do not stand out clearly and the OQ cannot be calculated reliably, it does not make sense to talk of an OQ at all (Henrich, 2007). Taking this into account, zero and nonsense OQ values (e.g., corresponding to f0 values greater than 400 Hz) were not considered. To be consistent we did not make an OQ analysis when phone 1 or phone 3 was silence. 2.6. Relative measures There is compelling evidence (Ladefoged, 2005; Ladefoged & Johnson, 2011; Lahiri, Gewirth, & Blumstein, 1984) that absolute values are not linguistically relevant and that they only ‘‘convey information about the speaker’s age, sex, emotional state, and attitude toward the topic under discussion’’ (Ladefoged & Johnson, 2011, p. 24). Ladefoged (2005) hypothesises that the goals of speech production are to achieve certain aerodynamic targets and shows that the phonetic property stress is correlated to relative (to adjacent syllables) respiratory energy. For bilabial and dental/alveolar stops it has long been known (Lahiri et al., 1984) that they may be differentiated effectively by time varying and relative acoustic properties. We therefore consider here not only the absolute values of the parameters defined in Section 2.5 but also their relative values between phones. For each metric described above (OA, f0, IF1 and OQ) a relative measure between phones 1 and 2 (phone (1–2)%) and between phones 2 and 3 (phone (2–3)%) was calculated using (Pinho et al., 2009) Mean ðPhone 1ÞW 20 ms Mean ðPhone 2ÞW 20 ms n100 Phone ð12Þð%Þ ¼ Mean ðPhone 1ÞW 20 ms ð1Þ Mean ðPhone 3ÞW 20 ms Mean ðPhone 2ÞW 20 ms n100 Phone ð23Þð%Þ ¼ Mean ðPhone 3ÞW 20 ms
ð2Þ 2.7. Analyses Voicing classification: A classification of the voicing category of fricatives was proposed by Jesus and Shadle (2002) and has been adapted for this study. They used an acoustic speech signal, recorded with a microphone located 1 m in front of the subject’s mouth and an EGG signal. A study by Pirello et al. (1997) used an alternative measure of presence or absence of voicing based on the acoustic signal: when there was a difference in amplitude greater than 10 dB between a fricative and its preceding vowel it was classified as devoiced. It was not evident a priori, however, whether the same threshold value would be suitable for use with the oral airflow. In this study we used the durations proposed by Jesus and Shadle (2002) for voicing classification but applied them to the oral airflow and EGG signals. Their criteria, adapted to our study, may be summarised as follows. A fricative was called: – devoiced when less than one-third of the frication interval showed periodic structure; – partially devoiced when more than one-third but less than half of the frication interval contained steady periodic cycles;
C.M.R. Pinho et al. / Journal of Phonetics 40 (2012) 625–638
– voiced when more than half of the frication interval showed steady periodic cycles, even if the amplitude was much lower than in the vowel. This classification was automatically extracted with a Praat Version 5.0.43 script developed for this purpose. When the EGG and oral airflow-based estimates disagreed (a minority of cases), a visual inspection of the signals was performed. Most of these disagreements happened for signals where the proportion of voicing was close to the category margin. In all cases of disagreement the oral airflow-based estimate was given precedence. Analysis-of-variance (ANOVA): Repeated measures 2-way analysis of variance was used to determine the statistical significance of differences between mean metrics for phones 1, 2 and 3 at the 95% level. Factors were subject (4 levels, random effect), phone (3 levels, repeated measure) and fricative ID (3 levels, repeated measure). The dependent variable was one of OA, f0, IF1 and OQ. The repeated measures design was required because mean values of a given metric for phones 1, 2 and 3 for a given subject and fricative may not be independent. The ANOVA was implemented using the algorithm RMAOV2 written in Matlab (TrujilloOrtiz, Hernandez-Walls, & Trujillo-Perez, 2004). In the case of each of the dependent variables, analysis of the data sets did not support an assumption of sphericity according to Mauchly’s test (Mauchly, 1940; Trujillo-Ortiz & Hernandez-Walls, 2003) so prior to significance testing the number of degrees of freedom for each factor was adjusted using the Huynh–Feldt epsilon (Huynh, 1978; Trujillo-Ortiz, 2006).
3. Results 3.1. Voicing classification A classification of the voicing category of fricatives produced by the four speakers, was made using an adapted version of the criteria (see Section 2.7) proposed by Jesus and Shadle (2002). Only 21% (/v/ 7%; /z/ 11%; /W/ 3%) of recorded fricatives were classified as devoiced or partially devoiced (see Table 1). This is a much lower percentage of tokens classified as devoiced and partially devoiced than was reported in Jesus and Shadle (2002) Table 1 Voicing category classification of fricatives /v, z, W/, based on Jesus and Shadle’s (2002) criteria. Fricatives
/v/ (%)
/z/ (%)
/W/ (%)
/v, z, W/ (%)
LJ # Devoiced Partially devoiced Voiced
0 0 100
6 12 82
0 6 94
2 6 92
RS # Devoiced Partially devoiced Voiced
12 0 88
0 6 94
0 0 100
4 2 94
JG ~ Devoiced Partially devoiced Voiced
0 12 88
6 6 88
0 6 94
2 8 90
HV ~ Devoiced Partially devoiced Voiced
0 6 94
0 12 88
0 0 100
0 6 94
All (# and ~) Devoiced Partially devoiced Voiced
3 4 93
3 9 88
0 3 97
2 5 93
629
for their corpus (/v/ 61%; /z/ 89%; /W/ 91%). However, as described above, they based their classification on the acoustic speech signal and the EGG signal, whereas we used the oral airflow and EGG. Our data suggests that when a voicing decision is based on the oral airflow, the large majority of fricative tokens will be classified as voiced. 3.2. Oral airflow waveforms Figs. 2–4 show the oral airflow waveforms across phones 1, 2, and 3 for male speaker LJ. We observed that the oral airflow during phone 2 (the fricative) had a lower oscillation amplitude than during the preceding and following phones (when these were voiced). Further, during the fricative the harmonic complexity of the signal was reduced, in particular the formant excitation, frequently visible in the surrounding vowels, was rarely visible in the fricative. Measurements for other subjects showed the same overall patterns of reduced oscillation amplitude and harmonic complexity of the oral airflow in phone 2 compared to phones 1 and 3 although the fine details of the voicing pattern for different utterances often differed substantially between subjects. 3.3. Absolute oral airflow The values for the mean absolute oral airflow and its standard deviation are shown in Table 2 for each of the 4 subjects. The absolute values for the oscillation amplitude varied widely between subjects, although the general pattern of reduced oral airflow oscillation amplitude during phone 2 was common to all subjects. An analysis-of-variance, as described in Section 2.7, was used to investigate the statistical significance of differences between the mean absolute oral airflow values in phones 1, 2 and 3. There was a significant difference in the mean absolute oral airflow measured between phones 1, 2, and 3 after accounting for the differences between subjects (F(2,6)¼23.251, p¼ 0.02). Post hoc testing based on the 95% confidence interval for the difference in the means showed that for subjects LJ, RS and HV phones 1 and 2 have significantly different mean oral airflow, phones 2 and 3 have significantly different mean oral airflow and phones 1 and 3 have significantly different mean oral airflow. For subject JG the contrast between phones 1 and 3 was not significant for the mean oral airflow. There was no significant difference in mean oral airflow for the factor fricative ID once differences due to the subjects had been accounted for (F(2,6)¼1.445, p¼0.32). For the interaction effect fricative ID vs. phone, the overall contribution to the variance was not significant at the 95% level (F(4,12)¼0.365, p ¼0.83) indicating that the significant main effect found for phone was not modified by the choice of fricative. 3.4. Absolute fundamental frequency (f0) The raw data for the estimates of f0 from the oral airflow show the expected, gender-related differences in mean f0. The mean absolute f0 estimated from the oral airflow for each subject and its standard deviation is shown in Table 3. The mean fundamental frequency derived for phone 2 was lower than for either phone 1 or phone 3 although each subject had a minority of cases where f0 stayed the same or increased during phone 2. An analysis of variance, as described in Section 2.7, was used to investigate the statistical significance of differences between the mean f0 values in phones 1, 2 and 3. There was a significant difference in the mean f0 measured between phones 1, 2, and 3 after accounting for the differences between subjects (F(2,6)¼31.118, p r0.001). Post hoc testing
630
C.M.R. Pinho et al. / Journal of Phonetics 40 (2012) 625–638
Fig. 2. Oral airflow waveforms from words with the fricative /v/ (17 files), recorded by male speaker LJ. The green vertical lines indicate the start of phone 1 and the end of phone 3. The blue vertical lines represent the phone 1–phone 2 and phone 2–phone 3 boundaries where phone 2 is the fricative. The x and y axes were normalised. See also Appendix B. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 3. Oral airflow waveforms from words with the fricative /z/ (17 files), recorded by male speaker LJ. The green vertical lines indicate the start of phone 1 and the end of phone 3. The blue vertical lines represent the phone 1–phone 2 and phone 2–phone 3 boundaries where phone 2 is the fricative. The x and y axes were normalised. See also Appendix B. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
based on the 95% confidence interval for the difference in the means shows that for subjects LJ and HV there were significant differences in mean f0 between phones 1 and 2 and between
phones 2 and 3 but not between phones 1 and 3. Subject RS had no significant differences in mean f0 between any phone pairs and subject JG had a significant difference between phones 1 and
C.M.R. Pinho et al. / Journal of Phonetics 40 (2012) 625–638
631
Fig. 4. Oral airflow waveforms from words with the fricative /W/ (17 files), recorded by male speaker LJ. The green vertical lines indicate the start of phone 1 and the end of phone 3. The blue vertical lines represent the phone 1–phone 2 and phone 2–phone 3 boundaries where phone 2 is the fricative. The x and y axes were normalised. See also Appendix B. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Table 2 Mean7 std (standard deviation) absolute values of oral airflow. Oral airflow (cm3/s)
/v/
Phone
1
2
3
1
2
3
1
2
3
#LJ #RS ~JG ~HV
2977 43 4597 60 1747 54 2847 58
107 727 106 743 40 723 56 724
243 7 41 376 7 73 131 7 43 229 7 64
297 7 34 407 7 66 145 7 24 295 7 55
767 58 1477 57 277 15 387 22
241 7 17 358 7 64 126 7 35 208 7 73
2647 20 4977 57 1267 19 2517 26
72 7 32 129 7 50 49 7 20 59 7 25
239 731 372 776 113 740 214 776
/z/
/W/
Table 3 Mean7 std (standard deviation) absolute values of f0. f0 (Hz)
/v/
/z/
Phone
1
2
3
1
2
3
1
2
3
#LJ #RS ~JG ~HV
137 7 5 117 7 18 217 7 20 2047 21
128 7 5 110 7 18 202 7 17 188 7 16
133 74 116 714 212 718 201 718
137 77 109 713 209 715 206 719
1257 6 1047 13 1937 18 1897 15
133 7 6 112 7 11 2087 14 2057 10
134 7 5 1007 8 205 7 16 201 7 21
124 75 95 79 199 721 190 721
135 76 104 713 205 722 204 724
2 only. Note that despite gender difference, LJ and HV had similar f0 strategies across the phones, although their absolute values were different. There was no significant difference in f0 for the factor fricative ID once differences due to the subjects had been accounted for (F(2,6)¼2.443, p ¼0.17). For the interaction effect fricative ID vs. phone, the overall contribution to the variance was significant (F(4,12)¼3.792, p ¼0.03). The significant main effect found for phone was qualified according to the choice of fricative. After subject differences were accounted for, contrasts based on the 95% confidence interval of difference in the mean values showed that for each of /v/, /z/ and /W/ phones 1 and 2 were significantly different,
/W/
phones 2 and 3 were also significantly different, but phones 1 and 3 were not significantly different.
3.5. Absolute IF1 intensity The pattern of variation of IF1 between phones 1, 2 and 3 was much more diverse and frequently F1 could not be estimated from the phone 2 analysis window. Only subject RS produced measurable F1 values for the majority of phone 2 tokens. For the other subjects and in particular the female ones, F1 for phone 2 was substantially reduced or absent for a majority of measured utterances.
632
C.M.R. Pinho et al. / Journal of Phonetics 40 (2012) 625–638
Table 4 shows the mean and standard deviation of IF1, where they were measurable, for each subject. The symbol ‘‘–’’ (in Table 4) for JG’s phone 2 fricative /z/ indicates that this subject produced no measurable F1 in any production of this fricative. An analysis of variance, as described in Section 2.7, was used to investigate the statistical significance of differences between the mean IF1 values in phones 1, 2 and 3. There was an overall significant difference in the mean IF1 measured between phones 1, 2, and 3 after accounting for the differences between subjects (F(2,6)¼49.354, p ¼0.004). Post hoc testing based on the 95% confidence interval for the difference in the means shows that for all subjects, phones 1 and 2 had significantly different mean IF1 as did phones 2 and 3. Phones 1 and 3, however, did not have significantly different IF1. To clarify this finding, the IF1 of the fricative, when measurable, always differs significantly from that of the surrounding phones, however in the majority of tokens IF1 was not measurable. There was no significant difference in IF1 for the factor fricative ID once differences due to the subjects had been accounted for (F(2,6)¼ 1.028, p ¼0.43). For the interaction effect fricative vs. phone, the overall contribution to the variance was not significant at the 95% level (F(4,12)¼2.041, p ¼0.15) indicating that the significant main effect found for phone was not modified by the choice of fricative. 3.6. Open quotient (OQ) Mean OQ values over the data analysis windows extracted from the EGG data are shown in Table 5 with their standard deviations. Only data from those cycles of the EGG where a valid OQ could be extracted by the analysis software were used. In a majority of cycles the value of f0, extracted from the EGG signal, could not be determined and therefore OQ could not be calcu-
lated. Where OQ could be determined, most subjects showed a pattern of slightly increasing OQ on average during the fricatives, though JG had a decrease in OQ in all cases. An analysis of variance, as described in Section 2.7, was used to investigate the statistical significance of differences between the mean OQ values in phones 1, 2 and 3. Unlike the other dependent variables considered in this study, the difference between subjects for the OQ was not significant at the 95% level (F(2,6)¼0.239, p ¼0.68). Furthermore, for the OQ, none of the other effects was significant at the 95% level (Fricative ID: F(2,6)¼2.339, p ¼0.22; Fricative ID vs phone: F(4,12)¼0.918, p¼0.48).
3.7. Relative measures The mean relative oral airflow and its standard deviation, for each subject, are shown in Table 6. The mean value for phone (1–2)% was calculated by taking the mean oral airflow oscillation amplitude for each token for a given subject and fricative, substituting the absolute oral airflow values into Eq. (1) and then averaging over all the phone (1–2)% values so calculated. A corresponding process was undertaken for phone (2–3)% for each subject and fricative using Eq. (2). There was no significant difference between the relative oral airflow values of males and females when tested by a t-test (t(22)¼1.356, p ¼0.189). The relative mean f0 values (calculated from Eqs. (1) and (2) analogously to the method for oral airflow) varied from 3% to 13%, except for the female speaker JG who seemed to use a different f0 adjustment during fricative production with much larger relative changes, as shown in Table 7. The difference in relative measures between genders was significant for f0 (t(22)¼3.297, p ¼0.003) even when the data for subject JG was excluded from the test.
Table 4 Mean7 std (standard deviation) absolute values of IF1 amplitude. IF1 (dB/Hz)
/v/
/z/
/W/
Phone
1
2
3
1
2
3
1
2
3
#LJ #RS ~JG ~HV
747 4 797 5 697 7 777 3
45 715 40 78 33 70 46 719
72 7 5 75 7 5 71 7 7 73 7 3
73 7 3 77 7 2 707 4 77 7 2
33 7 2 54 7 20 39 7 6
747 4 757 5 677 10 737 6
657 6 697 4 587 5 737 2
307 5 547 16 297 3 547 15
73 76 74 76 67 79 71 76
Table 5 Mean7 std (standard deviation) absolute values of OQ. OQ (%)
/v/
/z/
/W/
Phone
1
2
3
1
2
3
1
2
3
#LJ #RS ~JG ~HV
51 7 4 52 7 6 56 7 6 49 7 9
63 76 66 710 41 724 62 717
54 76 55 76 52 77 50 74
507 3 537 13 557 5 447 6
617 7 587 8 387 22 577 19
51 7 6 54 7 7 52 7 5 57 7 10
49 74 63 716 57 710 50 77
69 78 69 77 35 729 56 721
537 7 627 9 557 7 547 7
Table 6 Mean7 std (standard deviation) relative values of oral airflow. Oral airflow (%)
/v/ /z/ /W/
#LJ
#RS
~JG
~HV
Phone (1–2)
Phone (2–3)
Phone (1–2)
Phone (2–3)
Phone (1–2)
Phone (2–3)
Phone (1–2)
Phone (2–3)
64 7 10 74 7 23 74 7 15
537 16 747 13 707 16
76 79 69 717 76 715
707 9 61 7 18 72 7 17
807 13 827 10 657 20
69 7 21 72 7 11 57 7 20
817 12 877 10 767 11
75 716 73 729 67 724
C.M.R. Pinho et al. / Journal of Phonetics 40 (2012) 625–638
633
Table 7 Mean7 std (standard deviation) relative values of f0. f0 (%)
/v/ /z/ /W/
#LJ
#RS
~JG
~HV
Phone (1–2)
Phone (2–3)
Phone (1–2)
Phone (2–3)
Phone (1–2)
Phone (2–3)
Phone (1–2)
Phone (2–3)
77 3 97 3 77 1
3 73 6 75 8 73
7 73 3 75 4 77
67 10 87 12 87 8
367 44 297 43 277 44
247 40 147 27 257 41
13 724 13 724 10 725
12 7 25 877 674
Neither relative mean IF1 nor relative mean OQ was considered as the number of tokens for which each could be reliably calculated was low.
4. Discussion Voicing maintenance during consonant production requires ¨ well known articulatory (Mobius, 2004, p. 24) and aerodynamic (Stevens, 1991) conditions. These alone do not, however, allow us to account for the frequent occurrences of devoicing observed in EP (Jesus & Shadle, 2002) and other languages (Smith, 1997). However, the voicing category results (Section 3.1), obtained using the criteria proposed by Jesus and Shadle (2002) with an adaptation to use the oral airflow in place of the acoustic signal, resulted in the identification of very few devoiced samples. There are different possible explanations for this. Subjects could have voiced more samples as an effect of the experimental setting (e.g., effect of the Rothenberg mask or the choice of stimuli). However, when the same corpus was recorded with a microphone and EGG for subject LJ and the criteria were applied jointly to those two signals, there was a much higher percentage of devoicing found. Although these results are not directly comparable to those of Jesus and Shadle (2002) because the corpus was different, LJ was also studied by Jesus and Shadle (2002) using EGG and acoustic signals and in that study also showed a tendency to devoicing significantly. We suggest therefore that the criteria used to define voicing categories based on the acoustic signal (as in Jesus and Shadle, 2002) are not adequate for our new framework based on the oral airflow signal. We hypothesise that the small oscillations seen in the fricative oral airflow (which relate to production) do not generally produce significant acoustic excitation (which relates to perception) and that these low amplitude oscillations, often with little higher frequency content (perhaps resulting from minute mucosal oscillations for an open glottis), are only weakly related to measurable periodicity in the acoustic signal. Our observations indicate that the quality of the voicing during the fricatives generally differs from that observed during the contextual vowels. The mean absolute oral airflow amplitude, though varying widely between subjects, conforms to a general pattern of reduced oral airflow during phone 2 regardless of which fricative a subject produces. Previously, place of articulation has been shown to have an effect on the number of devoiced fricative tokens: ‘‘y the percentage of devoicing increased as the place of articulation moved posteriorly.’’ (Jesus & Shadle, 2003, p. 5). Place of articulation has also been shown to have an effect on the aerodynamic properties of stops (Ohala, 1983). Some of these patterns have also been observed for our tokens, but the results were not statistically significant which may be a result of analysing devoicing in terms of the oral airflow rather than the acoustic signal. There were also no significant differences in the f0 estimate for the different fricatives. For all subjects except RS, for any given fricative, phone 2 had a significantly lower f0 value than the
other two phones. Voiced obstruents have previously been shown (Ohde, 1984, p. 226) to have lower f0 values than adjacent vowels and the lowering of the laryngeal structures has long been known ¨ (Ohala, 1972) to decrease the value of f0. Lofqvist, Baer, McGarr, and Story (1989) present f0 values and cricothyroid muscle activity levels for vowels following obstruents and infer ‘‘ythat ¨ an increased longitudinal tension of the vocal foldsy’’ (Lofqvist, et al., 1989, p. 1320) could account for voicing distinctions (lower f0 during voiced obstruent production). We therefore believe that the significant differences that we have observed in f0 result from different voicing strategies between vowels and obstruents as a result of some combination of laryngeal lowering and vocal fold abduction. Vibration of the vocal folds can only occur when two physiological and aerodynamic conditions are met. First, the vocal folds must be suitably adducted and tensed. Second, a sufficient trans-glottal pressure gradient is needed to cause enough positive airflow through the glottis to maintain vibration. There is a recognised difficulty in supporting vibration during fricative production because in order to produce adequate frication the oral pressure should be high but voicing can only be maintained with a sufficient trans-laryngeal pressure drop (Ohala, 1983, p. 201). Therefore, it could be argued that the particular glottal constraints that have been reported in previous studies (L¨ofqvist et al., 1989; Ohala, 1972; Ohde, 1984; Watson, 1998) are related to mechanisms used to provide the oral pressure differential required for frication and result in a less constricted glottis (Ohala, 1983, p. 202). In a majority of the tokens analysed, IF1 could not be estimated for phone 2. Only subject RS produced measurable F1 values for the majority of phone 2 tokens. For the other subjects and in particular the female ones, F1 for phone 2 was substantially reduced or absent for a majority of measured utterances. Nevertheless, we could conclude that, where F1 was detectable, IF1 during fricative production was always on average reduced compared to the surrounding phones. Given that we expect voicing to be more damped in fricatives (in our data a longer OQ indicates the likelihood of this), lower F1 amplitude in phone 2 was perhaps to be expected as the observed acoustic excitation pattern for the vocal tract. To extract the open quotient of the signal, the method DEGG DECOM (Henrich et al., 2004), uses peaks at glottal opening and closure in the differentiated EGG (DEGG) signals to estimate f0 and hence OQ. At opening, they reported instances of imprecise peaks in some subjects which may prevent good estimates of OQ. At closing, all the samples they presented were for something approximating modal voice, where there is a clear glottal closed period. In our signals, for a perhaps more abducted glottis, the closure may be less definite or even incomplete leading to imprecise peaks at closing and hence poor detection of f0. This was the case for a majority of fricative tokens leading us to the conclusion that an OQ of 100%, or at least a very tentative glottal closure was quite a common occurrence. Where we could estimate a valid OQ, our results showed a slight increase in this parameter during the fricatives. Howard (1995, p. 166), suggested an increase in OQ value is related to a less physically efficient voice. He argued that this could be explained by
634
C.M.R. Pinho et al. / Journal of Phonetics 40 (2012) 625–638
the fact that an unobstructed acoustic tube from the lungs to the lips (through an open glottis) maintained for a greater portion of a glottal cycle resulted in higher subglottal damping. He also noted that more air from the lungs is expelled in each glottal cycle due to the longer open phase compromising ‘‘the efficiency of power source usage’’ and reinforcing a perception of breathiness (Howard, 1995, p. 166). It is notable that airflow in the fricative portion of the tokens (see Figs. 2–4) does not decay to zero during the ‘‘closed’’ phase of the glottal cycle. This may relate to the increase in OQ observed, but, according to Cranen and Schroeter (1995, p. 166) leak airflow cannot usually be explained solely on the basis of an incomplete adduction. A realistic model of airflow arising from a leaky glottis must also take into account a relatively gradual change in airflow during opening and closing and hence a weak acoustic excitation of the sub- and supra-glottal vocal tract. We have thus a situation in the oral airflow of voiced fricatives where traditional measures of voicing category designed for use with the acoustic and EGG signals are not valid since periodicity in the oral airflow continues even when associated only weakly with acoustic excitation of the vocal tract. The voicing regime represented by the oral airflow appears, however, to be substantially different to that found in the modal voicing of vowels having a smaller mean amplitude for the oral airflow fluctuations, lower fundamental frequency, a simpler harmonic structure with little or no excitation of the first formant of the vocal tract and a lengthened (frequently 100%) open quotient. We propose that this kind of laryngeal mode, which is expected to lead to a much lower level of acoustic excitation than the modal voicing of vowels, be designated weak voicing. 4.1. Defining weak voicing in obstruents Weak voicing has been regarded in most linguistics literature as what happens in voiced fricatives, stops and affricates. Weakly voiced obstruents have been described for Dutch (Ernestus &
Baayen, 2007, pp. 3–5) as: ‘‘yvoiceless obstruents that possess some acoustic characteristics of genuine voiced obstruentsy’’. Some voiced stops in Xhosa have been described as having ‘‘yshort and weak voicingy’’ (Jessen & Roux, 2002, p. 1), resulting from ‘‘ythe absence of closure voicing in most contextsy’’ (Jessen & Roux, 2002, p. 4). Jessen and Roux (2002) observed higher f0 values for vowels following ‘‘slack voicing’’ in stops and measured a number of spectral parameters (f0, H1, H2, F1, A1, F2, A2, F3 and A3) none of which seemed to provide information that could be clearly related to this voicing mechanism. However, their discussion about slack voicing (Jessen & Roux, 2002, pp. 38–39) hypothesised larynx lowering as the physiological mechanism responsible for most glottal leakage and devoicing during obstruent production, also resulting in lower f0 and F1 frequency. In fact, voicing in stops has long been regarded (Kingston & Diehl, 1995, pp. 8–9) as a low frequency acoustic property that can be observed ‘‘yduring or shortly after the consonant constrictiony’’ and results in ‘‘ya low f0 and F1 at the edges of vowels next to that intervaly’’. We now wish to further characterise weak voicing in obstruents on the basis of observed changes to the oral airflow waveform. Our results suggest that activity measurable in the oral airflow waveform may be only weakly related to events in the acoustic waveform due to the incomplete or tentative glottal closure and the consequent weak excitation of the vocal tract formants (see Fig. 5 where the secondary peaks in each cycle, present in the vowel but not in the fricative, relate to the vocal tract formants). Therefore, focus on the oral airflow waveform may give additional insight into laryngeal mechanisms in the generation of voiced obstruents that is not available from the output speech sound. With a definition of weak voicing based on the oral airflow signal alone, we could, for example, revisit specific EGG analysis techniques (Howard, Lindsey, & Allen, 1990; Rothenberg & Mahshie, 1988) for cases with a very low amplitude periodicity in the EGG signal (see Fig. 6). This might provide a new view on the classic problem of defining the meaning, in terms of physical
Fig. 5. Oral airflow and EGG signals during [ezP] production.
C.M.R. Pinho et al. / Journal of Phonetics 40 (2012) 625–638
635
Fig. 6. Oral airflow and EGG signals during [aWi] production.
activity in the larynx, of changes in the EGG signal during fricative and stop production (Childers, Hicks, Moore, Eskenazi, & Lalwani, 1990; Childers & Larar, 1984; Howard et al., 1990). One possibility is to define weak voicing as voicing with, say 100% (or perhaps 480%) open quotient. However, this may pose a technical problem in distinguishing voiced fricatives from breathy vowels, voiceless fricatives and stops, or any speech sound produced with a significant glottal chink (typical, for example, in female speakers). Weakly voiced speech frames could also be classified on the basis of an absolute oral airflow amplitude criterion of x cm3/s lower than the surrounding vowels. However, absolute values for the oral airflow and EGG vary significantly from subject to subject and are not ideal for setting definitive inter-subject criteria. We therefore propose to use a relative measure of oral airflow amplitude as the classification criterion since our measurements suggest this parameter does not differ significantly between subjects or by gender. Detailed analysis of our experimental data led to the empirical definition of an oral airflow amplitude threshold that could be used to define an epoch of weak voicing. This was based on the relative level of the oral airflow as defined by Eqs. (1) and (2). As can be seen from Fig. 7, the mean of the relative airflow values for subject HV and fricative /v/ taking data for phone (1–2)% and phone (2–3)% together is predominantly greater than 70%. Data for the other fricatives shows a similar pattern and again the means are predominantly greater than 70%. Through similarly considering the mean value of all the relative measures of oral airflow amplitude from all the tokens for all speakers we therefore defined a threshold of 70% on or above which, voicing in a fricative was considered to be weak compared to the strong voicing found in the surrounding contextual vowels. The outcomes of classifying all the tokens of the four speakers with this criterion are presented in the first five columns of Table 8. Here, a fricative is classified as weakly voiced if the mean of phone (1–2)% and phone (2–3)% is greater than or equal to 70%.
When comparing these results with those obtained for the same data using Jesus and Shadle’s (2002) criteria, presented in Table 1 and reproduced in columns 6–10 in Table 8, we observed a lower percentage of strong voicing (compared to the category voiced in Table 1) and a large number of weak voicing samples (compared to the sum of the categories partially devoiced and devoiced in Table 1). The numbers in the weak voicing category overall compare more favourably with the results presented by Jesus and Shadle (2002) as devoicedþpartially devoiced (/v/ 61%; /z/ 89%; /W/ 91%), though it must be borne in mind when making this comparison that for their study both the subjects (for the most part) and the corpus differed from ours. Classification based on a 70% threshold using values for oral airflow amplitude from phone (1–2)% alone gives the same or a slightly higher percentage of fricatives classified as having weak voicing (/v/ 60%; /z/ 71%; /W/ 53% over all subjects) and using phone (2–3)% alone reduces the number classed as weakly voiced (/v/ 43%; /z/ 51%; /W/ 38% over all subjects).
5. Conclusions and future work Our application of voicing category rules designed for the acoustic signal (Alphen & Smits, 2004; Jesus & Shadle, 2002) to the oral airflow suggests that new criteria are required to describe voicing during fricatives. This is largely due to the persistence of periodicity in the oral airflow signal that may be only weakly associated with acoustic excitation. Using the oral airflow signal to characterise behaviour during voiced fricatives may allow investigation of laryngeal mechanisms related to phone production that cannot easily be deduced from the acoustic signal. We have shown that the oral airflow signals in VFV sequences in real words can be characterised as having, during the production of the fricative, lower average oscillation amplitude, reduced amplitude of the formant
636
C.M.R. Pinho et al. / Journal of Phonetics 40 (2012) 625–638
Fig. 7. Airflow relative values for HV’s /v/. The numbering in the x axis of the figure corresponds to the numbering of the files in the Corpus (see Appendix A).
Table 8 Fricatives classified (based on airflow) as having weak and strong voicing, applying the 70% threshold. Juxtaposition of this new classification criterion with the results presented in Table 1 (based on Jesus and Shadle’s (2002) criteria). New classification criterion—airflow threshold of 70%
/v/ (%)
/z/ (%)
/W/ (%)
/v, z, W/ (%)
Jesus and Shadle’s (2002) criteria
/v/ (%)
/z/ (%)
/W/ (%)
/v, z, W/ (%)
LJ # Weak voicing Strong voicing
24 76
71 29
53 47
49 51
LJ # Devoiced þ partially devoiced Voiced
0 100
18 82
6 94
8 92
RS # Weak voicing Strong voicing
65 35
35 65
53 47
51 49
RS # Devoiced þ partially devoiced Voiced
12 88
6 94
0 100
6 94
JG ~ Weak voicing Strong voicing
53 47
88 12
35 65
59 41
JG ~ Devoiced þ partially devoiced Voiced
12 88
12 88
6 94
10 90
HV ~ Weak voicing Strong voicing
76 24
88 12
71 29
78 22
HV ~ Devoiced þ partially devoiced Voiced
6 94
12 88
0 100
6 94
All (# and ~) Weak voicing Strong voicing
54 46
71 29
53 47
59 41
All (# and ~) Devoiced þ partially devoiced Voiced
7 93
12 88
3 97
7 93
oscillations, a longer open quotient and a generally lower f0 compared to that in the surrounding phones. Clearly, voicing in the fricatives of VFV sequences is qualitatively and quantitatively different to the modal voicing found in the surrounding vowels and we have denoted these changes collectively by the term weak voicing. A new and more general criterion for the definition of weak voicing, based on the analysis of the airflow signal, was presented. Our definition of this voicing mode is based on factors more closely related to phone production (laryngeal behaviour) than to the more perceptually orientated acoustic signal. Our classification criterion for weak voicing ( Z70% relative reduction in the amplitude of oral airflow oscillations between the fricative and the surrounding vowels) has the advantage that it is based on the oral airflow signal alone and therefore does
not rely on subjective interpretation of the relationship between the EGG signal and laryngeal activity. Our future work will focus on further defining the laryngeal mechanisms and the vibration mode during weak voicing and in exploring the relationship between these and the observed EGG signal. Parallels with the corresponding conditions in unilateral vocal fold paralysis (UVFP) patients have been presented in Pinho, Jesus, and Barney (in press).
Acknowledgements The authors would like to thank Helena Vilarinho, Ricardo Santos, JG and Anı´bal Ferreira. This work was supported by Fundac- a~ o para a Ciˆencia e a Tecnologia, Portugal (Research and
C.M.R. Pinho et al. / Journal of Phonetics 40 (2012) 625–638
Development Project PTDC/SAU-BEB/67384/2006 FCOMP-019124-FEDER-007470—Acoustic and Aerodynamic Analysis of Speech Production by Patients with Unilateral Vocal Fold Paralysis). The authors would also like to thank Kenneth de Jong and three anonymous reviewers for their helpful suggestions. This work was partially funded by FEDER through the Operational Program Competitiveness Factors—COMPETE and by National Funds through FCT—Foundation for Science and Technology in the context
637
of the Project FCOMP-01-0124-FEDER-022682 (FCT reference PEst-C/ EEI/UI0127/2011).
Appendix A See Table A1.
Table A1 Corpus of voiced fricatives. Fricative /v/
/z/
/W/
Fricative
Word
IPA
File
‘‘vara’’ ‘‘cava’’ ‘‘teve’’ ‘‘Zeze´’’ ‘‘mesa’’ ‘‘doze’’ ‘‘jacto’’ ‘‘haja’’ ‘‘age’’
" ["va.NP] ["ka.vP] [ te.v] " [z " e ze] ["me.zP] ["do.z] ["Wa.tu] ["a.WP] [ a.W]
001 002 003 004 005 006 007 008 009
Sentence
IPA
File
/v/
‘‘Diga ‘‘Diga ‘‘Diga ‘‘Diga
teve teve teve teve
alegre por favor’’ arqueado por favor’’ iluminado por favor’’ iro´nico por favor’’
/z/
‘‘Diga ‘‘Diga ‘‘Diga ‘‘Diga
doze doze doze doze
alas por favor’’ arcas por favor’’ ~ por favor’’ iluminac- oes irma~ os por favor’’
/W/
‘‘Diga ‘‘Diga ‘‘Diga ‘‘Diga
age age age age
aliviado por favor’’ armado por favor’’ ilegalmente por favor’’ ironicamente por favor’’
/v/
‘‘Diga vara por favor’’ ‘‘Diga cava por favor’’ ‘‘Diga teve por favor’’
/z/
‘‘Diga Zeze´ por favor’’ ‘‘Diga mesa por favor’’ ‘‘Diga doze por favor’’
/W/
‘‘Diga jacto por favor’’ ‘‘Diga haja por favor’’ ‘‘Diga age por favor’’
/v/
‘‘Diga teve garra por favor’’ ‘‘Diga teve igual por favor’’ ‘‘Diga teve agoniado por favor’’
/z/
‘‘Diga doze gotas por favor’’ ‘‘Diga doze igrejas por favor’’ ‘‘Diga doze a´guias por favor’’
/W/
‘‘Diga age guiado por favor’’ ‘‘Diga age igual por favor’’ ‘‘Diga age agora por favor’’
/v/
‘‘Diga teve amigos por favor’’ ‘‘Diga teve inaceita´vel por favor’’
/z/
‘‘Diga doze amoras por favor’’ ‘‘Diga doze imo´veis por favor’’
/W/
‘‘Diga age amoroso por favor’’ ‘‘Diga age imediatamente por favor’’
/v/
‘‘Diga teve ajuda por favor’’ ‘‘Diga teve higiene por favor’’
/z/
‘‘Diga doze ajudas por favor’’ ‘‘Diga doze ejectores por favor’’
/W/
‘‘Diga age ajudando por favor’’ ‘‘Diga age higienizando por favor’’
" " " " " ["di.cP "te.v P l"e.cN puN " fP voN] " ["di.cP "te.v PN kja.du puN" fP voN] " " [ di.cP te.v i.lu.mi na.du puN fP voN] " " " " " [ di.cP te.v i NL.ni.ku puN fP voN] " " " " " ["di.cP "do.z "a.lPP puN " fP voN] " ["di.cP "do.z aN.kPP puN " fP "voN] " ~ ["di.cP "do.z i.lu.mi.nP s ojP puN fP voN] " " " [ di.cP do.z iN mP*wP puN fP voN] " " " " " ["di.cP "a.W P.li" vja.du" puN fP" voN] ["di.cP "a.W PN ma.du puN " " fP voN] " [ di.cP aW i.lX.cal me~.t puN fP voN] " " " " " [ di.cP a.W i.NL.ni.kP me~ .t puN fP voN] " " " " ["di.cP "va.NP "puN fP "voN] ["di.cP "ka.vP" puN fP " voN] [ di.cP te.v puN fP voN] " " " " ["di.cP z " e ze puN " fP voN] " ["di.cP "me.zP" puN fP " voN] [ di.cP do.z puN fP voN] " " " " ["di.cP "Wa.tu" puN fP " voN] ["di.cP "a.WP" puN fP " voN] [ di.cP a.W puN fP voN] " " " " " ["di.cP "te.v ca.CP " "puN fP "voN] ["di.cP "te.v i cwal " puN "fP voN] " [ di.cP te.v P.cu nja.du puN fP voN] " " " " " ["di.cP "do.z co.tPP puN " " fP voN] " ["di.cP "do.z "i cNP.WPP puN fP " " voN] [ di.cP do.z a.cjPP puN fP voN] " " " " " ["di.cP "a.W cja.du " " puN fP" voN] [ di.cP a.W i cwal puN fP voN] " " " " " [ di.cP a.W P cL.NP puN fP voN] " " " " " ["di.cP "te.v P mi.cuP " puN "fP voN]" [ di.cP te.v i.nP.sPj ta.vel puN fP voN] " " " " " [ di.cP do.z P mL.NPP puN fP voN] " " " " " [ di.cP do.z i mL.vPjP puN fP voN] " " " " " ["di.cP "a.W P.mu No.zu" puN "fP voN]" [ di.cP a.W i.mX.dja.tP me~.t puN fP voN] " " " " " ["dicP "te.v P "Wu.dP " puN fP" voN] [ di.cP te.v i Wje.n puN fP voN] " " " " " ["di.cP "do.z P Wu.dPP " "puN fP "voN] [ di.cP do.z i.We to.NP puN fP voN] " " " " " ["di.cP "a.W P.Wu dP*.du puN " " fP voN] " [ di.cP a.W i.Wje.ni zP*.du puN fP voN]
019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050 051 052 053 054 055 056 057 058 059 060
638
C.M.R. Pinho et al. / Journal of Phonetics 40 (2012) 625–638
Appendix B. Supporting information Supplementary data associated with this article can be found in the online version at http://dx.doi.org/10.1016/j.wocn.2012.06.002.
References Alphen, P., & Smits, R. (2004). Acoustical and perceptual analysis of the voicing distinction in Dutch initial plosives: The role of prevoicing. Journal of Phonetics, 32, 455–491. Barney, A., Jesus, L. M. T., & Santos, R. (2008). Investigation of the mechanisms of voicing onset. Journal of the Acoustical Society of America, 123(5), 3576. Boersma, P. (2001). Praat, a system for doing phonetics by computer. Glot International, 5, 341–345. Bouzid, A., & Ellouze, N. (2007). Open quotient measurements based on multiscale product of speech signal wavelet transform. Research Letters in Signal Processing, 2007, 5. Childers, D., Hicks, D., Moore, G., Eskenazi, L., & Lalwani, A. (1990). Electroglottography and vocal fold physiology. Journal of Speech and Hearing Research, 33, 245–254. Childers, D., & Larar, J. (1984). Electroglottography for laryngeal function assessment and speech analysis. IEEE Transactions on Biomedical Engineering, 31, 807–817. Cranen, B., & Schroeter, J. (1995). Modeling a leaky glottis. Journal of Phonetics, 23, 165–177. Ernestus, M., & Baayen, H. (2007). Paradigmatic effects in auditory word recognition: The case of alternating voice in Dutch. Language and Cognitive Processes, 22, 1–24. Fitch, J. L. (1990). Consistency of fundamental frequency and perturbation in repeated phonations of sustained vowels, reading, and connected speech. Journal of Speech and Hearing Disorders, 55, 360–363. Henrich, N. (2007). MOQ interface manual. De´partement Parole et Cognition, GIPSA-lab, Grenoble, France. Henrich, N., Alessandro, C., Doval, B., & Castellengo, M. (2004). On the use of the derivative of electroglottographic signals for characterization of nonpathological phonation. Journal of the Acoustical Society of America, 115, 1321–1332. Hertegard, S., & Gauffin, J. (1995). Glottal area and vibratory patterns studied with simultaneous stroboscopy, flow glottography, and electroglottography. Journal of Speech and Hearing Research, 38, 85–100. Higgins, M. B., Netsell, R., & Schulte, L. (1994). Aerodynamic and electroglottographic measures of normal voice production: Intrasubject variability within and across sessions. Journal of Speech and Hearing Research, 37, 38–45. Howard, D. M. (1995). Variation of electrolaryngographically derived closed quotient for trained and untrained adult female singers. Journal of Voice, 9, 163–172. Howard, D. M., Lindsey, G., & Allen, B. (1990). Toward the quantification of vocal efficiency. Journal of Voice, 4, 205–212. Huynh, H. (1978). Some approximate tests for repeated measurement designs. Psychometrika, 43, 161–175. Jessen, M., & Roux, J. (2002). Voice quality differences associated with stops and clicks in Xhosa. Journal of Phonetics, 30, 1–52. Jesus, L. M. T., Barney, A., Santos, R., Caetano, J., Jorge, J., & Couto, P. S. (2009). Universidade de Aveiro’s Voice Evaluation Protocol. In 10th annual conference of the international speech communication association (Interspeech 2009) (pp. 971–974). Brighton, UK. Jesus, L. M. T., & Jackson, P. (2008). Frication and voicing classification. In: A. Teixeira, V. Lima, L. Oliveira, & P. Quaresma (Eds.), Computational processing of the Portuguese language (pp. 11–20). Berlin: Springer-Verlag. Jesus, L. M. T., & Shadle, C. H. (2002). A parametric study of the spectral characteristics of European Portuguese fricatives. Journal of Phonetics, 30, 437–464. Jesus, L. M. T., & Shadle, C. H. (2003). Devoicing measures of EP fricatives. In: N. Mamede, J. Baptista, I. Trancoso, & M. Nunes (Eds.), Computational processing of the Portuguese language. Berlin: Springer. Keating, P., Linker, W., & Huffman, M. (1983). Patterns in allophone distribution for voiced and voiceless stops. Journal of Phonetics, 11, 277–290. Kingston, J., & Diehl, R. (1995). Intermediate properties in the perception of distinctive feature values. In: B. Connell, & A. Arvanniti (Eds.), Phonology and phonetic evidence: Papers in laboratory phonology IV (pp. 7–27). Cambridge: Cambridge University Press. Ladefoged, P. (2005). Speculations on the control of speech. In: W. Hardcastle, & J. Beck (Eds.), A figure of speech: A Festschrift for John Laver (pp. 3–22). Mahwah: LEA. Ladefoged, P., & Johnson, K. (2011). A course in phonetics (sixth ed.). Scarborough: Wadsworth, Cengage Learning. Lahiri, A., Gewirth, L., & Blumstein, S. (1984). A reconsideration of acoustic invariance for place of articulation in diffuse stop consonants: Evidence from a cross-language study. Journal of the Acoustical Society of America, 76, 391–404.
¨ Lofqvist, A., Baer, T., McGarr, N., & Story, R. (1989). The cricothyroid muscle in voicing control. Journal of the Acoustical Society of America, 85, 1314–1321. Lousada, M., Jesus, L. M. T., & Hall, A. (2010). Temporal acoustic correlates of the voicing contrast in European Portuguese stops. Journal of the International Phonetic Association, 40, 261–275. Mauchly, J. W. (1940). Significance test for sphericity of a normal n-variate distribution. The Annals of Mathematical Statistics, 11, 204–209. ¨ Mobius, B. (2004). Corpus-based investigations on the phonetics of consonant voicing. Folia Linguistica, 38, 5–26. Netsell, R., Lotz, W. K., & Shaughnessy, A. L. (1984). Laryngeal aerodynamics associated with selected voice disorders. American Journal of Otolaryngology, 5, 397–403. Ohala, J. J. (1972). How is pitch lowered?. Journal of the Acoustical Society of America, 52, 124. Ohala, J. J. (1983). The origin of sound patterns in vocal tractconstraints. In: P. MacNeilage (Ed.), The production of speech (pp. 189–216). New York: Springer-Verlag. Ohala, J. J., & Riordan, C. (1979). Passive vocal tract enlargement during voiced stops. In: J. Wolf, & D. Klatt (Eds.), Speech communication papers (pp. 89–92). New York: Acoustical Society of America. Ohala, J. J., & Sole´, M. J. (2010). Turbulence and phonology. In: S. Fuch, M. Toda, & M. Zygis (Eds.), Turbulent sounds: An interdisciplinary guide (pp. 37–101). Berlin: De Gruyter Mouton. Ohala, J. J., & Sprouse, R. (2003). Effects on speech of introducing aerodynamic perturbations. In Proceedings of the 15th international congress of phonetic sciences (ICPhS 2003) (pp. 2913–2916). Barcelona, Spain. Ohde, R. (1984). Fundamental frequency as an acoustic correlate of stop consonant voicing. Journal of the Acoustical Society of America, 75, 224–230. Pape, D., & Jesus, L. M. T. (2011). Devoicing of phonologically voiced obstruents: Is European Portuguese different from other Romance languages? In 17th international congress of phonetic sciences (ICPhS 2011) (pp. 1566–1569). Hong Kong, China. Pape, D., Mooshammer, C., Hoole, P., & Fuchs, S. (2006). Devoicing of word-initial stops: A consequence of the following vowel? In: J. Harrington, & M. Tabain (Eds.), Speech production: models, phonetic processes, and techniques. New York: Psychology Press. Pinho, C. M. R., Jesus, L. M. T., & Barney, A. (2009). Aerodynamics of fricative production in European Portuguese. In 10th annual conference of the international speech communication association (Interspeech 2009) (pp. 472–475). Brighton, UK. Pinho, C. M. R., Jesus, L. M. T., & Barney, A. Aerodynamic measures of speech in unilateral vocal fold paralysis (UVFP) patients. Logopedics Phoniatrics Vocology, DOI: http://dx.doi.org/10.3109/14015439.2012.696138, in press. Pirello, K., Blumstein, S. E., & Kurowski, K. (1997). The characteristics of voicing in syllable–initial fricatives in American English. Journal of the Acoustical Society of America, 101, 3754–3765. Recasens, D., & Espinosa, A. (2009). An articulatory investigation of lingual coarticulatory resistance and aggressiveness for consonants and vowels in Catalan. Journal of the Acoustical Society of America, 125, 2288–2298. Rothenberg, M. (1973). A new inverse-filtering technique for deriving the glottal air flow waveform during voicing. The Journal of the Acoustical Society of America, 53, 1632–1645. Rothenberg, M., & Mahshie, J. J. (1988). Monitoring vocal fold abduction through vocal fold contact area. Journal of Speech and Hearing Research, 31, 338–351. Shadle, C. H. (2010). Aerodynamics of speech, and the puzzle of voiced fricatives. In Conference on phonetic universals. Leipzig, Germany. Smith, C. L. (1997). The devoicing of /z/ in American English: Effects of local and prosodic context. Journal of Phonetics, 25, 471–500. Sole´, M. J. (2010). Effects of syllable position on sound change: An aerodynamic study of final fricative weakening. Journal of Phonetics, 38, 289–305. Stevens, K. (1991). Vocal-fold vibration for obstruent consonants. In: Gauffin, & Hammarberg (Eds.), Vocal fold physiology: Acoustic, perceptual, and physiological aspects of voice mechanisms (pp. 29–36). Singular. Stevens, K., Blumstein, S., Glicksman, L., Burton, M., & Kurowski, K. (1992). Acoustic and perceptual characteristics of voicing in fricatives and fricative clusters. Journal of the Acoustical Society of America, 91, 2979–3000. Trujillo-Ortiz, A. (2006). Anfactpc: Factor Analysis by the Principal Components Method. /http://www.mathworks.com/matlabcentral/fileexchange/10601anfactpcS. Trujillo-Ortiz, A., & Hernandez-Walls, R. (2003). Cochtest: Cochran’s test for homogeneity of variances for equal or unequal sample sizes. /http://www.math works.com/matlabcentral/fileexchange/loadFile.do?objectId=3292&objectType= FILES. Trujillo-Ortiz, A., Hernandez-Walls, R., & Trujillo-Perez, R. A. (2004). RMAOV2: Two-way repeated measures ANOVA. /http://www.mathworks.com/matlab central/fileexchange/loadFile.do?objectId=5578S. Watson, B. C. (1998). Fundamental frequency during phonetically governed devoicing in normal young and aged speakers. Journal of the Acoustical Society of America, 103, 3642–3647. Westbury, J. R., & Keating, P. A. (1985). On the naturalness of stop consonant voicing. Journal of Linguistics, 22, 145–166.