Interarticulator Timing Control During Inspiratory Phonation *Manwa L. Ng, †Yang Chen, *Stephen Wong, and *Steve Xue, *Sai Ying Pun, Hong Kong and yPittsburgh, Pennsylvania Summary: Objective. The ability of Cantonese speakers in producing aspirated and unaspirated stops and stops at different places of articulation using expiratory phonation (EP) and inspiratory phonation (IP) was compared. Interarticulator timing during stop production using EP and IP was examined. Voice onset time (VOT) associated with EP and IP stops was compared with stop identification scores by naı¨ve listeners. Subjects and Methods. Aspirated and unaspirated voiceless stops (/ph, th, kh, p, t, k/) followed by the vowel /a/ were produced by 15 male and 15 female Cantonese speakers using EP and IP. VOT values were measured and isolated speech samples of stop productions were identified by 10 naı¨ve listeners. Percent correct identification of stops was obtained from the 10 listeners. Results. Perceptual data showed that production of IP stops were associated with reduced accuracy in stop identification, with predominant errors in aspiration perception. Acoustic analysis showed that IP stops were generally produced with significantly shorter VOT than their EP counterparts. In addition, effect of place of articulation on VOT was also found for both IP and EP stops, notably with velar stops being associated with significantly longer VOT values than bilabial and alveolar stops. Conclusions. The findings that IP stops were produced with shorter VOT as compared with EP stops imply that the articulatory-phonatory coordination during IP was not the same as that during EP, causing a discrepancy in the timing control between articulators. Key Words: Reverse phonation–Inspiratory phonation–Cantonese–VOT. INTRODUCTION Normal or expiratory phonation (EP) occurs when pulmonic air flows outwardly from the airway, which is initiated by a positive pressure differential across the glottis.1,2 Reverse or inspiratory phonation (IP), however, requires a reverse flow of air when the airstream is drawn into the lungs through the glottis when subglottal pressure is lower than supraglottal pressure.3–5 IP can either be voluntary or involuntary.6–10 Involuntary IP is often associated with a prolapsed vocal tract airway,6–10 supraglottic or subglottic obstruction, a physical condition often seen in patients with asthma or obstructive sleep apnea.7 This frequently results in stridor or wheezing because of its unique phonatory characteristics.4 Voluntary IP, in contrast, takes place when the vocal mechanism is orchestrated in a deliberate attempt to phonate with an ingressive airflow.11 This unusual phonatory manner, however, has been widely used for various clinical purposes. For instance, in diagnostic radiology, patients frequently have to make use of IP to help radiologists distinguish supraglottic and glottic tumors from transglottic growth during an x-ray examination.9 Otolaryngologists often rely on IP for proper laryngoscopic examinations, especially for possible subglottic abnormalities.12 IP is also being used in voice therapy as a behavioral assessment or a facilitative technique for a variety of voice disorders, including ventricular phonation, spasmodic
Accepted for publication January 5, 2010. From the *Speech Science Laboratory, Division of Speech and Hearing Sciences, The University of Hong Kong, Sai Ying Pun, Hong Kong; and the yDepartment of Speech-Language Pathology, Duquesne University, Pittsburgh, Pennsylvania. Address correspondence and reprint requests to Manwa L. Ng, Speech Science Laboratory, Division of Speech and Hearing Sciences, The University of Hong Kong, 5/F Prince Philip Dental Hospital, 34 Hospital Road, Sai Ying Pun, Hong Kong, SAR, China. E-mail:
[email protected] Journal of Voice, Vol. 25, No. 3, pp. 319-325 0892-1997/$36.00 Ó 2011 The Voice Foundation doi:10.1016/j.jvoice.2010.01.001
dysphonia, and psychogenic voice disorders.9,13–15 Given the wide range of clinical applications of IP, it is important to understand the exact mechanism of IP and the timing relationship of activities involved in IP. The physiological rationale of IP as a facilitative approach for dysphonia is largely based on the fact that IP, in certain circumstances, triggers an appropriate physiology for dysphonic patients. In a study of IP by female speakers, Kelly and Fisher16 observed significant reduction in membranous vocal fold contact length, yielding a unique posterior glottal chink during IP. This finding was in line with earlier results reported by Orlikoff et al,9 in which a similar vocal fold contact pattern was noticed in male speakers. With the reduced force of vocal fold approximation, IP can facilitate effortless phonation in patients with vocal hyperfunction and adductory spasmodic dysphonia, and it has been accordingly used as either a preparatory step for easy voice production using EP or a preferred mode of speaking for severely dysphonic patients.3,14 The properties of the unique voicing mechanism of (voluntary) IP have been investigated visually, acoustically, and aerodynamically. For instance, Orlikoff et al9 examined vowels sustained using IP by normal individuals using a combination of acoustic, electroglottographic, and stroboscopic measurements. It was found that, IP was associated with increased airflow, decreased amount of vocal fold contact, and higher fundamental frequency (F0) when compared with EP. Kelly and Fisher16 examined the acoustic and stroboscopic data obtained from sustained vowel /i/ using IP and found similar results. They also noted that the intensity level of IP vowels failed to vary consistently with the increase of F0. Robb et al4 extended the research of IP by studying the interaction between articulation and phonation during production of the vowels /i, a, u/ using IP and EP. In accordance with earlier studies,9,16 Robb et al4 found that IP was associated with significantly higher F0 than EP. The main findings of the
320 study, however, indicated different vowel spectral characteristics (F1 & F2) between IP and EP, notably for /u/ and /a/.4 Such finding was attributed, despite the limited details, to the possible difference of articulatory-phonatory coordination between IP and EP. In view of the available data on IP, it appears that the articulatory-phonatory mechanism of IP is far from being sufficiently understood. Notably, previous studies mainly focused on IP production of isolated vowels,4,9,16 and the mechanism of IP production in a consonant-vowel context is yet to be investigated. Additional information regarding the mechanism of IP production in consonant-vowel context is needed to address the effects of IP on the coordination between phonation and articulation, because insufficient understanding of IP, as suggested by Orlikoff et al9 and Robb et al,4 would limit its clinical applicability. Interarticulator timing—voice onset time One widely accepted approach to investigate the coordination between phonatory and articulatory systems is to examine voice onset time (VOT), a parameter defined as the temporal separation between the release of an oral occlusion and onset of glottal pulsing.17 A longer VOT, for example, implies a greater latency between articulatory movement and onset of phonation.18 In addition, other factors have been found to affect VOT, including the lexical tone and place of articulation of an oral stop.19–21 By examining the VOT characteristics associated with stops of different languages, researchers have found that such interarticulator timing, along with several other important perceptual cues for voicing and aspiration (eg, aspiration noise levels/duration, formant frequency transitions, and burst characteristics, etc22), relates to perceptual differentiation for voicing and aspiration.23 For instance, English voiceless stops /p, t, k/ and voiced stops /b, d, g/ can be distinguished based on VOT differences, with voiceless stops being associated with longer VOT values than voiced stops.22 As for languages with only voiceless stops that are distinguished by aspiration contrast, such as Mandarin and Cantonese, a similar pattern of cueing between aspirated and unaspirated stops via different VOT values has been found.20,23 In both Mandarin and Cantonese, aspirated stops /ph, th, kh/ were associated with longer VOT than the unaspirated counterparts /p, t, k/. In addition to being a perceptual cue for voicing and aspiration, VOT has also been found to interact with place of consonant articulation.19,24 The evidence of place-dependent nature of VOT can be found in a comprehensive study by Cho and Ladefoged19 who examined the VOT differences between aspirated and unaspirated stops in 18 languages. The study not only demonstrated the voicing or aspiration-specific nature for VOT, but also found an approximate average difference of VOT of 18 milliseconds between coronal and velar stops. Results from these studies suggested that changes in VOT relates to changes of perception of voicing or aspiration and of place of articulation. However, all of the VOT studies were based on data obtained from normal EP productions. The VOT characteristics associated with IP are not known. As such, there is a lack of information regarding how IP affects the timing control between the articulatory system and the phonatory system. If IP is to be used for different purposes, such as in facilitating
Journal of Voice, Vol. 25, No. 3, 2011
phonation for dysphonic patients, timing characteristics between different speech systems during IP should be better understood. In addition, knowledge of IP mechanism may provide critical insights and shed light on the understanding of normal and pathological aspects of speech production. The present study examined the timing coordination between the phonatory and articulatory systems during EP and IP. VOT values obtained from Cantonese consonant-vowel (CV) syllables were correlated with listeners’ perceptual identification of various stops of different aspiration contrasts and places of articulation produced using EP and IP. In Cantonese, all stops are voiceless and contrasted by aspiration and three places of articulation (bilabial, alveolar, and velar) yielding a total of six different stops.25,26 It is expected that the effect of phonation (EP versus IP) on the control of interarticulator timing would be revealed based on comparisons of the VOT characteristics associated with the six Cantonese stops produced using EP and IP. Additionally, VOT data obtained from the present study would provide a more comprehensive appraisal, and lead to a better understanding of mechanism of IP, which would ultimately help design a better and more effective voice therapy regimen involving the use of IP. METHODS Participants Speakers. Fifteen male and 15 female speakers with ages ranged from 19–28 years and a mean age of 22.5 years participated in the study. They were healthy native speakers of Hong Kong Cantonese with no known history of respiratory, vocal, and/or auditory problems. The speakers were local Hong Kong residents who had completed at least high school education. An important inclusion criterion was the speakers must be proficient in sustaining the vowel /a/ steadily using IP for at least 2 seconds. Listeners. Five male and five female adult Cantonese speakers participated in the listening experiment. Their ages ranged from 19 to 24 years, with a mean age of 20.4 years. All listeners had no reported history of speech and/or hearing problems, with no formal training in voice therapy or phonetics, and were considered as naı¨ve listeners as they had no prior experience with sound production using IP. Speech materials Three aspirated (/ph, th, kh/) and three unaspirated (/p, t, k/) voiceless stops followed by the vowel /a/ were used in the study. This provided a complete set of phonological contrast of voiceless unaspirated and voiceless aspirated stops at three places of articulation (bilabial, alveolar, and velar) in Cantonese. All CV syllables were produced at the high-level tone, as lexical tone is beyond the focus of the study. This yielded six meaningful Cantonese monosyllabic words: /pha/ ‘‘to grovel,’’ /tha/ ‘‘he,’’ /kha/ ‘‘compartment,’’ /pa/ ‘‘father,’’ /ta/ ‘‘dozen,’’ and /ka/ ‘‘to add.’’ To increase the naturalness of productions, the six Cantonese words were embedded individually in a carrier phrase: /tYk—tsi/ meaning ‘‘Read—word.’’ With such carrier phrase, the monosyllabic word was easily identified on the waveform
Manwa L. Ng, et al
Interarticulator Timing Control
during acoustic analysis because it was preceded and followed by a brief period of silence associated with the stop /k/ and affricate /ts/, respectively. Recording procedures To make sure that the speakers were able to produce the CV syllables using both EP and IP proficiently, sufficient practice time and training were provided. Only when the speakers were able to sustain the vowel /a/ using IP for at least 2 seconds would the experiment begin. Before the experiment, the speakers were provided with instructions on how to produce IP, and they were allowed to practice with the speech materials. During the recording, the speakers were seated in a sound booth and instructed to produce the six Cantonese words in the carrier phrase three times at a comfortable pitch and loudness level using EP and IP. The order in which the six CV words were produced was randomized to avoid any order effect. For each syllable, the Chinese character corresponding to each CV syllable was printed with the carrier phase on an index card. A total of 18 (6 stops 3 3 productions) cards were shuffled and presented to each speaker during recording in a random sequence of phonation mode (ie, EP versus IP). Two investigators assured that each target CV was produced using the appropriate phonation mode as designated. When a target CV token produced by a participant was perceived inappropriate according to the designated phonation mode, the participant was asked to produce the CV token again until it was produced correctly. Speech samples were recorded by using a high-quality microphone (SM 58; Shure Incorporated, Niles, IL) with a 10-cm mouth-to-microphone distance and a preamplification unit (MobilePre USB; M-Audio, Arcadia, CA). Acoustic signals were digitized at a sampling rate of 22.05 kHz and quantization of 16 bits by using MobilePre USB interface. The digitized signals were stored in a computer for later analyses. On completion of the recording, a total of 1080 productions (6 stops 3 3 trials 3 2 phonation types 3 30 speakers) were recorded. Only the last two trials were chosen for VOT measurement and listening experiment, giving a total of 720 productions (6 stops 3 2 trials 3 2 phonation types 3 30 speakers). Perceptual experiment To assess the intelligibility associated with EP and IP, the target CV syllables isolated from the carrier phrase were perceptually identified by the listeners. A brief instruction was provided to the listeners before the listening experiment to acquaint themselves with the experimental procedure. The recorded CV tokens produced by the speakers using EP and IP were randomized and presented to the listeners at a comfortable loudness level (approximately 70 dB SPL) via high-quality headphones (HD 435; Sennheiser Electronic Corporation, Old Lyme, CT) while seated in a quiet room. Upon listening to each speech sample, the listeners selected the word corresponding to their perceived syllable on a preissued answer sheet. When needed, the listeners were allowed to listen to the speech stimulus again. A maximum of two presentations were allowed for each stimulus. An interstimulus pause of 3 seconds was provided to the listeners to make their selection.
321 Acoustic analysis and statistical procedures VOT values were measured by appropriately marking the waveform of the speech samples with the help of a wideband spectrogram (windows size ¼ 25 milliseconds) using Praat (P. Boersma & D. Weenink, Amsterdam, The Netherlands). The first cursor was placed on the waveform to indicate the release of the oral occlusion. The second cursor was placed at the onset of the vocalic segment, which was identified based on the first vertical striation extending upward through the first (F1) and second formant (F2) without interruption. Such technique of VOT measurement has been reported to yield consistent results.27 From the perceptual experiment, average percent correct identification was calculated for different stops based on which confusion matrices were generated to depict the error pattern of stop identification. The present study adopted a repeated-measure factorial design with phonation type, aspiration, and place of articulation as within subjects variables. Effects of various variables on VOT were examined by using analysis of variance (ANOVA). Reliability measures A set of 72 speech samples (10% of the entire data corpus) was randomly selected for assessment of intrainvestigator reliability of VOT measurement, from which VOT values were remeasured by the investigator. Pearson product-moment correlation and absolute percent error values were calculated based on the VOT values obtained from the first and second measurements of this set of speech samples. Results of intrainvestigator reliability are evidenced by strong correlation (r ¼ 0.976, P < 0.01) with an absolute percent error of 11.93% between the first and second measurements. For interinvestigator reliability, VOT values were measured over a second set of randomly selected 72 speech samples by a second investigator. Results of interinvestigator reliability is evidenced by strong correlation (r ¼ 0.939, P < 0.01) between the two measurements, with an absolute percent error of 16.14% for VOT interinvestigator measurement. A set of 72 target CV syllables (36 EP samples and 36 IP samples) randomly selected from the original speech samples were presented to the listeners for the second time for assessment of intralistener reliability of stop identification. A total of 720 identification scores were recorded from the listeners (10 listeners 3 72 tokens). The percentage of consistency was calculated based on the identification scores obtained from the first and second listening experiments. The overall percentage of consistency regarding stop identification was 80.42% between the first and second listening tests. RESULTS Perception Percent correct identification of aspirated and unaspirated stops produced using EP and IP is summarized in Tables 1 and 2. As shown in Table 1, identification of all Cantonese stops produced using EP achieved a near-perfect level of accuracy (>93%). Among the perceptual errors indicated in EP, misperception of place of articulation appeared to be the predominant error,
322
Journal of Voice, Vol. 25, No. 3, 2011
TABLE 1. Confusion Matrix of Identification (%) of Cantonese Stops Produced Using Expiratory Phonation Cantonese stops presented h
h
/p / Cantonese /th/ stops perceived /kh/ /p/ /t/ /k/
/p /
/th/
/kh/
/p/
/t/
/k/
98 2.0 0.0 0.0 0.0 0.0
2.5 94.0 3.1 0.0 0.2 0.2
0.2 1.0 98.5 0.0 0.3 0.0
0.0 0.2 0.0 98.5 1.3 0.0
0.2 0.5 0.0 4.3 93.5 1.5
0.2 0.0 0.0 0.2 0.6 99.0
with few errors in aspiration. However, all Cantonese stops produced using IP were identified at a significantly lower level of accuracy (<65%) (Table 2). A common error pattern appeared primarily to be the confusion of aspiration as opposed to the slightly higher accuracy in identification of place of articulation. Listeners demonstrated difficulty in distinguishing between aspirated and unaspirated stops produced at the same place of articulation using IP. They also misidentified place of articulation of Cantonese stops produced using IP, notably for the productions of unaspirated /k/ sound. VOT values Average VOT values of the aspirated and unaspirated stops produced at different places of articulation using EP and IP are shown in Figures 1 and 2, respectively. A three-way repeatedmeasure ANOVA was used to assess the possible effects of phonation type, aspiration, and place of articulation on VOT. As a significant interaction was found between aspiration and phonation (F(1,59) ¼ 689.047, P < 0.001), two 2 3 3 (aspiration 3 place of articulation) two-way ANOVAs, and two 2 3 3 (phonation 3 place of articulation) two-way ANOVAs were subsequently carried out to test for the differences between EP and IP and between aspirated and unaspirated stops, respectively.
FIGURE 1. Average and standard deviation values of VOT of aspirated and unaspirated stops produced at different places of articulation using EP.
respectively (Figure 1). The two-way ANOVA results showed no significant interaction between aspiration and place of articulation (F(2,118) ¼ 0.046, P ¼ 0.955). However, significant main effects were found for aspiration (F(1,59) ¼ 1737.685, P < 0.01) and place of articulation (F(2,118) ¼ 49.062, P < 0.01). Aspirated stops /ph, th, kh/ exhibited significantly longer VOT values than unaspirated stops /p, t, k/. Pairwise multiple comparisons revealed that velar stops /kh, k/ were associated with significantly longer VOT values than alveolar stops /th, t/ (P < 0.01) and bilabial stops /ph, p/ (P < 0.01). Alveolar stops /th, t/ were associated with significantly longer VOT values than bilabial stops /ph, p/ (P < 0.05). Inspiratory phonation The average VOT values of aspirated and unaspirated stops produced using IP were 37.30 and 28.24 milliseconds, respectively (Figure 2). The two-way ANOVA results showed no significant
Expiratory phonation The average VOT values of aspirated and unaspirated stops produced using EP were 96.86 and 15.66 milliseconds,
TABLE 2. Confusion Matrix of Identification (%) of Cantonese Stops Produced Using Inspiratory Phonation Cantonese stops presented
Cantonese stops perceived
h
/p / /th/ /kh/ /p/ /t/ /k/
/ph/
/th/
/kh/
/p/
/t/
/k/
50.2 12.3 1.2 30.5 4.6 1.2
1.5 64.0 7.0 0.3 20.9 6.3
0.7 6.5 57.0 0.2 1.8 33.8
27.3 8.4 2.3 56.7 4.5 0.8
1.2 44.2 1.5 0.8 48.8 3.5
50.2 12.3 1.2 30.5 4.6 1.2
FIGURE 2. Average and standard deviation values of aspirated and unaspirated stops produced at different places of articulation using IP.
Manwa L. Ng, et al
Interarticulator Timing Control
interaction between aspiration and place of articulation (F(2,118) ¼ 0.964, P ¼ 0.384). However, significant main effects were found for aspiration (F(1,59) ¼ 25.286, P < 0.01) and place of articulation (F(2,118) ¼ 21.842, P < 0.01). Aspirated stops were found to exhibit significantly longer VOT values than unaspirated counterparts. Velar stops /kh, k/ were associated with significantly longer VOT values than alveolar stops /th, t/ (P < 0.01) and bilabial stops /ph, p/ (P < 0.01). In addition, no significant difference in VOT was found between bilabial stops /ph, p/ and alveolar stops /th, t/ (P ¼ 0.141). Inspiratory phonation versus expiratory phonation Results of two-way ANOVA showed no significant interaction between phonation and place of articulation for both aspirated (F(2,118) ¼ 0.220, P ¼ 0.803) and unaspirated stop categories (F(2,118) ¼ 0.610, P ¼ 0.545). However, significant main effects were found in phonation for both aspirated (F(1,59) ¼ 534.340, P < 0.01) and unaspirated stop categories (F(1,59) ¼ 99.047, P < 0.01). VOT values of aspirated stops produced using EP were significantly longer than those produced using IP, whereas VOT values for unaspirated stops produced using EP were significantly shorter than those produced using IP. DISCUSSION Perception Table 1 reveals that identification of Cantonese stops produced using EP was nearly perfect (ie, with perceptual accuracy greater than 90%). This is consistent with results reported from previous studies.20,23 The lack of perfect identification of EP stops may result from the use of isolated CV syllables in the listening experiment, which may have deprived the listeners of other important cues for correct stop perception because of the absence of contextual cues. As for identification of stops in IP mode, the present results revealed a very low percent correct identification of stops with aspiration confusion as the predominant error. The predominant confusion between aspirated and unaspirated stops in IP mode is in line with the smaller VOT difference between aspirated and unaspirated stops obtained from IP production as opposed to those from EP production. (eg, average VOT differences between aspirated and unaspirated stops were 81.20 and 9.06 milliseconds for EP and IP, respectively). Recall that most of the researchers agreed that VOT is related to the perception for aspiration.23,27,28 The greater VOT difference between aspirated and unaspirated EP stops may be related to a better perceptual cue for listeners to correctly perceive aspiration of EP stops. On the contrary, it is likely that a small VOT contrast has rendered aspirated and unaspirated IP stops less distinguishable, which has led to the aspiration confusion. It should be noted that previous studies revealed that aspiration noise and formant transition information were also important cues for perception of aspiration and place of articulation in stops.23 Some even suggested that the presence of aspiration noise was a more important perceptual cue for aspiration perception than VOT. Given such, it appears that investigations of other acoustic cues in the perceptions of
323 aspiration and place of articulation of IP stops in addition to the presently examined VOT values are warranted. Phonation Results of this study indicated that, in general, aspirated EP stops were associated with significantly longer VOT than IP counterparts, whereas unaspirated EP stops were produced with shorter VOT than IP counterparts. To explain the shorter VOT in IP stops, the physiological process and possibly the subsequent aeroacoustic difference between EP and IP consonant productions needs to be reviewed closely. Aerodynamically, production of EP stops require a sufficiently high (around 7.5 cm H2O)29 intraoral pressure to be accumulated somewhere in the oral cavity (depending on the place of oral occlusion) to open the oral occlusion formed in the vocal tract. The subglottal pressure can be considered as approximately equivalent to the intraoral pressure at the moment of (EP) burst.30 This in turn results in a momentary zero transglottal pressure differential (TPD), which is a key aerodynamic parameter defined as the difference between the subglottal or pulmonic pressure and supraglottal or intraoral pressure.31 Recall that TPD needs to be maintained around 5–30 cm H2O.32,33 The oral pressure drops as the stop is released and the accumulated intraoral air dissipates into the atmosphere. It has been previously reported that VOT is related to the time it takes to adduct the vocal folds34 (perhaps for the purpose of building up subglottal pressure to overcome the supraglottal pressure). However, it can also be the case, derived from studies by Shadle33,35 in which intraoral pressure variation during the productions of stop and its following vowel sound is clearly illustrated, that VOT may depend on the time elapsed between the drop of supraglottal pressure (from the peak) and onset when adequate TPD is achieved. Unlike EP stop production, the adequate TPD for the phonation of vowel sound following the release of oral occlusion during IP stop production is probably more readily available owing to the nature that release of oral occlusion during IP is facilitated by a sufficiently negative intraoral pressure behind the occlusion. This can be achieved by the combination of: (1) an inhalatory effort mainly rendered by lowering and flattening the diaphragm,13,32 (2) dilation of airway,36 and (3) distention of laryngeal ventricle.11,37 Meanwhile, the synchronized physiological activities further lead to an inward flow of air because of the negative TPD established by the more negative subglottal pressure in comparison with the supraglottal pressure.9 However, more aerodynamic data is needed to fully understand how such negative TPD is achieved to initiate IP. Yet another possibility to consider is the likelihood of reduced vocal fold contact and/or increase of glottal chink,14,16,38 as well as reduced ventricular compression.3,9,11,39 In addition, there is a possibility of reduced vocal fold adductory force during IP as opposed to EP based on previous reports of Orlikoff et al9 and Kelly and Fisher.16 These unique vocal vibratory patterns have been confirmed in a more recent study on reverse phonation conducted by Finger and Cielo,13 who suggested possible instability and hypoadduction of the vocal folds during IP, which yields a breathy IP quality. Meanwhile, the obvious difference of the vocal tract geometric dimension or dynamics
324 between EP and IP (eg, the outward movement of tongue during EP burst of /th/ versus the inward movement of tongue during IP burst of /th/), inferred from Shadle,40 may also affect source parameters and sound production. Altogether, it is not unreasonable to conclude that the combination of possible reduced vocal fold contact, vocal adductory force, vocal hypofunction, and distinctive vocal tract geometry in IP may have contributed to the shorter VOT in aspirated consonant production using IP than EP. Apparently, these suggested reasons for the VOT difference between EP and IP aspirated stop productions should be confirmed with further aerodynamic and acoustic data in future research. Although all the physiological accounts mentioned previously seem to provide convincing explanations for the finding that aspirated stops produced using EP mode demonstrate longer VOT than using IP mode, they do not work equally well in explaining the opposite pattern of VOT in unaspirated stop productions, with unaspirated IP stops showing longer VOT compared with the unaspirated EP stops. The VOT difference between EP and IP unaspirated stops may suggest different aerodynamic mechanisms, which are likely to arise from the relative differences in vocal tract geometry and their impact on sound production.40 Moreover, as reasoned by Kelly and Fisher,16 the different biomechanical and aerodynamic factors may have contributed to a longer VOT associated with unaspirated IP stops as opposed to unaspirated EP stops. Recall that the pressure at the leading edge or upper lip of the vocal folds is the ambient atmospheric pressure during IP, whereas the pressure at the leading edge or lower lip of the vocal fold is the subglottal pressure during EP. The directional difference of mucosal wave propagation between IP and EP, as elaborated by Kelly and Fisher,16 may have altered the entire dynamic features of the vocal folds, such as the vectors of force against and displacement of the vocal folds. Lastly, it is reasonable to infer that the differences of physiological process between IP and EP in yielding adequate TPD in aforementioned discussions would require different neuromuscular control mechanisms for the two phonation modes. Put together, the difference of vocal fold biomechanics and dynamics along with the possible differences in neuromuscular control between IP and EP may render a less favorable or more effortful condition for unaspirated stop production in IP than in EP, which ultimately requires a longer time (thus a longer VOT) to generate adequate TPD for driving the vocal folds into vibration during IP. Aspiration For both EP and IP, aspirated and unaspirated Cantonese stops were associated with significantly different VOT values. This is consistent with the data reported in the literature regarding long and short lag VOTs.22,23 Voiceless unaspirated stops in CV syllables are associated with short lag VOTs, as they are produced with glottal vibration shortly after stop occlusion is released. In the production of voiceless aspirated stops, it was suggested that the glottal width is relatively larger to allow greater transglottal airflow before the actual glottal vibration occurs, resulting in a longer VOT. The present data, however, indicates a smaller difference of VOT between aspirated and unaspirated
Journal of Voice, Vol. 25, No. 3, 2011
IP stops when compared with EP counterparts. This again, as previously discussed, is because of the possible differences in aerodynamics, vocal fold biomechanics and dynamics, as well as in neuromuscular control.16 The finding of smaller VOT difference associated with IP complies well with the relatively poor perceptual distinction between aspirated and unaspirated IP stops obtained from the listening experiment, which once again substantiates the idea that VOT, an interarticulator timing measure, correlates with perceptual cue for aspiration.22 Place of articulation Regardless of phonation type, our data indicated that velar stops (/kh, k/) were associated with significantly longer VOT than bilabial and alveolar counterparts (/ph, p, th, t/). Similar results have been reported previously in studies of EP stops of different languages.19,20,22 In general, VOT of EP stops significantly increases as the place of articulation changes from bilabial to velar stops. Factors have been suggested to explain such trend of VOT changes with place of articulation, including: (1) the intraoral pressure behind the occlusion, (2) speed of articulator movement, and (3) the extent of articulatory contact area.19,20 For example, during EP production of aspirated velar stop /kh/, a high intraoral pressure is established behind the occlusion. Such high pressure requires a longer time to begin glottal vibration as it takes a relatively longer time for the intraoral pressure to reach an appropriate phonation threshold pressure. In addition, as a major articulator for /kh/, the tongue dorsum moves relatively more slowly than the lips for bilabial stops or the tongue tip for alveolar stops. During the release of velar stop occlusion, the wider area of constriction leads to a slower rate of reduction of supraglottal pressure. All of theses factors favor longer VOT values for velar stops produced using EP. It appears that such notions can also be applied to IP stops, at least factors (2) and (3). More information including aerodynamic data is needed to determine if and how intraoral pressure affects onset of vocal fold vibration in the production of an IP stop. Clinical implication This pioneering study of Cantonese IP stops provides preliminary data and fundamental information in the understanding of the physiology and interarticulator timing characteristics associated with IP stop production. The present results reveal a difference between IP and EP regarding how the articulatory and phonatory systems interact, which was consistent with the previously reported data.4,9,13,16 The present data is in line with the suggestion made by Finger and Cielo13 that voice therapy making use of IP should start with vowels as it is relatively easier for patients to acquire blunt explosive phonemes in an initial position, such as /p/ during production training. In addition, it is suggested that speech therapists need to use their discretions in deciding whether to use unaspirated stops as appropriate speech materials for Cantonese patients who adopt IP as part of their therapy regimen. CONCLUSION The present study investigated the effect of place of articulation and aspiration in production of Cantonese stops using EP and
Manwa L. Ng, et al
Interarticulator Timing Control
IP. Results indicated that aspiration confusion was found in the listeners’ identification of IP stops when compared with EP counterparts. VOT analysis revealed that IP aspirated stops were associated with significantly shorter VOT values than the EP aspirated stops, and IP unaspirated stops were produced with significantly longer VOT than the EP unaspirated stops. The discrepancy of IP-EP VOT difference between aspirated and unaspirated productions may suggest differences in aerodynamic, vocal fold biomechanics and dynamics, vocal tract geometry, as well as neuromuscular control between the two modes of phonation. Regardless of phonation type, aspirated stops exhibited longer VOT values than unaspirated stops. Velar stops were associated with longer VOT values than bilabial and alveolar stops. Such findings support that the articulatory-phonatory interaction in productions using IP is not the same as that in productions using EP. REFERENCES 1. Lagefoged P. Preliminaries to Linguistic Phonetics. Chicago, IL: The University of Chicago Press; 1981. 2. Titze I. Principles of Voice Production. Englewood Cliffs, NJ: Prentice Hall; 1994. 3. Colton RH, Casper JK, Leonard R. Understanding Voice Problems. A Physiological Perspective for Diagnosis and Treatment. 3rd ed. Baltimore, MD: Williams & Wilkins; 2006. 4. Robb MP, Chen Y, Gilbert HR, Lerman JW. Acoustic comparison of vowel articulation in normal and reverse phonation. J Speech Lang Hear Res. 2001;44:118–127. 5. Shulman S. Symptom modification for abductor spasmodic dysphonia: inhalation phonation. In: Stemple JC, ed. Voice Therapy: Clinical Studies. 2nd ed. San Diego, CA: Singular Publishing Group; 2000. 6. Aronson A. Clinical Voice Disorders. 3rd ed. New York, NY: Thieme Inc.; 1990. 7. Gavriely N, Palti Y, Alroy G, Grotberg J. Measurement and theory of wheezing breath sounds. J Appl Physiol. 1984;57:481–492. 8. Ludlow C. Treatment of speech and voice disorders with botulinum toxin. JAMA. 1990;264:2671–2675. 9. Orlikoff RF, Baken RJ, Kraus DH. Acoustic and physiologic characteristics of inspiratory phonation. J Acoust Soc Am. 1997;102:1838–1845. 10. Wilson K. Voice Problems of Children. 3rd ed. Baltimore, MD: Williams and Wilkins; 1987. 11. Lehmann QH. Reverse phonation: a new maneuver for eliminating the larynx. Radiology. 1965;84:215–222. 12. Sulica L, Behrman A, Roark R. The inspiratory maneuver: a simple method to assess the superficial lamina propria during endoscopy. J Voice. 2005;19: 481–484. 13. Finger LS, Cielo CA. Reverse phonation—physiologic and clinical aspects of this speech voice therapy modality. Braz J Otorhinolaryngol. 2007;73: 271–277. 14. Harrison GA, Davis P, Troughear RH, Winkworth A. Inspiratory speech as a management option for spastic dysphonia: case study. Ann Otol Rhinol Laryngol. 1992;101:375–382. 15. Maryn Y, de Bodt MS, van Cauwenberge P. Ventricular dysphonia: clinical aspects and therapeutic options. Laryngoscope. 2003;13:859–886. 16. Kelly CL, Fisher KV. Stroboscopic and acoustic measures of inspiratory phonation. J Voice. 1999;13:389–402.
325 17. Baken RJ, Orlikoff RF. Clinical Measurement of Speech and Voice. 2nd ed. San Diego, CA: Singular Publishing Group; 2000. 18. Abramson AS. Laryngeal timing in consonant distinctions. Phonetica. 1977;34:295–303. 19. Cho T, Ladefoged P. Variation and universals in VOT: evidence from 18 languages. J Phon. 1999;27:207–229. 20. Liu H, Ng M, Wan M, Wang S, Zhang Y. Effects of place of articulation and aspiration on voice onset time in Mandarin esophageal speech. Folia Phoniatr Logop. 2007;59:147–154. 21. Liu H, Ng M, Wan M, Wang S, Zhang Y. The effect of tonal changes on voice onset time in Mandarin esophageal speech. J Voice. 2008;22: 210–218. 22. Lisker L, Abramson AS. A cross-language study of voicing in initial stops: acoustical measurements. Word. 1964;20:384–422. 23. Tsui IYH, Ciocca V. Perception of aspiration and place of articulation of Cantonese initial stops by normal and sensorineural hearing-impaired listeners. Int J Lang Commun Disord. 2000;35:507–525. 24. Peterson GE, Lehiste I. Duration of syllable nuclei in English. J Acoust Soc Am. 1960;32:693–703. 25. Bauer RS, Benedict PK. Modern Cantonese Phonology. (Trends in Linguistics—Studies and Monographs 102). Berlin, NY: Mouton de Gruyter; 1997. 26. Zee E. Chinese (Hong Kong Cantonese), Handbook of the International Phonetic Association—A Guide to the Use of the International Phonetic Alphabet. Cambridge, UK: Cambridge University Press; 1999:58–60. 27. Francis AL, Ciocca V, Yu JMC. Accuracy and variability of acoustic measures of voicing onset. J Acoust Soc Am. 2003;113:1025–1032. 28. Lisker L, Abramson AS. Some effects of context on voice onset time in English stops. Lang Speech. 1967;10:1–28. 29. Blomgren M, Chen Y, Ng M, Gilbert HR. Acoustic, aerodynamic, physiologic, and perceptual properties of modal and vocal fry registers. J Acoust Soc Am. 1998;103:2649–2658. 30. Smitheran J, Hixon T. A clinical method for estimating laryngeal airway resistance during vowel production. J Speech Hear Disord. 1981;46: 138–146. 31. Ohala JJ. Respiratory activity in speech. In: Hardcastle W, Marchal A, eds. Speech Production and Speech Modeling. Dordrecht, The Netherlands: Kluwer; 1990:23–53. 32. Zemlin WR. Speech and Hearing Science: Anatomy and Physiology. 4th ed. Boston, MA: Allyn and Bacon; 1998. 33. Shadle CH. The aerodynamics of speech. In: Hardcastle WJ, Laver J, eds. Handbook of Phonetic Sciences. Malden, MA: Blackwell Publishers; 1997: 33–64. 34. Catford JC. Fundamental Problems in Phonetics. Edinburgh, UK: Edinburgh University Press; 1977. 35. Shadle CH, Scully C. An articulatory-acoustic-aerodynamic analysis of [s] in VCV sequences. J Phon. 1995;23:53–66. 36. Wyke BD. Neuromuscular control system in voice production. In: Bless DM, Abbs JH, eds. Vocal Fold Physiology: Contemporary Research and Clinical Issues. San Diego, CA: College-Hill; 1977:71–76. 37. Powers WE, Holtz S, Ogura J. Contrast examination of the larynx and pharynx: inspiratory phonation. Am J Roentgenol Radium Ther Nucl Med. 1964; 92:40–42. 38. Moore P, Von Leden H. Dynamic variation of the vibratory pattern in the normal larynx. Folia Phoniatr (Basel). 1958;10:205–238. 39. Boone DR, McFarlane SC. The Voice and Voice Therapy. 5th ed. Englewood Cliffs, NJ: Prentice Hall; 1994. 40. Shadle CH. The effect of geometry on source mechanisms of fricative consonants. J Phon. 1991;19:409–424.