ARTICLE IN PRESS Responses of Middle-Frequency Modulations in Vocal Fundamental Frequency to Different Vocal Intensities and Auditory Feedback *Shao-Hsuan Lee, ‡Tuan-Jen Fang, †Jen-Fang Yu, and §,¶Guo-She Lee, *†‡Taoyuan City and §¶Taipei, Taiwan Summary: Objectives and Background. Auditory feedback can make reflexive responses on sustained vocalizations. Among them, the middle-frequency power of F0 (MFP) may provide a sensitive index to access the subtle changes in different auditory feedback conditions. Materials and Methods. Phonatory airflow temperature was obtained from 20 healthy adults at two vocal intensity ranges under four auditory feedback conditions: (1) natural auditory feedback (NO); (2) binaural speech noise masking (SN); (3) bone-conducted feedback of self-generated voice (BAF); and (4) SN and BAF simultaneously. The modulations of F0 in low-frequency (0.2 Hz–3 Hz), middle-frequency (3 Hz–8 Hz), and high-frequency (8 Hz–25 Hz) bands were acquired using power spectral analysis of F0. Acoustic and aerodynamic analyses were used to acquire vocal intensity, maximum phonation time (MPT), phonatory airflow, and MFP-based vocal efficiency (MBVE). Results. SN and high vocal intensity decreased MFP and raised MBVE and MPT significantly. BAF showed no effect on MFP but significantly lowered MBVE. Moreover, BAF significantly increased the perception of voice feedback and the sensation of vocal effort. Conclusions. Altered auditory feedback significantly changed the middle-frequency modulations of F0. MFP and MBVE could well detect these subtle responses of audio-vocal feedback. Key Words: Auditory feedback–Vocal fundamental frequency–Middle-frequency modulation–Power spectral analysis–Vocal efficiency. INTRODUCTION Accurate control of vocal production is essential for humans to convey diverse messages in vocal communication.1,2 It is generally believed that vocal control relies on concurrent processing of vocal output and auditory input. The state feedback control theory has suggested that vocal motor responses to sensory perturbations involve the feedback control in the central nervous system (CNS) rather than only the function of the lower motor system. When there exists an alteration of auditory feedback, an adaptive motor command will be triggered in sensory cortices and then sent to motor-related cortical areas for driving subsequent motor corrections in vocal acts.3,4 The model of DIVA (Directions into Velocities of Articulators) also represents a neuroanatomical account for the process of neural mapping and remapping between speech sensory inputs and motor outputs.5–7 The neural model mainly consists of the feedforward and feedback subsystems. When speakers attempt to make a phonation, the laryngeal movements are first initiated according to the learned motor templates stored in the feedforward subsystem. These acquired motor skills allow the speaker to produce a speech sound at a preset level without the contributions of sensory feedback. Accepted for publication January 30, 2017. Disclosures of conflict of interest: None to declare. From the *Ph.D. Program in Biomedical Engineering, College of Engineering, ChangGung University, Taoyuan City, Taiwan; †Institute of Medical Mechatronics, College of Engineering, Chang-Gung University, Taoyuan City, Taiwan; ‡Department of Otorhinolaryngology, Chang-Gung Memorial Hospital, Taoyuan City, Taiwan; §Department of Otorhinolaryngology, Faculty of Medicine, School of Medicine, National Yang-Ming University, Taipei, Taiwan; and the ¶Department of Otolaryngology, Taipei City Hospital Renai Branch, Taipei, Taiwan. Address correspondence and reprint requests to Guo-She Lee, Department of Otolaryngology, National Yang-Ming University, No.155, Sec.2, Li-Nong Street, Bei-Tou District, Taipei City 112, Taiwan. E-mail:
[email protected] Journal of Voice, Vol. ■■, No. ■■, pp. ■■-■■ 0892-1997 © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jvoice.2017.01.015
During speech, however, the sensory feedback control subsystem located in the temporal lobe and parietal lobe continuously compares the inputs of auditory and somatosensory feedback with the reference targets copied from the feedforward subsystem. The detection of auditory errors would activate the auditory feedback control area in the superior temporal cortex for generating corrective motor commands. These auditory feedback-based commands are then subsumed in the feedforward controller in order to make the default feedforward commands properly tuned. The closed loop embedded in the speech chain has long been considered as an essential mechanism underlying the learning and stabilization of phonation and speech articulation.8,9 For the last few decades, many investigations have been conducted to enhance our understanding of the interactions between human vocal characteristics and auditory feedback. The results of several studies have provided evidence that vocal motor control is not only determined by what speakers attempt to express, but is also indirectly regulated by the audio-vocal feedback control system. For instance, the well-known phenomenon called Lombard effect manifests an adjustment of vocal intensity that occurs involuntarily as background noise level changes.10 The loud speech responding to realistic environmental noise also gives rise to higher F0, longer phonation time, and noticeable deterioration of voice stability, stress, and roughness.11 Moreover, when the voice feedback is pitch-shifted artificially during sustained phonation, many speakers show a compensatory adjustment of vocal fundamental frequency (F0) in the opposite direction to the pitch-shifted stimulus.12,13 In addition, delayed auditory feedback has been found to result in various types of speech disturbance as well, and the alterations are especially observed in speech articulation and fluency.14,15 With respect to the effect of hearing loss on vocal features, many investigations have presented obvious changes in the nonarticulatory aspects of the
ARTICLE IN PRESS 2 speech of hearing-impaired individuals, among which are more often the reduced overall speech rate, higher average F0, increased vocal intensity, and erratic or inappropriate pitch variations within a phrase or sentence.16–18 The deviations of intonation patterns measured from the hearing-impaired also show a greater variability of vocal fundamental frequency as a function of auditory feedback.19 As outlined above, vocal motor control in response to altered auditory feedback has been generally regarded as a compensatory mechanism leading to immediate adjustments in vocal pitch, loudness, and voice quality. Even though the effect of altered auditory feedback on vocal production usually can be perceived and deliberately modified by speakers, those who have high vocal demands in a noisy workplace still frequently exhibit vocal fatigue, vocal stress, and voice loss, and may predispose to the development of vocal fold abnormality.11,20,21 Therefore, these results seem to suggest that it is difficult for speakers to completely overcome the vocal disturbances from auditory interferences even though they have tried to maintain a relatively constant and stable voice outputs as possible. The rhythms underlying vocal F0 have been observed long ago in the studies analyzing the chaos or aperiodicity in vocal fold vibration.22,23 However, the origin and functional meaning of these rhythmic movements of vocal fold oscillation are not completely clear. In our previous studies, we decomposed the power spectrum of F0 into three components, namely the lowfrequency (0.2 Hz–3 Hz), middle-frequency (3 Hz–8 Hz), and high-frequency (8 Hz–25 Hz) ranges.24 The changes in the power of each frequency band may be reflective of different physiological information associated with the audio-vocal feedback control loop. For instance, the power of low-frequency modulations (LFP) in F0 remarkably increased in many speakers when there is a lack of sufficient auditory feedback of their own voice due to noise masking or sensor neural hearing loss.25–27 The power increment of low-frequency F0 rhythm indicated that vocal fold vibration was susceptible to the alteration in auditory feedback condition, and noise masking may result in a greater variability of vocal fold oscillations than in a quiet environment. Although the LFP of F0 has been found to sensitively respond to the perturbation in the audio-vocal feedback status, we have limited knowledge about the middle-frequency power (MFP) and high-frequency power (HFP) of F0 modulations. In terms of neuromuscular control of the vocal tract, essential tremor between 4 Hz and 8 Hz commonly appears as a vocal symptom in speakers with Parkinson’s disease (PD), spasmodic dysphonia, and other neurologic disorders.28,29 Because there is large overlap between the frequency ranges of the middle-frequency F0 modulation and pathologic vocal tremor, F0 variability within 3 Hz– 8 Hz may have the potential to reflect central neural regulation of laryngeal movements during voice production. Our previous work dealing with the vocal signals from PD patients also suggests a relationship between F0 variability and neuromuscular control of the larynx.30 Even though PD usually leads to laryngeal sensory deficits and the difficulty in immediately modifying existing motor behaviors according to the current sensory feedback status, speakers with PD can still exhibit sensorimotor adaptations in speech production when there is a mismatch
Journal of Voice, Vol. ■■, No. ■■, 2017
between actual and expected speech sounds.31,32 The levodopa therapy has been proven to be an effective treatment for early PD to reduce the extent of vocal tremor and laryngeal rigidity.33,34 Our previous study has shown a significant reduction in the MFP of F0 rhythm as the speech sensorimotor control of PD patients was improved after levodopa treatment.35 This result implies that changes in the F0 rhythms between 3 Hz and 8 Hz may reflect the adjustment of laryngeal muscle activity associated with central sensory-motor integration, and the decrease of MFP might point to a reduced variability of vocal fold vibration in order to achieve greater vocal efficiency. Power spectral analyses of the lowfrequency and high-frequency rhythms of F0 were also included in the current study to identify whether or not the MFP can better demonstrate the laryngeal adjustments for different levels of vocal demands or altered audio-vocal feedback status than the highfrequency rhythms. In the present study, we hypothesized that the MFP of F0 rhythms may be an acoustic indicator of the fine motor control over the phonation apparatus under different auditory feedback conditions and vocal intensity ranges. Because the change in the middle-frequency rhythm of vocal fold vibration was considered to be associated with the auditory-induced laryngeal regulations of vocalization, the measure of MFP was supposed to contribute to a new and noninvasive measurement of vocal efficiency as it would reflect a compensatory mechanism to improve vocal function. To provide support for our hypothesis, we used power spectral analysis of F0 to examine whether the MFP shows a significant difference when vocalizing under controlled auditory feedback conditions or with different vocal intensity ranges. Moreover, phonatory airflow temperature was simultaneously collected during each vocalization for estimating the mean airflow rate, which was later used to calculate the MFP-based vocal efficiency (MBVE) along with the values of MFP and output vocal power. PATIENTS AND METHODS Participants Twenty native Mandarin speakers (9 males, 11 females) aged between 20 and 40 years participated in this research. The participants reported no medical history of neurologic disorder, speech and language problem, or recent upper respiratory infection prior to the study. All participants passed the hearing screening test using pure-tone stimuli of 500, 1000, 2000, and 4000 Hz at an intensity of 25 dB hearing level (HL). An individual who has experience in voice or singing training was excluded. The research procedure was approved by the institution review board of Taipei City Hospital, Taipei, Taiwan (TCHIRB-1030802-E), and a written informed consent was obtained from each participant. Measurements of phonation airflow using thermistor Collection of exhaled airflow by an oronasal face mask needs to have a small dead space within the mask for improved accuracy in ventilatory measurement. However, the requirement of face seal makes it difficult to collect the voice signals and phonatory airflow simultaneously and accurately. The voice signals
ARTICLE IN PRESS Shao-Hsuan Lee, et al
Middle-Frequency Modulations of F0 in Noise and BAF
3
FIGURE 1. (A) The linear regression of 2-second temperature slope of phonatory airflow rate; and (B) the relationship between mean phonatory airflow rate and 2-second temperature slope. would be greatly attenuated if a microphone is set outside the mask. On the other hand, face seal leakage may take place if a microphone is inserted inside the mask; the closed space would also form an extra resonance cavity, which may lead to changes in the acoustic characteristics of phonation. Therefore, a ventilation mask with a calibrated NTC (negative temperature coefficient) thermistor (5 kΩ at 20°C, time constant = 0.23 seconds) fixed in the vent hole of the mask was used to collect phonatory airflow and to measure the temperature alteration as a surrogate of airflow variation during vocalization. The temperature signals obtained from NTC thermistor were amplified and digitally sampled at a rate of 200 Hz, and water baths at 20°C and 40°C were used to calibrate the thermistor. The temperature signals of sustained vowel [α] from two healthy volunteers were acquired using the mask with NTC thermistor. The mask was also connected to an airflow meter to obtain the airflow rate of the phonation. Although the vocal intensity could not be accurately measured owing to the acoustic signals that were substantially changed by the mask seal, the vocalizations produced at different voice efforts would still make different vocal intensity levels. The time-temperature curve and mean phonation airflow of each sustained vocalization were obtained. Using linear regression, the 2-second temperature signal after voice onset was used to acquire the temperature slope of vocalization; a typical example is shown in Figure 1A, which demonstrates a slope of 1.47 (°C/s) for 200 mL/s. The relationships between mean airflow rate and 2-second temperature slope were then analyzed for all vocalizations using linear regression. The correlation was excellent (Figure 1B; R2 = 0.84, P < 0.001), and the equation is listed below:
F = 55.1 + 96.6 ×δ
[α] at a low intensity (70~80 dBC) and at a high intensity (90~100 dBC) as long as possible and as steady as possible in pitch and loudness. Voice signals were digitally recorded in the format of 44.1 kHz and 16 bits using a standard microphone (IEC 651 Type II, Tenmars Electronics, Taipei, Taiwan) fixed in front of the vent hole of the ventilation mask (Figure 2). Vocal intensity was digitally sampled at 200 Hz and was displayed real time on a laptop computer as visual feedback to help the speakers maintain their vocal intensity within a given range. Meanwhile, four auditory feedback conditions were introduced to the participants: (1) natural auditory feedback (NO); (2) binaural speech noise masking through supra-aural headphones (SN); (3) bone-conducted auditory feedback of selfgenerated voice through a bone vibrator (BAF); and (4) providing speech noise masking and the bone-conducted feedback simultaneously (SN + BAF). The speech noise (equal intensity between 250 Hz and 1000 Hz, 12-dB roll-off per octave) used to block auditory feedback was generated by a computer program and the built-in sound adapter
(1)
where F represents the mean airflow rate in mL/s and δ refers to the linear slope of 2-second temperature signal after voice onset. Recording of voice and phonation airflow All signals were recorded in a room with double-walled soundproof. The ambient noises were less than 40 dBA throughout the experiment, and the room temperature was maintained at 20 ± 0.5°C. The participants were requested to sustained vowel
FIGURE 2. The experimental schematic.
ARTICLE IN PRESS 4 of a laptop computer (Asus A43S/Realtek High Definition audio, Taipei, Taiwan). A pair of supra-aural headphones (Telephonics, TDH-50) was used to transmit the noise to the speakers for SN and SN + BAF conditions. Calibrations were carried out using a standard sound level meter and a 6-cc coupler at an intensity of 80 dBC (Larson Davis System 824, New York), and the noise intensity was set at 80 dBC throughout the tests. In the BAF and SN + BAF conditions, the vocal signals collected from the sound level meter were relayed to a sound amplifier (HA3D, Superlux Inc., New Taipei City) and then were output to a bone vibrator (B81, Radioear Corporation, New Eagle, PA), fixing at the right mastoid area of the speakers for bony feedback of self-generated voice. The maximal output range of the bone vibrator was from 52 dB HL to 92 dB HL between 250 Hz and 8000 Hz. Calibration of the bone vibrator was performed using 1-kHz pure-tone and an artificial mastoid connected to the sound level meter (Type 4930 and Type 2235, Brüel & Kjær Sound & Vibration Measurement A/S, Copenhagen). Because the amplification gain of the bone vibrator was set at 10 dB, the speakers would hear their own voice through bony vibrations, with an intensity level that is 10 dB louder than the voice they actually made. The vocalizations under the four auditory feedback conditions of two levels of vocal intensity were arranged randomly for all participants. The recordings of voice and airflow temperature were repeated once in each condition, and the analytic results were averaged for statistical analysis. The perceptual evaluations of self-generated voice and vocal effort To evaluate the effect of audio-vocal feedback on the auditory perception of self-generated voice and the sensation of vocal effort, the speakers were asked to subjectively rate both the auditory clarity of their own voice and the sensation of vocal effort immediately after each vocalization in the scales of −2, −1, 0, 1, and 2. The scales were rated by best describing the auditory clarity of their own voice and the level of vocal effort, where −2 denotes “greatly decreased,” −1 denotes “slightly decreased,” 0 denotes “no obvious difference,” 1 denotes “slightly increased,” and 2 denotes “greatly increased,” compared with the perception of vocalization with natural auditory feedback (NO condition). Maximum phonation time and MFP-based vocal efficiency Maximum phonation time (MPT) was manually determined using the software which displayed the vocal waveforms offline. Moreover, the 2-second temperature signals since voice onset were used to get the temperature slope of phonation, and the estimated phonation airflow (PFEST) was derived from the slope value using Equation 1. Van den Berg first defined vocal efficiency as the ratio of the acoustic power to the subglottal power, and the subglottal power is the product of the subglottal pressure in kilopascals and the transglottal airflow rate in milliliters per second.36 However, the clinical application of vocal efficiency assessment is limited because the accurate measurement of subglottal pressure re-
Journal of Voice, Vol. ■■, No. ■■, 2017
quires an invasive puncture through the cricothyroid membrane, which has a potential risk for the subject.37 According to literature review, the phonation onset pressure was well correlated with the activations of laryngeal muscles in the canine model,38 and the subglottal pressure was generally regarded as a response to longitudinal tension and medial adductory compression of the vocal folds.39,40 Moreover, our previous investigation also revealed that the neuromuscular control of vocal fundamental frequency could be evaluated by examining the MFP in speakers with Parkinsonism.30 Therefore, in this study, with the relationships among MFP, subglottal pressure, and the modulations of intrinsic laryngeal muscles during phonation, we proposed a new index, MFP-based vocal efficiency (MBVE), as a noninvasive evaluation of vocal efficiency based on the formula proposed by Van den Berg.36 The equation to acquire MBVE is as follows:
MBVE = V ( MFP ⋅ PFEST )
(2)
2
where V is the acoustic power in watt/m calculated from C-weighted sound pressure level obtained in front of the mouth during sustained vowel phonation, MFP is the middle-frequency power of F0 spectrum, and PFEST is the airflow rate estimated from the temperature signals of the exhaled air. Power spectral analysis of F0 contour Because the phonation signals may be unstable at the very beginning of each phonation, the signals of the first second from voice onset were bypassed, and the 5-second voice signal starting at 1 second was used for further signal processing. The vocal fundamental period was extracted by detecting the maximum autocorrelation function of the data in a 20-millisecond window that included at least two glottal cycles. The period is compatible with the interval of a glottal cycle that repeats itself. Then the analytic window was shifted forward by the fundamental period to acquire the next period. After acquiring all the fundamental periods of the 5-second signal, the F0s were obtained from the reciprocals of the fundamental periods. To acquire a smooth contour of F0 and allow later power spectral analysis, the F0s of each 5-second vocalization were resampled by linear interpolation of F0s at the period of 20 milliseconds and then normalized by cent conversion according to the mean value of the resampled F0s. The power spectra of F0 were then divided into three powers: an LFP (0.2 Hz–3 Hz), an MFP (3 Hz– 8 Hz), and a HFP (8 Hz–25 Hz). The powers were calculated as the summation of the power amplitudes within each frequency range of the F0 spectrum and were expressed in decibels (dB), with the reference power of 1 cent2 as 0 dB. The details for signal processing and power calculation of F0 rhythms can be reviewed in our previous works. 24–27 The software for data acquisitions and analyses was developed using LabVIEW for Windows (Version 6.0i, National Instruments, Austin, TX). Analysis software and statistical analyses A three-way repeated-measures analysis of variance (threeway RM ANOVA) with pairwise comparisons using Bonferroni test (vocal intensity × noise × bone-conduction feedback) was used
ARTICLE IN PRESS Shao-Hsuan Lee, et al
Middle-Frequency Modulations of F0 in Noise and BAF
5
to analyze the effects of the independent variables on mean vocal intensity, F0, MPT, LFP, MFP, HFP, and PFEST. Pearson’s correlation analysis was performed to determine the correlation coefficient between PFEST and each of the acoustic variables obtained from the same vocalization. Moreover, Spearman’s correlation analysis was used to evaluate the strength of relationships between the acoustic properties of the sustained voices and the scores of subjective assessments. Statistical analyses were accomplished using SPSS for Windows, Version 17.0 (SPSS Inc., Chicago, IL), and P < 0.05 are considered statistically significant. All values are expressed as means ± standard error of the mean.
LFP of F0 power spectrum The three-way RM ANOVA revealed significant effects of vocal intensity [F(1, 19) = 30.89, P < 0.001] and noise masking [F(1, 19) = 9.08, P < 0.01] on the LFP of F0 (Figure 4A). The mean LFP was significantly decreased at high-intensity voicing but was significantly increased when vocalizing under noise masking. Moreover, in the low-intensity phonations, vocalizations with both of the noise masking and BAF showed a decreasing trend in the LFP as compared with the LFP of vocalizations under noise condition. These results are in accordance with our previous findings that LFP is an indicator of the status of audio-vocal feedback and is predisposed to increase or decrease when the auditory feedback of the speaker’s own voice is altered by noise.
RESULTS Vocal intensity, F0, MPT, and FEST The mean vocal intensity expressed in sound power (watt/cm2) was significantly different between the low- and the highintensity vocalizations [F(1, 19) = 249.79, P < 0.001, threeway RM ANOVA with post hoc pairwise comparisons of Bonferroni correction] (Figure 3A). BAF significantly decreased the vocal intensity [F(1, 19) = 11.41, P < 0.01] (Figure 3A), whereas SN only led to a nonsignificant increasing trend of vocal intensity [F(1, 19) = 3.65, P = 0.07] (Figure 3A). For vocal fundamental frequency, there was a significant increase in F0 when vocalized with high vocal intensity [F(1, 19) = 48.72, P < 0.001]. However, SN and BAF did not induce a significant change in the mean F0. For the duration of vocalization, the speech noise significantly lengthened the mean MPT (Figure 3B) [F(1, 19) = 17.9, P < 0.001], and on the contrary the BAF condition did not produce a significant effect on vocal F0. With respect to phonation airflow, the mean PFEST (Figure 3C) was significantly different only between the phonations at different intensities [F(1, 19) = 132.02, P < 0.001]. Neither speech noise nor BAF had a significant influence on PFEST.
MFP and HFP of F0 power spectrum The three-way RM ANOVA showed that the MFP of F0 significantly decreased when the speakers vocalized at high intensity [F(1, 19) = 10.9, P < 0.01] and under speech noise [F(1, 19) = 16.81, P < 0.01] (Figure 4B). Nonetheless, HFP significantly reduced only in the high-intensity phonations [F(1, 19) = 16.83, P < 0.01] (Figure 4C), and the speech noises did not change the HPF of F0 significantly. Although the roles of the middle- and high-frequency rhythms in phonation physiology have not been clearly identified, our previous and current studies have suggested that the degree of F0 rhythm variability in the low- and middle-frequency ranges defined in this study should be associated with the active neuromuscular controls and the vibratory mechanics of vocal folds during phonation, respectively.30,41 MFP-based vocal efficiency Figure 5 demonstrates the MBVE of different phonations. The high-intensity phonations showed a greater MBVE than the lowintensity phonations [F(1, 19) = 226.89, P < 0.001, three-way RM ANOVA]. The value of MBVE was significantly increased by noise masking [F(1, 19) = 8.45, P < 0.01], whereas the BAF significantly decreased it [F(1, 19) = 6.43, P < 0.05]. Moreover, the
FIGURE 3. The vocal intensity (A), maximum phonation time (MPT) (B), and phonatory airflow rate (PFEST) (C) at low and high vocal intensity levels in four auditory conditions: (1) natural auditory feedback (NO); (2) binaural speech noise masking (SN); (3) bone-conducted auditory feedback of self-generated voice (BAF); and (4) providing SN and BAF simultaneously (SN + BAF). *P < 0.05 versus phonations at low intensity using three-way repeated measures ANOVA and multiple pairwise comparisons with Bonferroni correction. †P < 0.05 versus phonations without SN and ‡P < 0.05 versus phonations without BAF in the same vocal intensity range using two-way repeated measures ANOVA and multiple pairwise comparisons with Bonferroni correction. ANOVA, analysis of variance.
ARTICLE IN PRESS 6
Journal of Voice, Vol. ■■, No. ■■, 2017
FIGURE 4. The mean low-frequency power (LFP) (A), middle-frequency power (MFP) (B), and high-frequency power (HFP) (C) of F0 at low and high intensities in auditory conditions of NO, BAF, SN, and SN + BAF. *P < 0.05 versus phonations at low intensity by three-way repeated measures ANOVA and multiple pairwise comparisons with Bonferroni correction. †P < 0.05 versus phonations without SN in the same vocal intensity range using two-way repeated measures ANOVA and multiple pairwise comparisons with Bonferroni correction. Refer to caption of Figure 3 for definitions of acronyms NO, SN, and BAF. ANOVA, analysis of variance.
change in MBVE resulting from vocal intensity variation was much greater than that from the auditory masking and BAF. Correlation between PFEST and acoustic signals There was an excellent correlation between PFEST and vocal intensity (R2 = 0.74, P < 0.001, Pearson’s correlation analysis). Additionally, there were fair and negative correlations of PFEST with LFP (R2 = 0.13, P < 0.001), MFP (R2 = 0.12, P < 0.001), and HFP (R2 = 0.13, P < 0.001).
FIGURE 5. MFP-based vocal efficiency (MBVE) at low and high vocal intensities in auditory conditions of NO, SN, BAF, and SN + BAF.*P < 0.05 versus phonations at low intensity using threeway repeated measures ANOVA and multiple pairwise comparisons with Bonferroni correction. †P < 0.05 versus phonations without SN and ‡ P < 0.05 versus phonations without BAF in the same vocal intensity range using two-way repeated measures ANOVA and multiple pairwise comparisons with Bonferroni correction. Refer to caption of Figure 3 for definitions of acronyms NO, SN, and BAF. ANOVA, analysis of variance; MFP, middle-frequency power.
Subjective assessment of auditory perception and vocal effort For the perceptions of vocal effort, the three-way RM ANOVA revealed the vocal effort was significantly greater in phonations with high vocal intensity [F(1, 19) = 5.65, P < 0.01] and with BAF [F(1, 19) = 5.65, P < 0.01]. For the perception of the speakers’ own voice, the speech noise significantly decreased the scores of auditory perceptions [F(1, 19) = 10.72, P < 0.01], whereas BAF significantly increased the scores of auditory perceptions [F(1, 19) = 10.72, P < 0.01]. The relationships between the subjective assessment scores and acoustic variables were analyzed using Spearman’s correlation coefficients. There were significant correlations of the scores of vocal effort with the vocal F0s (R = 0.1, P < 0.001, Spearman’s correlation) as well as the vocal jitter (R = −0.26, P < 0.05). In addition, the subjective score of auditory perception was significantly correlated only with the LFP of F0 (R = 0.03, P < 0.04). DISCUSSION In this study, sustained vowel phonations from 20 healthy speakers were collected to understand the relationships of the variability in F0 rhythms with vocal functions and audio-vocal feedback condition. The acoustic and aerodynamic signals were collected from four auditory feedback conditions and two vocal intensity level. The current study revealed that the middle-frequency rhythm of vocal fold oscillations was significantly affected by vocal intensity and noise masking. Vocal efficiency represented by MPT and the MBVE was also significantly increased by higher vocal intensity and noise masking. Furthermore, as BAF significantly increased the perceptions of self-generated voice and vocal effort, the enhanced auditory feedback of one’s own voice through a bone vibrator significantly decreased the vocal intensity and MBVE. Moreover, the audio-vocal feedback turned out to be better as revealed by the significant decrease of LFP in the highintensity vocalizations or without noise interference. Lastly, the
ARTICLE IN PRESS Shao-Hsuan Lee, et al
Middle-Frequency Modulations of F0 in Noise and BAF
HFP of F0 was significantly lower at high vocal intensity than at low intensity, whereas the altered auditory feedback did not cause any significant difference in HFP. These results suggested that there might be different laryngeal sensorimotor mechanisms for different frequency ranges of F0 rhythms, and the middle-frequency rhythm of F0 implies the motor adjustments of vocal fold vibration in response to different auditory feedback or vocal demands. Although these results were obtained from the tonal language speakers, we believe that the endogenous rhythms of vocal fold vibrations exist universally among individuals with different language experience because the internal periodical fluctuations underlying F0 were first revealed from the sustained vowel [α] of the native English speakers.22 The long-term periodicity of F0 contour was considered as an involuntary vibrato or tremor that the speakers could not suppress.22 Based on the feedforward and feedback mechanisms proposed by the DIVA model,6 when speech production is disturbed in some way, corrective feedback commands induced by the feedback errors in both auditory and somatosensory contribute to motor cortex tuning for intended speech output. Using pitchshift stimuli with a sinusoid rhythm and 100-millisecond delay, a vocal vibrato was artificially induced in healthy participants.42 The vowel was sustained with the F0s also swaying in a 5-Hz sinusoid pattern. This pitch-shift response is controlled by a closed loop negative feedback reflex.42 In this study, the MFP of F0 showed a significant difference as the auditory feedback condition was altered by noise masking or high vocal intensity. The results suggested that the rhythmic modulation of vocal fold vibrations was supposed to be associated with the sensorimotor integration of vocalization. Not only did the decrease of MFP refer to a reduced F0 variability between 3 Hz and 8 Hz, but it may also display a regulatory control of vocal fold vibration for compensating feedback alterations. In other words, the modulations were supposed to be an involuntary adjustment. When MFP is viewed together with MBVE, the results provide evidence that auditory feedback masking would result in significant influence on the intrinsic stability of vocal fold vibration, and the changes in the middle-frequency F0 rhythms may indicate a neuromuscular regulation from the auditory feedback control system for making vocal production more efficient. The results were consistent with our study hypothesis and compatible with the existing literature on the cortical sensorimotor integration of the auditory feedback system. As mentioned in the Introduction and Results sections, alterations in audio-vocal feedback can result in obvious adjustments of vocal pitch, vocal intensity, and the stability of vocal fold vibration. Changes in these voice variables represent the modulation of vocal efficiency ensued from auditory interference. Glottal efficiency is traditionally defined as the ratio of subglottic pressure to glottal flow.43 This energy conversion process is regulated by glottal resistance associated with vocal fold vibration.44,45 The frequency and amplitude of vocal fold vibration are loss factors during power transfer, which means greater frequency and amplitude of vibration increase the power absorbed by the vocal folds.46 Therefore, even though the origin and aeromechanics of F0 rhythms have not been identified, the amplitude of the rhythms
7
underlying vocal fold vibration may also play a role in the energy conversion efficiency. Several studies have been conducted to show the laryngeal neuromuscular control of vocal intensity or vocal effort, and the results demonstrated that an increased glottal resistance or medial adductory compression of the vocal folds should be a function of laryngeal nerve modulation, rather than glottal airflow. Increasing airflow was found to generate larger open quotients of vocal folds and thus lead to a greater decay in sound energy. The production of a louder phonation by increasing vocal fold resistance was more efficient than by increasing airflow.47–49 The findings herein also showed that the estimated phonatory airflow did not vary significantly with auditory conditions; however, the significant lengthening of MPT to respond to noise masking is generally regarded as an indicator of greater glottal efficiency.50,51 Additionally, the auditory masking significantly reduced MFP and brought improvement in MBVE, which suggested that the audio-vocal feedback control system appeared to reinforce glottal efficiency by stabilizing the underlying rhythms of vocal fold vibration when there is a discrepancy between anticipated and actually received speech sounds. For noninvasive and quantitative measurement of vocal efficiency, MFP and MBVE could be viewed as a more sensitive indicator of neuromuscular control of the larynx in response to altered auditory feedback. The significant difference of MBVE in different auditory feedback provided evidence that MBVE could be particularly used for evaluating audio-vocal interaction. These regulatory actions of larynx were supposed to maintain or foster vocal efficiency for different vocal demands or auditory feedback. The subtle changes hidden in vocal fold oscillation could be displayed using MFP and MBVE even if the loudness and pitch were maintained constant for each vocalization. Our previous researches point out that the LFP of F0 is sensitive to the changes in audio-vocal feedback status. LFP significantly increased when the audio-vocal feedback was interfered by the noises louder than their vocal intensity24,27 and in participants with severe hearing impairment.25,26,30 Moreover, the low-frequency fluctuations of F0 were significantly decreased by a raise in vocal intensity 24 or hearing aid amplification.26 In this research, BAF did not significantly reduce LFP, and that might have resulted from an inadequate hearing amplification of the bone vibrator as the maximal power was 52~80 dB HL in the low tones of 250~500 Hz. In addition, the significant decrease of vocal intensity by BAF might also reduce the feedback to a certain extent. In accordance with the responses of LFP and MFP to altered auditory feedback, the detection of auditory errors and the subsequent audio-motor control over vocal production are manifested by F0 fluctuations in different frequency ranges discretely. The difference in the response time of the intrinsic rhythms of vocal fold vibration also implies distinct neural circuits underlying the auditory feedback control system. CONCLUSIONS In conclusion, the current study showed that vocal efficiency was associated with the rhythmic adjustments of vocal fold vibration and would slightly increase with a disturbed audio-vocal feedback even though there was no obvious change in vocal pitch
ARTICLE IN PRESS 8 or loudness. This auditory-induced modulation of vocal fold vibration during sustained vowel production could be quantitatively measured by analyses of MFP and MBVE, which are noninvasive, simple, and objective methods for evaluating voice functions. Future studies will be necessary to explore the origin of the lowand middle-frequency rhythms underlying vocal fold vibration and to examine the development of vocal fold pathologies in relation to varying F0 rhythms.
Acknowledgment This study was supported by a grant from the Ministry of Science and Technology, Taiwan (104-2314-B-010-055-MY3).
REFERENCES 1. Ito T, Kimura T, Gomi H. The motor cortex is involved in reflexive compensatory adjustment of speech articulation. Neuroreport. 2005;16:1791–1794. 2. Jürgens U. The neural control of vocalization in mammals: a review. J Voice. 2009;23:1–10. 3. Houde JF, Nagarajan SS. Speech production as state feedback control. Front Hum Neurosci. 2011;5:1–14. 4. Todorov E. Optimality principles in sensorimotor control. Nat Neurosci. 2004;7:907–915. 5. Guenther FH. Cortical interactions underlying the production of speech sounds. J Commun Disord. 2006;39:350–365. 6. Guenther FH, Perkell JS. A neural model of speech production and its application to studies of the role of auditory feedback in speech. In: Perkell JS, Maassen B, Kent RD, et al., eds. Speech Motor Control in Normal and Disordered Speech. 1st ed. New York: Oxford University Press; 2007:29– 49. 7. Tourville JA, Guenther FH. The DIVA model: a neural theory of speech acquisition and production. Lang Cogn Process. 2011;26:952–981. 8. Denes PB, Pinson EN. The Speech Chain: The Physics and Biology of Spoken Language. 2nd ed. San Francisco: W. H. Freeman and Company; 1993. 9. Tourville JA, Reillya KJ, Guenther FH. Neural mechanisms underlying auditory feedback control of speech. Neuroimage. 2008;39:1429–1443. 10. Lombard E. Le signe de l’levation de la voix. Ann Malad Oreille Larynx Nez Pharynx. 1911;37:25. 11. Södersten M, Ternström S, Bohman M. Loud speech in realistic environmental noise: phonetogram data, perceptual voice quality, subjective ratings, and gender differences in healthy speakers. J Voice. 2005;19:29– 46. 12. Burnett TA, Freedland MB, Larson CR, et al. Voice F0 responses to manipulations in pitch feedback. J Acoust Soc Am. 1998;103:3153– 3161. 13. Chen SH, Liu H, Xu Y, et al. Voice F0 responses to pitch-shifted voice feedback during English speech. J Acoust Soc Am. 2007;121:1157– 1163. 14. Fairbanks G, Guttman N. Effects of delayed auditory feedback upon articulation. J Speech Hear Res. 1958;1:12–22. 15. Hashimoto Y, Sakai KL. Brain activations during conscious self-monitoring of speech production with delayed auditory feedback: an fMRI study. Hum Brain Mapp. 2003;20:22–28. 16. Lejska M. Voice field measurements—a new method of examination: the influence of hearing on the human voice. J Voice. 2004;18:209– 215. 17. Metz DE, Whitehead RL, Whitehead BH. Mechanics of vocal fold vibration and laryngeal articulatory gestures produced by hearing-Impaired speakers. J Speech Lang Hear Res. 1984;27:62–69. 18. Osberger MJ, McGarr NS. Speech production characteristics of the hearing impaired. In: Lass N, ed. Speech and Language: Advances in Basic Research and Practice, Vol. 8. New York: Academic Press; 1982:227–267.
Journal of Voice, Vol. ■■, No. ■■, 2017 19. Peng SC, Tomblin JB, Turner CW. Production and perception of speech intonation in pediatric cochlear implant recipients and individuals with normal hearing. Ear Hear. 2008;29:336–351. 20. Macleod J, Kalinowski J, Stuart A, et al. Effect of single and combined altered auditory feedback on stuttering frequency at two speech rates. J Commun Disord. 1995;28:217–228. 21. Neils LR, Yairi E. Effects of speaking in noise on vocal fatigue and vocal recovery. Folia Phoniatr Logop. 1987;39:104–112. 22. Titze IR. A model for neurologic sources of aperiodicity in vocal fold vibration. J Speech Hear Res. 1991;34:460–472. 23. Titze IR. Vocal Fold Physiology: Frontiers in Basic Science. San Diego: Singular Publishing Group; 1993. 24. Lee G-S, Hsiao T-Y, Yang CCH, et al. Effects of speech noise on vocal fundamental frequency using power spectral analysis. Ear Hear. 2007;28:343–350. 25. Lee G-S. Variability in voice fundamental frequency of sustained vowels in speakers with sensorineural hearing loss. J Voice. 2012;26:24–29. 26. Lee G-S, Liu C, Lee S-H. Effects of hearing aid amplification on voice F0 variability in speakers with prelingual hearing loss. Hear Res. 2013;302:1–8. 27. Lee S-H, Hsiao T-Y, Lee G-S. Audio-vocal responses of vocal fundamental frequency and formant during sustained vowel vocalizations in different noises. Hear Res. 2015;324:1–6. 28. Rubin JS, Sataloff RT, Korovin GS. Diagnosis and Treatment of Voice Disorders. San Diego: Plural Publishing; 2014. 29. Thenganatt MA, Louis ED. Distinguishing essential tremor from Parkinson’s disease: bedside tests and laboratory evaluations. Expert Rev Neurother. 2012;12:687–696. 30. Lee G-S, Lin S-H. Changes of rhythm of vocal fundamental frequency in sensorineural hearing loss and in Parkinson’s disease. Chin J Physiol. 2009;52:446–450. 31. Hammer MJ, Barlow SM. Laryngeal somatosensory deficits in Parkinson’s disease: implications for speech respiratory and phonatory control. Exp Brain Res. 2010;201:401–409. 32. Snider SR, Fahn S, Isgreen WP, et al. Primary sensory symptoms in parkinsonism. Neurology. 1976;26:423. 33. Gallena S, Smith PJ, Zeffiro T, et al. Effects of Levodopa on laryngeal muscle activity for voice onset and offset in Parkinson disease. J Speech Lang Hear Res. 2001;44:1284–1299. 34. Jiang J, Lin E, Wang J, et al. Glottographic measures before and after levodopa treatment in Parkinson’s disease. Laryngoscope. 1999;109:1287– 1294. 35. Lee G-S, Wang C-P, Fu S. Evaluation of hypernasality in vowels using voice low tone to high tone ratio. Cleft Palate Craniofac J. 2009;46:47–52. 36. Van den Berg J. Direct and indirect determination of the mean subglottal pressure. Folia Phoniatr Logop. 1956;8:1–24. 37. Isshiki N. Phonosurgery: Theory and Practice. Tokyo: Springer-Verlag; 1989:46–48. 38. Chhetri DK, Neubauer J, Berry AD. Neuromuscular control of fundamental frequency and glottal posture at phonation onset. J Acoust Soc Am. 2012;131:1401–1412. 39. Blomgren M, Chen Y, Ng ML, et al. Acoustic, aerodynamic, physiologic, and perceptual properties of modal and vocal fry registers. J Acoust Soc Am. 1998;103:2649–2658. 40. Plant RL, Younger RM. The interrelationship of subglottic air pressure, fundamental frequency, and vocal intensity during speech. J Voice. 2000;14:170–177. 41. Hsiao T-Y, Solomon NP, Luschei ES, et al. Modulation of fundamental frequency by laryngeal muscles during vibrato. J Voice. 1994;8:224–229. 42. Leydon C, Bauer JJ, Larson CR. The role of auditory feedback in sustaining vocal vibrato. J Acoust Soc Am. 2003;114:1575–1581. 43. Schutte HK. The Efficiency of Voice Production. Groningen: Research Institute for Neurosciences and Healthy Ageing, Faculty of Medical Sciences, University of Groningen; 1980. 44. Gelfer MP, Pazera JF. Maximum duration of sustained/s/and/z/and the s/z ratio with controlled intensity. J Voice. 2006;20:369–379. 45. Laukkanen AM, Lindholm P, Vilkman E. On the effects of various vocal training methods on glottal resistance and efficiency. Folia Phoniatr Logop. 1995;47:324–330.
ARTICLE IN PRESS Shao-Hsuan Lee, et al
Middle-Frequency Modulations of F0 in Noise and BAF
46. Titze IR. Principles of Voice Production. Englewood Cliffs: Prentice-Hall; 1994. 47. Berke GS, Hanson DG, Gerratt BR, et al. The effect of air flow and medial adductory compression on vocal efficiency and glottal vibration. Otolaryngol Head Neck Surg. 1990;102:212–218. 48. Koyama T, Kawasaki M, Ogura JH. Mechanics of voice production: regulation of vocal intensity. Laryngoscope. 1969;79:337–354.
9
49. Moore DM, Berke GS. The effect of laryngeal nerve stimulation on phonation: a glottographic study using an in vivo canine model. J Acoust Soc Am. 1988;83:705–715. 50. Eckel FC, Boone DR. The S/Z ratio as an indicator of laryngeal pathology. J Speech Hear Disord. 1981;46:147–149. 51. Sabol JW, Lee L, Stemple JC. The value of vocal function exercises in the practice regimen of singers. J Voice. 1995;9:27–36.