Differences in Shimmer Across Formant Regions

Differences in Shimmer Across Formant Regions

Differences in Shimmer Across Formant Regions re, *yOttawa, Ontario, Canada *Isabelle Leclerc, †Hilmi R. Dajani, and *Christian Gigue Summary: Object...

243KB Sizes 1 Downloads 40 Views

Differences in Shimmer Across Formant Regions re, *yOttawa, Ontario, Canada *Isabelle Leclerc, †Hilmi R. Dajani, and *Christian Gigue Summary: Objectives. Objective acoustic measures used to analyze phonatory dysfunction include shimmer and jitter. These measures are limited in that they do not take into account auditory processing. However, previous studies have indicated that shimmer may be processed differently along the tonotopic axis of the ear and, in particular, may be perceptually and physiologically significant around the third and fourth formants. Methods. This study investigated the relationship between shimmer around the first four formants (F1–F4) and in the broadband unfiltered speech waveform for 18 normal speakers from the voice disorders database of KayPENTAX. The voice samples were filtered around each formant with a bandwidth of 400 Hz and then shimmer was assessed using five built-in different measures from Praat software. Results. Comparisons of means tests revealed that shimmer increases significantly with formant frequency from F1 to F4, for all shimmer measures. Furthermore, for all shimmer measures, shimmer in the unfiltered speech was significantly and more strongly correlated with shimmer around F1 (r ¼ 0.45–0.61) and F2 (r ¼ 0.69–0.74), significantly but more weakly correlated with F4 (r ¼ 0.42–0.47), and not significantly correlated with F3. Conclusions. The findings indicate that there are differences in the shimmer found around the different formants and that shimmer information around F3 and F4 is not well captured in standard shimmer measurements based on the broadband unfiltered waveform. Key Words: Shimmer–Formants–Tonotopic processing. INTRODUCTION A variety of acoustic voice assessment techniques have been investigated by scientists and clinicians as aids to voice disorders diagnosis and treatment.1 The acoustic measurements include the average voice energy and fundamental frequency or pitch (F0) in sustained vowels or during conversational speech, rapid perturbations in voice amplitude and frequency (shimmer and jitter), amplitude and frequency modulations (tremors), estimates of the proportion of aperiodicity (signal-to-noise ratio), and more advanced methods based on cepstral analysis, nonlinear dynamics, and chaos theory.2–4 In studies involving clinical voice patients, shimmer and jitter remain the most commonly applied acoustic voice measures.5,6 Both measures have been extensively used to detect or characterize voice pathologies in patients with vocal folds polyps, cysts and nodules, vocal folds edema, spasmodic dysphonia, Parkinson’s disease, and surgical scarring.7–13 Shimmer and jitter values above a certain threshold are considered to be related to pathologic voices that are usually perceived by humans as breathy, rough, or hoarse voices. Shimmer and jitter can also reflect the emotional state of the speaker14 and may be useful as an additional acoustic feature for speaker verification systems.15 Shimmer assesses the involuntary variation (or perturbation) in voice amplitude from one vibratory cycle of the vocal folds (pitch period) to the next.16 Computer-based software such as Praat (http://www.praat.org/) and the Multi-Dimensional Voice Program (MDVP) from KayPENTAX (Montvale, NJ) are often used for acoustic voice analysis. Typically, with these tools, the Accepted for publication May 3, 2013. From the *Audiology and Speech-Language Pathology Program, University of Ottawa, Ottawa, Ontario, Canada; and the ySchool of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Ontario, Canada. Address correspondence and reprint requests to Hilmi R. Dajani, School of Electrical Engineering and Computer Science, University of Ottawa, 161 Louis Pasteur, Ottawa, Ontario, Canada K1N 6N5. E-mail: [email protected] Journal of Voice, Vol. 27, No. 6, pp. 685-690 0892-1997/$36.00 Ó 2013 The Voice Foundation http://dx.doi.org/10.1016/j.jvoice.2013.05.002

shimmer is calculated on a stable vowel of 3–4 seconds in duration, where the patient is asked to produce mostly an /a/ or /i/ at a comfortable vocal effort. The amplitude or peak energy is first extracted from each voice cycle and then the mean shimmer value is calculated from the resulting cycle-to-cycle amplitude contours using one or more smoothing functions. Praat, for instance, features different shimmer parameters that can be calculated over 1, 3, 5, or 11 cycles (a more detailed description is given in the Methods section below). These shimmer measurements are directly computed from the broadband (ie, unfiltered) speech waveform recorded with the patient. No frequencydependent analysis of the shimmer is performed although it is known that the pitch of complex acoustic signals is processed differently in different regions of the auditory tonotopic axis. In auditory nerve single fiber recordings in animals, the representation of the pitch of complex stimuli varies as a function of frequency content,17 whereas in the perceptual domain, the pitch of a complex of harmonics is more salient for lower frequency (resolved) harmonics than for unresolved harmonics.18 Recent results suggest that shimmer may be represented differently by the auditory system along the tonotopic dimension, that is, depending on where the acoustic energy is located across the frequency axis of the ear.14,19,20 In these studies, shimmer was measured separately in different frequency regions of the speech signal, over a few bands centered on the formants or over a large number of overlapping bands to give a finely grained tonotopic representation of shimmer, instead of being extracted from the broadband unfiltered waveform only. Ito14 mentions that shimmer in speech that is band-pass filtered around the third formant (F3) could be a possible cue for judging the social relationship between Japanese speakers. Dajani and Giguere20 found that the correlation between the shimmer in speech-evoked brainstem responses and the shimmer in the acoustic waveform was higher when the analysis of correlation was based on the acoustic waveform filtered around F3 or F4 than when it was based on the unfiltered waveform or the waveform filtered around F1 or F2. This finding was

686 unexpected because the power of the unfiltered speech and speech filtered around F1 or F2 was considerably higher than that filtered around F3 or F4. For example, the correlation coefficient between the speech filtered around F1 and the evoked brainstem response was 0.35, whereas the correlation coefficient between the speech filtered around F4 and the evoked brainstem response was 0.8, although the power in the speech filtered around F1 was 8.4 dB higher than that filtered around F4. Dajani et al19 proposed a tonotopic shimmer spectral distribution to describe such frequency-dependent representation of shimmer in the auditory system. One pending question is whether the frequency-dependence of the shimmer found in the perceptual and electrophysiological studies mentioned above is the result of auditory-specific processing or if it is already present in the acoustic waveform during speech production. Shimmer may be produced during speech production due to several causes: (1) cycle-by-cycle variations in the vibratory amplitude or shape of the glottal waveform, (2) aspiratory noise added to the periodic vibration, (3) cycle-by-cycle variations in the pitch period (jitter), and (4) nonphonatory changes such as fine articulatory motion. The first cause is the usual interpretation for the origin of shimmer. Although variations in the cycle-by-cycle amplitude of the vocal folds without associated change in the glottal waveform will induce equivalent amplitude variations in the voice fundamental and harmonics, changes in the shape of the opening or closure phases of the vocal folds will affect the tilt of the glottal spectrum on each cycle which is expected to contribute to greater amplitude variations in the high-order voice harmonics than the lower harmonics or the fundamental. The second cause will contribute small random fluctuations to the voice fundamental and harmonics in each cycle. The last two causes can produce shimmer through variations in the harmonic amplitude distribution at the output of the frequency-dependent vocal tract transfer function. Jitter effects, in particular, could also explain how shimmer may vary across different frequency regions. High-order harmonics that move under the envelope of a formant resonance are subject to more change in amplitude than low-order harmonics because the absolute frequency deviation they experience is higher. In a given male with voice fundamental at 100 Hz, for example, a 1% jitter will induce a 1 Hz frequency deviation at F0 but a 10 Hz deviation in the 10th harmonic. Given the sharpness of the formant resonances, higher formants are thus likely to produce higher amounts of harmonic amplitude variations than lower formants under jitter conditions. This effect would not be expected to be completely offset by the increasing bandwidths of higher formants. On the other hand, the contribution of all the phonatory and nonphonatory effects above to the overall and frequency-dependent shimmer is further complicated by the fact that the strength of the glottal waveform harmonics decreases with frequency at a rate of approximately 12 dB/octave and that formant energy is generally lower in the higher formants. Given these issues, this study aimed to investigate the relationship between the shimmer in the first four formant regions (F1–F4) of the acoustic speech waveform and the shimmer in the broadband unfiltered waveform commonly measured in

Journal of Voice, Vol. 27, No. 6, 2013

the voice clinic, for a normal population of talkers. If acoustic shimmer measurements exhibited significant differences across the different formant regions of speech or in comparison with the broadband shimmer, then the standard clinical measurement of shimmer may not fully describe perceptually significant speech amplitude perturbations or match internal representations of speech as found in electrophysiological studies. METHODS Database Voice samples were selected from the voice disorders database developed by the Massachusetts Eye and Ear Infirmary Voice and Speech Laboratory and available on CD-ROM through KayPENTAX (Disordered Voice Database and Program, model 4337). The database comprises a set of sustained /a/ vowel samples of 3-second duration from 53 different adult normal talkers (21 males and 32 females). Each voice sample from the KayPENTAX database is stored under the proprietary NSP file format at a sampling frequency of 50 kHz and 16-bit resolution. All voice samples were converted into a WAV format to use them with Cool Edit Pro, Version 1.2 (Syntrillium Software Corporation, Scottsdale, AZ) and Praat software, Version 4.1.9 (http://www.praat.org). Sample selection Inclusion criteria were used to select a subset of voice samples from the database containing adequate frequency separation between formants and stable formant frequencies over the recording duration. This preselection was made necessary in the present study to ensure that subsequent shimmer measurements could be associated with clearly defined formant frequency regions. The criteria were (1) a minimum frequency separation of 400 Hz between adjacent formants (F2-F1, F3-F2, and F4-F3) and (2) a standard deviation for each formant contour (F1 to F4) no more than three times the standard deviation in formant frequency calculated over all the 53 voice samples. All voice samples were analyzed using Praat and the formant-listing feature used to extract the first four formant frequency contours over the 3-second vowel duration. The average formant frequency separation between talkers was 507, 1336, and 917 Hz for F2-F1, F3-F2, and F4-F3, respectively. The average within-talker standard deviation in the formant frequency contours was 28.1, 76.0, 121.0, and 146.3 Hz for formants F1–F4, respectively. In all, 24 voice samples met the criteria and were retained for the experiment. There were seven males ranging in age from 28 to 45 years (mean 35 years) and 17 females ranging in age from 22 to 52 years (mean 36 years). Filtering Because the energy of a vowel is contained mainly in the formant regions, we filtered the selected voice samples around each formant to be able to produce a frequency-dependent representation of shimmer. Instead of obtaining one shimmer measure for the total waveform, four frequency-specific measures, one for each formant, could then be obtained. The filter

Isabelle Leclerc, et al

687

Differences in Shimmer Across Different Formants

TABLE 1. Mean and (Standard Deviation) of Five Shimmer Measures for Each of the Formant-Filtered and Unfiltered Waveforms Shimmer Measure Voice Waveform Formant 1 Formant 2 Formant 3 Formant 4 Unfiltered

Local

apq3

apq5

apq11

dda

1.84 (0.97) 4.17 (2.54) 5.69 (2.74) 10.04 (3.30) 1.80 (1.00)

0.99 (0.56) 2.30 (1.36) 3.22 (1.54) 5.45 (1.73) 1.00 (0.61)

1.10 (0.60) 2.54 (1.64) 3.30 (1.72) 6.33 (2.19) 1.07 (0.58)

1.47 (0.74) 3.18 (2.01) 5.51 (1.94) 8.08 (3.03) 1.34 (0.65)

2.98 (1.70) 6.90 (4.09) 9.66 (4.62) 16.35 (5.19) 3.00 (1.85)

Notes: Local, apq3, apq5, apq11, and dda are different measures of shimmer (refer to the Methods section for details on how they are calculated).

bandwidth was chosen to be 400 Hz. This choice of a bandwidth is based on a compromise between the need for the filter to be sufficiently narrow to isolate the speech signal around the formant and to be sufficiently wide to reflect the information that is available to higher auditory centers from a combination of multiple narrow cochlear filters at low tonotopic frequencies.20 This bandwidth of 400 Hz was also used by Ito14 to filter the speech signal around the third formant region. Each voice sample was then filtered with a 400 Hz Butterworth bandpass filter of order 8 around the F1, F2, F3, and F4 formant frequencies using Cool Edit Pro.

Shimmer measurements Shimmer measurements were carried out using the Voice Report feature in Praat for each formant-filtered waveform in addition to the original unfiltered voice sample. Shimmer contours were very stable within subjects, for the unfiltered and formantfiltered waveforms. Nonetheless, to ensure that shimmer measurements were exempt from artifacts due to poor F0 extraction in some subjects, F0 contours were visually inspected for each formant-filtered waveform processed. Shimmer extractions that were associated with F0 contours that deviated from the unfiltered F0 reference contours more than 50% of the time were rejected as a precautionary measure to ensure data quality. The results from six subjects were rejected in this way, leaving 18 subjects for the remainder of the analysis. There were seven males ranging in age from 28 to 45 years (mean ¼ 35 years) and 11 females ranging in age from 22 to 44 years (mean ¼ 34 years).

Shimmer was extracted using five different shimmer measures available from Praat (version 4.1.9) as follows:  Shimmer (local): This is the average absolute difference between the amplitudes of consecutive periods, divided by the average amplitude. In MDVP, this parameter is called Shim and a value above a threshold of 3.810% is considered abnormal.21  Shimmer (apq3): This is the three-point amplitude perturbation quotient, the average absolute difference between the amplitude of a period and the average of the amplitudes of its neighbors, divided by the average amplitude.  Shimmer (apq5): This is the five-point amplitude perturbation quotient, the average absolute difference between the amplitude of a period and the average of the amplitudes of it and its four closest neighbors, divided by the average amplitude.  Shimmer (apq11): This is the 11-point amplitude perturbation quotient, the average absolute difference between the amplitude of a period and the average of the amplitudes of it and its 10 closest neighbors, divided by the average amplitude. In MDVP, this parameter is called APQ (Amplitude Perturbation Quotient) and a value above a threshold of 3.070% is considered abnormal.21  Shimmer (dda): This is Praat’s original get shimmer. The value is three times APQ3. In addition to the shimmer measures above, the energy level (dB) associated with each formant-filtered waveform and the unfiltered waveform was calculated using Praat.

TABLE 2. Correlation Coefficient r Values Between the Shimmer in F1, F2, F3, and F4 Regions and the Shimmer in the Unfiltered Waveform, Over All the Subjects, for Each of the Five Shimmer Measures, Local, apq3, apq5, apq11, and dda (Refer to the Methods Section for Details on How They are Calculated) Correlation Coefficient (r) Shimmer Measure Local apq3 apq5 apq11 dda ***P < 0.001; **P < 0.01; *P < 0.05.

F1-Unfiltered

F2-Unfiltered

F3-Unfiltered

F4-Unfiltered

0.49* 0.45* 0.57** 0.61** 0.45*

0.72*** 0.69*** 0.74*** 0.73*** 0.69***

0.30 0.29 0.34 0.26 0.29

0.46* 0.42* 0.45* 0.47* 0.42*

688

FIGURE 1. The shimmer measure apq11 averaged across all subjects (n ¼ 18) in the unfiltered speech waveform and in the speech waveform band-pass filtered around the first four formants. The error bar corresponds to the standard error. Means with different letters indicate a significant difference between the filters at P < 0.05. Means with the same letters indicate pairs of filters that are not significantly different. Refer the Methods section for details on how shimmer apq11 is calculated.

Statistical tests Comparisons of means tests were performed in SPSS v. 19 (IBM, New York, NY) to test if each shimmer measure exhibited significant differences between the different types of filters (F1, F2, F3, F4, and the unfiltered voice sample). The assumption of equal variances was tested with the Levene Test for Equality of Variances. Subsequent post hoc analyses were conducted to find which frequency regions differed significantly from one another for each shimmer measure. Linear regressions were also performed in Excel to determine the correlation of shimmer measurements between F1, F2, F3, F4 regions, and the unfiltered voice sample.

Journal of Voice, Vol. 27, No. 6, 2013

RESULTS The results are based on the 18 subjects who met the inclusion criteria and that were not rejected because of erratic F0 extraction from the formant-filtered waveforms. The results of all five shimmer measures (local, apq3, apq5, apq11, and dda) are reported in tabular form (Tables 1 and 2). However, because all five shimmer measures exhibited very similar behavior, the results are illustrated graphically only for shimmer apq11, the 11point amplitude perturbation quotient used in MDVP (Figures 1 and 2). Table 1 lists the mean and standard deviation of the five shimmer measures by formant and for the unfiltered speech waveform. The comparison of means tests indicated unequal variances; therefore, Welch tests were used to compare means. The P values of all the Welch robust tests of equality of means were under 0.05 thereby establishing statistically significance differences among the means of the various types of filters for each shimmer measure. Post hoc tests were performed with the Tamhane T2 procedure for multiple comparisons, as this method does not assume equal variances. These statistical tests showed the same outcome for all shimmer measures. For shimmer apq11, for example, the post hoc test revealed significant between-filters differences for: Filters F1 and F2 (P < 0.05), Filters F1 and F3 (P < 0.001), Filters F1 and F4 (P < 0.001), Filters F2 and F3 (P < 0.05), Filters F2 and F4 (P < 0.001), Filter F2 and the unfiltered voice sample (P < 0.05), Filter F3 and the unfiltered voice sample (P < 0.001), and Filter F4 and the unfiltered voice sample (P < 0.001) (Figure 1). There were no statistically significant between-filters differences for Filters F3 and F4 as well as for Filter F1 and the unfiltered voice sample (Figure 1). Statistical comparisons of the means of all the other shimmer measures (local, apq3, apq5, and dda) gave an identical grouping to that found for shimmer apq11, with Filter F1

FIGURE 2. Linear correlation in the shimmer measure apq11 between the speech waveforms filtered around the first, second, third, and fourth formants (F1 Shimmer, F2 Shimmer, F3 Shimmer, and F4 Shimmer) and the unfiltered speech waveform (Unfiltered Shimmer). Refer the Methods section for details on how shimmer apq11 is calculated.

Isabelle Leclerc, et al

Differences in Shimmer Across Different Formants

TABLE 3. Mean Energy Level and Standard Deviation Over All Subjects, for the Formant-Filtered Waveforms and the Unfiltered Waveform F1

F2

F3

F4

Mean energy level (dB) 75.7 69.6 57.2 54.1 Standard deviation (dB) 5.2 4.0 5.1 6.1

Unfiltered 78.5 4.1

and the unfiltered voice sample in one group, Filter F2 in a second, and Filter F3 and Filter F4 in a third. Linear regressions and one-tailed statistical tests on the correlation coefficients were performed between shimmer in the unfiltered voice sample and that in the four filtered samples. Filters F1 and F2 were found to be the most highly correlated with the unfiltered sample. For shimmer apq11, the correlation coefficients (r) for F1, F2, F3, and F4 were 0.61 (P < 0.01), 0.73 (P < 0.001), 0.26 (ns), and 0.47 (P < 0.05), respectively (Figure 2). A similar pattern of correlations and significance levels was also found for the other shimmer measures (Table 2). As for the energy level for each formant-filtered waveform, it decreased from F1 to F4, with F1 and F2 being fairly close to the unfiltered waveform energy, whereas F3 and F4 were at least 20 dB lower (Table 3). DISCUSSION AND CONCLUSIONS This study provides evidence that the acoustic shimmer measurements for normal talkers exhibit significant differences across the different formant regions of speech and also compared with the broadband unfiltered waveform, both in terms of mean differences and correlation analysis. The mean shimmer in the different formant regions of the acoustic speech waveform increases from F1 to F4 (Table 1). It should be noted that shimmer, as calculated here using Praat, is a relative measure of voice amplitude perturbation (ie, relative to the underlying acoustic waveform energy). Although relative shimmer is high around formants F3 and F4, in absolute terms, these amplitude perturbations are small when we consider the sharp decrease in acoustic energy with formant frequency for the group of talkers and vowel in this study (Table 3). However, although the shimmer around F3 and F4 is small in absolute terms, this does not necessarily mean that it is of little perceptual importance, given that the auditory system processes sound in somewhat independent frequency channels. The mean shimmer in the unfiltered waveform is different from that in all formant-filtered waveforms except for F1, the formant containing most of the acoustic energy within the vowel. These findings were independent of the shimmer measure used. For all shimmer measures studied, the results also indicate that shimmer in the unfiltered speech was significantly and more strongly correlated with shimmer around F1 (r ¼ 0.45–0.61) and F2 (r ¼ 0.69–0.74), while it was significantly but more weakly correlated with F4 (r ¼ 0.42–0.47) and not significantly correlated with F3 (r ¼ 0.26–0.34) (Table 2), although F1 and F2 are the two dominant formants in terms of

689

acoustic energy (Table 3). This suggests that shimmer extracted from F3 and F4 may provide different information from that extracted from the unfiltered speech (or from F1 and F2). Thus, the shimmer around F3 and F4 is not reflected in standard measurements of shimmer in the clinic, which are uniquely based on the broadband unfiltered speech waveform. In a recent study by Dajani and Giguere20 on the relationship between the acoustic stimulus and brainstem response evoked by a natural vowel stimulus, stronger correlations were obtained between the amplitude contour of the brainstem evoked response and of the speech signal filtered around F3 and F4 than with the speech signal filtered around F1 and F2, a finding that fits with the higher shimmer values found around F3 and F4 in this study. This suggests that the shimmer is processed differently from the acoustic stimulus to the brainstem in the different formant frequency regions. Taken together, the conclusion from Dajani and Giguere20 and the results from this study suggest that the shimmer around F3 and F4 may be important from an auditory perspective, yet it is not well represented in standard measurements of shimmer. In vowel perception, the first two formants are thought to play a primary role. However, higher formants can modulate the perceptual value of F1 and F2, for example, by influencing the boundaries between vowel categories.22 Further psychoacoustic studies are needed to examine the relative perceptual impact of the different sources of shimmer across the different frequency regions and how they are processed by the human auditory system. REFERENCES 1. Murugappan S, Boyce S, Khosla S, Kelchner L, Gutmark E. Acoustic characteristics of phonation in ‘‘wet voice’’ conditions. J Acoust Soc Am. 2010; 127:2578–2589. 2. Maccallum J, Zhang Y, Jiang J. Vowel selection and its effects on perturbation and nonlinear dynamic measures. Folia Phoniatr Logop. 2010;63: 88–97. 3. Mehta D, Hillman R. Voice assessment: updates on perceptual, acoustic, aerodynamic, and endoscopic imaging methods. Curr Opin Otolaryngol Head Neck Surg. 2008;16:211–215. 4. Titze IR. Workshop on Acoustic Voice Analysis: Summary Statement. Iowa City, IA: National Center for Voice and Speech, USA; 1995. 5. Brockmann-Bauser M, Drinnan MJ. Routine acoustic voice analysis: time to think again? Curr Opin Otolaryngol Head Neck Surg. 2011;9:165–170. 6. Buder EH, Strand EA. Quantitative and graphic acoustic analysis of phonatory modulations: the modulogram. J Speech Lang Hear Res. 2003;46: 475–490. 7. Klingholz F. Acoustic recognition of voice disorders: a comparative study running speech versus sustained vowels. J Acoust Soc Am. 1990;87: 2218–2224. 8. Kreiman J, Gerratt BR, Kempater G, Erman A, Berke GS. Perceptual evaluation of voice quality: review, tutorial, and a framework for future research. J Speech Hear Res. 1993;36:21–40. 9. Herzel H, Berry DA, Titze IR, Saleh M. Analysis of vocal disorders with methods from nonlinear dynamics. J Speech Hear Res. 1994;37: 1008–1019. 10. Vieira M, Mclnnes F, Jack M. Comparative assessment of electroglottographic and acoustic measures of jitter in pathological voices. J Speech Lang Hear Res. 1997;40:170–182. 11. Jiang JJ, Zhang Y, McGilligan C. Chaos in voice, from modeling to measurement. J Voice. 2006;20:2–17. 12. Zhang Y, Jiang JJ. Chaotic vibrations of a vocal fold model with a unilateral polyp. J Acoust Soc Am. 2004;115:1266–1269.

690 13. Zhang Y, Jiang JJ, Biazzo L, Jorgensen M. Perturbation and nonlinear dynamic analysis of voices from patients with unilateral laryngeal paralysis. J Voice. 2005;19:519–528. 14. Ito M. Politeness and Voice Quality—The Alternative Method to Measure Aspiration Noise. Proceedings of Speech Prosody, Nara, Japan; 2004. Available at: http://www.isca-speech.org/archive_open/sp2004/sp04_213.pdf. (Last accessed May 26, 2013). 15. Farrus M, Hernando J. Using Jitter and Shimmer in speaker verification. IET Signal Process. 2009;3:247–257. 16. Colton RH, Casper JK, Leonard R. Understanding Voice Problems: A Physiological Perspective for Diagnosis and Treatment. 3rd ed. Baltimore, MD: Lippincott Williams & Wilkin; 2005. 17. Cariani PA, Delgutte B. Neural correlates of the pitch of complex tones I. Pitch and pitch salience. J Neurophysiol. 1996;76:1698–1716. 18. Larsen E, Cedolin L, Delgutte B. Pitch representations in the auditory nerve. Two concurrent complex tones. J Neurophysiol. 2008;100:1301–1319.

Journal of Voice, Vol. 27, No. 6, 2013 19. Dajani HR, Giguere C, Wong W, Kunov H. Auditory-Inspired Estimation of Jitter and Shimmer Spectra. Proceedings of the 16th International Congress on Sound and Vibration, Krakow, Poland; 2009. 20. Dajani HR, Giguere C. Tonotopic Variation in the Correlation Between the Shimmer of a Natural Vowel and That of the Evoked Response. Proceedings of the 20th International Congress on Acoustics, Sydney, Australia; 2010. Available at: http://www.acoustics.asn.au/conference_proceedings/ ICA2010/cdrom-ICA2010/papers/p622.pdf. (Last accessed May 26, 2013). 21. Boersma P, Weenink D. Praat: Doing Phonetics by Computer (Version 4.1.9). Computer Program and Manual; 2009. Available at: http://www. praat.org. (Last accessed May 26, 2013). 22. Johnson K. Speech normalization in speech perception. In: Pisoni DB, Remez RE, eds. The Handbook of Speech Perception. Malden, MA: Blackwell Publishing; 2005.