Perception of Pitch and Roughness in Vocal Signals with Subharmonics

Perception of Pitch and Roughness in Vocal Signals with Subharmonics

Journal of Voice Vol. 15, No. 2, pp. 165–175 © 2001 The Voice Foundation Perception of Pitch and Roughness in Vocal Signals with Subharmonics Christi...

133KB Sizes 0 Downloads 88 Views

Journal of Voice Vol. 15, No. 2, pp. 165–175 © 2001 The Voice Foundation

Perception of Pitch and Roughness in Vocal Signals with Subharmonics Christine C. Bergan and Ingo R. Titze Department of Speech Pathology and Audiology, National Center for Voice and Speech, The University of Iowa, Iowa City, Iowa

Summary: Pitch and roughness were rated according to the extent of amplitude modulation (AM) and frequency modulation (FM) of a subharmonic [fundamental frequency (F0)/2]. The objective was to determine the identification boundaries for pitch and roughness and to discover how both kinds of modulation affect these boundaries. Another objective was to judge the reliability between subjects when identifying subharmonic-related pitch and roughness. Three procedures were used: ABX comparisons, method of adjustment, and rating of roughness. Results indicated that the crossover point to the lower pitch (associated with the subharmonic) occurred between 10% and 30% modulation, depending on modulation type and F0. Subjects demonstrated highly variable perceptions of pitch and roughness, with poor intersubject reliability. Key Words: Voice—Pitch—Roughness—Subharmonics—Modulation.

INTRODUCTION

voice,1-4 there is a lack of parametric studies of specific waveform characteristics and their effect on vocal qualities. Recent research2,5 has demonstrated poor intersubject reliability and validity when rating such qualities as roughness and breathiness. “Listeners agreed very poorly in the midrange of scales for breathiness and roughness, and mean ratings in the midrange of such scales did not represent the extent to which a voice possesses a quality, but served only to indicate that the listeners disagreed.”5 DeBodt and colleagues studied the effect of experience and professional background on perceptual rating of voice quality using the GRBAS (Grade, Roughness, Breathiness, Asthenia, Strain) scale.3 They presented nine pathological voices to speech language pathologists and otolaryngologists with a 14-day test-retest interval. The test-retest reliability was moderate, with best agreement occurring for the G (grade) parameter and the worst for the S (strained) parameter. They concluded that professional background had a greater impact on perceptual rating than experience.

The ability to evaluate the perceptual qualities of the human voice is of practical interest to voice clinicians and singing teachers. If diagnosis and therapy are to improve, it is crucial that perceptual ratings of broad categories such as pitch, loudness, roughness, resonance, and breathiness can be broken down into finer subcategories. Ideally, the subcategories would reflect specific vocal fold oscillation properties. For example, roughness resulting from a subharmonic may be distinguishable from roughness resulting from aperiodicity. Aside from the existing inadequacy of current terminology for clinical judgment of the

Accepted for publication August 4, 2000. Address correspondence and reprint requests to Ingo Titze, 330–SHC, the University of Iowa, Iowa City, IA 52242, USA. Portions of this paper were presented as a poster session at the 27th Voice Foundation Symposium, Philadelphia, PA, June 1999.

165

166

CHRISTINE C. BERGAN AND INGO R. TITZE

Gerratt and colleagues compared internal and external standards in voice quality judgments through the use of standard rating scales versus scales in which a set of anchors was presented prior to rating.4 Their results showed that the use of an anchored scale (when compared to an unanchored scale) significantly increases both intersubject and intrasubject reliability. Other research6 has also shown poor reliability of ratings. A wealth of literature exists concerning the general perception of pitch and roughness. According to the American National Standards Institute (ANSI) standard of 1994,7 “Pitch is that attribute of auditory sensation in terms of which sounds may be ordered on a scale extending from low to high.” When the intensity increases, the pitch of a low tone decreases, whereas the pitch of a high tone increases.8 Beerands9 found that the pitch of short tones is less salient than the pitch of long tones. Pitch depends mainly on the frequency content of the sound stimulus, but it also depends on the intensity, duration, and spectrum of a voiced sound. The sensation of two interfering simple tones (sinusoids) is characterized as roughness in speech and tonal dissonance in music. Maximal roughness and maximal dissonance are sometimes considered synonymous. The degree of interference among the harmonics of two sinusoids determines the amount of dissonance or consonance perceived for simple frequency ratios of the fundamentals. Perception of roughness may also result from either amplitude modulation or frequency modulation of the tone. Terhardt10 found that the roughness of a sinusoidally amplitude-modulated tone depends primarily on the relative modulation amplitude (fluctuation of the temporal envelope relative to the carrier). In a complex tone, each partial produces potential pitches corresponding to subharmonics of that partial and the perceived pitch is that for which the number of coincidences is greatest. Roughness seems to be strongly correlated with the envelope fluctuations of a sound that has passed through a critical band filter. A parameter would be the ratio of fluctuation amplitude to a steady-state mean value. The resulting perception of roughness is composed of the partial roughnesses which are contributed by adjacent critical bands. When subharmonics exist in the source waveform, due to a variety of possible pathologies or unusual vibration patterns, Journal of Voice, Vol. 15, No. 2, 2001

roughness is likely to be perceived because the fundamental and a subharmonic are in a critical band. Critical band may be defined as a bandwidth divided by the center frequency and is typically around 10% of the center frequency if the presented frequency is above 300 Hz; however, if the frequency is below 300 Hz, the critical band remains at a constant value of approximately 80 Hz. When two tones interact within the same critical band it can cause irregular nerve firing patterns that correspond to envelope fluctuations. Some nerves will respond to both tones. If the two tones are farther apart than a critical band the nerves may be responding to other subharmonics (besides the F0/2) corresponding to either of the two presented tones. The resulting perception is often that of roughness. As we move higher in frequency, the two presented tones [fundamental frequency (F0) and F0/2] move farther apart and interact less within a critical band. This results in less envelope fluctuations and therefore should produce less perception of roughness. Our results confirm this hypothesis (100/50 Hz combinations produced greater perceptions of roughness than 300/150 Hz combinations). There may be another factor, other than auditory filtering, that influences the audibility of partials (e.g. subharmonics) in complex tones. Moore and Ohgushi11 suggested that the pitches of individual components of a complex tone may be partly coded through phase locking (time patterns of neural activity) in the auditory nerve. They found phase locking to be more precise below 1000 Hz than above 1000 Hz. Due to their relative proximity, the perceptual result is often roughness. Furthermore, the pitch may become uncertain. Wendahl12 showed that for a given jitter (usually thought of as a random frequency modulation), a signal with a low fundamental frequency (F0) tended to be perceived as rougher than a signal with a high F0. In a 50 Hz tone, for example, even a sinusoidal (rather than random) modulation may be enough to create the perception of roughness and a lowered pitch. This will be borne out in our present study. PURPOSE AND RESEARCH QUESTIONS The purpose of this study was to determine what perceptual judgments occur as a direct result of an F0/2 subharmonic, both as a frequency modulation (FM) and an amplitude modulation (AM). We

PERCEPTION OF PITCH AND ROUGHNESS IN VOCAL SIGNALS wished to determine an identification boundary between the F0 related pitch and the F0/2 related pitch and to investigate whether this perceptual boundary is discrete (categorical) or continuous. Subjects were presented with tokens of a synthetic [a] vowel created with a voice simulation model described below. The research questions to be answered were: (1) Where does the identification boundary occur as a function of modulation extent? (2) Is it the same for FM and AM? (3) How does the modulation affect the perception of roughness? (4) Does overall F0 have an effect on the perception of pitch and roughness? (5) How variable is the subjects’ ability to identify pitch and roughness? (6) Is there a significant difference in intersubject and intrasubject variability between musicians and nonmusicians? PROCEDURES Ten subjects were recruited. All of them successfully passed a hearing test and were considered to have normal hearing. The ages of the subjects ranged from 22 to 57 years, with a mean age of 33 years. Amount of musical background or training ranged from none (1 subject) to moderate (5 subjects) to professional vocalists (4 subjects). Synthetic stimuli were generated with a computer model Speak, a computational glottal flow and vowel articulation model based on the linear source-filter

167

theory.13 In this model, both the glottal source and the filter parameters can be manipulated and controlled. The glottal parameters are peak flow, fundamental frequency, open quotient, skewing quotient, the size of the area of the epilaryngeal tube, and frequency and extent of AM and FM (imposed on the first two parameters). The filter is defined by the area function of the vocal tract. Table 1 represents all available inputs, with typical default values and ranges. All parameters remained as default values, with the exception of fundamental frequency (which was chosen to be 100, 200, or 300 Hz), F0 modulation frequency (which was always chosen to be half of the F0; for example 50, 100, or 150 Hz), and either the amount of amplitude modulation (%) or frequency modulation (%), which was varied from 2% to 98%. Figure 1 shows a 200 Hz simulated glottal waveform with 20% amplitude modulation. Note the alternation of successive amplitudes. Frequency modulation of the same extent is difficult to see on a time waveform, so we show a magnitude spectrum in Figure 2 for 20% FM. Note that the subharmonics here are about 10-15 dB below the harmonics in two coexisting series. For the method of adjustment task to match pitch, triangular waves (rather than pure tones) were generated by a function generator (Heath IG-1271, Benton Harbor, MI) to present the subject a complex (but modulated) spectrum. Ideally, the same Speak syn-

TABLE 1. Parameters for Simulation Model “Speak” Parameter

Default

Range

Flow amplitude

375 cm3/s

Fundamental frequency

100 Hz

50-1200

Pitch

Open quotient—Qo

0.6

0.1-1.0

Tightness/breathiness

Skewing quotient—QS

1.7

1-5

Timbre

DC flow added

0 cm3/s

0-300

Breathiness

Area of epilarynx tube

0.5 cm2

0-1

Resonance/ring

200-700

Perceptual Effect of Changes Loudness

F0 modulation—frequency

0 Hz

0-12

Vibrato (rate)

F0 modulation—extent

0%

0-12

Vibrato (amount)

Amplitude modulation—frequency

0 Hz

0-12

Vibrato (rate)

Amplitude modulation—extent

0%

0-12

Vibrato (amount)

Nasal coupling area

0 cm2

0-1

Nasality

Abbreviations: DC, direct current; F0 = fundamental frequency. Journal of Voice, Vol. 15, No. 2, 2001

168

CHRISTINE C. BERGAN AND INGO R. TITZE

FIGURE 1. Glottal flow waveform with 20% amplitude modulation and F0 = 200 Hz.

FIGURE 2. Magnitude spectrum of glottal flow with 20% frequency modulation and F0 = 200 Hz.

thesizer would have been used, but a real-time version with simple dial control of F0 was not yet available. The intensity of the signal was kept at a comfortable dB sound pressure level (SPL) for each subject averaging about 60 dB SPL. Subjects were allowed to adjust the frequency until they found what they believed to be a “match” to the presented token. Frequencies chosen by the subjects during the method of adjustment task were measured and recorded by the investigator (not in view of the subjects) using a frequency counter with a range of 5 Hz80 MHz (B&K 1805, Chicago, IL). Three different listening tasks were requested of the subjects. All tasks involved the presentation of tokens of 1 second duration of the vowel [a] presented at 100, 200, or 300 Hz. All three fundamental frequencies were presented either in a nonmodulated form (first procedure only) or were systematically modulated with their corresponding subharmonic (F0/2) by varying the extent of AM or FM. The first task involved presentation of the above tokens in an ABX forced alternative format in which the subjects first heard the nonmodulated fundamental frequency F0 as A; then they heard a similar nonmodulated signal, but with a fundamental frequency of F0/2 as B; and finally they heard the token with the modulation as X. The subjects were asked to listen to A and B and determine if the pitch of X was more like A or B. There was a 2.5-second pause between A and B and between B and X, followed by a 5-second

pause for response. All possible tokens were presented three times to increase the power and reliability of our results and to check for intrasubject reliability. The second task involved the presentation of single tokens of the modulated tones, with the option to repeat the stimulus as many times as desired. The subject was allowed to adjust the frequency dial of the wave generator until he or she believed that a pitch to “match” the given token had been found. When this match had been found, their choice was displayed on a frequency counter and was recorded by the investigator. All tokens were presented twice to check for intrasubject reliability. The third task involved the presentation of the above stimuli again as single tokens. Subjects were instructed to rate each in terms of roughness on a scale of 1 (very smooth) to 10 (very rough). The subjects were given a set of anchors prior to this task to provide them with a frame of reference for the subsequent ratings. These anchors represented the entire range of all possible amplitude and frequency modulated tones. All tokens were presented three times to check for intrasubject reliability.

Journal of Voice, Vol. 15, No. 2, 2001

RESULTS AND DISCUSSION F0 boundary identification was influenced by both overall F0 and type of modulation. For frequency modulation (Figure 3), the identification boundary (5 of 10 selections) appeared to occur with less than 10% modulation for 100 Hz, 25% for 200 Hz, and

169

Number of times F0/2 selected

Number of times F0/2 selected

PERCEPTION OF PITCH AND ROUGHNESS IN VOCAL SIGNALS

FIGURE 3. ABX comparisons for FM.

FIGURE 4. ABX comparisons for AM.

35% for 300 Hz. For amplitude modulation (Figure 4), the identification boundary appeared to occur at 20% for 100 Hz, and at approximately 50% for both 200 and 300 Hz. In the method of adjustment task (Figures 5 and 6), there was a high preference for choosing the subharmonic pitch as the true pitch, but intermediate pitches between the two octaves were systematically chosen, particularly by nonmusicians. Thus, there was not a pitch dichotomy, but a gradual variation. This would suggest a continuous rather than discrete pitch perception of these modulated signals. The crossover to the subharmonic for FM was at 10% modulation for 100 Hz, increasing to about 20% for 300 Hz. For AM, the crossover occurred at about 20% modulation for all fundamental frequencies. From the above results, it appears that FM yields identification of the subharmonic at a lower modulation extent than does AM. Also, when modulation is increased, lower F0 appears to produce earlier identification of the subharmonic as the true pitch. Perception of roughness was also influenced by F0 and type of modulation (Figures 7 and 8). Frequency

FIGURE 5. Method of adjustment comparisons for FM. Journal of Voice, Vol. 15, No. 2, 2001

170

CHRISTINE C. BERGAN AND INGO R. TITZE

FIGURE 7. Roughness rating for FM. FIGURE 6. Method of adjustment comparisons for AM.

FIGURE 8. Roughness rating for AM. Journal of Voice, Vol. 15, No. 2, 2001

modulated tones received ratings of roughness of 5 or greater (out of 10) for 10% modulation, while amplitude modulated tones did not receive roughness ratings of 5 or greater until 20% modulation was reached. Thus, it appears that FM tones cause a somewhat higher rating of roughness (generally) than do AM tones, but statistical significance was not reached in our study. The lowest F0 (100 Hz) was perceived as rougher than the other two F0s (200 and 300 Hz) at all modulation amounts and for both AM and FM. This supports the findings of Wendahl.12 With regard to the question of subject variability, at least half of the subjects (those with little or no musical training) showed poor pitch matching abilities; their “matches” appeared to be random and inconsistent. This was demonstrated by their selection of neither the F0, the F0/2, nor any other harmonic component (Table 2, Appendix). The musicians showed greater pitch matching ability by choosing either the F0 or its subharmonic with approximately 88% accuracy (Figures 9 and 10) while the nonmusicians matched pitch at approximately 52%. A “match” was

PERCEPTION OF PITCH AND ROUGHNESS IN VOCAL SIGNALS defined as selecting a frequency within ±10% of either the F0 or its subharmonic. For perception of pitch in both AM and FM, the subject variance generally increased with increasing F0. For FM, the variance was particularly great for 100 Hz at the 95% and 98% modulations. There was a slight overall decrease in the variance at the 90% level for all three F0s and for both AM and FM. It appears that there is poor intersubject reliability in the perception of F0 in even the most “trained” ears and even less reliability in the “untrained” ears. This is not to discredit the listeners, but simply to underscore the fact that a single pitch does not exist and their choice is based on ambiguity. In perception of roughness, the lowest F0 (100 Hz) generally had greater variance in the ratings than the other two F0s (Table 3, Appendix). This was especially true for AM. Variance in roughness ratings for 200 and 300 Hz seemed to dramatically increase at about 10% and then tended to remain fairly high up to 98%. These findings support those of Kreiman et al5 and Gerratt et al4 Intersubject variability was high for all three procedures, for both AM and FM, and for both pitch identification and roughness rating. A very strong relationship was observed between selected F0 and presented F0 and selected F0 and type of modulation. Statistical analysis for the ABX task showed the comparison of selected F0 with the presented F0 and yielded a chi-square value of 37.63 (P ≤ 0.001). The comparison of selected F0 by type of modulation for AM yielded a chi-square value of 103.50 (P ≤ 0.001). The comparison of selected F0 with the type of modulation for FM yielded a chisquare of 54.906 (P ≤ 0.001). The musicians showed markedly greater intrasubject agreement than the nonmusicians in many cases. The chi-squared test of independence was performed on the ABX task to compare intrasubject reliability differences between the musicians and the nonmusicians (Table 4, Appendix). Significant differences (P < 0.05) in reliability occurred in the following cases: 100/50 Hz combinations at 90% FM and 10% AM, 200/100 Hz combinations at 50% FM (especially relevant as all five musicians agreed with themselves 100% of the time while all 5 nonmusicians disagreed with themselves 100% of the time), and 80% AM, and 300/150 Hz combinations at 50%, 80%, 90%, and 95% FM and 95% AM.

171

Interestingly, it appeared that the musicians tended to select the F0 (either A or B, the fundamental or its subharmonic) that most closely corresponded with their natural speaking F0 (e.g., sopranos and tenors tended to choose “A,” the fundamental, more frequently (and for larger amounts of modulation) than did the altos and baritones, who tended to choose “B,” the subharmonic, more frequently (and for larger amounts of modulation). This may warrant further investigation as no statistically significant correlation was found at this point. Analysis of variance for comparison of selected F0 versus presented F0 for the method of adjustment (MOA) task showed an F-value of 12.38 (P ≤ 0.0001) for FM tokens and an F-value of 14.35 (P ≤ 0.0001) for AM tokens. Significant differences in selected F0 were observed for both FM (F=2.21; P < 0.001) and AM (F = 2.92; P < 0.001). However, caution must be taken in interpretation of the relationship between selected F0 and presented F0 for this method since the selected F0 values were mean values and contained great variability. The chi-squared test of independence was performed on the MOA task to compare the overall reliability of musicians versus nonmusicians in their ability to choose a pitch within ±10% of either the fundamental or the subharmonic. For the 300/150 Hz combination, they would need to choose a response between 270 and 330 Hz or 135 and 165 Hz to be considered a “hit” or a “match.” The results (Figures 9 and 10) showed greatly increased accuracy or number of “hits” from the musicians when compared to the nonmusicians. In amplitude-modulated tokens (Figure 9) the musicians scored between 75% and 100% accuracy with a mean accuracy of 89% across all trials, while the nonmusicians scored between 40% and 68% accuracy with a mean accuracy of 51%. In frequency-modulated tokens (Figure 10), the musicians again ranged from 75% to 100% accuracy with a mean accuracy of 87% in ability to come within ±10% of either the F0 or the F0/2, while the nonmusicians ranged from 32% to 70% accuracy with a highly variable range of accuracy and a mean accuracy of 54% across all trials (Figure 10). This analysis represents the variability better than the analysis of variance (ANOVA) because it controls for trivial differences in the absolute frequency chosen by the listeners. Journal of Voice, Vol. 15, No. 2, 2001

172

CHRISTINE C. BERGAN AND INGO R. TITZE

FIGURE 9. Musician versus nonmusician differences in match to F0 or subharmonic for amplitude modulation, averaged across frequencies.

FIGURE 10. Musician versus nonmusician differences in match to F0 or subharmonic for frequency modulation, averaged across frequencies.

Statistical analysis of ratings of roughness yielded an F-value of 3.81 (P ≤ 0.001) for comparison of the selected versus presented F0 in frequency-modulated tokens, while an F-value of 6.45 (P ≤ 0.0001) occurred for the comparison of selected versus presented F0 in amplitude-modulated tokens. Comparisons between type of modulation and selected rating, however, were not as significant, yielding an F-value of 1.29 (P ≤ 0.1889) for frequency-modulated tokens and an F-value of 0.67 (P ≤ 0.8548) for amplitude-modulated tokens. A strong relationship was observed between presented F0 and rating of roughness (F = 3.81; P < 0.001 for frequency-modulated tokens and F = 6.45; P < 0.001 for amplitude-modulated tokens), but a weak relationship was observed between type of modulation and rating of roughness (F = 1.29; P < 0.1889 for frequency-modulated tokens and F = 0.67; P < 0.8348 for amplitude-modulated tokens). Levene’s test for quality of variances and independent samples t-tests for equality of means on independent samples were performed to compare the differences in mean ratings and variability between musicians and nonmusicians. The results yielded significant differences in mean scores for tokens at 300/150 Hz at 5% FM and 5% AM, with the musi-

cians rating the tokens as “rougher” than the nonmusicians. Significant differences in variability were found for tokens at 200/100 Hz at 80% FM and 80% AM and for tokens at 300/150 Hz at 50% and 80% FM and at 5%, 50%, and 80% AM. Without exception, the nonmusicians showed significantly greater variability than the musicians in the above cases.

Journal of Voice, Vol. 15, No. 2, 2001

CONCLUSIONS AND IMPLICATIONS FOR FUTURE RESEARCH This study has brought into question the perception of pitch and roughness for vowel sounds containing F0 and a subharmonic F0/2. Our findings suggest that there is not a single “fundamental pitch” when listening to such sounds. A modulation of as little as 10% can cause the perceived pitch to drop toward a lower octave associated with the subharmonic. However, intersubject variability is high. This may be explainable on the basis of the “best-fit” or (template-fitting) model.14 According to this model, subjects pick the best-fitting F0 that corresponds to a highly individualized stereotype. There may also be a tendency for subjects to select a pitch that most closely matches their own mean speaking pitch, as mentioned above.

PERCEPTION OF PITCH AND ROUGHNESS IN VOCAL SIGNALS Significant differences were noted in both intrasubjects and intersubject reliability and variability when comparing musicians to nonmusicians. The musicians generally showed greater intrasubject and intersubject agreement than the nonmusicians in all three listening tasks and for both pitch-matching ability and rating of roughness reliability. Further research might involve the investigation of the perception of lower-order subharmonics (F0/n, where n = 1, 2, 3, . . .) or modulations that have no integer relation to F0. Other possible areas might include the perception of other vocal qualities (ring, breathiness, resonance, pressed-lax) using similar experimental paradigms. One possible goal might be the development of a set of training tapes that contain a great variety of anchors for modulations typically observed in both normal and pathological voice signals. These tapes could be used by clinicians and vocal pedagogues who wish to find a more objective, accurate, and standardized means of perceiving vocal qualities. The increased reliability and decreased variability demonstrated by the musicians in this study would seem to support the plausibility of the effectiveness of training. Acknowledgments: We would like to express our sincere appreciation to Eric Hunter, Dr. Roger Chan, Xuyang Zhang, and Dr. Greg Flamme for their invaluable and most generous assistance in the preparation of this study. Funding for this study was from Grant No. 5 R01 DC 02532

from the National Institutes of Health.

173

REFERENCES 1. Jensen PJ. Adequacy of terminology for clinical judgment of voice quality deviation. Eye Ear Nose Throat Month. 1965; 44:77-82. 2. Rabinov CR, Kreiman J, Gerratt BR, Bielamowicz S. Comparing reliability of perceptual ratings of roughness and acoustic measures of jitter. J Speech Hear Res. 1995;38:26-32. 3. DeBodt MS, Wuyts FL, Vande Heyning PH, Croux C. Testretest study of the GRBAS scale: Influence of experience and professional background on perceptual rating of voice quality. J Voice. 1997;11:74-80. 4. Gerratt BR, Kreiman J, Antonanzas-Barroso N, Berke GS. Comparing internal and external standards in voice quality judgements. J Speech Hear Res. 1993;36:14-20. 5. Kreiman J, Gerratt BR. Validity of rating scale measures of voice quality. J Acoust Soc Am. 1998;104:1598-1608. 6. Ebel R. Estimation of the reliability of ratings. Psychometrika. 1951;16:407-424. 7. ANSI (ASA-111-1994). American National Standard Acoustical Terminology. New York: American National Standards Institute; 1994. 8. Stevens SS. The relation of pitch to intensity. J Acoust Soc Am. 1935;6:150-154. 9. Beerends JG. The influence of duration of pitch in single and simultaneous complex tones. J Acoust Soc Am. 1989;86:1835-1844. 10. Terhardt E. Pitch, consonance, and harmony. J Acoust Soc Am. 1974;55:1061-1069. 11. Moore BCJ, Ohgushi K. Audibility of the partials in inharmonic complex tones. J Acoust Soc Am. 1993;93:452-461. 12. Wendahl R. Some parameters of auditory roughness. Folia Phoniatr. 1966a;15:241-250. 13. Titze I, Mapes S, Story B. Acoustics of the tenor high voice. J Acoust Soc Am. 1994;95:1133-1142. 14. Goldstein JL. An optimal processor theory for the central formation of the pitch of complex tones. J Acoust Soc Am. 1973;54:1496-1516.

APPENDIX TABLE 2. Variabilities in the Method of Adjustment Task*† Amount of Frequency Modulation 2%

5%

10%

20%

30%

50%

70%

90%

95%

98%

82.75 24.3

106 46.77

74 31.26

65.22 25.01

46.33 25.02

58.88 32.11

55.88 33.69

45 12.83

82.25 55.18

70.37 57.73

200 Hz Mean (Hz) 200.75 St. Dev. (Hz) 4.02

182.25 37.36

153.88 56.44

121.88 43.81

115.22 51.12

102.33 45.41

115 32.45

109.88 34.32

114.87 38.24

117.57 37.38

300 Hz Mean (Hz) 276.62 St. Dev. (Hz) 41.56

254 58.01

245.55 81.74

205.44 75.21

164.77 52.43

129 35.97

160.33 33.32

161.66 28.83

161.62 59.91

169.12 57.7

100 Hz Mean (Hz) St. Dev. (Hz)

(continued)

Journal of Voice, Vol. 15, No. 2, 2001

174

CHRISTINE C. BERGAN AND INGO R. TITZE TABLE 2. continued Amount of Amplitude Modulation 2%

5%

10%

20%

30%

50%

70%

90%

95%

98%

99.6 1.64

109.9 40.73

85.9 31.78

84.33 23.62

54.11 8.66

75.33 54.25

63.22 24.17

43.88 11.35

56.37 17.96

57.12 20.71

200 Hz Mean (Hz) 201.8 St. Dev. (Hz) 31.97

204 12.07

153.7 64.12

166.44 55.24

123 43.66

152.22 58.25

122.55 44.36

98.55 10.08

136.25 49.09

145 49.66

300 Hz Mean (Hz) 285.6 St. Dev. (Hz) 29.41

282.7 33.01

247.8 65.61

200.88 89.08

191 94.47

141.22 32.62

133.88 23.71

158.88 28.62

169.87 55.27

148.75 20.67

100 Hz Mean (Hz) St. Dev. (Hz)

*Numbers of greatest variance are boldfaced.

†Note overall decrease in variability at 90%.

TABLE 3. Variabilities in the Perception of Roughness* Amount of Frequency Modulation 2%

5%

10%

20%

30%

50%

70%

80%

90%

95%

98%

100 Hz Mean (Hz) 5.12 St. Dev. (Hz) 1.88

7 1.41

8.25 1.49

7.37 1.5

6.25 2.96

7.5 2.5

7.32 2.55

7 2.93

6.62 1.59

7.75 3.01

7.62 2.72

200 Hz Mean (Hz) 3 St. Dev. (Hz) 1.19

5.5 1.77

6.25 1.75

4.62 1.84

6.5 1.19

5.25 1.75

4.62 2.19

5.37 2.06

7.87 1.45

7.5 1.3

4.37 2.87

300 Hz Mean (Hz) 2.25 St. Dev. (Hz) 0.7

3.37 1.68

4 2.5

4.62 2.61

5.12 2.41

4.62 2.77

4.87 2.47

4.87 2.9

6 3.07

4.37 2.61

5.75 2.86

Amount of Amplitude Modulation 2%

5%

10%

20%

30%

50%

70%

80%

90%

95%

98%

100 Hz Mean (Hz) 2.5 St. Dev. (Hz) 2.13

4.33 2.32

5.37 3.33

8 1.6

7.12 1.95

7.75 2.65

7.75 2.54

7.37 3.37

7.37 2.66

7.62 3.02

7.12 2.47

200 Hz Mean (Hz) 1.87 St. Dev. (Hz) 0.83

1.75 0.7

2.87 1.45

5.12 2.23

6.37 2.19

4.87 1.8

5.75 2.12

3.5 1.69

4.37 1.06

5 2.13

4.87 1.88

300 Hz Mean (Hz) 1.5 St. Dev. (Hz) 0.75

2.12 1.12

4 2

4.5 2.44

5.62 2.62

5 2

5.5 1.3

4.37 2.26

4.12 2.1

4.87 2.53

4.12 2.59

*Numbers of greatest variance are boldfaced.

Note overall decrease in variability at 90%.

Journal of Voice, Vol. 15, No. 2, 2001

PERCEPTION OF PITCH AND ROUGHNESS IN VOCAL SIGNALS

175

TABLE 4. Comparisons Between Musicians and Nonmusicians in Intrasubject Agreement *,† Amplitude Modulated ABX Task 100/50 Hz Chi-square 200/100 Hz Chi-square 300/150 Hz Chi-square

10% 6.667 10% 1.111 10% .000

20% 0.000 20% 0.000 20% 0.000

25% 0.000 25% 0.400 25% 2.500

30% 0.000 30% 0.111 30% —‡

35% 1.111 35% 0.000 35% 1.667

50% 0.000 50% 0.476 50% 0.476

70% 1.111 70% 0.476 70% 1.667

80% 1.111 80% 4.286 80% 1.667

90% 1.111 90% 0.000 90% 1.667

95% —‡ 95% 1.667 95% 6.667

Frequency Modulated ABX Task 100/50 Hz Chi-square 200/100 Hz Chi-square 300/150 Hz Chi-square *P

10% 0.476 10% 1.111 10% 1.111

20% 1.111 20% 0.400 20% 1.111

25% 2.500 25% 0.476 25% 1.111

30% 2.500 30% 0.476 30% 1.667

35% 0.000 35% 2.500 35% 0.000

50% —‡ 50% 10.000 50% 6.667

70% 1.111 70% 0.476 70% 3.600

80% 0.000 80% 3.600 80% 6.667

90% 4.286 90% 0.400 90% 6.667

95% 0.476 95% 0.000 95% 6.667

< 0.05 = significant difference.

†Boldface

indicates a significant difference in intrasubject agreement (P < 0.05) between the two groups with musicians showing greater consistency in their answers. ‡Statistic

could not be computed because all subjects gave the same answer.

Journal of Voice, Vol. 15, No. 2, 2001