Journal of Voice Vol. 16, No. 1, pp. 44–51 © 2002 The Voice Foundation
Significance of Auditory and Kinesthetic Feedback to Singers’ Pitch Control *†Dirk Mürbe, †Friedemann Pabst, *†Gert Hofmann, and ‡Johan Sundberg *Department of Otorhinolaryngology, Technical University of Dresden, Dresden, Germany; †Voice Research Laboratory, University of Music Carl Maria von Weber, Dresden, Germany; ‡Department of Speech, Music and Hearing, Royal Institute of Technology, Stockholm, Sweden
Summary: An accurate control of fundamental frequency (F0) is required from singers. This control relies on auditory and kinesthetic feedback. However, a loud accompaniment may mask the auditory feedback, leaving the singers to rely on kinesthetic feedback. The object of the present study was to estimate the significance of auditory and kinesthetic feedback to pitch control in 28 students beginning a professional solo singing education. The singers sang an ascending and descending triad pattern covering their entire pitch range with and without masking noise in legato and staccato and in a slow and a fast tempo. F0 was measured by means of a computer program. The interval sizes between adjacent tones were determined and their departures from equally tempered tuning were calculated. The deviations from this tuning were used as a measure of the accuracy of intonation. Statistical analysis showed a significant effect of masking that amounted to a mean impairment of pitch accuracy by 14 cent across all subjects. Furthermore, significant effects were found of tempo as well as of the staccato/legato conditions. The results indicate that auditory feedback contributes significantly to singers’ control of pitch. Key Words: Singing—Pitch control—Mechanoreceptors—Auditory feedback—Kinesthetic feedback.
systems have been described that act on the laryngeal musculature during phonation, namely auditory and kinesthetic feedback circuits.2 Auditory cues are commonly regarded as the obvious tool for pitch control in singing under normal circumstances. Their significance has been estimated in numerous investigations, describing the effects of delayed or distorted auditory feedback upon the phonatory process.1,3–6 These observations, as well as the presence of an abnormal voice in congenitally deaf patients, suggest that auditory feedback may play an important role in phonatory control. However, auditory feedback cannot explain the fact that singers are able to continue phonating accurately even when they cannot hear their own voices.
INTRODUCTION The high demands on intonation in professional singing require precisely acting pitch control systems. A complex activity like singing would rely on different types of feedback mechanisms.1 Apart from prephonatory tuning, two basic neurological control Accepted for publication April 30, 2001. Address correspondence and reprint requests to Dr. Dirk Mürbe, Department of Otorhinolaryngology, Technical University of Dresden, Fetscherstr. 74, D-01307 Dresden, Germany. This investigation was presented at the 29th Annual Symposium: Care of the Professional Voice, Philadelphia, PA, June 2000. e-mail:
[email protected]
44
SINGING WITHOUT AUDITORY FEEDBACK This situation is typically experienced by choir singers, whose auditory feedback is sometimes masked by the sound of fellow singers.7 Similar situations are likely to occur also in solo singing when the orchestral accompaniment is loud; sound pressure level (SPL) values as high as 110 dB have been observed on orchestral podia.8 Furthermore, it is well known that adult patients who are suddenly struck by a bilateral deafness carry on talking with only a slightly disturbed voice for a long time. These observations suggest the significance of a second intraphonatory feedback circuit, based on kinesthetic discharges. According to Wyke2,9 the kinesthetic pitch control includes a triad of intrinsic laryngeal reflex systems relevant to the accuracy of intonation, particularly in cases of lacking auditory feedback. These laryngeal reflexogenic systems depend on discharges of three types of laryngeal mechanoreceptors: stretch-sensitive myotatic mechanoreceptors located in each of the intrinsic laryngeal muscles, mucosal mechanoreceptors in the subglottic mucosa, and articular mechanoreceptors located in the fibrous capsules of the intercartilaginous joints of the larynx.10 Stimulated by changed tension and posture of the vocal folds and by alternating subglottic pressure, afferent discharges are conducted into the brain stem and are then polysynaptically relayed to the motoneurons of the laryngeal muscles.2,9 The processing involves the comparison of produced and target pitch generated on the basis of the acquired “muscle memory” of pitch.11 The kinesthetic feedback may be supplemented by discharges from “peripheral” mechanoreceptors in the thorax, the abdominal wall, and the vocal tract.10,12 The significance of the auditory and kinesthetic feedback to pitch control has been studied in some previous investigations. Ternström, Sundberg, and Collden13 and Elliot and Niemoeller14 found that the auditory feedback was important to the stabilization of pitch. Ward and Burns11 reported that the auditory feedback was relevant also to the matching of the target pitch. These authors, as well as Schultz-Coulon15 found that trained subjects were less affected than untrained by a lack of auditory feedback. These findings suggest that the kinesthetic feedback acts more efficiently in singers than in nonsingers. In these investigations, however, the number of subjects was ei-
45
ther rather limited or the singers’ degree of training was not explicitly specified, although relevant effects of training can be expected. The aim of the present study was to estimate the importance of auditory and kinesthetic feedback to pitch control in students beginning their professional solo singing education. The effect on pitch control was investigated in tasks differing in complexity, such as legato or staccato, or slow and fast singing. The present experiment was carried out with the prospect for a future longitudinal study. METHOD Subjects Seventeen female and 11 male singing students, mean age 20, 9 ±1, 6 years, participated in the investigation. They were examined at the beginning of their professional solo singing education at the University of Music Carl Maria von Weber, Dresden, after having successfully passed the entrance examination. All were experienced choir singers or had previously attended private singing lessons. The sample included all students who started their education in 1997 and 1998. Procedure and equipment Subjects were asked to sing on the vowel [a:] an ascending and descending triad pattern up to the twelfth and back (Figure 1) at a moderate degree of vocal loudness. The pitches were chosen so as to fit comfortably the pitch range of the individual subject. The starting pitch was given by means of a synthesizer. Each subject sang the sequence twice, first without masking noise, and immediately afterwards with a masking noise presented via headphones. The masker was a white noise band-pass filtered (24 dB/octave) at 50 Hz and 2000 Hz. The SPL of the noise was 105 dBAs. The masking efficiently eliminated the auditory feedback in all subjects except in two sopranos who reported that they could hear themselves only at the top pitch. None of the subjects complained of any discomfort during or after the measurements. The sequences without and with masking noise were recorded in different conditions. These conditions were selected in accordance with an investigation by Ward and Burns:11 (1) legato slow, (2) legato Journal of Voice, Vol. 16, No. 1, 2002
46
DIRK MÜRBE ET AL
legato
staccato FIGURE 1. Sequence of an ascending and descending triad (up to the twelfth and back) for recordings of legato and staccato performances of different tempi.
fast, (3) staccato slow, (4) staccato fast. The legato and staccato conditions were included since the demands on the pitch control system can be assumed to be higher when adjacent tones are separated by a pause. Likewise the fast condition would require a higher skill in pitch control than the slow condition. The slow and fast tempi corresponded to metronome settings of 40 and 160 beats per minute, respectively. After a few informal trials, the subjects experienced no difficulties in performing the task. The output from a portable electroglottograph (EGG) (Laryngograph, London, UK), the electrodes of which were fastened to the subject’s neck by means of an elastic ribbon, and the audio signal picked up by a microphone (distance to mouth 0.3 m) (ECM959DT SONY, Tokyo, Japan) were recorded on a digital audiotape (TCD-D10, SONY, Tokyo, Japan) (Figure 2). Analysis Fundamental frequency (F0) was mostly estimated from the EGG signal using the Soundswell workstation program package which also displayed the resulting F0 contour on the computer screen (Figure 3) (Soundswell, Solna, Sweden).16 In some of the female subjects the EGG signal produced errors in the F0 measurement at high pitches. In such cases F0 was measured from the audio signal. For determining the mean F0 for each pitch, a set of complete vibrato cycles was selected from the quasi-steady state section, thus excluding onset and decay transients (see Figure 3). The frequency distribution of this selection was analyzed, using the histogram module in the Journal of Voice, Vol. 16, No. 1, 2002
Soundswell package. The program also displays the mean F0. The ratios between the mean F0 of the starting tone and the mean F0 of each tone were calculated and expressed in the logarithmic cent unit, i.e., hundredths of a semitone. This allowed comparisons between subjects in spite of their different pitch ranges and voice classifications. The sizes of the 10 intervals included in each triad sequence were determined by calculation of the pitch difference between adjacent tones, expressed in the cent unit. The absolute values of the deviations of these intervals from their equivalents in the equally tempered tuning, henceforth the interval deviations, were determined and regarded as a measure of the accuracy of intonation. The averaged interval deviation of the 10 intervals contained in a complete triad sequence was defined as the mean interval deviation. Equally tempered tuning was considered an acceptable standard, since its deviations from the Pythagorean and the just tunings are small as compared to the observed interval deviations. Interval deviation data were referred to a statistical analysis carried out by means of a repeated measures design [analysis of variance (ANOVA)], with interval (1–10) masking (without/with masking), technique (legato/staccato), and tempo (slow/fast) as within subject factors. The selection of the section of each tone that was used for determining the mean F0 depends to some extent on the investigator’s judgment (see Figure 3). To estimate the magnitude of this source of error, a reliability analysis was run after the first series of recordings had been completed (n= 14, all recordings
SINGING WITHOUT AUDITORY FEEDBACK
M
47
amplifier
DAT E1
SWELL
F0
EGG
E2
frequency [Hz]
FIGURE 2. Block diagram of the experimental setup: The output from an electroglottograph (EGG) with electrodes (E) fastened to the subject’s neck, and the audio signal picked up by a microphone (M) were recorded on a digital audiotape (DAT). After pitch tracking by means of the SWELL analysis program F0 was estimated.
time [sec] FIGURE 3. F0 contour of a recorded sequence as displayed on the computer screen. For each tone a section like the one marked for the second tone, comprising a set of complete vibrato cycles, was analyzed with regard to the mean of F0.
made in 1997). The analysis included frequency values of all fast performances measured by two investigators (n = 2 ⫻ 616 = 1232). The reliability coefficient amounted to 0.999. RESULTS A significant difference between the various intervals was found (p = 0.008). Thus, the interval deviations were smaller for the highest than for the lower pitches, both in the ascending and descending parts of the sequences (Figure 4). A significant difference was found between the unmasked and masked conditions (p < 0.001), mean interval deviations across
all subjects amounting to 33.3 and 47.3 cent, respectively. The effect of masking appeared to be independent of technique and tempo (Table 1). A significant difference between legato and staccato performances was found (p < 0.001), mean interval deviations across all subjects 31.4 and 49.1 cent, respectively. Also slow and fast performances produced significant differences (p < 0.001), mean interval deviations across all subjects 37.2 and 43.4 cent, respectively. The effects of different tempi and techniques appeared to be independent of masking, see Table 1. Differences are illustrated for the cases of masked and unmasked slow legato in Figure 5, unmasked Journal of Voice, Vol. 16, No. 1, 2002
48
DIRK MÜRBE ET AL
Mean interval deviation (cent)
48 46 44 42 40 38 36 34 32 30 1
2
3
4
5
6
7
8
9
10
Intervals FIGURE 4. Interval deviations (cent) averaged across all test conditions and subjects for different intervals
TABLE 1. Mean, Standard Error, and 95% Confidence Intervals from the ANOVA with the Interval Deviation Data Confidence Interval (cent) lower upper
Mean (cent)
Standard error (cent)
Unmasked slow legato
19.5
0.9
17.8
21.3
Masked slow legato
35.9
2.5
30.8
41.0
Unmasked fast legato
28.8
1.7
25.3
32.2
Masked fast legato
41.4
2.3
36.7
46.2
Unmasked slow staccato
40.2
3.4
33.2
47.2
Masked slow staccato
53.0
3.6
45.6
60.6
Unmasked fast staccato
44.6
2.5
39.4
49.7
Masked fast staccato
58.7
3.2
52.1
65.2
Mean interval deviation (cent)
unmasked masked
60 50
40
30 20
10
0 1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Subject number FIGURE 5. Mean interval deviations (cent) of the unmasked and masked test runs of all subjects for the condition legato slow Journal of Voice, Vol. 16, No. 1, 2002
SINGING WITHOUT AUDITORY FEEDBACK slow legato and staccato in Figure 6, and for unmasked slow and fast legato in Figure 7. Figure 8 summarizes the results for the different conditions in terms of the distribution of individual mean interval deviations. DISCUSSION The present study was carried out to assess the significance of auditory and kinesthetic feedback on pitch control in singing and to determine the influence of vocal tasks differing in complexity, such as
49
legato or staccato, or slow and fast singing. The only means to eliminate auditory cues temporarily is to use an auditory masking signal that contains no relevant information like a wide-band white noise signal. Still, it cannot be excluded that the noise signal not only masked the auditory feedback, but also disturbed the performance. The noise used in the present investigation efficiently masked the auditory feedback in most cases. The choice of the vowel [a] would have contributed to this; the strongest partials of this vowel appear near the first formant frequency, i.e., near 600 Hz. A
Mean interval deviation (cent)
legato staccato 100 90 80 70 60 50 40 30 20 10 0 1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Subject number FIGURE 6. Mean interval deviations (cent) of the legato and staccato test runs of all subjects for the condition unmasked slow
Mean interval deviation (cent)
slow fast 60 50
40
30 20
10
0 1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Subject number FIGURE 7. Mean interval deviations (cent) of the slow and fast test runs of all subjects for the condition unmasked legato Journal of Voice, Vol. 16, No. 1, 2002
DIRK MÜRBE ET AL
Mean interval deviation [cent]
50 120
unmasked masked
100 80 60 40 20 0
N=
28
legato slow
28
28
28
legato fast
28
28
staccato slow
28
28
staccato fast
FIGURE 8. Distributions of the mean interval deviations (cent) for the different test conditions.
vowel with a lower first formant frequency should be more difficult to mask because of the low-pass nature of bone conduction.17 Further, by using the same vowel throughout, the measurements effects of “intrinsic pitches” of different vowels were avoided.13 Test conditions of the present study were selected in accordance with the investigation by Ward and Burns.11 Also, the choice of the absolute interval deviation, as a measure of intonation accuracy, followed their study. This measure has the advantage that it compensates for errors in matching the given starting tone of the triad sequence. In a majority of the subjects the accuracy of intonation was affected in an expected way. Thus, it was reduced by masking noise, by staccato as opposed to legato singing, and by fast as opposed to slow performance. In all these experimental conditions a small number of subjects showed results that were contrary to that of the majority of the 28 singing subjects. Some of these deviations may reflect a training effect; the unmasked condition preceded the masked condition, and the legato condition preceded the staccato condition. The effect of slow and fast tempo was comparatively small. The values of the mean interval deviation found in the present investigation are somewhat greater than those reported by Ward and Burns.11 For example, for fast legato without auditory feedback they found Journal of Voice, Vol. 16, No. 1, 2002
27 cent, while the corresponding average for subjects in this study amounted to 41 cent. This discrepancy would be due to the fact that our subjects sang wider intervals, i.e., a triad pattern, while their subjects sang a simple scale pattern. Schultz-Coulon15 measured the absolute value of the deviations from the reference tones, so comparisons with his results are difficult. The effect of the masking noise produced a significant effect, the increase of the interval deviations averaged across intervals and subjects amounting to 14 cent. This increase is similar to that found by Ward and Burns.11 Schultz-Coulon15 reported smaller effects, but again comparisons are complicated by the difference in measures mentioned. The effect of masking on the mean interval deviation was similar across tempo and technique conditions. This result reflects the considerable contribution of kinesthetic control to intonation in singing. The demands raised on the kinesthetic feedback system would be affected by technique and tempo of the vocal sequence. This is suggested by the increased mean interval deviation observed for the case of eliminated auditory feedback for the staccato versus legato conditions as well as for the fast versus the slow conditions. In a staccato performance singers would need to rely on an absolute neuromuscular memory of pitch, while in a legato performance they
SINGING WITHOUT AUDITORY FEEDBACK could recruit also a relative neuromuscular memory. The difference observed between staccato and legato performances suggests that the former memory is less precise than the latter. This interpretation seems to match the general experience among singers. For example, in vocal warm-up, staccato performances of vocal exercises are typically preceded by legato performance of the same exercise. Likewise, many singers prefer to begin practicing fast passages in a slow tempo. We found that the intonation accuracy was affected by the elimination of the auditory feedback. It also varied between different technique and tempo conditions. In slow legato the effect of auditory masking was similar to that observed between slow unmasked legato and staccato (Figure 8). Thus, in practice, the singers’ intonation might be similarly affected both by the difficulty of the singing task and by the masking effects of a loud accompaniment. Ward and Burns11 found that in the absence of auditory feedback the nonsingers’ intonation errors were about 10 cents wider than those of the singers. Also Schultz-Coulon15 found differences in intonation between his singer and nonsinger subjects. This suggests that singers improve their kinesthetic feedback by training.18 We plan to study this in a future longitudinal investigation. CONCLUSIONS Our investigation has shown that singers’ intonation accuracy is reduced in the absence of auditory feedback. Under such conditions, the singers have to rely on kinesthetic feedback circuits. The performance of this feedback is significantly affected by the task that the singer performs. Thus, the mean intonation error was greater in fast than in slow singing. It was also greater in staccato than in legato singing. It seems reasonable to assume that the accuracy of the kinesthetic feedback can be improved by training. Acknowledgments: This work was realized as a joint project between the Department of Speech, Hearing and Music, KTH, Stockholm and the University of Music Carl Maria von Weber, Dresden, sponsored by both these institutes. The F0 recordings were carried out by co-author Dirk Mürbe. F0 extraction was done by singing student
51
Andreas Bauer; the second data set for the reliability analysis was created by Dirk Mürbe. Eberhard Kuhlisch performed the ANOVA analysis.
REFERENCES 1. Larson CR, Carrell TD, Senner JE, Burnett TA, Nichols LL. A proposal for the study of voice F0 control using the pitch shifting technique. In: Fujimura O, Hirano M, eds. Vocal fold physiology: voice quality control. San Diego, Calif: Singular, 1995:321–331. 2. Wyke BD. Laryngeal neuromuscular control systems in singing. Folia phoniatr. 1974;26:295–306. 3. Fairbanks G. Systematic research in experimental phonetics: I. A theory of the speech mechanism as a servosystem. J Speech Hear Disord. 1954;19:133–139. 4. Sapir S, McClean MD, Larson CR. Human laryngeal responses to auditory stimulation. J Acoust Soc Am. 1983;73: 315–321. 5. Burnett TA, Senner JE, Larson CR. Voice F0 responses to pitch-shifted auditory feedback: a preliminary study. J Voice. 1997;11:202–11. 6. Burnett TA, Freedland MB, Larson CR, Hain TC. Voice F0 responses to manipulations in pitch feedback. J Acoust Soc Am. 1998;103:3153–3161. 7. Ternström S, Sundberg J. Intonation precision of choir singers. J Acoust Soc Am. 1988;84:59–69. 8. Jansson E, Axelsson A, Lindgren F, Karlsson K, Olaussen T. Do musicians of the symphony orchestra become deaf? In: Acoustics of Choir and Orchestra. Stockholm: Royal Swedish Academy of Music; 1986, Publication No 52, 62–74. 9. Wyke BD. Laryngeal myotatic reflexes and phonation. Folia Phoniatr. 1974;26:249–264. 10. Abo-El-Enein MA, Wyke BD. Laryngeal myotatic reflexes. Nature. 1966; 209:682–688. 11. Ward WD, Burns EM. Singing without auditory feedback. J Res Sing. 1978;1:24–44. 12. Lindblom B, Sundberg J. Acoustical consequences of lip, tongue, jaw and larynx movement. J Acoust Soc Am. 1971; 50:1166–1179. 13. Ternström S, Sundberg J, Collden A. Articulatory F0 perturbations and auditory feedback. J Speech Hear Res. 1988; 31:187–192. 14. Elliot L, Niemoeller A. The role of hearing in controlling voice fundamental frequency. Int Audiol. 1970;9:47–52. 15. Schultz-Coulon HJ. The neuromuscular phonatory control system and vocal function. Acta Otolaryngol. 1978;86: 142–153. 16. Ternström S. Soundswell manual. Solna, Sweden: Sound Swell; 1991. 17. Tonndorf J. Bone conduction. In: Tobias JV, ed. Foundations of Modern Auditory Theory (Vol.2). New York and London: Academic; 1972:195–237. 18. Sundberg J. The Science of the Singing Voice. Dekalb, Il: Northern Illinois University Press; 1987.
Journal of Voice, Vol. 16, No. 1, 2002