Articulatory Configuration and Pitch in a Classically Trained Soprano Singer Johan Sundberg, Stockholm, Sweden Summary: Previous studies suggest that singers modify articulation to avoid that the pitch frequency F0 exceeds the normal value of the first formant F1Normal. Using magnetic resonance imaging at a rate of 5 frames/s, articulation was analyzed in a professional soprano singing an ascending triad pattern from C4 to G5 (262–784 Hz) on the vowels /i, e, u, o, a/. Lip and jaw opening and tongue dorsum height were measured and analyzed as function of pitch. Four or five semitones below the pitch where F0 ¼ F1Normal the tongue dorsum height was reduced in /i, e, u, a/, whereas in /o/ the lip opening was widened and in /a/ also the jaw opening was widened. At higher pitches, the jaw opening was widened in all vowels. These articulatory maneuvers are likely to raise F1 in these vowels. Key Words: Singing–Articulation–Lip opening–Jaw opening–Tongue shape–High-pitched singing–Formant frequencies. INTRODUCTION In singing, fundamental frequency F0 is often higher than the normal value of the first formant F1, particularly for female singers. Under conditions of high F0, it is difficult to measure formant frequencies, because of the wide frequency distance between the spectrum partials. Using external low-frequency vibration excitation of the vocal tract of a female singer performing silent articulation, it was found that the singer raised F1 to a frequency slightly above F0.1 In a subsequent study, Johansson et al2 used X-ray imaging for estimation of vocal tract shape and associated formant frequencies of two female opera singers. The results confirmed that singers raise their F1 if as soon as F0 is higher than the normal F1. Sundberg3 observed that this may increase the sound pressure level (SPL) of a vowel considerably, and Titze4 pointed out that allowing F0 to pass F1 would be acoustically quite disadvantageous. Nevertheless, Miller and Schutte5 found reasons to assume that sopranos may sing with F0 greater than F1 in what they called the ‘‘flageolet’’ register. The X-ray investigation just mentioned showed that the two sopranos singers systematically widened their jaw openings with increasing F0. As the widening of the jaw opening tends to raise F1, this finding was compatible with the observation that sopranos avoid the situation that F1 < F0.6 The need to raise F1 to avoid that F1 < F0 is encountered not only by sopranos, but by singers of most other voice classifications. For example, the top note of a baritone singer may be the pitch G4 (F0 z 392 Hz), whereas F1 of vowels like /i/ and /u/ are typically lower than that. Hence, it seemed reasonable to assume that a pitch-dependent jaw opening would be observed not only in sopranos. Using magnetometer technique, Sundberg and Skoog7 measured the jaw opening in a group of male and female singers while they sang an ascending two octave scale. The results showed that, for the vowels /a/ and /a/, most singers Accepted for publication February 5, 2008. From the Department of Speech Music & Hearing, School of Computer Science and Communication, KTH, Stockholm, Sweden. Address correspondence and reprint requests to Johan Sundberg, PhD, KTH Voice Research Centre, Department of Speech Music & Hearing, KTH, SE-100 44, Stockholm, Sweden. E-mail:
[email protected] Journal of Voice, Vol. 23, No. 5, pp. 546-551 0892-1997/$36.00 Ó 2009 The Voice Foundation doi:10.1016/j.jvoice.2008.02.003
started to widen their jaw openings some semitones before F0 passed the singer’s normal value of F1. For the vowels /i, e, o, u/, however, the widening of the jaw opening occurred a few semitones above that F0. This raises the question if singers first use articulatory means other than the jaw opening to raise F1 in the latter vowels. The purpose of the present investigation was to analyze by means of magnetic resonance imaging the articulatory strategy used by a professional soprano for different vowels at different pitches. METHOD The experiment was carried out in 2002 with assistance of Alain Soquet and Thierry Metens at Hoˆpital Erasme, Brussels and in collaboration with Laboratoire de Phonologie, Universite´ Libre de Bruxelles. The subject was a professional soprano with considerable experience both as a solo singer and as a singing teacher. While lying on her back in the magnetic resonance (MR) exposure tube with supports preventing movements, she performed two times the melodic sequence shown in Figure 1 on the vowels /a, e, i, u, o/ at a comfortable degree of vocal loudness. The triad pattern ranged from C4 (262) to G5 (784 Hz). The MR system was adjusted so as to record the midsagittal vocal tract profile at a rate of 5 frames/s during 15 seconds. The images were stored in the MR computer system, which also produced a synchronization pulse train signal with one pulse per frame. This signal was recorded on a separate track of a DAT recorder. The audio signal was picked up by a specially designed microphone consisting of a membrane that covered one end of a small cylinder. The membrane was exposed to light from a thin glassfiber and the vibrations of the membrane were captured by a second glassfiber as reflections of this light. The microphone was fastened to the subject’s sternum a few centimeters below her chin. The light variations were converted at the other end of the fiber into an electrical signal, which was recorded on another track of the same DAT recorder. The images were displayed by means of the Osiris computer freeware (http://www.sim.hcuge.ch/osiris/01_Osiris_ Presentation_EN.htm). Only the ascending part of the triad pattern was analyzed, corresponding to a material of 251 images. The rectangle available under rectangular region of interest was used to determine the widths of the jaw and lip opening
Johan Sundberg
Articulation and Pitch in a Soprano Singer
547
FIGURE 1. Note sequence sung by the subject during the recordings. Only the part of the example marked by the dashed rectangle was used for the analysis.
and the height of the tongue dorsum. The Line caliper tool under the same menu was used to measure these distances in millimeters. One pixel corresponded to 2.3 mm, which thus was the quantization step of the measurements. In gathering these measures, the following landmarks were applied (Figure 2). For measuring the jaw opening, the upper side of the rectangle passed through the anterior-most point of a prominence in the subject’s forehead contour and the lower side was a tangent to the lower contour of a bright spot produced by the frontal part of the lower mandible bone. The tongue dorsum was measured by means of a rectangle that had the same lower side as the rectangle constructed for the jaw opening and the upper side was tangent to the uppermost part of the tongue contour. For the lip opening, the upper and lower rectangle sides were tangents to the lip contours. All rectangles had sides parallel to the horizontal and vertical sides of the image, that is, they were never tilted relative to the image. As the lip opening is dependent on the jaw opening, the lip opening values were normalized by calculating the excess lip opening LExc. This value was obtained subtracting the lip opening L from the difference between the measured jaw opening J and the jaw opening observed for clinched jaws JClinch. LExc ¼ L ðJ JClinch Þ The value thus obtained reflected the lip opening in excess of the jaw opening. For obtaining values reflecting how the subject changed the positions of her articulators, the jaw and tongue measures were compared with the mean of the values observed at the lowest pitch, that is, when F0 was well below F1. For example, if five images were taken during the first tone of the exercise, the mean of the jaw opening values observed in these five images were used as the reference for the jaw opening values. In other words, the mean of the values observed at the lowest pitch was subtracted from the distances measured for the other pitches. The subject’s normal values of F1 were determined from a separate recording session, when she sang the vowels concerned at the pitch G3 (F0 z 196 Hz). The analysis was performed by means of the custom-made Decap inverse filtering program (Svante Granqvist, KTH, Stockholm). The program requires manual setting of the inverse filters and shows in real-time, the resulting effects on the voice source waveform and spectrum. Therefore, this method allows estimation of formant frequencies with particularly high accuracy. F0 was measured from the audio recording using the Corr program in the Soundswell program (Hitech Development AB, Ta¨by, Sweden).8 The resulting F0 contour was transferred to a separate track of computer files. In these files, the first track was the audio. The second contained the synchronization
pulses, which were manually labeled from 1 to 75. Each number thus corresponded to a frame in the movie. The third track was the F0 curve (Figure 3). In this way, the F0 produced at each image could be determined and displayed together with the articulatory measures. RESULTS Figures 4–6 illustrate the reproducibility of the data in terms of the lip and jaw opening and the height of the tongue dorsum for the two takes of the same vowel. The dashed vertical lines represent the occurrence of the pitch shifts. In the graphs, the data have been warped so that they are synchronized with respect to the timing of the pitch shifts. The lip and jaw openings show a moderate agreement between the takes, /i, e/ showing particularly large differences. In these vowels, the subject seemed to choose to start with the widening of the lip opening in one take and start with the widening of the jaw opening in the other take. The tongue dorsum data show a higher degree of reproducibility. In some cases, a tendency to a stepwise change pattern, synchronized with the pitch changes, can be observed. The jaw opening in the front vowels /i, e/ provides good examples, particularly in the second take. A quantitative estimate of agreement of articulator position between takes 1 and 2 was calculated. First, the average lip,
FIGURE 2. Example of a frame from the movie. The rectangles show the measurements, the lip opening, the jaw opening, and the height of the tongue dorsum. Also, the resulting values are shown.
548
Journal of Voice, Vol. 23, No. 5, 2009
FIGURE 3. Example of the recordings. The top curve is the audio signal, and the middle curve the synchronization pulses, each of which shows the timing of a frame in the movie. The bottom curve is the F0 curve extracted from the audio recording by means of the SoundSwell Corr module. The numbers at the top show the numbering of the synchronization pulses.
jaw, and tongue dorsum values was calculated for each pitch in take 1 and in take 2. Then, the absolute values of the differences between these averages were computed. The results are shown in Figure 7. The poorest agreement between the takes was obtained for the jaw and excess lip openings in the vowel /i/ and for the jaw opening in the vowel /e/, thus reflecting the difference in lip and jaw behavior that could be observed in Figures 4 and 5. The agreement averaged across all data was 2.1 mm, that is, quite close to the 2.3 mm quantization step. By and
large, these data show that the subject’s articulatory behavior was reasonably similar in takes 1 and 2. Figure 8 shows, as a function of pitch, the mean of the lip, jaw, and tongue dorsum data for each pitch, averaged across takes 1 and 2. The dotted lines in the graphs represent the crossover pitch, at which F0 equals the subject’s normal value of F1. The articulator position changes are mostly systematic, starting at a pitch that is several semitones below the crossover pitch. For all vowels except /a/, the jaw opening is widened
FIGURE 4. Excess lip opening observed in the first and second takes (open and filled symbols, respectively) of the indicated vowels. The dashed vertical lines show the pitch transitions.
Johan Sundberg
Articulation and Pitch in a Soprano Singer
549
FIGURE 5. Jaw opening observed in the first and second takes (open and filled symbols, respectively) of the indicated vowels. The dashed vertical lines show the pitch transitions.
FIGURE 6. Height of the uppermost point of the tongue dorsum relative to the jawbone contour (Figure 2) observed in the first and second takes (open and filled symbols, respectively) of the indicated vowels. The dashed vertical lines show the pitch transitions.
550
Journal of Voice, Vol. 23, No. 5, 2009
FIGURE 7. Absolute values of the mean differences between takes 1 and 2, averaged across pitches, for the excess lip opening, the jaw opening, and the tongue dorsum height (left, middle, and right columns in each group) observed for the indicated vowels.
considerably in the upper part of the triad. For the vowels /i, e, u/, the lowering of the tongue dorsum was started already on the second tone of the triad, that is, the major third, which was, respectively, 2, 4, and 3 semitones below the crossover pitch. For the vowels /o, a/, the tongue dorsum and the jaw opening were kept rather constant up to the crossover pitch. In /o/, the excess lip opening started to expand already eight semitones below the crossover; and in /a/, a clear change of the tongue dorsum and the excess lip opening can be noted at a pitch three semitones below the crossover.
DISCUSSION This investigation is limited in important respects. One limitation is that just one single subject was analyzed. Another limitation is that the recording conditions were clearly unnatural. The subject is a professional singer performing in public concerts in different places, and she has extensive experience also as a singing teacher at high levels. Hence, her vocal technique seemed a worthwhile object of study. In addition, it can be assumed that the robustness of singers’ phonatory and resonatory strategies increases with the degree of skill and the amount of experience. Therefore, it seemed reasonable to assume that the special recording conditions would not disturb her professional vocal behavior. This assumption was also supported by the articulatory consistency that she showed when repeating the same task; this consistency is apparently incompatible with random articulatory behavior. A more important limitation was the 2.3 mm/pixel resolution of the MR images. A much greater resolution would have been needed, particularly for articulators producing narrow vocal tract constrictions, such as the lips in /u/ and /o/ and the tongue dorsum in /i/. The uncertainty of the measurements was somewhat reduced by calculating averages. The choice of measures deserves some comments. We measured both the lip opening and the height of the tongue dorsum in relation to the jaw opening. This appeared reasonable because the positions of both these articulators are heavily influenced by the jaw opening. For example, unless special compensatory movements are made, an increase of the jaw opening will automatically induce a wider lip opening and
FIGURE 8. Changes of articulator positions relative to the start position for the indicated articulators and vowels.
Johan Sundberg
Articulation and Pitch in a Soprano Singer
a lower position of the tongue dorsum. Our lip and tongue dorsum measurements reflect such compensatory movements. In most vowels, the singer changed the shape of the tongue dorsum for the highest pitches, where she raised both the posterior and the anterior parts of the tongue, such that it formed a U-shaped contour in the sagittal plane. In these cases, the highest point on the tongue contour was sometimes constituted by the tongue tip and sometimes by the posterior tongue hump, whereas at low pitches it was a point in between these points. One might argue, then, that in these cases the tongue dorsum measure was not quite adequate. On the other hand, our main question was what the articulatory changes were in the lower part rather than in the top part of the singer’s pitch range. Even though the measures of the three articulators analyzed here were related to the values observed for the low F0 at the beginning of the triad example, they varied greatly in the range of variation. The jaw and the excess lip openings varied between 0 and around 15 mm, whereas the tongue dorsum height varied only within 0 to ±5 mm. As a consequence, the changes of tongue shape appeared much smaller than those of the lip and jaw openings in the graphs. It should be kept in mind, though, that the acoustic effect of a change in the position of an articulator is determined by the resulting relative change of the crosssectional area of the vocal tract. Therefore, a tiny change of the tongue dorsum height in the front vowels /i, e/ will have a substantial impact on the vocal tract constriction and hence on the formant frequencies. The singer subject replicated the changes of tongue position quite accurately, when repeating the same task, as illustrated in Figure 6. This suggests that the tongue shape is a highly important tool for tuning the formant frequencies, particularly in vowels where it is forming a narrow vocal tract constriction.9 In the front vowels, the singer thus reduced the degree of vocal tract constriction by reducing the tongue bulging. By and large our results agreed with those of the previous study of singers’ pitch-dependent variation of jaw opening.7 There it was found that for the vowels /u, o, e, i/ the singers started to widen the jaw opening when F0 had reached a frequency that was about five semitones above the subject’s normal value of F1. In the present investigation, similar observations could be made for the vowels /e, u, o/. In /i/, the singer initiated the widening of the jaw opening already at two semitones above the crossover pitch. These findings support the assumption that the very special experimental conditions did not cause an atypical vocal behavior. What articulator position changes were likely to aim at increasing F1? In /a/, the changes of both jaw opening and the lowering of the tongue dorsum are likely to raise F1. The lowering of the tongue dorsum in /i, e, u/ and the widening of the excess lip opening in /o/ are strong candidates. The articulatory strategy for high-pitched singing thus seems to first to reduce vocal tract constriction and then to widen the jaw opening. In the /a/ however, this strategy was not used. The reason would
551
be that in this vowel a reduced constriction leads to a large rise of F2, which causes a vowel quality shift from /a/ to /æ/.6 The articulatory changes appeared already when F0 was about four or five semitones below the crossover value. The purpose may be to achieve timbral similarity. Some singing teachers recommend their students to prepare for an approaching high note when singing the preceding lower note; the physiological target of this recommendation might be articulatory adjustments of the kind found in the present investigation. CONCLUSIONS Considerable and systematic articulatory changes were observed in our professional soprano singer when she sang an ascending triad pattern in the range C4–G5 (262–784 Hz). She started to modify her lip opening, jaw opening, or the height of her tongue dorsum at an F0 that was between four or five semitones below the F0 that equaled her normal F1 value in low-pitched singing. The changes of the jaw opening varied between 8 and 19 mm, depending on pitch and vowel. The associated changes of the lip opening were somewhat counteracted or amplified, again depending on pitch and vowel. The tongue dorsum relative to the lower jaw was reduced in the vowels /i, e, u/ and remained more constant in the vowels /a, o/. Presumably, these articulatory changes aimed at increasing F1 so as always to keep it above F0. Acknowledgments The author gratefully acknowledges the kind and patient cooperation of the singer subject and the excellent assistance during the recordings of the staff at the Radiology Department at the Hoˆpital Erasme, Brussels. The results were first presented at the 2003 Annual Symposium Care of the Professional Voice in Philadelphia. REFERENCES 1. Sundberg J. Formant technique in a professional female singer. Acustica. 1975;32:89-96. 2. Johansson C, Sundberg J, Willbrand H. X-ray study of articulation and formant frequencies in two female singers. In: Askenfelt A, Felicetti S, Jansson E, Sundberg J, eds. SMAC 83, Proceedings of the Stockholm International Music Acoustics Conference, Vol. 1. Stockholm: Roy Sw Acad Music; 1985:203-218. 46. 3. Sundberg J. Vocal tract resonance. In: Sataloff R, ed. Professional Voice: The Science and Art of Clinical Care. New York: Raven Press; 1991:49-68. 4. Titze I. A theoretical study of F0-F1 interaction with application to resonant speaking and singing voice. J Voice. 2004;18:292-298. 5. Miller D, Schutte H. Physical definition of the ‘‘flageolet register’’. J Voice. 1993;7:206-212. 6. Lindblom B, Sundberg J. Acoustical consequences of lip, tongue, jaw, and larynx movement. J Acoust Soc Am. 1971;50:1166-1179. 7. Sundberg J, Skoog J. Dependence of jaw opening on pitch and vowel in singers. J Voice. 1997;11:301-306. 8. Granqvist S, Hammarberg B. The correlogram: a visual display of periodicity. J Acoust Soc Am. 2003;114:2934-2945. 9. Fant G. Acoustic Theory of Speech Production. The Hague: Mouton; 1970.