Journal ofPhonetics (1979) 7, 71-79
Maximum speed of pitch changes in singers and untrained subjects Johan Sundberg Department of Speech Communication, Royal Institute of Technology (KTH), S-100 44 Stockholm 70, Sweden Received 27th August 1977
Abstract:
The maximal speed of voice pitch changes is measured in singers and untrained subjects of both sexes. Typical differences are observed between each of these four groups. On the average, singers change pitch more quickly as compared with untrained subjects. The same observation is made regarding female subjects as compared with male subjects. Unlike singers, untrained subjects perform pitch drops considerably faster than pitch elevations. Implications with respect to the properties of pitch changing mechanisms are discussed.
Introduction Pitch changes can be regarded as manifestations of the pitch regulating system and can be assumed to mirror properties of this system. The characteristics of pitch changes performed as quickly as possible by untrained speakers have been examined by Ohala (1972) and Ohala & Ewan (1973). Singers can be assumed to use the pitch regulating system with maximal efficiency. Therefore, an investigation of maximally fast pitch changesin singers may complement our knowledge about the system in various respects. The purpose of the present paper was to collect data on the maximal speed with which pitch can be changed in singers and untrained subjects of both sexes. Subjects The untrained subjects (five females, five males) participated in a 1-week voice training course. None of the subjects had abnormal voices, and some of them had had modest experience singing in a choir. The trained subjects (five females, five males) represented a group of experienced singers. All of them had had several years of voice training, and most of them presently work as opera and concert singers. Experimental procedure The subjects were asked to alternate repeatedly between two given pitches, and thereby to perform the pitch changes as rhythmically and as quickly as possible. The rhythm was signalled to the subjects either by the experimenter's counting combined with distinct hand movements, or by clicks in ear phones combined with flashes of a small light. In this way, the interval singing resembled a legato performance of the sequence of notes schematically indicated in Fig. 1. At least eight repetitions of each pitch change were obtained from each subject. The intervals sung were the octave, fifth, major third, and in the case of trained females-also the major second. These intervals correspond to a frequency ratio 0095-4470/79/020071 +09$02.00/0
© 1979 Academic Press Inc. (London) Ltd.
72
J. Sundberg
jJ]J]J]J]J] Figure 1
etc .
Schematical representation of the tone sequence performed by the subjects.
of 1:2, 1: 1·5, 1: 1·25, and 1: 1·12, approximately. In each subject all intervals started from the same lower pitch which was located in the lower part of the subject's range. The signal was recorded on tape.
Measurements The fundamental frequency was measured by means of a zero-crossing detector preceded by a low pass filter. The output was registered on an oscillograph using a paper speed of 100 mm s- 1 . A typical example of the registrations is given in Fig. 2, which also illustrates how the transient durations were measured. First, the start and end frequencies were determined in each sample. Then those points on the curve that represented 1/8 and 7/ 8 of the difference between the start and end frequencies were identified. Finally, the time interval separating these two points was measured. In accordance with the procedure chosen by Ohala & Ewan (1973) this measure was defined as the response-time. Thus, the response time is the time needed by the subject in order to produce 6/8 of the pitch change. In most cases the singers displayed a vibrato, i.e. the quasistationary fundamental frequency showed a regularly oscillating value. In these cases the start and end frequencies were defined as the average of the quasi-stationary frequency. Four response time values were selected for measurement in each interval and subject. Thereby, exceptionally slow transitions were disregarded, since it is the maximum speed of pitch changes which is of interest. The interval widths were expressed in semitones, which is a logarithmic measure of frequency ratios (l semitone corresponds to a fundamental frequency ratio of 1: 12 y2). In this way comparisons between male and female subjects were facilitated.
i
>c
u
"::>
0"
~ 0
c "E 0 '0
c
::>
u.
Time-
Figure 2
Schemati zed representation of a fundamental frequency-time graph. I is the width Qf the interval sung, i.e. the magnitude of the pitch change. The time needed for executing the central 75 % of the pitch change is defined as the response time.
Speed ofpitch changes
73
General observations Due to the purpose of the investigation the accuracy with which the subjects managed to reproduce the prescribed intervals was not examined in detail. On the average, the subjects managed to match the ideal frequencies within ± a half semitone. These deviations are disregarded in the plotting of the results as seen in Figs 3 and 4. As previously mentioned, the singers developed a vibrato in most cases. A common trend observed was that pitch changes in a given direction tend to be synchronized with that phase in the vibrato cycle which changes the frequency in the same direction. Thus, a pitch rise is synchronized with the vibrato phase raising the frequency and vice versa (cf Vennard, 1971). Moreover, given the rhythm all vibrato periods seem to have identical durations. This means that the singers adjust the periodicity of their vibrato so as to match the timing prescribed by the rhythm. By varying the rhythm, vibrato period variations of up to I 0 % were observed ; whereas, the response time did not show systematic variations. When the auditory feedback was eliminated by means of masking noise introduced in the ears of one of the singers, the regularity and thus the synchronization was Untrained moles Rising
Falling
200
-;;; E 150
.. .. ... E
c: 0
a.
100 50
~ ,
t::".1 .. :
..
..:+
0::
0 4
Figure 3
6
8
10
16 12 14 4 6 8 Interval width (sem i lones)
10
12
14
Response time values given by Ohala and Ewan (1973) (small points) and the averages obtained for our group of untrained male subjects (heavy points and solid lines). The bars represent the scatter in terms of± one standard deviation.
Rising
Falling
120
!.. 100 E
.."' ..
80
---------==-==: ~---..0
c:
~
0
:; 60
0::
-40
4
8
12
4
8
12
Interval width (semilones)
Figure 4
16
Average response time of male (filled circles) and female (open circles) untrained subjects (dashed lines) and singers (solid lines).
74
J. Sundberg
disturbed : the standard deviation of the average duration of four or five consecutive vibrato periods increased from about 15 to 30 ms. This seems to suggest that the auditory feedback provides information of relevance to the synchronization of vibrato speed and pitch changes. However, all subjects did not adjust the vibrato speed in accordance with the rhythm given. In some cases the vibrato simply disappeared or the rhythm was not maintained.
Results As the response time in this experiment was defined in the same way as in the investigation ofOhala & Ewan (1973), our results should be comparable to theirs . In Fig. 3 the averages from our group of untrained male subjects can be compared with the data given by Ohala & Ewan which pertain to the same type of subjects. Our subjects tend to exhibit somewhat shorter response time values than the subjects in the Ohala & Ewan study. This difference is presumably due to the fact that in our measurements each subject's slowest transitions were disregarded. Apart from this small difference the agreement between the two sets of data is good. This agreement supports the assumption that our group of untrained males was representative, and that the differences in the experimental procedures between the two investigations did not affect the results appreciably. The entire material was submitted to an analysis of variance (split plot design, sex and training constituting between-blocks factors, the magnitude and direction of the pitch change within-block factors; mixed model , blocks considered as random variable cf. Kirk, 1968, p. 311) . The following sources of variance were investigated: the magnitude and direction of the pitch change, and the sex and training of the subject. The results are listed in Table I. In that table only those factors and interactions are included that showed a significant influence on the results. More specifically, in each group in the table all nonsignificant factors (P > 0·05) are excluded. One of the members of the group of trained females, which had the least amount of training, showed longer response time values than the remaining subjects in that group. In order to evaluate the effect of this on the results, this subject and one subject in each of the other three groups were excluded, and a new analysis of variance was run over this reduced material. The result of this test can be compared with those pertaining to the entire material in Table I. It can be seen that the major trends are the same in the original and the reduced material. Still, in the analysis of variance within the group of trained females the subject mentioned was excluded. As seen in Table I all four main factors and the interaction between training and duration were significant. The meaning of these results may be seen in Fig. 4. The response time is shorter in pitch drops than in pitch elevations, in small pitch changes, in female subjects, and in trained subjects. Two of these factors show an interdependence; the direction of the pitch change has a significant effect on the response time in untrained subjects only. The analysis of variance within each of the four groups (randomized block factorial design, mixed, non-additive model; cf. Kirk, 1968) reveals that the individual subject is a factor contributing to the scatter of data significantly in all groups. In other words, the mean time for pitch change (averaged over the different magnitudes and directions of pitch change) is different in different individuals. Also, in the group of untrained males there is an interaction direction-magnitude-subject. The magnitude of the pitch change is significant in all groups except trained males, where it is subject dependent. The direction
Speed of pitch changes
15
Table I
16 Subjects
20 Subjects All subjects Source
F
d.f.
p
F
d .f.
p
D irection Magnitude Sex Training D irection-Training
34·83 24·90 5·37 11·60 10·77
1/16 2/32 1/16 1/ 16 1/16
<0·01 <0·01 <0·01 <0 ·01 <0·01
28·04 20·51 4·84 17·14 14·68
1/1 2 2/24 1/12 1/ 12 1/12
<0·01 <0·01
Untrained males (5 subjects) D irection Magnitude Subject Direction-Magnitude-Subject
25·55 4·34 5·73 3·76
1/4 2/8 4/90 8/90
Untrained females (5 subjects) D irection Magnitude Subject D irection-Subject
16·91 23·88 3·70 3·01
1/4 2/ 8 4/90 4/90
Trained males (5 subjects) Subject Magnitude-Subject
18-47 5·65
4/90 8/90
<0·01 <0·01
Trained females (4 subjects) Magnitude Subject Magnitude-Subject
6·88 43·24 1·94
3/9 3/96 9/96
~o-01
~0·05
<0·01 <0·01
<0·01 ~0·05
<0.01 <0·01 <0·05 <0·01 ~0 . 01
<0·05
<0·01 ~0 · 05
of pitch change is significant only in the two untrained groups. (In the female trained group the direction factor is just below the limit of being significant.) Summarizing the results, the response time shows a significant dependence of sex, t raining, and the width and the direction of the pitch change. Females tend to perform pitch changes faster than males, and trained singers change the pitch faster than untrained subjects. Moreover, the greater a pitch rise is, the slower is the response time. Pitch elevations are performed slower than pitch drops in untrained voices. The magnitudes of the average response time differences are illustrated in Fig. 4. The differences between the groups are considerably greater in pitch rises than in pitch drops. In pitch drops the sex of the subject seems to be a more important factor than the training, while in pitch elevations training is more decisive than sex.
D iscussion Ohala (1972) and Ohala & Ewan (1973) have suggested an explanation for the asymmetry in the response time between pitch rises and pitch drops. First, they interpret the rapid pitch drops as an indication that pitch is lowered actively. Second, in order to explain why larynx height and pitch show an interdependence in several subjects, they suggest that a raising of the larynx tenses the vocal folds in a vertical dimension. To quote Ohala &
76
J. Sundberg
Ewan : "The difference in time taken to raise and lower pitch could be explained if we could show that the anterior-posterior tensing mechanism was fastest for raising pitch and the vertical tensing mechanism was fastest for lowering pitch." Thus, according to these authors, the fast performance of pitch drops may possibly be explained if the pitch dependent vertical movements of the larynx are taken into account. In singing, pitch dependent shifts in larynx height probably cannot be accepted. This assumption is supported not only by a general agreement on this point among singing teachers, but also by the fact that a high larynx position is likely to destroy the "singing formant", which evidently is a desirable characteristic of professional Western opera and concert singing, at least in male voices (Sundberg, 1972, 1974, 1975, 1977). Also, formant measurements of the dependence of larynx height on pitch in professional singers support the conclusion that this dependence decreases with the singers' skill (Frommhold & Hoppe, 1965 ; Shipp & Izdebski, 1975). Nevertheless, singers were observed to perform pitch changes faster than untrained subjects. Thus, if larynx height is causally related to the speed of pitch changes, this relationship seems to be rather complex. Pitch changes are due to movements in structures in the larynx from one given start position to another given target position. Thus, we may regard pitch change curves as manifestations of muscular movements. The fact that the target pitch is not approached asymptotically in these curves supports the assumption that antagonistic muscle functions are involved. Thus, the shape of the pitch change curves seems to indicate that the pitch is lowered actively. Also in most fast and voluntary movements of human structures between two given positions, two antagoni stic muscle groups are involved. With respect to a specific movement, one group plays the role of an accelerator: its contraction accelerates the structures toward the target position. The other muscle group plays the role of a decelerator : it decelerates the moving tissue so that it stops at the target position and does not pass it. The maximum speed with which the structure can be moved between two given positions probably depends on several factors. Among these we may assume that the following are important for fast changes of the voice pitch. One is the force per unit of mass to be moved , that the muscles involved develop. Probably, the force of the accelerator is more decisive to the maximal speed of movement than the force of the decelerator. Another factor must be the time constants characterizing the feed-back system used for controlling the position of the moving structure. A third factor may be the contraction range of the muscles involved as compared with the contraction minimally required for the actual movement. These considerations may help us to find hypothetical explanations to the differences in response time values that have been found above. Let us start by considering the asymmetry between rising and falling intervals. An important pitch-raising muscle is the cricothyroideus. According to some authors, pitch can be lowered by contracting the thyroarytenoideus lateralis (Van Riper & Irwin, 1958 ; Zemlin, 1968 ; Lindqvist, 1972). There are no EMG-data supporting this assumption , but this may be due to the difficulties of obtaining an EMG-signal which is known to emerge from this muscle and not from the vocalis muscle. Also, if the thyro-arytenoideus lateralis is assumed to lower the pitch, the narrowing of the larynx tube opening, which is frequently observed to accompany a pitch drop, becomes explicable. Thus, according to Lindqvist (1972) , this muscle not only constricts the larynx tube but also shortens and laxes the vocal folds. As a larynx tube constrictor, it can be said to have the function of protecting the larynx and the lungs. Note also that neither in the trained nor in the untrained groups the time needed to lower the pitch (or possibly to protect the lungs by closing the larynx tube) shows a very high
Speed of pitch changes
77
dependence on the initial positions of the pitch regulating tissues in phonation. Protecting muscles can be assumed to be well developed and quick in operation because of their importance to vital functions. If so, the thyro-arytenoideus lateralis must be assumed to be well developed and quick in operation in all subjects regardless of training. The cricothyroideus does not possess a protective function. Hence, it is not unlikely that this muscle may be developed by training. The difference in response time between singers and untrained subjects is much larger in pitch elevations than in pitch drops. If we assume that the strength of the accelerator is more important to the speed of the movement than the decelerator strength, this difference between singers and untrained subjects becomes understandable. It would reflect the consequences of an increase of the force per unit of mass to be moved developed by the crico-thyroid muscle, the accelerator in pitch elevations. The sex differences in response time values may very well be due to the same effect. If so, we would expect to find a pitch regulating system which develops more muscle force per unit of mass to be moved in females than in males. The differences in response time may very well be due to more than one thing. The feedback system used for controlling the "position" along the pitch scale may differ between singers and untrained subjects. We may assume that untrained subjects rely to a higher degree on a slow auditory feedback system than singers do . Singers may rely more on a sort of "muscle memory" developed during the training. They "remember" how much and when the various muscles must contract in order for the pitch to change from one given value to another. This ability to explore and memorize the function of a muscle system seems to be of great relevance in other forms of music playing. For instance, unlike the learner, the professional pianist does not need to follow the fingers with his eyes in order to hit a far-lying note on the keyboard. By experience developed during practice he "knows" exactly how much and when the relevant muscles shall contract. It is likely that this ability to memorize muscle positions and contractions is developed and used also in singing, i.e. that the singer learns to arrive at the intended pitch without using the slow auditory feedback. This assumption seems to find support in experiments where subjects sang with noise masking the auditory feedback signal (Michel, 1974). A difference in the use of the auditory feedback may account for the differences between singers and untrained subjects to a certain extent only. Thus, it cannot explain why the response time differences are greater in rising than in falling intervals. The importance of the contraction range of the relevant muscles to the speed of pitch changes may be studied by varying the frequency level of the intervals systematically within the subjects' range. Such experiments were not included in the present investigation. Therefore this question is left open for future research. The vibrato is frequently assumed to be due to a self-oscillation of the pitch regulating system. The frequency of these oscillations would then depend on the time constants inherent in the pitch regulating system. But these same time constants also manifest themselves in the response time of pitch changes. Therefore, an interrelationship between the speed of the vibrato and the response time in pitch changes might be expected. Our results did not show any interrelationship of this kind, though. For instance, in the experiments with varied rhythm, the vibrato periodicity changed, whereas the response time values remained essentially the same. From these data it seems safe to conclude that the vibrato generating system is not entirely identical with the pitch regulating system. The explanations suggested above to the response time differences between the groups of subjects are certainly speculative. This seems to be an inevitable consequence of the limited knowledge that we possess about the pitch regulating system. It is likely that this
78
J. Sundberg
knowledge will increase substantially when models have been developed describing the system which controls, adjusts, and generates the voice pitch. One more aspect should be considered: how could voice changes be synthesized? The pitch drops provide the easiest case. Here the dependence on the magnitude of the pitch change is weak. It will take 50 or 80 ms to lower the pitch depending on the subject's sex and degree of training rather than on the size of the pitch change. Thus, the speed of the pitch change can be considered as a linear function of the magnitude of the change: the greater the pitch change is , the higher is the speed. In other words, the pitch change mechanism behaves similar to a simple low pass filter responding to a step function . This indicates that, in synthesizing singing, such a low pass filter smoothing the signal , which determines the voice pitch, would provide a realistic synthesis of pitch drops. The pitch raising mechanism differs from the pitch lowering mechanism in that the time required for a pitch rise depends on the magnitude of the change, among other things. Here then, there is no linear relationship between the magnitude and the rate of the pitch change. Bence pitch rises cannot be accurately synthesized by the same means as pitch drops, i.e. with a low pass filter smoothing the pitch regulating signal. On the other hand, the typical differences in the rate of pitch rises and pitch drops do not seem to possess any substantial perceptual relevance (Larsson, 1977). Conclusions The data collected in the present investigation indicate that there are typical differences between various groups of subjects regarding the transient time required to complete changes of the voice pitch. Female subjects perform pitch changes faster than male subjects, and singers change the pitch faster than untrained voices. In untrained voices , pitch elevations take longer time than pitch drops, whereas this difference is much less pronounced or even absent in singers. The dependence of the response time on the interval width varies considerably between subjects. The general trend is, however, that the response time increases slightly with interval width , particularly in the case of untra ined subjects performing pitch elevations. Hypothetically, the differences between singers and untrained subjects may be regarded as the combined effect of a development of the cricothyroideus muscle and a " muscle memory" in the pitch regulating system in singers. The results do not seem to support the view that larynx height alterations increase the speed of pitch changes. Among singers, pitch changes of a given direction tend to be synchroni zed with that phase of the vibrato cycle that changes the frequency in the same direction. At least in part, this synchronization seems to depend on the auditory feedback system .
The author has profited greatly from discussions with colleagues at the Department of Speech Communication, particularly with Dr M. Rothenberg (guest researcher from Syracuse University, New York) and Dr J. Lindqvist Gauffin. Dr A. Gabrielsson of the Department of Psychology, Uppsala University has provided the statistics treatment. The data pertaining to the group of untrained subjects were available by cooperation with S. Wedin , Boden. The work was supported by the Bank of Sweden Tercentenary Foundation. References Frommhold, W. & Hoppe, G. (1965). Tomographische Studien zur Funktion des menschlichen Kehlkopfes. Folia Phoniatrica 17, 83- 91. Kirk, R. E. (1968). Experimental design . Procedures for the Behavioral Sciences Belmont, Calif: Brooks/Cde. Larsson, B. (1977). Music and singing synthesis equipment (Musse). STL-QPSR 1, 38-40.
Speed of pitch changes
79
Lindqvist, J. (1972). Laryngeal articulation studied on Swedish subjects. STL-QPSR 2-3, 10-27. Michel, J. (1974). Analytic studies of the larynx. Paper given at the voice symp. at the Julliard School of Music, New York, U.S.A., June. Ohala, J. (1972). How is pitch lowered? Journal of the Acoustical Society of America 52, 124(A) Ohala, J. & Ewan, W. (1973). Speed of pitch change. Journal of the Acoustical Society of America 53, 345 (A). Shipp, T. & lzdebski, K. (1975). Vocal frequency and vertical larynx positioning by singers and nonsingers. Journal of the Acoustical Society of America 58, 1104-1106. Sundberg, J. (1972). A perceptual function of the 'singing formant' . STL-QPSR 2-3, 61-63. Sundberg, J. (1974). Articulatory interpretation of the 'singing formant'. Journal of the Acoustical Society of America 55, 838- 844. Sundberg, J. (1975). Formant technique in a professional female singer. Acustica 32, 89-96. Sundberg, J. (1977), Singing and timbre. In Music Room Acoustics Vol. 17, pp. 57-81. Stockholm: Royal Swedish Academy of Music Publications. Van Riper, C. & Irwin, J. V. (1958). Voice and Articulation New Jersey: Englewood Cliffs. Vennard, W. (1971). The relation between vibrato and vocal ornamentation. Journal of the Acoustical Society of America 49, 137(A). Zemlin, W. R. (1968). Speech and Hearing Science New Jersey: Englewood Cliffs.