THE PHYSICS OF THE
SINGING VOICE
GJ. TROUP Physics Department, Monash University, Clayton 3168, Victoria, Australia; Lecturer in Musical Acoustics, Melba Memorial Conservatorium of Music, 16 Hoddle St., Abbotsford, Victoria, Australia
t93t
NORTH-HOLI..AND PUBLISHING COMPANY - AMSTERDAM
PHYSICS REPORTS (Review Section of Physics Letters) 74. No. 5 (198!) 379—401 North Holland Publishing Company
TilE PHYSICS OF THE SINGING VOICE Measurements on the voices and vocal apparatus of trained singers, and their interpretation G.J. TROUP Physics Department, Monash University, Clayton 3168, Victoria, Australia Lecturer in Musical Acoustics, Melba Memorial Conservatoriurn of Music, 16 Hoddle St., Abboisford, Victoria, Australia Received December 1981)
Contents: I. 2. 3. 4.
Introduction (mainly historical) The acoustic model for the analysis of the singing voice Spectral characteristics of the trained singing voice Vocal intensity. subglottal pressure. and air flow relationships 5. The source spectrum of the singing voice
381 382 386 390 391
6. The origin of the upper singing formant 7. The vibrato 8. Conclusions and discussion 9. Acknowledgements Appendix References
Single orders for this issue
PHYSICS REPORTS (Review Section of Physics Letters) 74, No. 5(1981) 379—401. Copies of this issue may be obtained at the price given below. All orders should be sent directly to the Publisher. Orders must he accompanied by check. Single issue price Dfl. 11.00, postage included.
0 370-1573/81/0000—0000/$4.60
©
1981 North-Holland Publishing Company
395 396 397 399 4(J() 400
G.J. Troup, The physicsof the singing voice
381
1. Introduction (mainly historical) Music, and in particular vocal music, is found in every culture, however advanced or “primitive”. This article, though, is confined to the physicsof the singing voice which has been trained in the western (Italian) operatic, or ‘lieder’ style of singing. The term ‘bel canto’ seems a good one to use in this regard. The restriction is not meant in any way as a value judgement, or a criticism of other styles of singing, or other cultures: the fact is, that the great majority of the research which has been done on the singing voice has used professional or would-be professional singers trained in the ‘bel canto’ style. Every culture has its legendary hero or god musician, whocould charm and even control the whole of his world with his music. Thus we might begin this review by mentioning Orpheus, whowith his lyre and his voice managed to charm his way into the Underworld in an attempt to recover Euridice. His fate after he failed, because he lookedback too soon, was either to be torn to pieces by the Bacchantes because he was too melancholy to join in their revels; or to be apotheosed by Apollo (the inventor of the lyre) amongthe stars (depending upon which version of the legend one takes). Another voice from ‘classical’ times, notorious rather than famous, was that of the Roman Emperor Nero. He believed in purging himself to strengthen his voice (!), and won all the song contests at the various festivals in Greece by bringing his own claque, and by making it very clear to the adjudicators what their fate would be if he did not win. After Christianity was at last permitted by the Roman Empire, a school of church music, including singing, was founded in Rome about 350 AD. A little later in this same century, strophic hymnody was invented by St. Ambrose of Milan. In 590 AD, Pope Gregory the Great, who encouraged the use of plain chant in church services, found a schola Cantorum in Rome*. At this time, it must be remembered, castrati were used, and women were forbidden to sing in public, because it was regarded as much too lascivious and exciting. From this time on, despite the prevalence of plain chant, polyphonic music developed, not only in the Church, but in the popular ballads and madrigals that were sung. However, the ‘bel canto’ style of singing can be said to have originated with the Renaissance: in particular, with the founding by Count Giovanni de’ Bardi and Jacopo Corsi, of the Camerata [1] in Florence at the end of the 16th century. The composers particularly involved were Caccini, Pen and Del Cavaliere, and they sought to return to what was thought at that time to be the earlier, simpler, Greek classical style of the presentation of vocal music. For them, this meant monody rather than polyphony, one voice rather than many, and a more declamatory style, with the accompaniment often of only a few chords from the harpsichord. The ‘Chief Theoretician’ of the Camerata was Vincenzo Galilei, father of Galileo Galilei. Vincenzo was, in his time, a noted lutanist, singer, arranger and composer, and his writings considerably influenced the development of music. He also performed quantitative experiments in musical acoustics [2]. Music was first given numbers (the simple ratios of octave, ‘perfect’ fifth and ‘perfect’ fourth) by Archimedes, of whose musical abilities we know almost nothing. Similarly, we know rather little of Galileo as a musician; however, he is known to have played the lute for which he wrote passable music [2]; to have written scurrilous verse; and is thought to have timed his famous inclined plane experiments by singing songs [2]. His two great contributions to the development of music and musical acoustics were the discoveries of the pendulum (after all, an inverted metronome!) and of the laws of the acoustics of stretched strings (independently of Mersenne). —
*
—
The Conservatorio of Santa Cecilia, in Rome, can trace its history back to this.
382
Cf. Troup, The physics of the singing voice
But it is in the work of Caccini, “Le Nuove Musiche” [31 that the first mention is made of the necessity or ability to swell and diminish the loudness of the voice (in his words ~cresceree scemare la voce”): i.e. of what is now known as the ~messadi voce’, which is recognized as the basis of all good singing. The declamatory style led to the development of Opera (Monteverdi’s ~Orfeo’was produced in 1607), which spread all over Italy, with the aid of the schola Cantorum. The first opera house per sé opened in Venice in 1607. The declamatory style, and the use of the ~messa di voce’ (swelling and diminishing) and the ‘esclamazione’ (starting rather loudly and then diminishing), would have led to control of the breath. even if only to obtain proper musical phrasing. As will be seen later on (sections 4 and 5), this leads inevitably (with the use of the simple Italian vowels a, e, i, o, u) to the more efficient functioning of the vocal cords, and hence to the harmonic enrichment of the voice. So began in Florence the ~belcanto’ style which was to lead to the development of great singers and great operas down the centuries. Since Newton was born in the year Galileo died, and his contribution to Mechanics developed from that of Galileo, it is interesting, if a little divergent, to enquire about Newton’s musicality. His approach to music was certainly scientific: “Newton, on hearing Handel play upon the harpsichord, could find nothing worthy to remark but the elasticity of his fingers” [4]. Further, “He said he never was at more than one Opera. The first act he heard with pleasure, the second stretched his patience, at the third he ran away” [5]. Perhaps the next milestone, after some two centuries of ‘bel canto’ singing, was the contribution to science of a great singer and teacher, Manuel Garcia, who invented the laryngoscope, and read a paper [6] at the Royal Society of London in 1855 on his observations of the working of the vocal cords. It must be noted here that some writers on singing, both then and now, have said that Manuel Garcia’s invention, and his treatise on singing [7] were actually negative contributions to the art and its progress. However, as will be seen later (section 5), progress in the understanding of the physics of the singing voice would have been extremely difficult without the laryngoscope. Considerable advances in the understanding of the mechanism of speech and hearing were made by Wheatstone [8], who first formulated the theory of vowel formant frequencies, and by Helmholtz [9], whose work “On the sensations of tone” still remains a classic in the fields of musical acoustics and psychophysics. But it was not until the invention of electronic valve led to the development of high-speed electronic recording in the late 1920’s and early 1930’s that real progress in analysis was made: Bartholomew’s paper [10] in 1934, which (after an analysis of vocal waveforms) enumerated the characteristics of the well-formed ‘bel canto’ voice, set both the pace and the standard for subsequent work. The development of high-speed cinematography led to the first moving pictures of the vocal cords in action [111,taken by the Bell Laboratories in 1940. It is a remarkable fact that some of the findings, confirmed independently by Fletcher [12] in 1950, still do not appear in works on the singing voice, and are even in direct contradiction to some of the statements of earlier authors. The enormous amount of work done on speech analysis and synthesis (see e.g. Chiba [13],Fant [14] and Flanagan [15])since the development of the science of electronics has also led to a considerable understanding of the singing voice, using a similar model. This model will be discussed in the next section. 2. The acoustic model for the analysis of the singing voice Before going on to describe the characteristics of the spectra of the ‘bel-canto’ voice, and to attempt to explain them, the model used in the analysis needs to be discussed. While the intensity of the singing
G.J. Troup, The physics ofthe singing voice
383
OUTPUT (MOUTH, HEAD ETC
V
RESONATORS
OUTPUT SPECTRUM
1
VfXAL TRACT)
SOURCE
TRACT RESPONSE VOWEL
‘AH’(APPROX
SPECTF&N N
CECILLATOR (VIBRATING VOCAL CORDS)
AIRFLOW
(IDEALISED
HHHHIIIII FREQUENCY
POWER SWRCE (LUNGS AND BODY MUSCLES)
Fig. I. Schematic diagram of the linear acoustic model for the singing voice.
voice can be markedly greater than for loud speech, and the (Fourier) harmonic content also greater than in normal speech, all the studies here considered agree that the linear model used in speech analysis is adequate for most purposes. This model has the advantage that the results of the enormous body of work done on speech analysis and synthesis (see, e.g. [13, 14, 15]) are available for application. Fig. 1 shows the model schematically. Air under pressure from a reservoir (the lungs) passes through a vibrating system (the vocal cords) which ‘chops’ the airstream into ‘puffs’, thus giving rise to a source spectrum which is essentially harmonic, though it can of course be modulated in both frequency and intensity. The resulting waveform is modified by the various resonances of the vocal tract, which we shall take as all that part of the larynx, throat, pharynx and sinuses above the opening between the vocal cords (the glottis). The final waveform emerges from the radiator, of which the main component is clearly the (open!) mouth. The resonances of the vocal tract are called ‘formants’; some can be changed at will by changing the shape of the vocal tract, while at least one (that in the region of 3000 Hz) is relatively fixed (though it can be changed at will) for the singing voice. It is the changes in the formant (resonant) frequencies that give rise to the vowel quality differences between various vowel sounds. The vowel formants correspond to the various modes of the vocal tract, considered as a complicated pipe closed at the glottis end, in which all modes having a pressure maximum at the glottis and a pressure
384
G.f. Troup, The physics of the singing voice
‘f
_____
~Ir
~
~
(‘es)
ARROWS INDICATE NODES OF VOLLIvIE VEL~ITY, AND THE THEREFORE ANTINODES OF THE PRESSLJ~E
•:~II1 (ah)
~NGING~ORMANT SPEAKING FORMANT NOT THE
(~hI
Fig. 2. Locations of antinodes of the volume-velocity in the vocal tract, considered as a tube (after Chiba and Kajiyama).
Fig. 3. Approximate vocal tract shapes for the vowels indicated (after Ladefoged).
minimum at the mouth are allowed (fig. 2). In this article, the only vowels considered will be the open Italian monopthongs a, e, i, o, u;t the approximate shapes of the vocal tract for these in speech are shown in fig. 3 (after Ladefoged [35]) and a table of corresponding formant frequencies (again for speech) is given in fig. 4. In the majority of analyses of the singing voice, the response of the vocal tract is considered to be linear in the same way that the response of a network of fixed resistors, or a length of transmission line, is considered to be linear. The vocal tract is also considered as being passive, in the sense that there are no generators or amplifier therein. For the responses under analysis, it is also considered as stationary in the time sense: the response does not change with time during a particular vowel under analysis, for example. The trained singer, and many untrained ones, also, on note of low pitch, feels a vibratory sensation in the chest. For the trained singer, this sensation can extend to the stomach and pelvic regions. It is clear that some sound must travel back into the chest cavity, but a suitable way to take this into account has not been agreed on, although van den Berg [16] has measured the acoustic impedance of the trachea (‘ee’)
(‘eh’)
(‘ah’)
(oh’)
(‘00)
Vowel
heed
hid
head
had
hod
haw’d
hood
who~
1st formant (Hz) frequency 2nd formant (Hz) frequency
228 2220
410 2093
555 1902
728 1756
673 1183
619 1165
337 837
209 846
Fig. 4. Table of formant frequencies corresponding to vowels indicated in fig. 3.
t The approximate English equivalents are: ‘ah’. eh’, ‘Ce’, ‘oh’, ‘oo’.
Gi. Troup, The physics of the singing voice
385
and lungs of the human corpse. The acoustic impedance is defined as the (complex) ratio of the pressure to the volume velocity. The vocal cords essentially form a high impedance source [17], so that the transmission (volume velocity at lips/volume velocity at input to vocal tract) is large for the formant frequencies and small for frequencies between the formants, behaving in the same way as the input impedance [16].Van den Berg found an input impedance to the lungs of dead subjects which had peaks and valleys corresponding to resonance and antiresonance of the trachea and connected bronchi, and a considerable valley at about 50 Hz, corresponding to the resonance of the trachea and lungs as a whole. However, he concludes “In spite of their typical resonant behaviour, the trachea and lungs are very efficient as a bellows and windpipe”. Hence, to a good approximation, we can ignore the resonant behaviour of the power source below the vibrator, as suggested in the model in fig. 1. The formant resonances of the vocal tract have been measured and calculated by a number of authors (see e.g. [13, 14, 15] for collected data), and these results agree substantially with the measurements made by van den Berg [16]on a hemilaryngectomized (one with half his larynx removed) subject. These results are established, and not in dispute. However, the question of what role the various sinuses may play in the formation of the singing tone has come in for considerable discussion. Flanagan [17] takes account of a ‘nasal shunt’ in some of his analysis, and we shall see later in section 5 that some singers do ‘nasalize’ vowels, at least on low pitches. Vennard [18] quotes experiments in which subjects sang with the maxillary sinuses (those in the cheek-bones) half-filled with water, and the nasal passages filled with gauze. The singing under these conditions was compared with that under normal conditions by 86 vocal authorities from the United States and 25 from Holland: all subjects were adjudged to have sung as well under the abnormal as well as the normal conditions.* On the other hand, a change in even speech quality is apparent when the mucous membranes lining the sinuses are swollen and discharging, as in the case of a cold. It is also interesting to note that one of the effects of adrenalin, which will be released when the singer is in the appropriate state of nervous tension during a performance, is to contract these mucous membranes against the bone, and thus reduce the acoustic losses in the sinus cavities. This also has a perceptible effect on vocal quality: a voice can sound ‘cold’ and only of average quality during a technical rehearsal, and ‘warm’ and of much higher quality during a performance. There is also the quoted comment in the book ‘Sinus Tones Production’ by White [19] that the reason for the rather ‘flat’ quality of Australian aboriginal (full-blood) voices is the almost total lack of frontal sinuses (those in the forehead above the eyes). The major thesis of White’s book, that the voice is formed by the sinuses and not by the vocal cords, is not tenable to-day. Finally, Manen [20], who has also published on voice production and singing voice spectra, states “The craft of the singer depends on his ability to make the air inside them (the ethmoidal sinuses, those on either side of the nasal passages) vibrate”. What seems to emerge from all this is that the sensation of vibration in the nose and sinuses is an indication of a correct ‘bel-canto’ voice production, but that the nose and sinuses themselves do not have a large part to play in the overall formation of the tone: the major resonators appear to be in the throat and pharynx above the glottis. Measurements of the radiated intensity around the head by Wolf, Stanley and Sette [21] have confirmed that the open mouth is approximately a hemispherical radiator. As it is ‘small’ compared with the wavelengths of pure tones below, say 2 kHz, this is not surprising. To quote them: “However, a few tests we made indicated that the sound level existing fifteen inches from the side of the head were in the order of 4 db lower than those existing fifteen inches in front of the mouth. This difference varied with *
It is a pity, however, that the voices were not recorded and the spectra analyzed.
386
G.f. Troup, The physics of the singing voice
the singing pitch.* With the singer facing away from the microphone and his lips fifteen inches from it, the intensity was down an additional 2db. For vowels other than ‘ah’ (Italian ‘a’)the differenceswere of varying magnitudes. The simple linear model is unable to explain the following results obtained with trained singers: (a) the presence of multiples of sub-harmonics of the fundamental frequency in the output spectrum [221; (b) the appearance of anharmonic components at times in the output spectrum of vocally very-educated sopranos [23].**This latter phenomenon is a function of the use of vibrato in pitch and intensity, and therefore the stationary (in the time sense) analysis implicit in the above model cannot strictly be used. The model need not, of course, be stationary, but this assumption greatly simplifies the analysis; BjØrklund [23] chose to allow only stationary situations. However, the model has a wide acceptance among the workers in this field, and we shall see that the majority of the properties of the trained ‘bel-canto’ voice can be explained by the use of this model.
3. Spectral characteristics of the trained singing voice From such works as those of Bartholomew [10], Wolf et at. [21], Stout [24], Bjørklund [23] and Rzhevkin [25], the following spectral characteristics of the trained voice emerge: (a) the trained voice is richer in higher harmonics than the untrained voice; (b) the trained voice has a regular vibrato in pitch and intensity, and therefore in timbre, which occurs at a frequency characteristic of the singer. The usual vibrato frequency lies between 6 and 8 I-k, but 5 Hz and up to 13 Hz have been reported [26]; (c) in both male and female singers, there appears an extra formant, the so-called ‘singing formant’, which has an (ensemble) average frequency of 2800 Hz. It remains relatively fixed in frequency, while the vowel formants change; (d) in male singers, a further ‘singing formant’ appears, in the region of 500 Hz: this can be more variable than the upper singing fonnant, since its position depends to some extent on the frequency of the first (lowest) formant of the sung vowels [24]. It is possible that this is true only for a relatively long-time average of the spectra. Fig. 5 (after Bjørklund) illustrates how the harmonic content of the voice can change with training. Fig. 6 (after Bartholomew) illustrates the spectra of the extremes of the vibrato. Fig. 7 (after Rzhevkin) clearly shows the low and high singing formant regions of a distinguished Russian bass. Fry and Manén [27] have shown that the vocal spectra depend on laryngeal position and shape. If one exclaims ‘ah’, the laryngeal position in the throat is almost that for relaxed silent respiration, and lies midway between that for exclaiming ‘ee’ (higher) and ‘oo’ (lower). Further, in the exclamation of ‘ee’, the laryngeal diameter is less than for the exclamation of ‘ah’. (Manen [20] notes that the singer has to take care that the change for ‘oh’ and ‘oo’ really happens at the level of the larynx, and that these vowels are not produced by the lips”.)t The lower ‘oo’ position of the larynx is associated with dark (sombre) timbres of the voice, while the ‘ee’ (higher) position and natural narrowing of the To be expected, from ordinary radiation theory, and the fact that they were using total sound intensity levels. Time-averaged harmonic componentswere initially determined by Björklund in this study; eventually, the spectra were actually’determined for various times and the anharmonic components became evident 1571. t For the model adopted in this article, these statements mean: “the laryngealposition changes theformants”. Asseen later, the larynx of itself cannot produce a vowel, *
Gi Troup, The physics of the singing voice
387
a No
lj
i
50-
-
training b
I I
i..lIil.. No training
iii ~
I 0 0—
I
-
J
j Ii 262 523 1046
I
~—
Hz
4186
I Fig. 6. Changes in the vocal spectrum during the vibrato of a good baritone (after Bartholomew).
I 50-
d
I
4 months training
0’~.~—I
50
5 years training
-
.1 06
2
~l
‘5111
A
101
~ -Il R so-
____
I
-
0
10-
FRED tkHzI
I 04
/
0-2
famous artist
-
I
2
I 1
2
VOWEL ‘A’,FORTE ~“
Fig, 5. Effect of training on the harmonic Content of the voice (after Bjørklund).
I 3
4kHz
—259 Hz ‘-—--288Hz
Fig. 7. ‘High’ and ‘low’ singing formant regions in the voice of an excellent Russian bass (after Rhzevkin).
larynx are associated with bright timbres. The corresponding spectra for some vowels sung with the appropriate laryngeal position are shown in fig. 8 (after Fry and Manén). A further characteristic of the trained versus the untrained voice is the ability to produce a greater total intensity. Wolf et at. [21] give data on a student baritone, first in the beginners stage, then after 4 months of training: the maximum intensity over the vocal range had increased by 5—8 db. Wolf et at. obtained a ‘reference curve’ for male and female singers by averaging the maximum intensity of a number of trained singers over the appropriate vocal ranges. The student baritone was still at least 7 db below this ‘reference curve’ in the middle of his range, and 15 db below at the extremes of his range.
388
G.J. Troup, The physics of the singing voice 30’
IaI
20 10
~ L~ ~Ii~Ii1F~1~.
30
80 70
(bI
tEE.
Fij’,darnent.at freq..iency 250 Hz Articulated vowel, I) IaI”ee’ Laryngeal position Cagressive mood”I,Ibl “ah laryngeal positi~’i(“joyful mood” I (cI “00” laryngeal position (“fearful mood”I Fig. 8. The effect of laryngeal position (or ‘mood’) on the vocal spectrum (after Fry and Manén).
I
0~ 3
C
C
2*
C 2 l~ SINC~NC PITCH
C
C
C
I
Fig. 9. Maximum intensity curves for ‘ah’ and ‘cc’ (after Wolf, Stanley and Sette).
The work of Wolf et a!. [22] and of Stout [24] shows a smooth increase in the maximum intensity of the voice with pitch on all vowels for the trained singer. The maximum intensity achievable on ‘ee’ and ‘oo’ is less than ‘ah’; further, for baritones, it was noticed that for A below middle C (C = 261.6 Hz) there was a sudden drop in intensity for ‘ee’. This occurred also for a good contralto, an octave higher, The effect was not consistently present in the same singers, and was accompanied by a noticeable change in vowel quality. Here we have an experimental verification of the dictum of many singing teachers that ‘ee’ is a ‘treacherous vowel~.*Curves for ‘ah’ and ‘ee’ (after Wolf et al.) are shown in fig. 9. Stout [24]studied the harmonic structure of vowels in relation to pitch and intensity, and was led to a most important concept, that of the ‘functional singing area’ of a singer. This area is described by a graph of total intensity against fundamental frequency, on which the notes and intensities that a singer can produce fall in an enclosed part of the total area. One such graph (after Stout) for a trained singer is shown in fig. 10. While there does not appear to be published data on the functional singing area of untrained singers, it must certainly be less, because of: (a) smaller pitch range; (b) smaller maximum intensity. The functional singing area experimentally verifies the subjective judgement that the intensity range of which a voice is capable is larger in the middle of the pitch range than at the extremes. Stout found that the effect of increase in intensity on harmonic structure was generally to increase the harmonics above 1800 Hz in intensity more than those below 1800 Hz for all (Italian) vowels. 1800 Hz was chosen as there is a large spectral minimum in this region. The fundamental remained almost unchanged with increase in total energy for ‘ah’: there was some increase for ‘oo’ and more for ‘ee’. The effect of pitch on harmonic content, intensity held constant, was such that as the pitch was *
The first formant for ‘Ce’ lies in the 200—30() Hz region. The poor performance doubtless comes about when the first formant matches the
fundamental frequency of the note to be sung: hence a change in articulation becomes necessary, to move the first formant away from the fundamental,
GJ, Troup, The physics of the singing voice
389
‘AH’ +3g.
-10
,
F “
I
I
A
C
E
GABbO
ci
—.
— ~i
C) C)
—
“‘N
~
F
~
FREQUENCY
A
‘N
r”i .
‘N
i
ci
Z
Fig. 10. ‘Functional singing area’ for a trained singer (after Stout). Note the effect of different vowels.
raised, the energy shifted from the high frequency regions to the tower, and that the fundamental appeared to absorb a considerable part of the shifted energy. It is opportune here to consider what happens when the first formant frequency falls below the fundamental frequency of the note being sung. A study by Sundberg [28]showed that either the internal shape of the pharynx is changed, so that the first fonnant frequency is closer to the fundamental frequency; or the jaw opening is increased, again having the effect of bringing the first formant frequency closer to the fundamental frequency. Consequently, the vowel is ‘modified’ (the formants are changed) and there can be some sacrifice of intelligibility for tone quality. That some singers carry this to the extremes is well known to all of us; but the advice of singing teachers to ‘modify the vowel’ or to ‘open the mouth more’ on high notes has a good acoustic foundation. Sundberg [59]has elicited that the male singer must open his mouth to push the formant frequency away from the first formant. When the first fonnant is close to the fundamental frequency, it is impossible for the male to sing the desired fundamental. This is because the damping in the formant peaks for the male vocal tract is less than in the female, and consequently the coupling between the resonant system of the oscillator (larynx) and vocal tract is greater. The welt-known phenomena associated with two closely-coupled resonant circuits therefore exist for the male, but not for the female, because the extra damping reduces the coupling in the latter case. Referring back to the functional singing area, it is not only true that the notable singer will have a greater intensity range at a given pitch than a singer of average ability or an untrained one, but that his control of intensity will be better. Sacerdote [29] shows recordings of the total intensity vs. time of the tenors Tito Schipa and Beniamino Gigli singing the same passage from an aria from Massenet’s ‘Manon Lescaut’. Schipa shows a logarithmic drop in intensity of ‘—‘30 db which is almost linear with time over 4~ seconds; Gigli shows a similar curve, with the drop being ‘—.20 db over the same period of time. In contrast, what Sacerdote calls “a common professional tenor” (!) can only manage a somewhat irregular decrease of ‘—10db over 3~seconds. Finally, we note the effect of training on the vibrato [23,30]. In the beginner, it is usually absent; after some months of training, it has developed, but is usually irregular as to frequency, and as to pitch
390
G.J. Troup, The physics of the singing voice
excursion. Eventually it is regular in both parameters, and the pitch excursion can become to some extent under the control of the highly trained singers, sometimes showing a variation over a whole tone. at others over less than a semitone,
4. Vocal intensity, subglottal pressure, and air flow relationships One of the first reported measurements on the trained vs. the untrained (throaty-voiced) singer was that of air flow vs. intensity at the same pitch, made by Stanley [31] in 1931. His findings are shown in fig. 11. Essentially, the performer had to sing wearing a ‘gas-mask’ or respirator, connected to a spirometer. Subsequently, measurements of subglottal pressure, air flow and vocal intensity were made by Rubin et a!. [32]. The subglottal pressure was measured by a hypodermic needle introduced into the trachea; intra-oesophagal pressure was measured (in the initial phases of the work) by a small latex balloon attached to the end of a polythene catheter passed through the nose into the oesophagus; and air flow was measured using a face-mask. The findings of Rubin et at. [32] are in reasonable agreement with those of Stanley [31].As was to be expected, as intensity was increased at constant pitch, the subglottal pressure increased moderately or markedly, while the air flow remained constant, increased only slightly, or fell (for the trained subjects). For increase in pitch at constant intensity, the air-flow fell but the subglottal pressure rose (this is to be expected, since the glottis becomes smaller, and vocal cords more tense). When the subjects changed from a correct (efficient) production to a breathy (inefficient) one, the air-flow increased markedly: if the sound level remained the same, the subglottal pressure increased, but if the subglottal pressure remained the same, the sound level decreased.
I
I
)‘
Intensity
I
I
~
Intensity
mf
f
mf C)
E 0 .0
Fig. 11. Air flow vs. intensity for various subjects: A, ‘throaty’ voice; B, C, early and later training stages; D, E, fully trained subject (after Stanley).
G.J. Troup, The physics of the singing voice
391
Tenseness of the larynx (as when running out of brea’th and attempting to maintain tone by contracting the throat) results in maintaining roughly the same air-flow and intensity, but there is a marked increase in sub-glottal pressure. A poor quality is obtained: the intensity level and the vibrato become irregular. These experiments verify the necessity for proper breath support and controlled breath release, and for the proper relaxation of the throat. These results, in relation to the characteristics of the voice enumerated in section 3, will be discussed later in section 5. While on the subject of breath flow, it is worthwhile mentioning that two independent studies [33,34] have shown that voice training improves the functioning of the lungs and the ventilatory capacity. In particular, the vital capacity (the volume of air that can be expired after a maximum inspiration, or the volume inspired after a maximum expiration) of trained singers is well above average, and the residual volume (the volume remaining in the lungs at the end of a voluntary expiration) is well below average. The bellows clearly become very efficient!
5. The source spectrum of the singing voice It must be reiterated here that speech vowels formant characteristics are well known (see e.g. Chiba [13],Fant [141 and Ladefoged [35])and that confirmation of the calculations and models used has been made experimentally in vivo [16].Individual variations of the formant frequencies give rise to individual voice quality and to regional and national speech accents. On the basis of the linear acoustic voice model of section 2, it is clear that the richness in higher hartnonics of the trained voice in comparison with the untrained must at least be partly due to a difference in source spectra. Sundberg [36] also examined most carefully the source spectra of trained singers to see whether the higher (—‘-2800 Hz) ‘singing formant’ could perhaps be partly due to a peak in the source spectrum in this region. As will be seen below, this is not a particularly likely hypothesis, even though meriting investigation. Further, if the formant characteristics of the vocal tract do not change during a sung vowel, it is clear that the vibrato must have its origin in the source spectrum. The results of the analyses of the high-speed motion pictures of the vocal cords taken by the Bell Laboratories [11] and by Fletcher [12] are now briefly summarized. Without this material, a thorough understanding of the source spectra is impossible. The vocal cords behave generally with pitch change as follows. At very low pitches, the vocal cords are short and thick, and almost their whole mass vibrates. Many singers claim that they can sing one or two tones lower than their normal pitch range in the early morning; this is because the cords are more charged with fluid, as are all the membranes of the head and neck after a night’s repose in bed, and the ‘dehydration’ to a normal situation takes about 3 hours [37].As the pitch rises, the vocal cords elongate and become thinner, and the vibration moves closer to the edges of the cords. Once the maximum elongation of the cords has been reached (or before, if the performer wishes), they commence to vibrate in a new mode: more and more of the cords at the back of the throat remain stationary, and less and less of the cords towards the front of the throat vibrate as the pitch rises. This different mode of operation of the cords we choose in this article to call falsetto: it is the mode of the very high-pitched singing used by counter-tenors, for example, and in the male has a characteristic timbre different from the other vibrational mode. There is also a greater breath-flow associate with the falsetto mode. In the ideal situation for the ‘normal’ mode, the cords come together (the glottal area is zero) for a time, then open, letting through a puff of air: the Bernouilli force of this then close the cords, according
392
G.J. Troup, The physics of the singing voice
to the myoelastic aero-dynamic theory of van den Berg [16]. Depending on the individual, the cords may not close completely, or may close for a shorter or longer part of a pitch cycle. Fletcher [12]found that the element of vibratory motion most consistently associated with intensity of the voice was the closed phase of the cycle of cord vibration. To quote from his thesis: the more effective control of intensity is exemplified by a second mechanism. Here, the laryngeal adjustments provide for the increased velocity of the ‘puffs’ for the higher vocal intensities, without augmenting the size of aperture. In the present study, this latter pattern was found to be consistent at all pitch levels for only the one subject of the group who was trained in singing”. The lecture notes, accompanying the Bell Laboratories film, ‘High Speed Motion Pictures of the Human Vocal Cords’ [11], make the observation that the closure time per cycle is greater for trained than for untrained voices. * To quote: at high intensities. two important differences appear (for trained voices). First, the closure time per cycle of cord movement is greater than for untrained voices and second, the displacement or amplitude of cord vibration is smaller than for the untrained voice in the production of a sound of similar intensity. it is this ability to better control the flow of air which enables the trained voice to radiate a greater amount of sound power than the untrained voice or to radiate an equal amount of sound power with a lower air flow”. Since Fletcher [12]found no consistent relationship between amplitude of cord displacement and vocal intensity, these two independent observations on the vocal cord behaviour of trained and untrained voices are in excellent agreement. Flanagan [38] took Fletcher’s glottal area vs. time data [12], and by applying fairly simple aerodynamic considerations, was able to deduce volume-velocity curves. The leading and trailing edges of the velocity wave are slightly steeper than the area function because of the nonlinear resistance of the glottal orifice. The other method of obtaining the waveform of the voice source is that of inverse filtering. The technique assumes that the transmission characteristics of the vocal tract are known for the vowels being produced, so that the correct inverse transformation may be made. While it is accepted that these transmission characteristics are extremely well known on average, the result of inverse filtering is also therefore subject to some loss of information. Since a particular subject will have some departures from the average vocal tract characteristics, we will not arrive exactly at his correct source spectrum, using this technique. The source waveforms deduced by Miller [39],Martonyi [40],Seymour [22]and Sundberg [36]do not differ in gross characteristics from those deduced by Flanagan [38]. A commonly occurring waveform seen by both techniques is roughly triangular in shape, with a sharper trailing edge than leading edge: such waveforms have also been presented by Chiba [13] and Fant [14]. However, the high-frequency ripple apparent on the leading and trailing edges of Flanagan’s waveforms are not present on those deduced using inverse filtering,* * presumably because of the afore-mentioned averaging effect. Following Flanagan [38], let us idealize the glottal volume-velocity waveform by a symmetrical triangular wave, as shown in fig. 12; and suppose that the mean volume flow and fundamental frequency are kept constant, while the closure time per cycle is varied (the departure from symmetrical triangular form in some cases is not large). The Fourier transform for one cycle is “.
“.
. .
. .
. .
. -
2(wa/2) F’~,W)~ sin(wa/2)2 — —
* The closure time per cycle is, from fig. 12, (T — 2a). or the time for which. during a fundamental pitch cycle of the vocal chords, they are closed, If we know the network that produces, from a given harmonic generator, a certain waveform, we can design the (inverse)network that will, from the known output waveform, produce the (unknown) generator waveform. Reciprocity and passivity!
G,J. Troup, The physics of the singing voice
393
f(t
~T/2 -a
0
a
1/2
Fourier transform and
Ftc.))
s~ctrumof above waveform
°b T
iir a
.~i a
ilur I
Fig. 12. Idealized glottal puff waveform (after Flanagan).
where 2a is the open time for the cycle of period T It is therefore obvious that as a decreases, the number of higher harmonics in the main lobe of F(w) will increase, and further that their ratio to the DC (constant velocity component) will become greater: the widths of the smaller lobes also increase. In the limiting case of the triangle becoming a Dirac 8-function, the spectrum becomes a ‘Dirac comb’ (that is, an infinite harmonic series of Dirac 8-functions of the same height (strength)). While the above treatment is certainly idealized, presupposing extremely sharp corners of the waveform at initiation and peak, it makes it difficult to see how a constant, comparatively high energy component in the region of 2800 Hz (for the upper singing formant) could arise. In fact, it is the zeros of the glottal waveform which can almost negate the presence of vocal tract formants [38]! It is therefore clear that the richness in higher harmonics of the trained voice is at least partly brought about by: (a) the fact that the cords are fully closed during the closed part of the cycle; and (b) that this closure time is comparatively large with respect to one period of cord vibration. This is brought about in the proper training of singers by insisting on breath control trying to use as little of the inhaled breath reservoir as possible in singing a musical phrase, without constricting the throat or larynx in any way. The increase in closure time as the intensity rises also explains the greater rise in energy in the higher harmonics of the voice fundamental, as discussed by Stout [24], section 3. In a private communication, Prof. Sundberg has suggested that it may be the closure rate per cycle (shifting the isosceles triangle model of Flanagan [38] towards a saw-tooth waveform) that is improved by training, rather than the closure time. An increase in higher harmonics still results. In a very thorough investigation, Sundberg [36] used a combination of inverse filtering, and the ‘analysis by synthesis’ technique, to study the source spectra of three trained (bass) singers’ sung vowels, —
394
G.J. Troup, The physics of the singing voice
and of their vowels in normal and in loud speech. In the ‘analysis by synthesis’ procedure, a formant synthesiser (consisting of 5 LCR circuits with buffer amplifiers between them) was used, together with, as source, a pulse train having a spectrum envelope slope of 12 db per octave (the validity of this will be discussed later). The source signal was modulated by a low-frequency sine-wave with adjustable amplitude and frequency, so as to simulate the vibrato. Because of the problems associated with the ‘analysis by synthesis’ technique e.g. the very severe demands on the sound reproducing system, and the difficulty of estimating the formant frequencies in the case of high-pitched tones the results were presented in the form of average source spectra derived from several vowels. An interesting finding was (to quote) “the spectrograms of the sung vowels revealed that both subjects’ low-pitched notes (fundamental <100 Hz) and the ‘a’ of the light voice were nasalized. Extra-spectrum envelope peaks near 0.2 kHz and between 1 and 2 kHz, and abnormally wide bandwidths were taken as evidence for this. Failures to obtain normal glottograms (source waveforms)* from inverse filtering of these vowels supported the same conclusion. Mainly owing to the lack of detailed knowledge about the transfer functions of nasalized vowels, these vowels were excluded from the material”. (Thus the effects of the ‘nasal shunt’ (section 1) and any others of the sinuses cannot yet be considered well known.) Sundberg found that the (envelope of the) source-spectrum slope above 1 kI-Iz was about 12 db/octave, almost irrespective of fundamental pitch or loudness variations. This agreed with the findings of Lindquist [41], who studied the source spectra of trained and untrained voices by inverse filtering. He found that the source of an untrained voice is efficient only within a limited range of fundamental frequency and intensity. This range is expanded for the trained voice, and the source waveforms do not vary as much as those of the untrained voice when pitch or intensity is changed. Sundberg concludes that “the development of the voice timbre in a vocal training would then be a matter of learning a special articulation rather than having the vocal cords to vibrate in a special way”. This can only be partly true, in view of the findings of the Bell Laboratories [11] and of Fletcher [12] and Flanagan [38].Sundberg also found that the slope of the source spectrum envelope in singing and in the loud speech of the trained singers was very similar. This is to be expected, since the tendency of a trained singer when asked to phonate loudly on an extended vowel is to use ‘singing mode’ rather than ‘normal speech mode’, since the former is less strained than the latter in this case, and is easier for him. Hicks and Troup [42] have pointed out that if the total source spectra are normalized to the DC component (constant pressure or volume velocity), then the untrained voice must always come out as less efficient than the trained voice. A ‘too breathy’ voice is readily detected, because of the ‘hiss’ of the too high DC component, and because some of the DC component excites modes of the vocal tract, as in whispering, when the glottis is open, and the vocal cords do not vibrate. Hicks and Troup re-analyzed the data of Flanagan [38],who did not give values for the DC component, and their results supported their thesis. The basic effects of vocal training on the voice source are therefore to increase the efficiency and higher harmonic content by ensuring that the vocal cords are properly closed during their closure phase, and by increasing the closure time per cycle. The slope of the source spectrum envelope becomes relatively constant at ‘—12 db/octave above 1 kHz, irrespective of pitch or intensity variation. Some untrained subjects have a speech source spectrum fall of 18 db/octave, and an increase in intensity of untrained speech is sometimes accompanied by a shift in the slope of the entire source spectrum envelope. As presented in Fant [14], the low pitch, low volume source spectrum derived from the Bell —
—
*
Explanation added (G.J.T.).
G,J. Troup, The physics of the singing voice
395
Laboratories’ film has a slope above 1 kHz of —18 db per octave, while the low-pitch, high volume source spectrum has moved to —12 db/octave. In a private communication, Prof. Sundberg has suggested that singers learn to avoid changes of voice source spectrum with articulation and pitch, so that the range of normal phonation is expanded, to the extremes of pitch and loudness. Provided the speech is efficient, in terms of breath use and ‘open throat’, then it seems reasonable that the nonsinger’s source spectrum should be close to that of the trained singer.
6. The origin of the upper singing formant Bartholomew [10] suggested that the upper singing formant had its origin in the larynx, by acoustic reflection between the slit between the stretched cords (the glottis) at the bottom of the laryngeal chamber, and the rim formed by the top edges of the epiglottis and the aretyno-epiglottic fold. He obtained a damped vibration of the appropriate high frequency by blowing a completely excised calf larynx, and stated that “although the shape and dimensions of the calf-larynx are different from the human one, the differences are not so marked as to invalidate the testimony” (of the results). Sundberg [28, 36, 43] very carefully investigated in the singing formant, and also came to the conclusion that it had its origin in the region below the epiglottis; however, his interpretation differs from Bartholomew’s. Sundberg pointed out features of the laryngeal anatomy which could give rise to the formant. Above the vocal cords (at the bottom of the larynx tube, which is about 2 cm long in male subjects) are two folds, often called the ‘false vocal cords’. There is a cavity between these and the true cords, called the laryngeal ventricle, or the sinus Morgagni. The larynx is vertically and eccentrically inserted into the larger pharynx tube, which is closed off at its lower end by the mouth of the oesophagus, at approximately the level of the vocal cords. The low end of the pharynx tube thus surrounds the larynx tube, and is divided into two pockets, left and right of the larynx tube, called the sinus piriformes. Lateral and frontal X-ray tomograms were taken of the laryngeal region of a singer phonating the vowels ‘a’ and ‘i’ with a raised and a lowered larynx. The lowered laryngeal position corresponds to darker timbres of the voice, and is deliberately used in what is known as ‘covered’ singing the deliberate darkening of an ‘a’ by mixing some ‘o’ with it, for example. In the lowered laryngeal position, both the sinus piriformes and the sinus Morgagni are expanded. Such a larynx lowering also increases the length of the pharynx tube, and widens the pharynx cavity. By simulating these effects with mechanical models designed from the appropriate calculations based on the larynx anatomy, Sundberg [43] concluded that when the larynx tube opening is less than one-sixth the area of the pharynx cross-section, the larynx tube can act as a separate resonator. It is basically this separate resonator, together with a co-operative behaviour of the sinus Morgagni and the sinus piriformes, which give rise to the higher singing formant, according to Sundberg. He concludes that a good singer needs a wide pharynx (agreed on by all good authorities on singing) and a large sinus Morgagni: the predominance of large sinus Morgagni in singers had previously been observed by Flach [44]. It is interesting to compare one of Sundberg’s conclusions with the known laryngeal behaviour of two noted singers. Sundberg [43]states: “Particularly at higher pitches where the larynx tube opening is wide, it would be essential to acquire a wide pharynx. It was mentioned that an increase in pitch is associated with an increase in larynx tube opening. This increase has to be compensated by a widening of the sinus Morgagni if the larynx tube —
396
G.J. Troup. The physics of the singing voice
resonance is to remain at 2.8 kHz. A contribution to the widening required is provided by the stretching of the vocal folds constituting the bottom of the sinus Morgagni. But probably additional compensation is required as well. If so, the singer will have to lower his larynx more and more the higher the tones he sings”. * Such a laryngeal lowering with pitch increase has been verified by Ruth [45], who investigated the laryngeal positions of a number of singers by X-ray tomography. The larynx of the distinguished soprano Elizabeth Grümmer dropped 8 mm in going from C4 (—‘-262 Hz) to ‘high’ C~(——932Hz). The following statement is quoted by Foster [46] about the tenor John McCormack: “The late Dr. Herbert Marks told me that when observing John McCormack’s throat with the laryngoscope, the larynx gradually receded as he ascended the scale until the vocal cords disappeared below the line of sight”. While nothing is said in Ruth’s work [45] about the presence of the singing formant, the laryngeal behavior suggested by Sundberg is certainly consistent with that of distinguished singers. Hence one is inclined to accept Sundberg’s thesis as to the origin of the singing formant as correct.
7. The vibrato The vibrato of trained singers has been the object of long and considerable study, but conclusions as as to its mechanism and origin have not been uniform. Therefore this review is restricted to those works which point the way, or tend to, some measure of agreement. All studies are agreed that untrained voices generally have little or no vibrato, and that the vibrato develops in regularity and extent with training (see e.g. Seashore [47], Wolf et al. [21] and Bjorklund [23]). The (trained) vibrato usually occurs in pitch, intensity and timbre, although Tiffin [48] reported vibratos with no intensity variation. The typical frequency of the vibrato is —6 Hz, though vibratos ranging from 5 Hz to 13Hz have been reported [26]. The vibrato frequency is characteristic of the individual singer, though, as we shall see later, synchronisation of vibratos can occur. On the basis of the simple model of section 2, a vibrato in pitch, total intensity and timbre could occur purely by a change in the fundamental frequency of the vibrator, all other parameters remaining constant. If the source spectrum remains the same, a vibrato in timbre and total intensity (and perhaps in pitch perception [47]) could occur if the formants (resonances) are regularly changed. In fact, the mouth-opening of certain singers (a lower-jaw wobble) is regularly correlated with a perceptible vibrato. Seashore [47]who with co-workers extensively studied the vibrato in the 1930’s, states: “There is no reason for thinking there is a single way of producing the vibrato”. In fact, Zemlin, Mason and Holstead [49]suggested that at low pitch levels, the vibrato originated at the larynx (i.e. that the vocal cords were stretched and contracted to give the pitch variation); that at high pitch levels, the vocal tract itself was modulated (i.e., the formants changed); and that possibly, at medium pitch levels, there was a combination of both mechanisms. Mason and Zemlin [50] had found in an earlier study, using electromyographical methods (the electrical study of muscle contraction via the action potential), that the action of the cricothyroid muscle, which lengthens the vocal cords, had an inphase relationship with the vibrato. This was borne out by the later work [49]. Many studies (see e.g. Seashore [47], Stanley [51], Wolf et al. [21]) have shown that trained singers make changes of pitch and/or intensity “on the vibrato”; that is, on the maximum or minimum of the pitch cycle. In some early studies by Shoen [51]and Stanley [29],the maxima of both the pitch cycle and *
Emphasis added (G.J.T.).
G.J. Troup, The physics of the singing voice
397
of the intensity cycle were found to coincide. However, Sacerdote [29] found professional subjects in which the intensity increase was accompanied by a pitch decrease, such that the two cycles were 180° out of phase. No other phase differences were observed. On the basis of deducing the 180°phase difference, Weiss postulated that the vibrato was due to the contrary action of the muscles of inspiration and of expiration during controlled expiration: however, the muscle action potential recorded over a respiratory muscle at the 8th intercostal space* by Mason and Zemlin [50] revealed no activity correlated with the vibrato. Nor was there any such correlation in the action of an expiratory muscle. But E.C. Smith [52]found that “When the subglottic air pressure within the trachea is directly affected by expiratory muscles, controlled pressure applied to four expiratory muscles does result in an alteration of the ratio of the muscle action potential to the rate of vocal vibrato during the act of singing” (the respiratory muscles referred to were those of the stomach). Fournier [53]and Deutsch and Clarkson [54]have proposed that the vibrato is a means of automatic regulation, i.e. part of a ‘control ioop’, of which the ear, which almost exclusively controls the voice, is the sensor. Both Sacerdote [29] and M. Smith [55]report experiments in which the subject is fed an
audio signal loud enough to mask the ‘internal’ signal of the voice in the ear. The rate and extent of vibrato are affected by experiments in delayed feedback [29], and other techniques which alter pitch-discrimination, such as “straight tone” (no vibrato) feedback [55] filtered white noise, or a frequency-modulated tone whose rhythm is significantly different from the singers’ natural vibrato [29].
In all cases, the singers’ vibrato is affected, being either reduced, or becoming less regular and more broad. If the frequency-modulated signal is too different in rhythm from the natural one, the singer is confused, and attempts to sing independently. If the rhythm is fairly close to the natural rhythm, synchronization occurs. It may well be that the pleasing or less pleasing quality of harmony in a vocal duet, for example, depends upon whether or not the vibratos of the singers synchronise. This does not appear to have been investigated. Sacerdote [29]points out that the “control ioop” theory is consistent with what is known about the time of delayed speech feedback, about 0.15 sec, corresponding to 6.6 Hz. Hearing persistence is about 1/7 sec. The average vibrato rate is about 6Hz, though 5—13 Hz have been reported [26].Slower vibrato
rates are sensed as disagreeable [22,29]. It is also interesting that the singers’ breath flow decreases about 10% if the vibrato is suppressed [58]. In vibrato, the intrinsic laryngeal musculature and respiratory musculature are working and resting. In ‘straight-tone’ production, the musculature is constantly working. Such a condition would promote a greater average glottal resistance than occurs in vibrato and could therefore account for the difference in airflow rate [58]. The development in vibrato is nevertheless linked to the development of breath control, and the ‘freeing’ of the larynx which this promotes. The vibrato rhythm is also a possible a-rhythm frequency for the brain, and studies are in progress [56] to determine whether there is any correlation or synchronism between a-rhythm and vibrato rate. 8. Conclusions and discussion The first conclusion that can be drawn from all the preceding is that the schematic linear model which has served so well for the analysis and synthesis of speech, also serves for the analysis of the *
Starting from the top (first) rib, the first intercostal space will occur between the first and second ribs, and so on.
398
GJ. Troup, The physics of the singing voice
singing voice in the majority of respects, when the appropriate changes are made e.g., the addition of the upper singing formant. —
We may draw the following conclusions about the effects of training on the singing voice. (1) The lungs and the thoracic and abdominal musculature come under better control; the efficiency of operation increases, and a physical enlargement of the lungs and chest can result, as well as an increase in vital capacity. In terms of our model, the ‘bellows’ become larger, more efficiently operated, and under better control. (2) The vocal cords become more efficient, and the valvular action either in or below the larynx also changes in such a way as to increase the efficiency of the source. (Longer, complete, closure phase for the cords: ability to radiate a higher sound level for the same air flow.) This more efficient source also becomes frequency modulated; the modulation is to some extent controllable, and can be eliminated at will. Synchronization with external frequency-modulated sound is also possible, and probably occurs unconsciously. More work needs to be done in this area. (3) The trained singer becomes capable of enlarging and lengthening his lower pharynx, and the sinus Morgagni (laryngeal ventricles) become larger. The soft palate also tends to be lifted.* Thus the whole of the vocal tract volume increases, the walls become somewhat stiffer, and therefore the whole system becomes more resonant (the losses decrease). The vowel formants used in speech serve quite well to describe sung vowels, except (a) when the fundamental pitch rises above the first formant; here, the jaw is dropped to raise the first formant towards the fundamental frequency. (b) Very low notes appear to be ‘nasalized’ (at least for basses), and more work needs to be done on the role of the paranasal, nasal and frontal sinuses in singing voice production. In this regard, it is interesting to note that children with a history of continual head colds are often thought to be ‘tone deaf’: when the appropriate high frequency feedback is externally applied, they can be taught to sing in tune. Thus, if nothing else, the sinuses form a part of the feedback network, and also part of the necessary sensory network to monitor correct singing voice production. (4) The fact that Sundberg’s [43]attribution of the larynx as the source of the singing fonnant allows the prediction of actual laryngeal behaviour in notable singers tends to confirm this thesis. Perhaps more work needs to be done here also. (5) It would seem that the vibrato originates as part of a feedback control mechanism; frequency modulation of the laryngeal level is involved, and also perhaps modulation of the internal (as opposed to the obvious external) vocal tract parameters. There is room for further work on the topic. While this review was written mainly with physicists in mind, it is perhaps well to ask, what conclusions can be drawn from this study for the singer or, more importantly, the aspiring singer? The first is that any singer or teacher of singing who ignores breath control, proper physical support of the breath, and making the larynx more efficient by ensuring a proper closure of the vocal cords, does so at his peril. The second is that the lowering of the larynx, the widening of the lower pharynx, and the dropping of the lower jaw are necessary acoustically for the proper production of high notes. Finally, it is clear that the trained singer has gained (or should have gained!) control over a most complex and marvellous instrument, which this discussion has necessarily dissected and simplified. One can only marvel at the beauty and the control of such singers as (for example) Montserrat Caballe and Tito Gobbi, to name only two of the many illustrious singers which we can be privileged to hear, and whose technique can be said to derive from Renaissance Florence.
*
This action lowers the larynx.
GJ. Troup, The physics of the singing voice
399
9. Acknowledgements The author is indebted to Mr. R.G. Turner, Physics Department, Monash University, Clayton, Victoria, Australia, for many helpful discussions. Grateful thanks are due to Prof. R. Pratesi, Director of the Laboratorio di Elettronica Quantistica del Consigio Nazionale delle Ricerche, Via Panciatichi 56/30, Florence, Italy, where this article was written ‘in santa pace’ during a period of study leave. Without the continual stimulation and encouragement of Miss Joan Arnold, Director of the Melba Memorial Conservatorium of Music, Abbotsford, Victoria, Australia, this study would not have been made. Maestro L. Sifônia, of the Conservatono Statale “Luigi Cherubini”, kindly looked at part of this manuscript and gave advice. I am greatly indebted to Professor J. Sundberg, of the Department of Speech Communications, Royal Institute of Technology, Stockholm, Sweden, for most generous correspondence, and for reading a draft version of this article, and making comments: I am sure any differences could be cleared up by a speech communication session! The MS was typed by Gabriella Vagnarelli with considerable patience, kindness and understanding.
7
11
Fig. A2, Schematic positions of 1, frontal sinuses; 2, maxillary sinuses; 3, ethmoidal sinuses,
16 2
14
Fig. Al. Schematic side view of the vocal tract and associated features: 1, Frontal sinus; 2, Turbinates (conchae); 3, Sphenoidal sinus; 4, Naso.pharynx; 5, Hard palate; 6, Soft palate and uvula; 7, L~owerjaw bone; 8, Tongue; 9, Hyoid bone; 10, Vertebrae; 11, Epiglottis; 12, Oesophagus; 13, Larynx and upper trachea; 14, Thyroid cartilage (“Adam’s apple”); 15, True vocal cords (‘folds’); 16, False vocal cords.
1 I
6
Fig. A3. Schematic view of larynx from the front, showing 1, jaw level; 2, thryroid cartilage; 3, false vocal cords; 4, laryngeal ventricles or ‘sinus Morgagni’; 5, true vocal cords; 6, trachea.
400
G.J. Troup, The physics of the singing voice
Appendix Three schematic diagrams to aid the understanding of the vocal instrument are given here: they are very much self-explanatory. More detailed diagrams and discussion will be found, for example, in Vennard [18] or Manén [20]. A comment is in order with regard to the frontal, ethmoidal and maxillary sinuses. These communicate directly with the nasal cavities by means of small holes, and also with each other. It is extremely difficult to show their three-dimensional structure by means of line drawings. Though almost symmetrically placed with respect to the median line, the left-hand ones are not necessarily equal in size or shape to the right-hand ones. Three frontal sinuses may even occur in some subjects.
References [1] See, e.g. D. Galliver, The vocal technique of Caccini. in: Poesia e musica nell’estetica del XVI e XVII secolo, p. 17; or: e.g. Oxford Companion to Music. [21 Stillman Drake, Scientific American 232 (1975) 98. [310. Caccini, Le nuove musiche, collez.; Le musiche antiche, Raccolta Nazionale (1919), Istituto Edit. Italiano, Milano. [41Joseph Warton, Works of Alexander Pope (1797). [51Rev. William Stukely, Diary (1720) 18 April. [6] M. Garcia, Proc. Roy. Soc. (London) VII (1855) 13. [7]M. Garcia, Traité Complet de l’Art dii Chant (Heugel Ct Cie., Paris, 1911). [81C. Wheatstone, London and Westminster Review. October 1837. [91H.L.F. Helmholtz, On the sensations of Tone (Tr. A.J. Ellis) (Dover, New York, 1954) p. 105. [101~.T. Bartholomew, J. Acous. Soc. Am. 6 (1934) 25. [11] Lecture notes accompanying Bell Telephone Laboratories’ film “High speed pictures of the Ruman vocal cords”, Bureau of publication. 463 west St. New York, 1940; D.w. Farnsworth, Bell Labs. Record 18 (1940) 203. [12] W.V.’. Fletcher, A study of internal laryngeal activity in relation to vocal intensity, Ph.D. Thesis (1950) Northwestern University. Evanston, Illinois, U.S.A. [13] T. Chiba and M, Kajiyama, The vowel, its nature and structure (Phonetic Society of Japan, Tokio, 1958). [14] G. Fant, Acoustic theory of speech production (‘s Gravenhage, Mouton; Paris, 1960). [15] J.L. Flanagan, Speech analysis, synthesis and perception (2nd ed., Berlin, Springer, 1972). [161J. van den Berg, Journ. speech and Hearing Res. 1 (1958) 227. [17] iL. Flanagan, in: Auditory Analysis and Perception of Speech, eds. G. Fant and M.A.A. Tatham (London, Academic Press. 1975). [18] W. Vennard, Singing — the mechanism and the technic (C. Fischer, New York, 1967). [19]G.E. White, Sinus Tone Production (Tuplinger, New York, 1970). [201Lucie Manén, The Art of Singing (Faber Music Ltd., London, England, 1974). [2115K. Wolf, D. Stanley and Wi. Sette, Journ. Acous. Soc. Am. 6 (1953) 255. [22] J. Seymour, Acustica 27 (1972) 203. [23] A. Bjørklund, Journ. Acous. Soc. Am. 33 (1961) 575; See also CS. McGinnis, M. Elmick and M. Kraichman, Journ. Acous. Soc. Am. 23(1951) 4.40. [24] B. Stout, Journ. Acous. Soc. Am. 10 (1938) 137. [25]SN. Rzhevkin, Soviet Physics Acoustics 2 (1956) 215. [261D.A. Weiss, Wiener Med. wochenschr. 80 (1930) 35. [271D.B. Fry and Lucie Manén, Joum. Acous. Soc. Am. 29 (1957) 690. [281J. Sundberg, Acustica 32 (1975) 89. [29] G.G. Sacerdote, Acustica 7 (1957) 61. [30] J. Tiffin, University of Iowa Studies of Psychology of Music 1(1932) 134. [31] D. Stanley, Journal of the Franklin Institute 211(1931) 405. [32] Hi. Rubin, M. Le Cover and V.’. Vennard, Folia phoniat. 19 (1967) 393. [331Wi. Gould and H. Okamura, Folia phoniat. 26 (1974) 275. [34] J. Large, NATS Bulletin (February/March 1971) 34. [351P. Ladefoged, Elements of acoustic phonetics (University of Chicago Press, Chicago, U.S.A.. 1962).
G.J. Troup, The physics of the singing voice
401
[36] J. Sundberg, Folia phoniat. 25 (1973) 71; J. Sundberg, STL-QPSR No. 4 (1970) 21. [37] 1 sin indebted to Professor Archie Mcintyre, Professor of Physiology, Monash University, Melbourne, Australia, for this and other clarifications, [381J.L. Flanagan, Journ. Speech and Hearing Res. 1(1958) 99. [39] R.L. Miller, Journ. Acous, Soc. Am. 31(1959) 667. [40] J. Màrtony, STL-QPSR No. 1(1965) 4. [41] J. Lindquist, STL-QPSR No. 1(1970)3. [42] P.R. Hicks and G.J. Troup, Folia phoniat. (in press). [43] J. Sundberg, Journ. Acoust. Soc. Am. 55 (1974) 838. [44] M. Flach, Folia phoniat. 16 (1964) 67. [45] W. Ruth, NATS Bulletin (1963) May, 2. [46] Roland Foster, Vocal Success (W.H. Paling and Co., Sydney, Australia, 1960) p. 61. [47] C, Seashore, University of Iowa Studies in the Psychology of Music 3 (1936) 70, 157; Psychology of Music (McGraw.Hill, New York, U.S.A., 1938). [48] J.H. Tiffin, Proc. Iowa Academic Science 35(1928) 298. [49] W.R. Zemlin, R.M. Mason and L. Hoistead, NATS Bulletin (December 1971) 22. [50] R.M. Mason and W.R. Zemlin, NATS Bulletin (February 1966) 12. [51] D. Stanley, The Science of Voice (Carl Fischer, New York, 1929). [52] E.C. Smith, NATS Bulletin (May/June 1970) 2. [53]I.E. Fournier, L’Acustique musicale. La voix (Maloine, Paris, 1953). [54] iA. Deutsch and J.K. Clarkson, Nature 183 (1959) 167. [55]M. Smith, NATS Bulletin (May/June 1972) 28. [56] K. Ng and G.J. Troup, work in progress. [57] G. Fant, STL-OPSR No. 2—3 (1980) 17—37. [58] 1. Large and S. Iwata, NA’I’S Bulletin 32 (1976) 42. [59] iL. Sundberg. STL-QPSR No. 1(1979) 65.