ARTICLE IN PRESS Flow Glottogram and Subglottal Pressure Relationship in Singers and Untrained Voices *,†Johan Sundberg, *†Stockholm, Sweden Summary: This article combines results from three earlier investigations of the glottal voice source during phonation at varying degrees of vocal loudness (1) in five classically trained baritone singers (Sundberg et al., 1999), (2) in 15 female and 14 male untrained voices (Sundberg et al., 2005), and (3) in voices rated as hyperfunctional by an expert panel (Millgård et al., 2015). Voice source data were obtained by inverse filtering. Associated subglottal pressures were estimated from oral pressure during the occlusion for the consonant /p/. Five flow glottogram parameters, (1) maximum flow declination rate (MFDR), (2) peak-to-peak pulse amplitude, (3) level difference between the first and the second harmonics of the voice source, (4) closed quotient, and (5) normalized amplitude quotient, were averaged across the singer subjects and related to associated MFDR values. Strong, quantitative relations, expressed as equations, are found between subglottal pressure and MFDR and between MFDR and each of the other flow glottogram parameters. The values for the untrained voices, as well as those for the voices rated as hyperfunctional, deviate systematically from the values derived from the equations. Key Words: Inverse filter–Subglottal pressure–Flow glottogram–F0–Gender.
INTRODUCTION The glottal voice source, that is, the pulsating glottal airflow produced when the vocal folds vibrate, represents an essential aspect of both singing and speech. Moreover, by listening to the voice quality, it is generally easy to decide if a voice is hyperfunctional or not. This implies that the acoustic effects of such phonation are generated by certain voice source characteristics. Consequently, voice source analysis should allow identification of these characteristics. Vocal fold vibration can be analyzed by means of various methods. Of these, electroglottography (EGG) and high-speed imaging are commonly used. However, the voice source is only indirectly and nonlinearly related to vocal fold vibration, but can be derived from the radiated sound by inverse filtering implying elimination of the effects of the vocal tract sound transfer characteristics. In the past, much research has been spent on voice source analysis by inverse filtering. Lindqvist-Gauffin1 published a set of flow glottograms representing a male speaker’s entire vocal range. A great improvement of the inverse filtering technique was achieved by analyzing the flow rather than the audio signal, thus eliminating the problematic need for large amplification at infra-frequencies.2 Holmberg and associates published studies where flow glottogram parameters for loud, middle, and soft voice phonation at low, middle, and high fundamental frequencies (henceforth F0) were analyzed in untrained female and male speakers.3,4 Sulter and Wit5 studied voice source differences associated with gender, vocal training, and subglottal pressure (henceforth PSub). They found few significant differences between trained and untrained voices, and several differences between Accepted for publication March 31, 2017. From the *Department of Speech Music Hearing, School of Computer Science and Communication, KTH, Stockholm, Sweden; and the †University College of Music Education, Stockholm, Sweden. Address correspondence and reprint requests to Department of Speech Music Hearing, KTH, SE 10044 Stockholm, Sweden. E-mail:
[email protected] Journal of Voice, Vol. ■■, No. ■■, pp. ■■-■■ 0892-1997 © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jvoice.2017.03.024
female and male voices. They also observed several differences associated with vocal loudness. Alku and Vilkman6 studied the effects on the voice source of phonation type in female and male voices. They found that the types were most clearly separated by parameters related to the maximum flow amplitude and the maximum flow declination rate (henceforth MFDR), that is, the negative peak of the derivative of the glottal flow. Also, Airas and Alku7 studied effects of phonation type and found that they could best be differentiated by the normalized amplitude quotient (henceforth NAQ), defined as the ratio between the peakto-peak pulse amplitude (henceforth AmplAC) and the product of period and MFDR. Ratings of phonatory pressedness have been shown to be related to a number of flow glottogram parameters.8,9 Limitations of the inverse filtering method have been studied in several articles.10–13 The results have shown that inverse filtering does not allow conclusions on the underlying glottal area functions, because of the nonlinear relationship between glottal area and glottal airflow.12,14,15 Also, it is well known that inverse filtering becomes unreliable for high F0.13 A complication in attempts to identify voice source properties that are correlated with phonatory hyperfunction or with the emotional color of an utterance is that most of these properties are strongly correlated with each other7,16; hyperfunction and emotional color show a strong correlation with a number of flow glottogram parameters. To improve the understanding of the voice source, it would be necessary to analyze these correlations. What is needed is analysis of flow glottograms produced at different F0 and with different PSub. Also, it would be desirable to examine voice samples produced with the same phonation type at different F0 and with different PSub. This condition may be difficult to meet for nonsingers, because they are likely to change phonation type as they change pitch or vocal loudness. Classically trained singers, by contrast, should be able to vary both F0 and PSub without changing phonation type; shifting phonation type when singing more loudly or when singing high-pitched tones is generally considered a sign of poor vocal technique. Hence, internationally
ARTICLE IN PRESS 2 recognized singers would be the optimal subject group in investigations aiming at analyzing the interrelations between flow glottogram parameters. Because inverse filtering is unreliable at high F0, baritone and bass singers are preferable as subjects. The present study combines data from three earlier flow glottogram investigations (1) of five internationally touring baritone or bass singers;17 (2) of 14 male and 15 female untrained voices;18 and (3) of male voices rated along a visual analog scale representing degree of hyperfunction.8 The effects of PSub on flow glottogram parameters are analyzed and expressed in terms of equations for the classically trained singers. The flow glottogram parameters of the untrained voices and the voices with varied degree of hyperfunction are compared with those of the singers. METHOD Detailed accounts of the recording procedures have been published in the earlier articles. In the investigations of professional baritone singers and of untrained female and male voices,17,18 henceforth referred to as studies 1 and 2, respectively, the subjects sang the syllable /pV/ at different F0. In study 1, five classically trained, internationally touring baritone singers sang diminuendo sequences on the syllables /pa/ and /pae/ at a low, a medium, and a high F0 (140, 200, and 280 Hz, approximately), starting from loudest. These F0 values roughly corresponded to 25%, 50%, and 75% of their professional F0 range. Syllable duration was about 500 ms. The protocol in study 2 was similar to that of study 1. The subjects were asked to pronounce strings of the syllable /pae:/ while gradually decreasing degree of vocal loudness, from loudest possible to softest possible, and keeping F0 as constant as possible. The subjects, 15 females and 14 males, all had untrained voices. At the time of the experiment, all had healthy voices. In the investigation of the relationship between flow glottogram properties and perceived degree of hyperfunction, henceforth study 3, five female and six male subjects were asked to sing the syllable /pae/, with a hyperfunctional or a neutral type of phonation
Journal of Voice, Vol. ■■, No. ■■, 2017
and, for some subjects, also in flow phonation. The last mentioned type of phonation is defined as phonation with the weakest degree of glottal adduction that produces full glottal closure.8 The syllables were then rated for hyperfunctionality on a visual analog scale by an expert panel consisting of 16 speech and language pathologists, all with professional experience of voice. In studies 1 and 2, airflow was picked up by means of a Rothenberg flow mask.2 Psub was estimated from the oral pressure during the /p/ occlusions. This pressure was captured by a pressure transducer (Glottal Enterprises, Syracuse, NY) attached to a thin plastic tube inserted into the flow mask, the end of which the subject held in the corner of the mouth. Flow and pressure signals were recorded on separate tracks on a multichannel TEAC PCM recorder (TEAC, Tokyo, Japan) together with the audio signal. Calibration signals of flow, pressure, and sound level were all recorded on the same tape. For the inverse filtering, a mid-vowel section of the vowel was selected for analysis. The filter settings were adjusted manually, using as the criteria a ripple-free closed phase and a source spectrum envelope as free as possible of dips and peaks near the formant frequencies. In studies 1 and 2, a Glottal Enterprises MSIF-2 unit was used, and in study 3, a custom-made DeCap software was used (Svante Granqvist, KTH, available at www.tolvan.com, last inspected March 2017). From the flow glottograms, the following parameters were determined: period, relative duration of the closed phase (QClosed), AmplAC, MFDR, the level difference between partials 1 and 2 of the source spectrum (H1-H2), and the nondimensional NAQ, see Figure 1. The intersubject scatter of these different measures has been reported elsewhere.17,18 The relationship between PSub preceding the analyzed flow glottogram and the various parameters of that flow glottogram was analyzed; for each F0 and each singer, 10 approximately equidistant PSub values were identified in the diminuendo sequences on /pae/ and /pa/, and their covariation with the associated flow glottogram parameters was examined.
FIGURE 1. Flow glottogram, its derivative, and spectrum (upper and lower right panels and left panels, respectively) illustrating the definition of flow glottogram measures. AmplAC is the peak-to-peak amplitude of the flow pulse, and H1 and H2 are the two lowest partials of the voice source spectrum.
ARTICLE IN PRESS Johan Sundberg
Voice Source and Subglottal Pressure Variation
RESULTS Singer data Figure 2 shows the relationship between SPL (sound pressure level) and PSub for each of the five singers at each of the three F0 analyzed in study 1. The relationship is close to logarithmic. Moreover, the data points pertaining to the three F0 show a similar trend. For 10 cm H2O, the singers produced an SPL in the vicinity of 75 dB at 0.3 m, and a doubling of pressure produced a level increase close to 10 dB. As can be seen in the same figure, the PSub range varied considerably between the singers, singer 1 having the widest range, between 3 and 51 cm H2O, and singer 4 having the narrowest, between 6 and 26 cm H2O. This difference may reflect different singing techniques or different mechanical properties of the phonatory apparatus. SPL is heavily influenced by the first formant. Therefore, MFDR should be a more important voice source characteristic to examine, representing the amplitude of the vocal tract excitation. The relationship between MFDR and PSub for the singers is shown in Figure 3, and the equations for the trendlines for each F0 and each singer are listed in Table 1. The table also lists the ranges of PSub in terms of the ratio between the highest and the lowest pressures for a given F0. Correlation coefficients are quite high, varying between 0.966 and 0.699, the lowest value being due to outlier values obtained for the highest PSub values. Thus, most of the MFDR variation can be explained by the variation of PSub. The slope values vary among the singers, being highest for singer 1 and lowest for singer 5. These singers also produced the highest and the lowest MFDR values, respectively. In the table, the rows marked MV refer to the trendlines for all subjects’ values pooled. For the singers’ low, middle, and high
3
F0, 10 cm H2O increase of PSub yielded an increase of 594, 789, and 456 L/s2 increase of MFDR. Next, the relationships between MFDR and the different flow glottogram parameters will be analyzed. This will shed some light on how the waveform of flow glottograms varies with PSub. As the PSub range differed among the singers, the values were normalized with respect to each singer’s personal pressure range at each F0. Thus, for each F0 and each singer, 10 tokens, approximately equidistantly spaced along the PSub continuum, were selected from the diminuendo sequences sung on /pae/ and /pa/. Flow glottogram parameters were then determined for each percentage decade of PSub and averaged across singers. The averages thus obtained show the average of the flow glottogram parameter associated with each percentage decade of PSub. Figure 4 shows these averages of flow glottogram parameters as a function of the average of MFDR. For both AmplAC and H1-H2, the dependence on MFDR is almost identical for the three F0 levels. QClosed and NAQ show covariations that are similar, but not identical, for the low, the middle, and the high F0. As can be seen in the figure, the relationships between MFDR and these averaged flow glottogram parameters can be rather closely approximated by means of equations. These equations are listed in Table 2. R2 for NAQ at high F0 was much lower than in the other cases, owing to a few outlier data points. For the other cases, the values of R2 and η2 are quite high, varying between 0.821 and 0.996. This implies a very strong influence of MFDR on the analyzed flow glottogram parameters. It should be kept in mind that these equations refer to professional, classically trained baritone singers. They show how flow glottogram parameters are interrelated in voices that can
FIGURE 2. Mean SPL of the vowels /a/ and /ae/ as a function of subglottal pressure for the low, the middle, and the high F0 (open squares, circles, and triangles, respectively) as produced by the five singers.
ARTICLE IN PRESS 4
Journal of Voice, Vol. ■■, No. ■■, 2017
FIGURE 3. MFDR, multiplied by −1,0, for the vowels /a/ and /ae/ as a function of PSub. Left, middle, and right panels refer to low, middle, and high F0. Top, middle, and bottom rows of panels show data for singers, and for male and female untrained voices, respectively. The equations refer to the trendlines, shown as the dashed lines.
be assumed to retain the same type of phonation in the F0 range and PSub range considered. In other words, deviations from the equations in Table 2 may reflect deviations from constant type of phonation. This possibility will be explored in the following section. Nonsinger data As mentioned, study 2 concerned voice source properties in 15 females and 14 males, none of whom had had any formal voice training. A protocol similar to that analyzed in study 1 was recorded (/pV/ strings on one pitch with diminuendo from loudest possible to softest possible). In Figure 3, MFDR for these voices is plotted as a function of PSub in the same manner as for the singers. For the low and middle F0, the female voices had MFDR ranges that were narrower than for the singers. For the high F0, the pressure ranges for the untrained voices were much narrower than for the singers. Thus, at high F0, the singers could use much higher pressures and produced higher MFDR values than the untrained voices.
The figure also shows the trendlines for these relations for the entire group of subjects. The trendline data for the individual voices can be compared with those of the singers in Table 1. The correlation coefficients were comparable with those of the singers’, although those of the untrained female voices were actually somewhat higher than those of the singers. The slope values for the untrained male voices were similar to those of the singers, but lower for the untrained female voices’ low and middle F0. The PSub range ratios for the singers were between 7 and 8, whereas for the female and male untrained voices, they were only between 4 and 5. Thus, the singers used clearly wider PSub ranges. The flow glottogram data for the untrained voices were processed in the same way as for the singers. Thus, after determining the PSub, normalized with respect to each subject’s individual PSub range at the respective F0 and expressed in percent of this range, the flow glottogram parameters were averaged across subjects for each percent decade. The averages thus obtained were compared with the trendline values at the nonsingers’ respective
ARTICLE IN PRESS Johan Sundberg
5
Voice Source and Subglottal Pressure Variation
TABLE 1. Factors, Intercepts, and Correlations (Slope, Icpt, and r) for the Trendlines for MFDR as a Function of PSub Found for the Singers and for the Untrained Female (F) and Male (M) Voices at Low, Middle, and High F0 Low F0
Middle F0
r
Psub range ratio
Slope [L/s2/cm H2O]
Itcpt [L/s2]
High F0
r
Psub range ratio
Slope [L/s2/cm H2O]
Itcpt [L/s2]
r
Psub range ratio
Subject
Slope [L/s2/cm H2O]
Singer 1 Singer 2 Singer 3 Singer 4 Singer 5 MV SD
131,0 46,2 55,7 51,4 31,3 63,1 39,0
−260 −7 70 355 281 88 244
0,980 0,820 0,818 0,903 0,699 0,844 0,105
8,3 5,8 4,6 7,3 14,1 8,0 3,7
121 56 55 86 55 75 29
−528 −266 −89 −121 −58 −212 194
0,966 0,918 0,963 0,942 0,851 0,928 0,047
7,1 5,9 6,7 5,7 9,6 7,0 1,5
55,4 47,2 41,6 58,2 32,4 47,0 10,5
−406 −370 −65 −63 52 −170 205
0,734 0,886 0,893 0,909 0,893 0,863 0,073
7,5 8,3 6,0 6,7 6,8 7,1 0,9
F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15 MV SD
100,4 125,5 77,2 40,5 67,6 95,5 44,5 64,3 51,7 53,5 58,7 58,5 71,4 86,3 75,4 71,4 23,0
−363 −477 0 −8 −195 −241 −33 −135 −43 −63 −217 −23 −137,9 −446 −166 −170 156
−129 −239 0 28 2 −154 −45 −333 −34 −127 29 −230 −108 −218 72 −99 119 −188 40 0 −537 −139 229 −360 −309 −174 65 −298 −176 −180 −190 −158 195
108,3 87,6 119,6 96,9 58,1 126,6 108,9
−486 −151 −380 −327 113 −465 −364
0,961 0,908 0,973 0,946 0,955 0,931 0,963 0,962 0,976 0,957 0,823 0,958 0,959 0,977 0,768 0,934 0,060 7,4 0,949 0,908 0,967 0,975 0,846 0,970 0,958
3,6 4,0 5,5 4,8 7,4 4,2 6,4 3,7 5,5 3,1 2,4 5,4
79 42 119 145 83 77 138 89 71 55 124 69 78 79 89 31
4,5 4,6 3,5 6,2 5,1 4,1 7,8 7,1 5,5 2,9 3,7 3,4 4,1 6,3 5,0 4,9 1,4 3 6,3 3,3 3,4 3,2 6,2 7,0 4,5 3,5 5,8 8,0 6,5 3,8 5,5 7,1 5,3 1,7
−233 −143 0 −190 −77 −121 −64 −256 −148 −286 79 −58 −140 −266 131 −118 123
117 −3 0 −302 127 168 −55 −125 −46 247 −213 −150 −61 −108 −29 153
0,884 0,923 0,953 0,855 0,908 0,973 0,953 0,970 0,955 0,878 0,768 0,976 0,974 0,983 0,799 0,917 0,067 7,8 0,967 0,644 0,942 0,975 0,935 0,892 0,967 0,951 0,986 0,879 0,954 0,944 0,948 0,966 0,925 0,086
71,5 38,2 49,8 61,4 56,5 55,3 36,1 62,2 49,3 57,6 12,2 42,9 42,9 36,3 30,3 46,8 15,0
37,2 95,4 100,8 137,1 56,9 53,8 85,5 73,4 60,3 10,4 97,6 56,0 64,0 64,4 70,9 31,0
2,7 4,4 3,2 6,3 3,8 3,8 7,3 4,9 3,9 5,6 3,3 7,2 7,2 5,6 3,4 4,8 1,6 3 5,2 3,7 3,3 2,5 6,4 6,1 3,3 2,7 3,7 5,1 3,1 3,5 1,9 4,0 3,9 1,3
56 78 87 37 55 65 34 85 51 52 24 75 51 52 32 56 19
M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 MV SD
0,945 0,944 0,980 0,942 0,993 0,975 0,975 0,978 0,959 0,975 0,983 0,953 0,976 0,958 0,962 0,967 0,016 7,3 0,857 0,883 0,961 0,984 0,944 0,831 0,911 0,872 0,808 0,554 0,961 0,972 0,948 0,975 0,890 0,112
63,0 131,4 94,4 63,9 101,0 46,9 92,8 27,4
−207 −309 −180 −85 −366 −99 −254 172
0,990 0,902 0,968 0,958 0,961 0,937 0,945 0,039
6,2 5,7 4,8 5,7 8,7 2,6 5,2 1,6
Itcpt [L/s2]
3,7 4,3 4,6 1,3 2 3,8 5,7 5,8 3,2 6,5 5,2 3,5
The PSub range ratio is the ratio between the highest and the lowest PSub value that the singer used for the respective F0. MV and SD refer to mean and the standard deviation of the subjects’ data pooled.
MFDR values. Figure 5 shows how much the nonsinger data deviated from the singers’ trendlines. The data are plotted as a function of MFDR. The data for the untrained voices show a covariation with MFDR, which is similar to that found for the singers. The untrained males produced higher AmplAC values than the singers,
whereas the females produced much lower values. The H1-H2 parameter was lower for the untrained voices’ lowest and middle F0, but larger for their high F0, particularly for the females. QClosed for low and middle F0 tended to be somewhat higher than for the singers, but for the high F0 it was considerably lower. With regard to the NAQ parameter, the values for the untrained voices
ARTICLE IN PRESS 6
Journal of Voice, Vol. ■■, No. ■■, 2017
FIGURE 4. The indicated flow glottogram parameters, averaged across the five singers, and plotted as a function of MFDR for low, middle, and high F0 (squares, circles, and triangles). The equations pertain to the trendlines. are higher than for the singers, particularly for high F0, whereas for the low F0 the differences are small.
low in hyperfunction had higher AmplAC, whereas the samples rated as more hyperfunctional had lower AmplAC.
Data for hyperfunctional voices As the stimuli in study 3 were produced with widely and voluntarily varied degree of pressedness but with uncontrolled F0, comparison with the singers’ averages was meaningful only for the two flow glottogram parameters that showed no variation with F0, that is, AmplAC and H1-H2. From the measured MFDR values of each stimulus in study 3, an AmplAC value and an H1-H2 value were computed. Then, the differences between the values measured in study 3 and the values derived from the trendlines were plotted as a function of the median ratings of hyperfunction. Only the AmplAC values showed a somewhat systematic variation with the ratings (correlation: 0.583), see Figure 6. Thus, as compared with the singers’ average curve, voice samples rated as
DISCUSSION As was shown in studies 1 and 2, PSub heavily influences the voice source, and the purpose of those studies was to analyze this influence. There, PSub was given in terms of the relative excess pressure, which relates PSub values to the phonation threshold pressure.19 In the present investigation, PSub was either given as raw values or normalized with respect to each singer’s pressure range at the F0 concerned. Using raw PSub allows direct comparison with some other investigations. Figure 2 showed that for low, middle, and high F0, the singers produced a mean SPL@ 0.3 m of 76, 76, and 78 dB with 10 cm H2O, and doubling of PSub yielded an average increase of 10, 10, and 9 dB. These values are similar to those reported for untrained voices.20,21 However,
TABLE 2. Equations and Determination Coefficients (R2) for the Trendlines Approximating the Relationship Between Mean MFDR and the Mean of the Indicated Flow Glottogram Parameters Found for the Singers’ Low, Middle, and High F0 R2 AmplAC low, middle, and high F0 H1-H2 low, middle, and high F0
= −1*10 *MFDR + 0.0006*MFDR + 0.059 = 551.1*MFDR – 0.62
0.964 0.904
NAQ high F0 NAQ mid F0 NAQ low F0
= 1,0453*MFDR−0,28 = 1,2283*MFDR−0,341 = 1,3132*MFDR−0,393
0,563 0.898 0,821
−7
2
η2 QClosed high F0 QClosed mid F0 QClosed low F0
= 0.47 − e(−0.0036*MFDR + 0.001) = 0.50 − e(−0.0036*MFDR + 0.001) = 0.53 − e(−0.0036*MFDR + 0.001)
0.991 0.987 0.996
ARTICLE IN PRESS Johan Sundberg
Voice Source and Subglottal Pressure Variation
7
FIGURE 5. Averages of the indicated flow glottogram parameters as a function of MFDR. Open and filled symbols refer to untrained female and males voices, respectively, and squares, circles, and triangles refer to low, middle, and high F0. In the upper panels, the dashed curves represent the trendlines for the singer data. In the lower panels, trendlines for the singers’ low, middle, and high F0 are represented by solid, dashed, and dotted curves, respectively. for an unknown reason, they differ from those reported by Titze and Sundberg.22 The normalized pressure values are linearly related with the relative excess pressure values. Therefore, the results presented here do not provide new information about how PSub variation affects the voice source. The main objective of the present study was to analyze how measures of the flow glottograms vary under conditions of changing PSub and at different F0. In this analysis, it seemed best to choose MFDR as the independent parameter, as it is strongly and linearly related to PSub, as was shown in Table 1, and represents the amplitude of the excitation of the vocal tract. Also, according to Alku and Vilkman,6 the glottal closing and partic-
ularly its relation to Ampl AC is sensitive to phonatory hyperfunction. PSub was estimated from the oral pressure during the /p/ occlusion preceding the analyzed vowel in a diminuendo sequence of /pæ/ syllables. It is possible that the lung pressure decreased gradually during the sequence. If so, the PSub measured was somewhat higher than the pressure during the section of the vowel that was analyzed. The oral pressure decrease between adjacent /p/ occlusions was typically close to 10%, so the overestimation of PSub was probably quite small. Also, the focus of this investigation was the analysis of the relationship between flow glottogram parameters and MFDR.
FIGURE 6. Deviations from singers’ relationship between mean AmplAC and mean MFDR, plotted as a function of visual analog scale ratings of hyperfunction in study 3. The dashed line and the equation pertain to the trendline.
ARTICLE IN PRESS 8 The fact that the singers’ MFDR was strongly and linearly correlated with PSub suggests that mostly the singers used PSub only for the purpose of varying vocal loudness. The same is not necessarily true for other types of phonation. If glottal adduction is increased, PSub needs to be increased, other things being equal. As can be seen in Table 1, the correlation for untrained subject M10 was quite low at low F0. The reason was that in loud phonation, his MFDR did not increase with PSub. The quantitative relations found between mean MFDR and the means of AmplAC, H1-H2, QClosed, and NAQ seem to reflect limitations of how the waveform of a flow glottogram can vary. The relation between MFDR and QClosed is an example. As shown in Figures 4 and 5, QClosed increases rapidly for low values of MFDR and saturates at high values; in other words, the closed phase grows quickly for low values of PSub. This increase most likely results from an earlier moment of vocal fold contact, that is, from a more sudden closure, or from a later onset of the open phase. However, the fact that QClosed increased with MFDR implies that the moment of vocal fold contact is happening earlier in time; a delay of the onset of the open phase would not affect MFDR. Another example is that MFDR tended to increase linearly with PSub. This means that MFDR increased also after the closed phase had ceased to expand, that is, in the upper part of the PSub range. The reason was that the AmplAC and also the pulse skewing continued to increase, demonstrating that EGG-based measures of contact quotient are insufficient to predict voice source characteristics such as MFDR and AmplAC. A third example is that the combination of decrease of QClosed and increase of MFDR was not observed. This combination would have implied that decrease of the closed phase would have been associated with an increase of pulse skewing, which apparently did not happen in these classically trained singers. NAQ basically reflects glottal abduction; it decreases as glottal adduction is increased.23 Its decrease with increasing MFDR is expected, as it is defined as the ratio between AmplAC and MFDR, normalized with respect to period. Interestingly, this normalization seemed to cause NAQ to vary with F0, thus suggesting that the relation between AmplAC and MFDR does not vary so much with F0. This was observed also by Björkner.24 Hence, both MFDR and F0 need to be taken into account when interpreting NAQ values. The same is true also for QClosed, which is a rarely realized fact in voice research. The data for the singers and the untrained voices showed similar but not identical relationships with MFDR. A major difference was that the singers used a much wider range of PSub. This allowed them to produce higher MFDR values and hence louder sound. It should be kept in mind that the highest PSub that human adults can produce are approximately 10 times higher than those typically used for phonation. Thus, there would be no difficulty for any of the untrained subjects to produce PSub values comparable with those used by the singers. Hence, differences in maximum expiratory force cannot be the reason for the differences in PSub range. Probably, singers’ extensive practicing of loud singing may change the mechanical properties of their vocal folds such that they can be driven with higher PSub. Gender differences were found in all parameters studied. The male voices were observed to reach higher MFDR and higher
Journal of Voice, Vol. ■■, No. ■■, 2017
AmplAC than the female voices, thus corroborating the findings of Holmberg and associates.3,4 The low AmplAC observed for the female voices would be a consequence of their short vocal folds, because, as pointed out by Sulter and Wit,5 a short glottis, compared with a long glottis, will allow less airflow to pass. The greater H1-H2 values in the female subjects may be a register phenomenon; their mean high F0 was about 420 Hz, which typically is within the female falsetto range. Also, the low QClosed at high F0, also observed by Price,25 may be related to register. Some clear differences between the singers and the untrained voices were observed. The untrained male voices had greater AmplAC than the singers. Thus, to reach a given MFDR value, the untrained male voices needed higher AmplAC than the singers. Probably, this difference reflects a slower glottal closing in the untrained male voices. Further, for a given MFDR, QClosed showed higher values for the low F0 than for the high F0. This may be owing to vocal fold thickness26; for the same arytenoid adduction, thick vocal folds, compared with thin folds, should have a greater phase difference between the lower and the upper margins of the folds. If the phase lag is great, the opening of the glottis will be delayed by the late opening of the upper margin, and the open phase will be interrupted by the early closing of the lower margin. Also, for high F0, both male and female untrained voices reached much lower values of QClosed than did the singers. A possible reason would be that the singers produced the high F0 with slightly more adduction and thicker vocal folds than did the nonsingers. An important goal of voice source analysis is to construct a method for deriving a measure of glottal adduction from acoustic data, a key aspect of healthy phonatory behavior. The results presented here can be seen as a step toward this goal. However, it was based on the assumption that classically trained singers have learned to vary vocal loudness and F0 without changing glottal adduction. There is no formal evidence supporting this assumption, except for the common view that in this type of vocal art, singer voices are not allowed to always sound pressed when singing loudly or in the upper part of their pitch range. This means that the reference, with which the untrained and pressed voices were compared, was perceptual rather than physiological. For example, it is possible that singers need to increase glottal adduction at high pitches to retain the same voice timbre as at lower pitches. Still, in the absence of a complete understanding of the glottal adduction mechanism, a perceptual standard seems like a reasonable alternative. Comparing hyperfunctional voices with the singer voices was feasible only with respect to AmplAC and H1-H2, because of the limited material. The results suggested that this parameter may be revealing, provided that MFDR is taken into account. Analysis of a greater collection of hyperfunctional voices seems indicated, including also other flow glottogram parameters other than those considered in the present investigation. However, under any condition, it seems necessary to take into account both F0 and MFDR in analyses of voice source properties. CONCLUSIONS Combining results of three independent investigations of flow glottograms suggests that
ARTICLE IN PRESS Johan Sundberg
Voice Source and Subglottal Pressure Variation
(1) There are strong interdependences between the flow glottogram parameters and the MFDR and F0 measures. (2) For both singers and nonsingers, PSub was strongly and linearly related to MFDR, which, in turn, was strongly related to the flow glottogram parameters AmplAC, H1H2, QClosed, and NAQ. (3) Classically trained baritone singers can use higher PSub values at high F0 than untrained subjects. (4) The singers’ MFDR values extend higher than those observed in untrained voices. (5) For the singers, the relationships between MFDR and the source parameters could be expressed with equations, which for AmplAC and H1-H2 was similar for low, middle, and high F0 and which for QClosed and NAQ was different for low, middle, and high F0. (6) Compared with the baritone singers, untrained subjects produced different values of mean AmplAC (higher for males, lower for females), mean H1-H2 (smaller for low F0, greater for high F0), mean QClosed (lower for high F0), and mean NAQ (higher for low, medium, and high F0). This suggests that in general, compared with baritones, for high F0, nonsingers use less glottal adduction (QClosed less, AmplAC more, H1-H2 greater), and for low F0, they use higher adduction (QClosed higher, AmplAC higher, H1H2 lower). (7) Compared with the baritone singers, hyperfunctional voices appear to have lower AmplAC.
REFERENCES 1. Lindqvist-Gauffin J. The voice source studied by means of inverse filtering. STL QPSR. 1970;11:3–9. 2. Rothenberg M. A new inverse-filtering technique for deriving the glottal air flow waveform during voicing. J Acoust Soc Am. 1973;53:1632–1654. 3. Holmberg EB, Hillman RE, Perkell JS. Glottal airflow and transglottal air pressure measurements for male and female speakers in soft, normal, and loud voice. J Acoust Soc Am. 1988;84:511–529. 4. Holmberg EB, Hillman RE, Perkell JS. Glottal airflow and transglottal air pressure measurements for male and female speakers in low, normal, and high pitch. J Voice. 1989;3:294–305. 5. Sulter AM, Wit HP. Glottal volume velocity waveform characteristics in subjects with and without vocal training, related to gender, sound intensity, fundamental frequency, and age. J Acoust Soc Am. 1996;100:3360–3373.
9
6. Alku P, Vilkman E. A comparison of glottal voice source quantification parameters in breathy, normal and pressed phonation of female and male speakers. Folia Phoniatr Logop. 1996;48:240–254. 7. Airas M, Alku P. Comparison of multiple voice source parameters in different phonation types. In: Proceedings of the 8th Annual Conference of the International Speech Communication Association, Interspeech (2007). Antwerp, Belgium: 2007:1410–1413. 8. Millgård M, Fors T, Sundberg J. Flow glottogram characteristics and perceived degree of phonatory pressedness. J Voice. 2015;30:287–292. 9. Sundberg J, Thalén M, Alku P, et al. Estimating perceived phonatory pressedness in singing from flow glottograms. J Voice. 2004;18:56–62. 10. Rothenberg M, Zahorian S. Nonlinear inverse filtering technique for estimating the glottal-area waveform. J Acoust Soc Am. 1977;61:1063–1071. 11. Fant G. Glottal flow: models and interaction. J Phon. 1986;14:393–399. 12. Fant G. Some problems in voice source analysis. Speech Commun. 1993;13:7–22. 13. Arroabarren I, Carlosena A. Inverse filtering in singing voice: a critical analysis. IEEE Trans Audio Speech Lang Process 2006;14:1422–1431. 14. Rothenberg M. Acoustic interaction between the glottal source and the vocal tract. In: Stevens KN, Hirano M, eds. Vocal Fold Physiology. Tokyo: University of Tokyo Press; 1980:305–328. 15. Titze IR. Nonlinear source–filter coupling in phonation: theory. J Acoust Soc Am. 2008;123:2733–2749. 16. Sundberg J, Patel S, Björkner E, et al. Interdependencies among voice source parameters in emotional speech. IEEE Trans Affect Comput. 2010;2:162–174. 17. Sundberg J, Andersson M, Hultqvist C. Effects of subglottal pressure variation on professional baritone singers’ voice sources. J Acoust Soc Am. 1999;105:1965–1971. 18. Sundberg J, Fahlstedt JE, Morell A. Effects on the glottal voice source of vocal loudness variation in untrained female and male subjects. J Acoust Soc Am. 2005;117:879–885. 19. Titze IR. Phonation threshold pressure: a missing link in glottal aerodynamics. J Acoust Soc Am. 1992;91:2926–2935. 20. Schutte HK. The Efficiency of Voice Production. Groningen: State University Hospital; 1980. 21. Björklund S, Sundberg J. Relationship between subglottal pressure and sound pressure level in untrained voices. J Voice. 2015;30:15–20. 22. Titze IR, Sundberg J. Vocal intensity in speakers and singers. J Acoust Soc Am. 1992;91:2936–2946. 23. Alku P, Bäckström T, Vilkman E. Normalized amplitude quotient for parametrization of the glottal flow. J Acoust Soc Am. 2002;112:701–710. 24. Björkner E. Musical theater and opera singing—why so different? A study of subglottal pressure, voices source, and formant frequency characteristics. J Voice. 2006;22:533–540. 25. Price PJ. Male and female voice source characteristics: inverse filtering results. Speech Commun. 1989;8:261–277. 26. Sundberg J, Högset C. Voice source differences between falsetto and modal registers in counter tenor, tenor, and baritone singers. Logop Phoniatr Vocol. 2001;26:26–36.