Journal of Phonetics (1978) 6, 69-76
An investigation of speaker height and weight identification by means of direct estimations Norman J. Lass, Amy S. Beverly, Debra K. Nicosia and Laurel A. Simpson West Virginia University, Morgantown, West Virginia 26506, U.S.A. Received 19th January 1977
Abstract:
The purposes of this investigation were: (1) to determine if listeners were capable of making accurate direct estimations of speakers' heights and weights from recorded speech samples, and (2) to determine the importance of the sex of the speaker and listener on the speaker height and weight identification tasks. A standard prose passage was recorded by 30 speakers, 15 females and 15 males. A master tape containing the randomly arranged recorded readings of all speakers was played to a group of 40 judges, 20 females and 20 males, for speaker height and weight identification. All subjects participated in two experimental sessions, one for height judgments and one for weight judgments. The judges were asked to estimate the height and weight of each of the speakers on the master tape. Results indicate that listeners are capable of accurately identifying the approximate heights and weights of speakers. Moreover, the sex of the speaker and listener did not significantly affect speaker height and weight identification judgments. Implications of these findings and suggestions for future research are discussed .
Introduction On the basis of perceptual cues obtained from recorded speech samples, listeners have been found capable of identifying various kinds of information about speakers, including their age (Ptacek & Sander, 1966; Shipp & Hollien, 1969 ; Ryan & Burk, 1972; Burk, Hoyer, Fey & Charlip, 1975; Hartman & Danhauer, 1975), sex (Schwartz, 1968; Schwartz & Rine, 1968; lngemann, 1968; Coleman, 1971, 1973a, b; Lass, Hughes, Bowyer, Waters & Bourne, 1976), race (Bryden , 1968 ; Dickens & Sawyer, 1962; Stroud, 1956; Larson & Larson, 1966; Abrams, 1975), socioeconomic status (Harms, 1961, 1963), personality (Stagner, 1936; Eisenberg & Zalowitz, 1938; Markel, Eisler & Reese, 1967), specific identity (McGehee, 1937; Pollack, Pickett & Sumby, 1954; Compton, 1963; Voiers, 1964; Clarke, Becker & Nixon, 1966; Bricker & Pruzansky, 1966; Holmgren, 1967; Stevens, Williams, Carbonell & Woods, 1968; Clarke & Becker, 1969; Coleman, 1973c), and facial and bodily features (Lass & Harvey, 1976). The purpose of the present investigation was to determine if listeners were also capable of making accurate direct estimations of speakers' heights and weights from recorded speech samples. Another purpose was to determine if the sex of the speaker and listener is an important variable in height and weight identification tasks.
70
N . J. Lass et a/.
Method Speakers A total of 30 speakers, 15 males and 15 females , participated in the experiment. All were students at West Virginia University and ranged in age from 18 to 25 years, with a mean age of 20·8 years. The speakers had normal speech characteristics and no reported hearing difficulty. Experimental procedures Construction of master tape The speakers' readings of the first paragraph of Fairbanks' (1960) The Rainbow Passage were recorded in a sound-treated room using a Nagra model IV-D tape recorder and an Altec model 681 A condenser microphone. The recorded readings were randomly arranged on a master tape, in which a pause of 8 s was inserted between each reading to allow time for listener judgments. In addition to the 30 readings, 10 were randomly selected and repeated at the end of the tape for estimation of listener reliability. Thus, the master tape contained a total of 40 readings of the same standard prose passage. Heights and V.'eights of speakers The heights of all speakers were obtained by using a tape measure attached to a wall; the speakers' weights were obtained from a standard floor scale .The same tape measure and scale and identical measurement procedures were employed for height and weight determinations for all speakers in the study. The range of heights was 67·5 to 74 in for male speakers and 61 ·5 to 7I in for female speakers. The range of weights for male speakers was I 35 to 225 lb, and 96 to I 65 lb for female speakers. Experimental sessions A total of 40 normal-hearing persons, 20 females and 20 males, served as judges in the study. All subjects participated in two experimental sessions. In one session, they were asked to determine the height of the speakers, and in another session weight judgments were made. The order of presentation of the height and weight tasks was randomized so that one group of 15 subjects made height judgments in the first session and weight judgments in the second session, while the other group of 15 subjects judged weight in the first session and height in the second session. In all experimental sessions, the listeners heard tape recordings of speakers and were asked to make direct estimations of the heights and weights of the speakers. To provide the listeners with a perceptual frame of reference upon which to base their decisions, they were presented the middle sentence of all readings prior to making specific judgments. The readings were presented binaurally using a Nagra model IV-D tape recorder and matched Sharpe model HA-IOA headphones. All listening sessions were held in a sound-treated room at West Virginia University.
Results Weight identification Figures I and 2 contain the actual weights of the I 5 female and 15 male speakers and the estimated weights of the 20 female and 20 male listeners. Table I contains a summary of the actual and mean estimated weights for the groups of speakers and listeners. The figures and table indicate that the differences between actual and estimated weights vary for female and male speakers and listeners. Based on average figures for the listener sex groups, male speakers' weights are overestimated by both male (mean difference between actual and
Speaker height and weight identification 230 220210
~
D 0
71
Actual Estimated (mole listeners)
{Zl Est i mated (female li steners)
2oor190 f-.D
:c0'
·;;;
:;=
180r 170r 160r 150f-140r 130 1-2
3
4
5
6
7
9
8
10
II
12
13
14
15
Spec kers
Figure 1
180 170 160 :0
:c0' ;;;
:;=
Actual weights of the 15 male speakers and mean estimated weights of the 20 female and 20 male listeners.
D D EZl
Actual Estimated (mole listeners) Estimated (female listeners)
150 r140 f-130 r120 r110 100
[ 2
3
4
5
6
7
8
9
r
10
II
12
13
14
15
Speakers
Figure 2
Actual weights of the 15 female speakers and mean estimated weights of the 20 female and 20 male listeners.
Table I Mean and standard deviation values for actual and estimated weights (lb) for the groups of speakers and listeners
Actual weights Female speakers: Male speakers : Estimated weights Female speakers Male listeners : Female listeners: Male speakers Male listeners: Female listeners:
X
S.D.
128·47 165·80
17·39 22·36
124·79 124·87
15·26 13·44
168·99 169·24
20·69 18·15
72
N.J. Lass et al.
estimated weights=+3·19 !b) and female (mean difference=+3·44 !b) listeners, while female speakers' weights are, on the average, underestimated by both male (mean difference= -3 ·68 !b) and female (mean difference= -3·60 !b) listeners. However, the average differences between actual and estimated weights are very small and almost identical in magnitude: the mean difference between actual and estimated weights of male speakers when judged by male and female listeners is only +3 ·31 lb ; the mean difference between actual and estimated weights of female speakers when judged by both speaker sexes is only -3·64 lb. An analysis of variance (Winer, 1970) was employed to determine if the observed differences between actual and estimated weights for sex of speaker and /or listener are significant of chance occurrences. Results indicated that there were no significant differences in weight judgments for the sex of the listener (F = 0·008, d.f. = I ,38, P > 0·05) or the sex of the speaker (F = 0·94, d.f. = 1, 28, P > 0·05). To determine the relationship between actual and estimated weights, Spearman rank order correlation coefficients (Siegel, 1956) were computed for each listener across his/her weight judgments for a1130 speakers on the master tape. Table II contains a summary of the correlation coefficients. It indicates that there was a significant correlation between estimated and actual weights for each of the 40 listeners in the study. The mean correlation coefficient for male listeners was 0·62, and the range was 0·39 to 0·75. For the female listeners, the mean was 0·65, and the range was 0·52 to 0·92. Table II Spearman rank order correlation coefficients for actual and estimated weights of all speakers for each of the 40 listeners in the study
Male listeners
Female listeners
Male listeners
Female listeners
0·64*** 0·55** 0·62*** 0·66*** 0·57** 0·75*** 0·65*** 0·63*** 0·61 *** 0·39*
0·56** 0·65*** 0·66*** 0·65*** 0·69*** 0·66*** 0·73*** 0·58** 0·70*** 0·70***
0·65*** 0·52** 0·67*** 0·59*** 0·65*** 0·67*** 0·66*** 0·58*** 0·70*** 0·72***
0·58** 0·66*** 0·55** 0·92** * 0·67*** 0·56** 0·52** 0·61 *** 0·70*** 0·57**
*P < 0.05. **P < 0.01. ***P < 0.001.
The intra-judge reliability was determined for the 40 listeners by means of the SpearmanBrown reliability coefficient (Winer, 1970) based on the listeners' first and second judgments of the 10 repeated speech samples on the master tape. A Spearman-Brown reliability coefficient of 0·88 was obtained, indicating a very satisfactorily high degree of intrajudge reliability.
Height identification Figures 3 and 4 contain the actual heights of the 15 female and 15 male speakers and the estimated heights of the 20 female and 20 male listeners. Table III contains a summary of the actual and mean estimated heights for the groups of speakers and listeners. The figures and table indicate that the differences between actual and estimated heights vary for female
73
Speaker height and weight identification 90 ~~W~~Ac~tu-a~1- - - -- -- - - -- - - - - - - - - - - - - - ,
Estimated I male li stene r s) Estimated (fe ma le li steners)
:;::
I
i~~n nnnnnnn I
8
Figure3
9
10
II 12 Speakers
13
14
15
Actual heights of the 15 male speakers and mean estimated heights of the 20 female and 20 male listeners .
.:::
·~ 80
·:lb h hn u 8
9
urn 10
II
12
rrn urn 13
14
I
15
Speakers
Figure4
Actual heights of the 15 female speakers and mean estimated heights of the 20 female and 20 male listeners. Table III Mean and standard deviation values for actual and estimated heights (in) for the groups of speakers and listeners
Actual heights Female speakers: Male speakers: Estimated heights Female speakers Male listeners : Female listeners: Male speakers Male listeners: Female listeners :
x
S.D.
66·47 70·87
3·14 2·13
65·21 65·03
2·07 2-37
70·42 70·81
3·51 2·75
and male speakers and listeners. Based on average figures for the listener sex groups, male speakers' heights are underestimated by both male (-0·45 in) and female (-0·06 in) listeners, and female speakers' heights are also underestimated by male (-1 ·26 in) and female (-1·44 in) listeners. However, the average differences between actual and estimated heights are very small: the mean difference between actual and estimated heights of male speakers when judged
74
N.J. Lass eta!.
by male and female listeners is only -0·25 in, and the mean difference for female speakers is only -1 ·35 in. To determine if the observed differences between actual and estimated heights for sex of speaker and /or listener are significant or chance occurrences, an analysis of variance (Winer, I 970) was employed. Results indicated that there were no significant differences in height judgments for the sex of the listener (F = 0·17, d.f. = I, 38, P > 0·05) or the sex of the speaker (F = I· 33, d.f. = I, 28, P > 0·05). To determine the relationship between actual and estimated heights, Spearman rank order correlation coefficients (Siegel, 1956) were computed for each listener across his/her height judgments for all30 speakers on the master tape. Table JV contains a summary ofthe Table IV Spearman rank order correlation coefficients for actual and estimated heights of all speakers for each of the 40 listeners in the study
Male listeners
Female listeners
Male listeners
Female listeners
0·58** 0·62*** 0·59*** 0·69*** 0·65*** 0·71 *** 0·56** 0·60*** 0·63*** 0·69***
0·59*** 0·66*** 0·56** 0·48** 0·74*** 0·55** 0·55** 0·65*** 0·64*** 0·66***
0·26 0·55** 0·54** 0·58** 0·59*** 0·57** 0·69*** 0·47** 0·61 *** 0·53**
0·60*** 0·52** 0·40* 0·64*** 0·55** 0·47** 0·50** 0·36* 0·70*** 0·62***
*P < 0.05. **P < 0·01. ***P < 0.001.
correlation coefficients. It indicates that there was a significant correlation between actual and estimated heights for all but one of the 40 listeners in the study. The mean correlation coefficient for male listeners was 0·58, and the range was 0·26 to 0·71. For female listeners, the mean was 0·57 and the range was 0·36 to 0·74. The intra-judge reliability was determined for the 40 listeners by means of the SpearmanBrown reliability coefficient (Winer, 1970) based on the listeners' first and second judgments of the 10 repeated speech samples on the master tape. A Spearman-Brown reliability coefficient of 0·93 was obtained, indicating a very satisfactorily high degree of intra-judge reliability. Discussion The results of this investigation indicate that listeners are capable of accurately identifying the approximate heights and weights of speakers when presented with only their recorded speech samples. These findings corroborate those obtained by Lass & Davis (1976) in an earlier study in which listeners were given a less-demanding multiple choice response task for their speaker height and weight judgments. Moreover, the results of the judges' responses in the present investigation are considerably better than those obtained in the earlier study. The average difference for all speakers and listeners between actual and estimated heights and weights was only 0·80 in and 3-48 lb, respectively. Thus, apparently there are adequate perceptual cues in the voice which reflect, to some extent, the physical features of height and weight of speakers.
Speaker height and weight identification
75
Furthermore, the present results have shown that the sex of the speaker and listener is not an important variable in speaker height and weight identification judgments. Although a trend was evident indicating some differences in identification related to speaker and listener sex, the differences were not statistically significant. This investigation has shown that, in general, listeners are more accurate in the identification of speaker heights than speaker weights. One possible explanation for this finding pertains to the ranges of heights and weights of the speakers in the study. The range of weights was 135 to 225 lb (a difference of 90 !b) for male speakers and 96 to 165 lb (a difference of69 !b) for female speakers. The range of heights was 67 ·5 to 74 in (a difference of 6·5 in) for male speakers and 61 ·5 to 71 in (a difference of 9·5 in) for female speakers. Perhaps the considerably narrower range of heights of speakers also reflects the narrower range of choices available to and employed by the listeners in the study and thus may account for more accuracy in their height judgments. The results of the present study are encouraging enough to warrant further investigation of speaker height and weight identification. Specifically, the next step needs to be an attempt to isolate and define the important acoustic cues in the voice which may reflect speakers' heights and weights. This information, along with additional evidence on other characteristics of speakers, including their age, sex, race, and specific identity, may prove very useful in a number of future theoretical and applied areas of investigation. Papers based on this study were presented at the 93rd Meeting of the Acoustical Society of America, 7-10 June 1977, University Park, Pennsylvania, USA, and at the 52nd Annual Convention of the American Speech and Hearing Association, 2-5 November 1977, Chicago, Illinois, U.S.A. References Abrams, A. S. (1975). Auditory cues and racial identification. Paper presented at the Annual Convention of the American Speech and Hearing Association 21-24 November, Washington, D.C. Bricker, P. D. & Pruzansky, S. (1966). Effects of stimulus content and duration on talker identification. Journal of the Acoustical Society of America 40, 1441-9. Bryden, J. D. (1968). An acoustic and social dialect analysis of perceptual variables in listener identification and rating of Negro speakers. Unpublished Doctoral dissertation , University of Virginia. Burk, K. W., Hoyer, E. A., Fey, M. & Charlip, W. S. (1975). Perceptual and acoustic correlates of aging in the female voice. Paper presented at the Annual Convention of the American Speech and Hearing Association, 21-24 November, Washington, D.C. Clarke, F. R. & Becker, R. W. (1969). Comparison of techniques for discriminating among talkers. Journal of Speech and Hearing Research 12, 747-61. Clarke, F. R., Becker, R. W. & Nixon, J. C. (1966). Characteristics that determine speaker recognition. Report ESD-TR-66-636, Electronics Systems Division, Air Force Systems Command, Hascom Field, December 1966. Coleman, R. 0. (1971). Male and female voice quality and its relationship to vowel formant frequencies . Journal of Speech and Hearing Research 14, 565-77. Coleman, R. 0. (1973a). A comparison of the contributions of two vocal characteristics to the perception of maleness and femaleness in the voice. Paper presented at the Annual Convention of the American Speech and Hearing Association, Detroit, Michigan, October 1973. Coleman, R. 0. (19736). A comparison of the contributions of two vocal characteristics to the perception of maleness and femaleness in the voice. Quarterly Progress Speech Research, Speech Transmission Laboratory, Royal Institute of Technology, Stockholm, Sweden. Coleman , R. 0. (1973c). Speaker identification in the absence of inter-subject differences in glottal source characteristics. Journal of the A coustical Society of America 53, 1741- 3. Compton, A. J. (1963). Effects of filtering and vocal duration upon the identification of speakers, aurally. Journal of the Acoustical Society of America 35, 1748- 52. Dickens, M. & Sawyer, G. M. (1962). An experimental comparison of vocal quality among mi xed groups of Whites and Negroes. Southern Speech Journal IS, 178-85. Eisenberg, P. & Zalowitz, F. (1938). Judging expressive movements. III. Judgments of dominancefeeling from phonograph records of voice. Journal of Applied Psychology 22, 620-31.
76
N.J. Lass et a/.
Fairbanks, G . (1960). Voice and Articulation Drillbook. New York: Harper and Row. Harms, L. S. (1961). Listener judgments of status cues in speech. Quarterly Journal of Speech 47, 164- 8. Harms, L. S. (1963). Listener comprehension of speakers of three status groups. Language and Speech 4, 109-12. Hartman, D. E. & Danhauer, J. L. (1975). Perceptual features of aging male speech. Paper presented at the 90th Meeting of the Acoustical Society of America, 3-7 November 1975. San Francisco, California. Holmgren, G. L. (1967). Physical and psychological correlates of speaker recognition. Journal of Speech and Hearing Research 10, 57-66. lngemann, F. (1968). Identification of the speaker's sex from voiceless fricatives. Journal of the Acoustical Society of America 44, 1142--4. Larson, V. S. & Larson, C. H. (1966). Reactions to pronunciation. In Communication Barriers to the Culturally Deprived (McDavid, R. I. & Austin, W. M., Eds). Washington, D. C.: U.S. Office of Education, Cooperative Research Project Number 2107. Lass, N . J. & Davis, M. (1976). An investigation of speaker height and weight identification. Journal of the Acoustical Society of America 60, 700-3. Lass, N . J. & Harvey, L.A. (1976). An investigation of speaker photograph identification . Journal of the Acoustical Society of America 59, 1232-6. Lass, N. J., Hughes, K. R., Bowyer, M.D., Waters, L. T. & Bourne, V. T. (1976). Speaker sex identification from voiced, whispered and filtered isolated vowels. Journal of the Acoustical Society of America 59, 675-8. Markel, N. , Eisler, R. M. & Reese, H. W. (1967). Judging personality from dialect. Journal of Verbal Learning and Verbal Behavior 6, 33-5. McGehee, F. (1937). The reliability of the identification of the human voice. Journal of General Psychology 17, 249-71. Pollack, I., Pickett, J. M. & Sum by, W. H . (1954). On the identification of speakers by voice. Journal of the Acoustical Society of America 26, 403-6. Ptacek, P. H. & Sander, E. K. (1966). Age recognition from voice. Journal of Speech and Hearing Research 9, 273-7. Ryan, W. J. & Burk, K. W. (1972). Predictors of age in the male voice. Paper presented at the 84th Meeting of the Acoustical Society of America, 28 November-! December. Miami Beach, Florida. Schwartz, M. F. (1968). Identification of speaker sex from isolated, voiceless fricatives. Journal of the Acoustical Society of America 43, 1178-9. Schwartz, M. F. & Rine, H. E. (1968). Identification of speaker sex from isolated, whispered vowels. Journal of the Acoustical Society of America 44, 1736-7. Shipp, F. T. & Hollien, H. (1969). Perception of the aging male voice. Journal of Speech and Hearing Research 12, 703-10. Siegel, S. (1956). Nonparametric Statistics/or the Behavioral Sciences. New York: McGraw-Hill. Stagner, R. (1936). Judgments of voice and personality. Journal of Educational Psychology 27, 272-7. Stevens, K . N., Williams, C. E., Carbonell, J. R. & Woods, B. (1968). Speaker authentication and identification: a comparison of spectrographic and auditory presentations of speech material. Journal of the Acoustical Society of America 44, 1596-607. Stroud, R. V. (1956). A study of the relation between social distance and speech differences of White and Negro high school students of Dayton , Ohio. Unpublished Master's thesis, Bowling Green State University. Voiers, W. D. (1964). Perceptual bases of speaker identity. Journal of the Acoustical Society of America 36, 1065-73. Winer, B. J . (1970). Statistical Principles in Experimental Design . New York: McGraw-Hill.