The Effect of Experience on Response Time When Judging Synthesized Voice Quality

The Effect of Experience on Response Time When Judging Synthesized Voice Quality

The Effect of Experience on Response Time When Judging Synthesized Voice Quality *Jessica Sofranko Kisenwether and †Robert A. Prosek, *Dallas and yUni...

131KB Sizes 0 Downloads 22 Views

The Effect of Experience on Response Time When Judging Synthesized Voice Quality *Jessica Sofranko Kisenwether and †Robert A. Prosek, *Dallas and yUniversity Park, Pennsylvania Summary: Objectives/hypothesis. The purpose of this study was to determine the effect of level and type of experience on response time and the number of replays needed when judging voice quality. Study design. This was a within-subjects group design. Methods. Speech-language pathologists, singing voice teachers, speech-language pathology graduate students with and without experience with a voice client, graduate students who have completed a voice pedagogy course, and inexperienced listeners (n ¼ 60) rated stimuli with systematically altered measurements of jitter, shimmer, and noise-toharmonics ratio (NHR) on a visual analog scale ranging from mild to severe for overall severity, roughness, breathiness, strain, and pitch. Response time (in seconds) and number of replays were recorded during the experiment. Results. Results showed that experienced listeners took the most time when rating the stimuli. Stimuli with two altered acoustical components also yielded longer response times compared with the stimuli with one altered acoustical component. Finally, level and type of experience had some effect on the number of replays for each stimulus during the rating task. Conclusions. In conclusion, experience does affect response time when judging voice quality and the number of replays during voice quality rating tasks. Continued research is needed regarding the reasons for extended time and replays as per experience so as to enhance future training protocols. Key Words: Voice perception–Experienced listener–Response time–Synthesized stimuli. INTRODUCTION When controlling for stimulus length and type as well as rating scale, experience has been shown to affect judgments of voice quality.1,2 A large body of research exists discussing accuracy and agreement among listeners when perceiving voice quality.3–16 Although experience can affect those judgments, agreement remains moderate3–6,17 which is often said to be a result of a multidimensional signal in which listeners are using many underlying variables to make their decisions. Although there is detailed information regarding the voice signal, with evidence of relation to specific acoustical measures as well as a list of the factors possibly affecting those perceptions, there are currently no studies that examine the time it takes to make said judgments. One may assume that because experience has been found to affect perceptions of voice quality, those listeners with extensive training or exposure may be more able to focus on the underlying variables of the signal. The question remains as to whether this results in longer response times to judge voice quality? Or whether increased levels of experience result in a faster response time? Finally, does experience have an effect on the number of times listeners need to replay a signal before making a judgment? The answers to these questions are unknown and when obtained may assist in developing appropriate training protocols for judgments of voice quality during assessment and treatment. For instance, if experienced listeners take less time than

Accepted for publication May 29, 2015. From the *Department of Speech Language Pathology, Misericordia University, Dallas, Pennsylvania; and the yDepartment of Communication Sciences and Disorders, The Pennsylvania State University, University Park, Pennsylvania 16802. Address correspondence and reprint requests to Jessica Sofranko Kisenwether, Department of Speech Language Pathology, Misericordia University, 301 Lake St., Dallas, PA 18612. E-mail: [email protected] Journal of Voice, Vol. -, No. -, pp. 1-4 0892-1997/$36.00 Ó 2015 The Voice Foundation http://dx.doi.org/10.1016/j.jvoice.2015.05.017

inexperienced listeners (IEs) when judging voice quality, student clinicians taking a lengthy amount of time during the task may indicate a need for continued training before a level of independence could be considered. The purpose of this study was to determine the effect of level and type of experience on the response time and the number of replays needed when judging voice quality. METHODS Stimuli The same stimuli used in Sofranko and Prosek (2013) were used for this study. One sample of sustained vowel /ɑ/ with normal voice quality obtained from a woman, age 23, was synthesized using the University of California, Los Angeles (UCLA) synthesizer.18 The sample was judged to be ‘‘normal’’ by speechlanguage pathologists (SLPs) who have experience in the area of voice and voice disorders on the basis of quality, pitch, and loudness.1,2,19–22 The sample was also used in many previous studies as an example of ‘‘normal’’ voice quality.1,2,19–22 This voice sample was synthesized using the UCLA voice synthesizer,18 with a duration of 1 second and a constant fundamental frequency and amplitude. This sample was then systematically altered by changing measurements of jitter, shimmer, and NHR creating two sets of stimuli. The first set of stimuli included variations of jitter and shimmer simultaneously in five evenly spaced intervals resulting in 25 stimuli. Jitter was altered in increments of 0.75 ms (0–3 ms), and shimmer was altered in increments of 0.5 dB (0–2 dB). The second set of stimuli included a variation of NHR in 10 evenly spaced intervals resulting in 10 stimuli (50 to 0 dB). NHR was altered in increments of 5 dB. Combining jitter/shimmer stimuli and NHR stimuli resulted in 35 total stimuli. Jitter, shimmer, and NHR combination stimuli were not generated for this study in an effort to control for fatigue. Aperiodicity and additive noise components were

2 altered separately to significantly reduce the number of stimuli from 250 samples to 35 samples.

Listeners The same listeners used in Sofranko and Prosek (2013) were used for this study. There were six groups with 10 listeners in each group (n ¼ 60). Groups consisted of SLPs, singing voice teachers (SVTs), speech-language pathology graduate students who had completed a voice disorder course and had not had a voice client (SLPGRADs), speech-language pathology graduate students who had completed a voice disorder course and had treated one or more voice clients (SLPGRADVs), graduate students in the music department who had completed a voice pedagogy course (SVTGRADs), and IEs. Group 1 consisted of seven women and three men who were American Speech Language Hearing Association certified and state-licensed SLPs. Ages ranged from 29 to 67 years (mean [M], 45.7; standard deviation [SD], 12.92). They had a range of 5–35 years of experience in voice disorders (M, 19; SD, 11.01) and spent 10–40 hours/week treating voice disorders (M, 23.4; SD, 12.21). Group 2 consisted of eight women and two men, ages ranging from 48 to 69 years (M, 59.6; SD, 6) who were tenured singing voice faculty and full members of the National Association of Teachers of Singing (NATS). Individuals holding a full membership in NATS, with either a Master Degree or Doctor of Musical Arts, teach an average of six or more singing voice students weekly and have 2 years of experience.23 The criterion of tenure implies at least 6 years of full-time faculty work in which the individual mentors undergraduate and graduate students throughout their academic degree of study. Group 3 consisted of 10 women, ages ranging from 21 to 24 years (M, 22; SD, 0.943), who were current graduate students in a speech-language pathology program and had completed a voice disorder course. Group 4, although similar, consisted of 10 women, ages ranging from 21 to 42 years (M, 26.1; SD, 6.33), who were also current graduate students in a speech-language pathology program, had completed a voice disorder course, but these students had also had one or more voice client(s) in clinic. Students had a range of one to eight voice clients in their clinical experience (M, 2.5; SD, 2.321). Group 5 consisted of six women and four men with ages ranging from 22 to 46 years (M, 27.9; SD, 7.4). These individuals were current graduate students in either voice pedagogy or vocal performance who had completed a voice pedagogy course in their graduate work. The students taught a range of 1–20 singing voice students weekly (M, 5.6; SD, 5.48). Finally, group 6 consisted of five women and five men, ages ranging from 24 to 56 years (M, 35; SD, 12.18), with no previous training in voice and/or voice disorders, including singing lessons and voice treatment. This group included individuals from various backgrounds including nursing, real estate, chemistry, culinary arts, fashion, architecture, cosmetology, engineering (mechanical and electrical), and law. All participants in all groups reported no history of a hearing loss, a language disorder, a speech impairment, and/or a neurologic disorder.

Journal of Voice, Vol. -, No. -, 2015

Procedures Approval from the institutional review board at The Pennsylvania State University was obtained before running participants. Detailed instructions were provided before beginning the experiment defining voice qualities of overall severity, roughness, breathiness, pitch adequacy, and strain according to the Consensus Auditory Perceptual Evaluation—Voice5 and textbook definitions.24 Participants listened to the synthesized samples via noise reduction headphones (each presented twice in random order, n ¼ 70) and rated each voice quality on a visual analog scale ranging from mild to severe covering a range from 1 to 1000, for the previously described voice qualities. During this task, a time stamp in seconds was collected as well as the number of times participants replayed each stimulus while making their judgments, using Alvin2.25 Playback level was adjusted to a comfortable level for each participant individually. RESULTS Analysis of variance revealed a significant effect for group and type of stimulus on response time during the rating task, F(5, 4199) ¼ 5.66, P < 0.05 and F(3, 4199) ¼ 2.76, P < 0.05, respectively. An interaction effect of group and type of stimulus was not significant, F(15, 4199) ¼ 0.94, P > 0.05. Tukey honestly significant difference (HSD) criterion indicated that although there was some overlap among groups, individuals with a higher level of experience (SVTs and SLPs) took the longest amount of time when rating stimuli (Table 1). Tukey HSD criterion also showed some spread regarding stimulus type; however, there was a significant difference in response time between stimuli with simultaneous alterations in jitter and shimmer and stimuli with altered NHR. Samples with simultaneous altered jitter and shimmer took the longest time to rate, whereas NHR stimuli took the shortest time (Table 2). An additional analysis of variance was conducted to examine the effect of experience on number of replays during the rating task. Each participant was permitted to replay the stimulus as many times as he or she needed during the experiment. Results revealed a significant effect of group type on number of repetitions, F(5, 4199) ¼ 31.57, P < 0.05. The post hoc Tukey HSD criterion indicated a significant difference between the SLPGRAD group and all other groups. This group used the smallest number of repetitions of each stimulus to make their judgments. Compared to the IE and SVT groups,

TABLE 1. Group: Tukey HSD Results for Response Time Group SVTs SLPs IEs SVTGRADs SLPGRADVs SLPGRADs

Mean (in s) 19 196 18 323 17 479 17 438 15 169 13 307

A A A A

B B B B

Note: Means that do not share a letter are significantly different.26

C C

Jessica Sofranko Kisenwether and Robert A. Prosek

The Effect of Experience on Response Time

TABLE 2. Stimulus Type: Tukey HSD Results for Response Time Stimulus Type

Mean (in s)

Jitter and shimmer Jitter only Shimmer only NHR

17 857 17 069 16 571 15 777

A A A

B B B

Note: Means that do not share a letter are significantly different.26

SLPGRADVs, SLPs, and SVTGRADs were significantly different using on average, the largest number of repetitions throughout the experiment (Table 3). Finally, intrarater and interrater agreement were reported in detail in Sofranko and Prosek (2013). Overall, SVTs were the only group to differ from rating 1 to rating 2 and only for judgments of overall severity and roughness. Interrater agreement remained moderate at best for all listener groups. DISCUSSION Previous research has indicated that experience can affect judgments of voice quality,1,2,27 but currently there is no literature regarding how experience may or may not affect the time it takes to make those judgments. The purpose of this study was to determine the effect of level and type of experience on the response time and the number of replays needed when judging voice quality. Individuals with a high level of experience, regardless of type, took the most time when judging synthesized voice samples for overall severity, breathiness, roughness, strain, and pitch. Alternatively, student groups with a speech-language pathology background and a singing background took the least amount of time when rating. IEs were found to be in the middle of these experienced groups (never consistently grouped with either), which is consistent with previous research.1 However, in contrast to previous findings, response time varied in regard to level of experience (years of exposure) rather than type of experience (singing background vs a speech-pathology background). Aside from level of experience, the type of stimuli also had an effect on response time among groups. Stimuli with both simultaneously altered jitter and shimmer took the most time

TABLE 3. Group: Tukey HSD Results for Number of Stimulus Replays During Rating Group

Mean

SLPGRADVs SLPs SVTGRADs IEs SVTs SLPGRADs

1.9686 1.9671 1.9586 1.8614 1.6400 0.8500

A A A A

3

for participants to rate voice quality compared to the stimuli with only one altered acoustical component. This indicates that listeners will take more time when judging voice samples with a multidimensional nature. In addition, it is important to note that samples with only altered NHR took the least amount of time to rate among listeners. Perhaps, breathiness is a more agreed-on perceptual quality of voice compared with other voice qualities and therefore easier to judge. Finally, listeners were able to replay the stimuli as many times as needed to complete the experimental task. SLPs, SLPGRADVs, and SVTGRADs had nearly the same average number of replays. These groups were significantly different from the SLPGRADs, who had the least amount of replays per stimulus. Again, IEs were spread among the middle of the replay range. It can be hypothesized that students with little experience in the area of voice will make a more ‘‘snap’’ judgment compared to someone with more experience and more information on which to rely. In conclusion, experience (level and type) does not only affect judgments of voice quality, but the amount of time and number of replays needed per judgment. If individuals with more experience take more time to judge voice quality, future research is needed to investigate why this may be the case. Perhaps, individuals with more experience listen for additional components within the signal compared with individuals without experience. Future research should focus on capturing these components within the signal so as to improve training protocols for students who are newly exposed to perceptions of voice quality. Capturing this information can help to train student clinicians in regard to voice perception and guide their focus while listening. In addition, longer response times may possibly be one of the causes of moderate agreement levels previously found during perceptual rating tasks in the literature. Given less time to ponder, it is possible that agreement levels may increase as each listener would have less time to rely on unstable internal standards to make his or her judgment. As the same participants used in Sofranko and Prosek (2013) were used in this study, the same limitations remain. Participants reported the absence of a hearing impairment, but hearing thresholds were not obtained before the study. Although participants were able to adjust the playback volume to a comfortable level, this is of concern as many participants were aged >30 years and hearing thresholds may have been diminished. Considering the possibility of advanced ages when including expert populations, future research should include hearing screenings before listening tasks to assure hearing thresholds are within normal limits.

REFERENCES

B B

Note: Means that do not share a letter are significantly different.

C 26

1. Sofranko JL, Prosek RA. The effect of levels and types of experiences on judgment of synthesized voice quality. J Voice. 2013;28:24–35. 2. Sofranko Kisenwether JL, Prosek RA. The effect of experience on perceptual spaces when judging synthesized voice quality: a multidimensional scaling study. J Voice. 2014;28:548–553. 3. De Bodt M, Wuyts F, Van de Heyning P, Croux C. Test-retest study of the GRBAS scale: influence of experience and professional background on perceptual rating of voice quality. J Voice. 1996;11:74–80.

4 4. Eadie TL, Doyle PC. Direct magnitude estimation and interval scaling of pleasantness and severity in dysphonic and normal speakers. J Acoust Soc Am. 2002;112:3014–3021. 5. Kempster G, Gerratt BR, Verdolini-Abbott K, Barkmeier–Kraemer J, Hillman R. Consensus auditory-perceptual evaluation of voice: development of a standardized clinical protocol. Am J Speech Lang Pathol. 2009; 18:124–132. 6. Kreiman J, Gerratt B. The perceptual structure of pathologic voice quality. J Acoust Soc Am. 1996;100:1787–1795. 7. Kreiman J, Gerratt B. Validity of rating scale measures of voice quality. J Acoust Soc Am. 1998;104:1598–1608. 8. Kreiman J, Gerratt BR, Berke GS. The multidimensional nature of pathologic voice quality. J Acoust Soc Am. 1994;96:1291–1302. 9. Kreiman J, Gerratt BR, Ito M. When and why listeners disagree in voice quality assessment tasks. J Acoust Soc Am. 2007;122:2354–2364. 10. Kreiman J, Gerratt B, Kempster G, Erman A, Berke G. Perceptual evaluation of voice quality: review, tutorial, and a framework for future research. J Speech Hear Res. 1993;36:21–40. 11. Kreiman J, Gerratt B, Precoda K, Berke G. Individual differences in voice quality perception. J Speech Hear Res. 1992;35:512–520. 12. Kreiman J, Vanlancker-Sidtis D, Gerratt B. Defining and Measuring Voice Quality. From Sound to Sense. MIT; 2004. 13. Lee M, Drinnan M, Carding PN. The reliability and validity of patient selfrating of their own voice quality. Clin Otolaryngol. 2005;30:357–361. 14. Shrivastav R, Sapienza CM, Nandur V. Application of psychometric theory to the measurement of voice quality using rating scales. J Speech Lang Hear Res. 2005;48:323–335. 15. Yiu EML, Chan KMK, Mok RSM. Reliability and confidence in using paired comparison paradigm in perceptual voice quality evaluation. Clin Linguist Phon. 2007;21:129–145.

Journal of Voice, Vol. -, No. -, 2015 16. Yiu EML, NG C. Equal appearing interval and visual analogue scaling of perceptual roughness and breathiness. Clin Linguist Phon. 2004;18: 211–229. 17. Karnell M, Melton S, Childes J, Coleman T, Dailey S, Hoffman H. Reliability of clinician-based (GRBAS and CAPE-V) and patient-based (V-RQOL and IPVI) documentation of voice disorders. J Voice. 2006;21:576–590. 18. Kreiman J, Gerratt BR, Anto~nanzas-Barroso N. Analysis and Synthesis of Pathological Voice Quality. Los Angeles, CA: University of California; 2006. 19. Awan SN, Roy N. Toward the development of an objective index of dysphonia severity: a four-factor model. Clin Linguist Phon. 2006;20: 35–49. 20. Awan SN, Roy N. Acoustic prediction of voice type in adult females with functional dysphonia. J Voice. 2005;19:268–282. 21. Awan SN, Lawson L. The effect of anchor modality on the reliability of vocal severity ratings. J Voice. 2009;23:341–352. 22. Awan SN, Roy N. Outcomes measurement in voice disorders: an acoustic index of dysphonia severity. J Speech Lang Hear Res. 2009;52:482–499. 23. National Association of Teachers of Singing. (n.d.) Membership qualifications. Available at: http://www.nats.org/index.php?option¼com_ content&view¼article&id¼137&Itemid¼102. Accessed May 10, 2010 24. Colton RH, Casper JK. Understanding Voice Problems: A Physiological Perspective for Diagnosis and Treatment. Baltimore, MD: Williams & Wilkins; 1996. 25. Hillenbrand J. Getting started with Alvin2. 2005. Available at:http:// homepages.wmich.edu/hillenbr/. Accessed January 30, 2010. 26. Minitab. Minitab Statistical Software, Release 14 [Computer Software]. State College, PA: Minitab Inc; 2005. 27. Sofranko JL, Prosek RA. The effect of experience on classification of voice quality. J Voice. 2012;26:299–303.