ARTICLE IN PRESS
Comparison of Two Versions of the Acoustic Voice Quality Index for Quantification of Dysphonia Severity *DGeun-Hyo 1X X KimD2X X, *D3X XYeon-Woo LeeD4X X, †D5X XIn-Ho BaeD6X X, ‡D7X XHee-June ParkD,8X X *DByung-Joo 9X X LeeD10X X, and §D1X XSoon-Bok KwonD,12X X *xGeumjeong-gu, yYangsan, and zSouth Korea
Summary: Objectives. The acoustic voice quality index (AVQI) is a specific acoustic indicator designed to objectively estimate dysphonia severity and measure the values of acoustic parameters based on the diagnostic category. This study compared the performance of two AVQI versions (2.02 and 3.01, v2 and v3) and PraatCPPS using a voice sample of Korean population. Materials and Methods. Voice samples for sustained vowel and connected speech were elicited from 2257 patients across 14 diagnostic categories. Auditory-perceptual (A-P) assessments of dysphonia severity were compared to acoustic parameters of severity derived from two versions of the AVQI (v2 and v3) as well as the PraatCPPS. Results. The AVQI-estimated severity (v2 and v3) and PraatCPPS severity for concatenated voice samples strongly correlated with each other and were significantly associated with A-P ratings. The AVQI (v2 and v3) and PraatCPPS showed high reliability in differentiating between pathological voice disorders. Conclusion. The AVQI (v2 and v3) and PraatCPPS were strongly correlated with the A-P ratings and provided valid estimates of dysphonia severity. However, the associations of the A-P ratings with the AVQIv2 were significantly stronger than those with the AVQIv3 and PraatCPPS, suggesting that the V2 outperformed the V3 and PraatCPPS. Key Words: Acoustic Voice Quality Index−Voice−Dysphonia−GRBAS−CAPE-V.
INTRODUCTION Acoustic evaluation of vocal disorders involves analyses of various parameters. These analyses can be essentially classified into time-domain−based, spectral, and cepstral analysis, and each analysis method is applied depending on the characteristics of the voice signal, such as the presence of periodicity in the voice. A measurement index called the acoustic voice quality index (AVQI), which can measure the degree of dysphonia severity, was recently introduced by integrating analyses of the sustained vowel (SV) phonation and connected speech (CS) tasks.1−10 AVQI is computed using a specific algorithm that weighs six variables (smoothed cepstral peak prominence, CPPS; harmonics-tonoise ratio, HNR; shimmer local, SL, also known as percent shimmer; shimmer local dB, SLdB, also known as shimmer in dB; as well as the slope and tilt of the regression line through the long-term average spectrum, slope dB and tilt dB) derived from time-domain−based, spectral, and cepstral analysis. These analyses are performed using the Praat script (Institute of Phonetic Sciences, University of Amsterdam, Netherlands).
Accepted for publication November 21, 2018. From the *Department of Otorhinolaryngology-Head and Neck Surgery and Biomedical Research Institute, Pusan National University Hospital, Geumjeong-gu, Busan, South Korea; yDepartment of Otorhinolaryngology-Head and Neck Surgery, Pusan National University Yangsan Hospital, Yangsan, Gyeongsangnam-do, South Korea; zDepartment of Speech Rehabilitation, Choonhae College of Health Sciences, Ulsan, South Korea; and the xDepartment of Humanities, Language and Information, Pusan National University, Geumjeong-gu, Busan, South Korea. Address correspondence to and reprint requests Soon-Bok Kwon, Department of Humanities, Language and Information, Pusan National University, Pusan, South Korea. E-mail:
[email protected] Journal of Voice, Vol. &&, No. &&, pp. &&−&& 0892-1997 © 2018 The Voice Foundation. Published by Elsevier Inc. All rights reserved. https://doi.org/10.1016/j.jvoice.2018.11.013
The analysis script, when integrated into Praat, automatically estimates AVQI scores from zero to more than 10, with higher scores reflecting an abnormal voice (the pathological voice has a larger value than the normal voice). The most important feature of this process is the integrated analysis of SV and CS (concatenated voice samples), and the voice quality is quantified by automatically removing pauses and unvoiced intervals. Praat scripts can automatically analyze a large number of speech samples using various measurement variables. The regression equation of AVQI is constantly being supplemented to improve its reliability.11−13 This combination is believed to represent an ecologically valid estimate of the severity of the speaker's overall dysphonia.2 Recently, the early version of AVQI (v.2.02) was significantly revised to create v.3.01, which is designed to improve the external validity and overall performance of AVQI. The analysis algorithm of v.3.01 includes significant adjustments for the weights of each parameter. In addition to AVQI assessments, a method called Cepstral Spectral Index of Dysphonia (CSID) was introduced to quantify voice and track the recovery of voice impairment.14−16 The CSID analysis derives two CSID values by analyzing SV and CS, and CSID regression equations for SV and CS are applied differently. The CS regression equation is based on a weighted three-factor model that includes the CPPS, L/H spectral ratio (the ratio of spectral energy below 4 kHz versus that above 4 kHz), and its standard deviation, while the vowel regression equation is a weighted five-factor model that additionally includes gender and CPPS standard deviation. In previous studies, it was confirmed that these acoustic parameters were independent predictors of the speaker's estimated
ARTICLE IN PRESS 2
Journal of Voice, Vol. &&, No. &&, 2018
dysphonia severity. Nevertheless, in this study, we did not perform comparisons with CSID because we focused on the variables measured by Praat. AVQI generates a single estimate of the severity of speech impairment based on analysis of concatenated samples that combine the speaker's vowel and consonant utterances. The AVQI algorithm incorporates time-domain−based parameters (SL, SL dB, and HNR), cepstral (CPPS) and spectral (slope dB and tilt dB) measurements. To extract the voiced segment, a script presented in the previous study is applied.2,10 Despite the differences in the AVQI versions, automated voice measures can provide valuable objective data on the effectiveness of voice therapy, surgical intervention, and medication.8,11,12 There are many previous studies using version 2.02, but there are not many follow-up studies using version 3.01.11−13 Although the AVQI algorithm has been improved, it is not known whether version 3.01 is valid in Korean speakers, since recent Korean studies have used version 2.02.10,17 We identified this difference by comparing the two versions of the AVQI using the dysphonic population in Korea across fourteen diagnostic categories. The diagnostic categories were chosen to reflect the types of dysphonia commonly experienced in clinical practice, and we inferred that each category would likely present individual voice qualities that could influence the performance of the acoustic parameters. Thus, the inclusion of various diagnostic categories can allow a fair comparison of performance across AVQI versions. The present study aimed to address the following questions: First, is there a significant association among AVQI (v2 and v3), PraatCPPS, and auditory-perceptual assessment (grade and overall severity)? Second, how much do sex, age, and diagnostic category influence the findings obtained with the two AVQI versions and PraatCPPS? Third, what is the cutoff value and area under curve (AUC) of the two AVQI versions and PraatCPPS for distinguishing between normal and pathological voice? The purpose of this study was to compare the performance of these two AVQI versions (2.02 and 3.01, v2 and v3) and PraatCPPS using voice samples from Korean population, and to investigate the correlation between acoustic analyses and auditory-perceptual assessment.
n = 175), (13) edema (n = 73), and (14) laryngopharyngeal reflux (LPR, n = 80). The patients’ voices were recorded during routine voice examinations at the Voice clinic of Pusan National University Hospital. The Institutional Review Board of Pusan National University Hospital approved this study.
METHODS
Auditory-perceptual ratings Auditory-perceptual assessment of dysphonia is commonly regarded as the gold standard to establish the severity of a pathological voice. Therefore, averaged ratings of dysphonia severity by three listeners served as the reference standard for comparison with AVQI- and CPPS-estimated dysphonia severities. For listener ratings, data from the Kim et al17 study were used for analysis, and the researchers can refer to the detailed description of the methods for listening assessment. The raters were three speech language pathologists with more than 7 years of working experience in voice evaluation
Study population The study population included 2,257 patients across the following 12 diagnostic categories who underwent recordings before intervention: (1) vocal nodules (nodules: n = 340), (2) papillary thyroid cancer (PTC, n = 127), (3) spasmodic dysphonia (SD, n = 66), (4) vocal cyst (Cyst, n = 56), presbylaryngis (Presby, n = 261), (5) glottic cancer (Cancer, n = 143), (6) functional dysphonia (FD, n = 31), (7) laryngeal leukoplakia (Leuko, n = 264), (8) normal (n = 145), (9) vocal fold palsy (Palsy, n = 204), (10) papilloma (n = 85), (11) vocal polyp (Polyp, n = 207), and (12) sulcus vocalis (sulcus,
Voice recordings Each recording was obtained using a voice recording system during (1) sustained vowel /a/ phonation at a comfortable and habitual pitch and loudness and (2) reading the Korean sentence “Walk” Passage, which contains 25 syllables in 10 words. The central 2 seconds of the /a/ sustained phonation and the “Walk” passage were concatenated by Praat for AVQI analysis. Thus, the voice sample database included concatenated files of sustained phonation and “Walk” passage reading from each patient. Acoustic analyses All voice samples were analyzed using Praat (v. 6.0.21) to estimate the AVQI severity and CPPS. These samples were also used for auditory-perceptual assessments. Dysphonia severity estimates were measured using the AVQI versions 2.02 and 3.01 based on Praat scripts reported by Maryn and colleagues to evaluate the performance of the two versions of the AVQI and CPPS.6,13 Each AVQI script concatenates a 2-second SV sample along with voiced segments extracted from a CS sample and generates a single AVQI score. Although the acoustic components used to generate the AVQI are the same from versions 2.02 to 3.01, their relative weights have changed within the regression formulas. The differences between the two versions of the AVQI are demonstrated in the following two equations, which were used to measure the AVQI-estimated severities. AVQI v:2:02 ¼ 9:072 ð0:245 CPPSÞ ð0:161 HNRÞ ð0:470 SLÞ þ ð6:158 SL dBÞ ð0:071 SlopeÞ þ ð0:170 TiltÞ AVQI v: 3:01 ¼ ½4:152 ð0:177 CPPSÞ ð0:006 HNRÞ ð0:037 SLÞ þ ð0:941 SL dBÞ þð0:01 SlopeÞ þ ð0:093 TiltÞ 2:8902
ARTICLE IN PRESS Geun-Hyo Kim, et al
3
Comparison of Two Versions of the Acoustic Voice Quality
(Two were speech language pathologists working in the voice clinic, and one was a professor specializing in dysphonia).
Statistical analyses All statistical analyses were completed using R version 3.5.0 (The R Foundation for Statistical Computing, Vienna, Austria) and R Studio 1.1.456 (R Studio Inc., Boston, Massachusetts). The inter-rater variability of G (Likert scale) and OS (visual analogue scale) were measured using intraclass correlation coefficient (ICC) values. The intrarater reliability of the three raters for 200 voice samples was computed. The ratings of the three raters were averaged, resulting in mean G and OS scores for 2,257 subjects. The correlation coefficient and coefficient of determination (r2) were used to investigate the correlation between acoustic variables and A-P ratings. The coefficient of determination can be thought of as a percentage value. A higher coefficient is an indicator of a better goodness of fit for observations. The coefficient of determination is the proportion of variance in the dependent variable that is predictable from the independent variable. Multiple linear regression was used to explain the relationship between one continuous dependent variable (AVQIv2, v3, and PraatCPPS) and two or more independent variables (sex, age, and laryngeal status). The independent variables can be continuous or categorical (dummy coded as appropriate). For multiple regression analysis, laryngeal status was converted into dummy variables and analyzed (glottic cancer, vocal cyst, functional dysphonia, laryngeal leukoplakia, vocal fold nodules, normal, vocal fold palsy, papilloma, vocal polyp, presbylaryngis, PTC, spasmodic dysphonia, and sulcus vocalis).
To compute the cutoff values and predictive power of the AUC of the AVQI (v2 and v3) and PraatCPPS between normal (G0) and pathological voice, a receiver operating characteristic (ROC) curve analysis was performed. RESULTS Inter-rater and intrarater variability Inter-rater variability Inter-rater variability statistics for A-P ratings among the three raters are summarized. The inter-rater variability ranged from moderate (ICC = 0.726 between rater 1 and 3 on OS) to high (ICC = 0.861 between rater 2 and 3 on G), with a mean inter-rater ICC of 0.801. Intra-rater variability Intra-rater variability statistics for A-P ratings among the three raters were presented. The intra-rater variability for G varied from moderate (ICC = 0.832, rater 2) to high (ICC = 0.880, rater 1), and the results for OS ranged from 0.797 (rater 3) to 0.880 (rater 1), with a mean ICC of 0.851 for G and 0.807 for OS.
Results of AVQI (v2 and v3) and PraatCPPS measurements and A-P ratings (G and OS) according to laryngeal status The results of AVQI v2, v3 measurements and A-P ratings according to laryngeal status are presented in Table 1. AVQI v2 ranged from 2.4 to 7.3, v3 ranged from 1.4 to 7.3, CPPS ranged from 7.2 to 13.9, G ranged from 0.0 to 2.5, and OS ranged from 12.5 to 72.4.
TABLE 1. Description of AVQI (v2 and v3), PraatCPPS, and A-P ratings (G and OS) Across Diagnostic Categories
Normal Leuko SD PTC Nodules Papilloma Sulcus FD Presby LPR Polyp Cyst Edema Palsy Cancer
AVQIv2
AVQIv3
PraatCPPS
Grade
2.4 § 0.8 4.5 § 1.4 4.7 § 1.4 4.8 § 1.1 4.8 § 1.2 4.8 § 1.5 4.9 § 1.2 5.0 § 1.3 5.1 § 1.3 5.2 § 1.3 5.2 § 1.3 5.3 § 1.6 5.4 § 1.5 6.0 § 1.8 7.3 § 1.4
1.4 § 1.0 3.0 § 1.8 3.2 § 1.8 3.4 § 1.4 3.7 § 1.7 3.3 § 2.2 3.8 § 1.5 3.7 § 1.5 3.8 § 1.9 4.0 § 1.7 4.0 § 1.6 4.3 § 2.1 4.2 § 1.9 5.4 § 2.5 7.3 § 2.3
13.9 § 1.5 12.8 § 2.6 12.2 § 2.7 12.2 § 2.0 11.2 § 2.4 11.7 § 3.2 11.0 § 2.1 11.1 § 2.0 10.9 § 2.7 11.1 § 2.5 11.1 § 2.3 10.8 § 2.8 10.9 § 2.7 9.2 § 3.4 7.2 § 3.2
0.0 § 0.0 1.1 § 0.6 1.4 § 0.6 1.1 § 0.7 1.3 § 0.6 1.2 § 0.7 1.3 § 0.6 1.4 § 0.7 1.4 § 0.6 1.3 § 0.6 1.7 § 0.8 1.5 § 0.8 1.5 § 0.6 1.8 § 0.8 2.5 § 0.5
Overall Severity 12.5 § 6.1 35.1 § 16.9 42.7 § 17.2 33.7 § 15.0 39.6 § 14.1 45.2 § 15.6 41.9 § 15.9 42.7 § 16.8 44.8 § 16.3 45.7 § 15.4 46.4 § 16.9 47.9 § 19.0 48.3 § 15.2 52.5 § 21.0 72.4 § 13.7
AVQIv2, acoustic voice quality index version 2.02; AVQIv3, acoustic voice quality index version 3.01; PraatCPPS, smoothed cepstral peak prominence in Praat; Leuko, leukoplakia; SD, spasmodic dysphonia; PTC, papillary thyroid cancer; Nodules, vocal nodules; Sulcus, sulcus vocalis; FD, functional dysphonia; Presby, presbylaryngis; LPR, laryngopharyngeal reflux; Polyp, vocal polyp; Cyst, vocal cyst; Palsy, vocal fold palsy; Cancer, glottic cancer.
ARTICLE IN PRESS 4
Journal of Voice, Vol. &&, No. &&, 2018
vocal fold nodules had dominant effect on AVQIv2. The results of statistical analysis for AVQIv2 were significant (P < 0.001).
FIGURE 1. Correlations among the measured variables and the p-value significance levels. The correlations between acoustic parameters and auditory-perceptual assessments ranged from ¡0.96 to 0.95. In summary, the overall correlation was high.
Correlations between acoustic measurements (AVQIv2, v3, and PraatCPPS) and A-P ratings (G and OS) The correlations between acoustic analyses (AVQIv2, v3, and PraatCPPS) and A-P assessments (G and OS) are shown in Figure 1. AVQIv2 showed a positive correlation with v3 (0.95, P < 0.001), G (0.86, P < 0.001), and OS (0.87, P < 0.001), and negative correlation with PraatCPPS (¡0.91, P < 0.001). AVQIv3 also showed significant correlations with PraatCPPS (¡0.96, P < 0.001), G (0.83, P < 0.001), and OS (0.84, P < 0.001), respectively. PraatCPPS showed significant correlations of ¡0.78 (P < 0.001) and ¡0.81 (P < 0.001) with G and OS, respectively. Multiple regression analysis of gender, age, and laryngeal state for acoustic analyses AVQI version 2.02 (v2) The regression model was assessed in model 18 by the stepwise method (Table 2). AVQIv2 can be explained by 36.6% by the regression equation of the final determined model 18 (r = 0.605, r2 = 0.366, adj. R2 = 0.361, P < 0.001). The final determined model 18 showed significance in the analysis of variance (F = 78.171, P < 0.001). There was no multicollinearity because there were no variables whose variance inflation factor (VIF) of the included variables is 10 or more. In model 18, the regression coefficients of age and sex were 0.013 and ¡0.395, respectively, and the regression coefficients of laryngeal status ranged from 1.296 to 4.105. The statistical results with standardized beta scores show that glottic cancer, vocal fold palsy, and
AVQI version 3.01 (v3) The regression model in AVQIv3 was also determined in model 18 by the stepwise method (Table 3). AVQIv3 can be explained by 32.9% by the regression equation of the final determined model 18 (r = 0.573, r2 = 0.329, adj. R2 = 0.324, P < 0.001). The final determined model 18 showed significance in the analysis of variance (F = 66.272, P < 0.001). There is no multicollinearity because there were no variables whose VIF of the included variables was 10 or more. In model 18, the regression coefficients of age and sex were 0.016 and ¡0.454, respectively, and the regression coefficients of laryngeal status ranged from 0.706 to 4.959. The results of standardized beta scores indicate that AVQIv3 is also significantly affected by glottic cancer, vocal fold palsy, and vocal fold nodules. The statistical analyses for AVQIv3 were found to be significant (P < 0.001).
PraatCPPS The regression model in PraatCPPS was also determined in model 19 by the stepwise method (Table 4). PraatCPPS can be explained by 26.6% by the regression equation of the final determined model 19 (r = 0.515, r2 = 0.266, adj. R2 = 0.260, P < 0.001). The final determined model 19 showed significance in the analysis of variance (F = 52.198, P < 0.001). There was no multicollinearity because there were no variables whose VIF of the included variables is 10 or more. In model 19, the regression coefficients of age and sex were ¡0.022 and 0.408, respectively, and the regression coefficients of laryngeal status ranged from ¡5.490 to ¡1.212. The laryngeal leukoplakia group did not show statistical significance (P = 0.718) and was excluded from model 19. CPPS was significantly affected by laryngeal conditions such as glottic cancer, vocal fold palsy, and vocal fold nodules, which were confirmed by the results of standardized beta values. The results of this statistical analysis were significant (P < 0.001).
ROC curve analysis The cutoff value and the discrimination accuracy of the measured variables were verified to distinguish between normal and pathological voice (Figure 2). In this study, the normal voice group was classified by A-P ratings in grade 0 without larynx disease and vocal discomfort. First, the results for AVQIv2 were as follows: cutoff value, 3.33; sensitivity, 0.901; specificity, 0.938; and AUC 0.963 (P < 0.001). The corresponding results for AVQIv3 were 2.48, 0.751, 0.910, and 0.890, respectively and those for PraatCPPS were 12.33, 0.656, 0.917, and 0.818 (P < 0.001).
ARTICLE IN PRESS Geun-Hyo Kim, et al
5
Comparison of Two Versions of the Acoustic Voice Quality
TABLE 2. Multiple Regression Analysis for the Influence of Variables on AVQI Version 2.02 Coefficients* Model 18
(Constant) Cancer Age Palsy Leuko Polyp Sex Nodules Edema Cyst PTC SD LPR Sulcus Presby Papilloma FD
Unstandardized Coefficients
Standardized Coefficients
Beta
Std. Error
Beta
2.737 4.105 .013 3.077 1.296 2.420 ¡.395 2.138 2.601 2.500 2.158 2.126 2.332 2.006 1.972 1.805 2.234
.167 .186 .002 .163 .167 .154 .069 .141 .201 .218 .173 .198 .198 .182 .169 .194 .266
.615 .131 .543 .256 .429 ¡.118 .470 .283 .239 .306 .221 .266 .254 .388 .212 .160
T
R
R2
Sig.
16.423 22.063 5.774 18.930 7.777 15.700 ¡5.708 15.180 12.959 11.472 12.506 10.721 11.751 11.005 11.662 9.326 8.385
0.605
0.366
.000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000
* Dependent Variable: AVQIv2 t, t-value; R, correlation coefficient; R2, coefficient of determination; Sig., significance; Leuko, leukoplakia; SD, spasmodic dysphonia; PTC, papillary thyroid cancer; Nodules, vocal nodules; Sulcus, sulcus vocalis; FD, functional dysphonia; Presby, presbylaryngis; LPR, laryngopharyngeal reflux; Polyp, vocal polyp; Cyst, vocal cyst; Palsy, vocal fold palsy; Cancer, glottic cancer.
TABLE 3. Multiple Regression Analysis for the Influence of Variables on AVQI Version 3.01 Coefficients* Model 18
(Constant) Cancer Palsy Age Leuko Polyp Nodules Sex Edema Cyst LPR PTC Sulcus SD Presby FD Papilloma
Unstandardized Coefficients
Standardized Coefficients
Beta
Std. Error
Beta
1.700 4.959 3.427 .016 .706 2.152 2.016 ¡.454 2.385 2.493 2.023 1.771 1.825 1.664 1.559 1.933 1.261
.230 .257 .225 .003 .230 .213 .195 .096 .277 .301 .274 .238 .252 .274 .234 .368 .267
.554 .450 .120 .104 .284 .330 ¡.101 .193 .178 .171 .187 .172 .129 .228 .103 .110
t
R
R2
Sig.
7.384 19.286 15.257 5.138 3.066 10.103 10.358 ¡4.751 8.598 8.278 7.376 7.428 7.245 6.073 6.670 5.248 4.713
0.573
.0329
.000 .000 .000 .000 .002 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000
* Dependent Variable: AVQIv3 t, t-value; R, correlation coefficient; R2, coefficient of determination; Sig., significance; Leuko, leukoplakia; SD, spasmodic dysphonia; PTC, papillary thyroid cancer; Nodules, vocal nodules; Sulcus, sulcus vocalis; FD, functional dysphonia; Presby, presbylaryngis; LPR, laryngopharyngeal reflux; Polyp, vocal polyp; Cyst, vocal cyst; Palsy, vocal fold palsy; Cancer, glottic cancer.
ARTICLE IN PRESS 6
Journal of Voice, Vol. &&, No. &&, 2018
TABLE 4. Multiple Regression Analysis for the Influence of Variables on PraatCPPS Coefficients* Model 19
(Constant) Cancer Palsy Age PTC Nodules Sex Polyp Edema Cyst Sulcus Presby LPR FD SD Papilloma
Unstandardized Coefficients
Standardized Coefficients
Beta
Std. Error
Beta
13.750 ¡5.490 ¡3.845 ¡.022 ¡1.212 ¡2.234 .408 ¡2.227 ¡2.396 ¡2.415 ¡2.146 ¡1.906 ¡2.010 ¡2.219 ¡1.414 ¡1.391
.296 .259 .228 .004 .277 .200 .131 .224 .335 .373 .289 .215 .319 .482 .350 .308
¡.453 ¡.373 ¡.121 ¡.094 ¡.270 .067 ¡.217 ¡.144 ¡.127 ¡.149 ¡.206 ¡.126 ¡.087 ¡.081 ¡.090
t
R
R2
Sig.
46.531 ¡21.190 ¡16.841 ¡5.796 ¡4.381 ¡11.146 3.108 ¡9.931 ¡7.150 ¡6.469 ¡7.434 ¡8.874 ¡6.302 ¡4.606 ¡4.043 ¡4.512
0.515
0.266
.000 .000 .000 .000 .000 .000 .002 .000 .000 .000 .000 .000 .000 .000 .000 .000
* Dependent Variable: PraatCPPS t, t-value; R, correlation coefficient; R2, coefficient of determination; Sig., significance; SD, spasmodic dysphonia; PTC, papillary thyroid cancer; Nodules, vocal nodules; Sulcus, sulcus vocalis; FD, functional dysphonia; Presby, presbylaryngis; LPR, laryngopharyngeal reflux; Polyp, vocal polyp; Cyst, vocal cyst; Palsy, vocal fold palsy; Cancer, glottic cancer.
FIGURE 2. Receiver operating characteristic (ROC) curves obtained using acoustic quantification (AVQIv2, v3, and PraatCPPS) of all voice samples combining the ‘Walk’ passage (containing 25 syllables in 10 words) and 2-second portions of the vowel /a/.
DISCUSSION This study aimed to compare the performance of two versions of the AVQI and the PraatCPPS using normal and pathological voice samples across a variety of diagnostic categories for laryngeal status. This study was meaningful in that it quantified pathological voice using AVQI, CPPS, and A-P evaluations and confirmed high correlations among the measured variables. Analysis of the variables affecting AVQI and PraatCPPS showed that the effects of gender were greater than those of age. Additionally, glottic cancer and vocal fold palsy in laryngeal disease were found to show the greatest influence. ROC analysis was performed to identify the cutoff value that could distinguish between normal and pathological voice. All of the analyses showed excellent diagnostic predictive power, with AUC values of 0.82 or more. The following findings were obtained from the results of the study. First, a comparison of the results of the two groups with respect to acoustic variables showed significant differences between the normal and pathological groups in both AVQIs (v2 and v3) and PraatCPPS. The two regression equations (v2 and v3) of AVQIs applied in this study have been used in previous studies.8,10,11,13,17−19 These two equations include the same variables but use different variable weights to quantify the severity of dysphonia. Pathological voice yields a larger AVQI value. In AVQI v2 and v3, the pathological voice group showed higher values than the normal voice group, and the differences were statistically significant. These characteristics have been confirmed in the results of
ARTICLE IN PRESS Geun-Hyo Kim, et al
Comparison of Two Versions of the Acoustic Voice Quality
previous studies.20−22 Mean AVQIs for all normal and pathologic voice groups were 3.83 § 1.63 and 3.97 § 1.60 for the initial alpha and beta versions, respectively. The correlation between the two versions was 0.980 (P < 0.0001), and the beta version was later developed into AVQIv2.22 In order to improve the internal consistency of AVQI, the optimal regression equation was verified by analyzing and comparing voice samples of various lengths. In comparisons of the voice durations (VDs) VD-1 (17 syllables + vowels for 3 seconds), VD-2 (a sentence length corresponding to 3 seconds after extracting the unvoiced interval + vowels for 3 seconds), and VD-3 (a whole sentence including 93 syllables and the 3 second vowel), VD-2 showed the best accuracy to discriminate between normal and pathological voice, and VD-2 was then defined in the AVQIv3.6 PraatCPPS is a variable representing the degree of harmonics in the voice signal, and pathological voice shows a low CPPS value. In a previous study, CPPS values of all subjects measured by SpeechTool (ST) and Praat were 6.61 § 1.79 and 11.66 § 2.68, respectively, and the correlation between the two methods was 0.961 (P < 0.0001).22 The results of this study were consistent with those of previous studies. In this study, the pathological voice group was measured as showing a lower CPPS than the normal voice group, consistent with the results of previous studies.23,24 Second, the results of the present study identified a high correlation among the measured variables. Despite the differences in the weights of the regression equation, there was a high correlation (r = 0.95, r2 = 0.90, P < 0.001) between AVQIv2 and AVQIv3. AVQIv2 and v3 were verified with different sentence lengths and regression equations were also derived with different weights. The AVQIv2 was based on a relatively short sentence length (17 syllables + vowel 3 seconds) and validated, whereas AVQIv3 was validated in a longer sentence length (34 syllables + vowel 3seconds). In this study, we analyzed the AVQI using concatenation of 26 syllable sentences and vowel 2seconds. It is important to emphasize that the clinician should be aware of the version of AVQI used and know that the length of the sentence affects the AVQI value. Follow-up study needs to identify the time of the voiced segment extracted from 26 syllables and determine the effect of the vowel and sentence on the AVQI. PraatCPPS also plays an important role in two regression equations of AVQI. AVQIv2 and v3 showed correlations of ¡0.91 and ¡0.96 with CPPS, similar to the results (r = ¡0.71) of previous studies.2 The correlation between AVQI and PraatCPPS was 0.87 in vocal polyp patients who underwent laryngeal microsurgery.20 The correlation between AVQI (v2, v3) and Grade was 0.86 and 0.83, respectively, and was consistent with previous studies by other language linguists reporting correlations between 0.78 and 0.91.2,3,5,10 OS also showed correlations of 0.87 and 0.86 with AVQI (v2, v3), consistent with previous studies showing values of 0.88»0.92.1−3,5,8,9,25,26 In this study, the correlations between PraatCPPS and A-P rating (G, OS) were ¡0.78 and ¡0.81, respectively. Correlations between
7
CPPS (vowel and connected speech) and A-P rating were ¡0.80, ¡0.86 in the study by Heman-Ackah et al27 and ¡0.56, ¡0.73 in that by Delgado-Hernandez et al28 Bae et al29 reported a correlation of ¡0.85, ¡0.72, and ¡0.71 with G, CPPST, CPPSST and CPPADSV, and Awan et al30 reported that the OS showed correlations of ¡0.68 to 0.79 with CPP and CPPS. Additionally, the high correlation between G and OS (0.87; P < 0.01) in the present study was consistent with previous studies (0.836»0.969).10,17,26,31,32 Third, multiple linear regression was used to explain the relationship between acoustic variables (AVQIv2, AVQIv3, and PraatCPPS) and independent variables (sex, age, and laryngeal status). AVQIv2, AVQIv3, and PraatCPPS could be explained by 36.6%, 32.9%, and 26.6% using the regression equations of the final determined model, respectively. Common findings were that age has a greater impact on acoustic variables than sex, and glottic cancer, vocal fold palsy, and vocal fold nodules among the diagnostic categories had the greatest impact on the acoustic index. In a previous study, many patients with glottal insufficiency produced a breathy voice, and the acoustic variables reflected this characteristic.33 Fourth, it is possible to distinguish pathological voice from normal voice by using the predictive power and the cutoff value of variables. The results of this study confirmed that the predictive power of AVQI (v2 and v3) was more than 0.89 and that of PraatCPPS was more than 0.82. The cutoff values for distinguishing between normal and abnormal voices were 3.33 for AVQIv2, 2.48 for AVQIv3, and 12.33 for PraatCPPS, similar to the findings of previous studies. In previous studies, the cutoff values for AVQIv2 were 2.30»3.48, and its diagnostic predictive power was 0.906»0.970. In an earlier version of AVQI, the cutoff values were 2.70»3.66 and its predictive power was 0.880»0.956.1,5,8−10,19 The cutoff values of AVQIv3 were 2.06»2.43, and its diagnostic predictive power was 0.906»0.923.12,13 Heman-Ackah et al reported a cutoff CPPS value of 5.0 for discriminating dysphonia.34 In a follow-up study, Houde's algorithm was used to report that a CPPS cutoff value of 4.0 showed sensitivity of 0.924, specificity of 0.790, and the diagnostic predictive power of 0.937.35,36 The tools used for cepstral analysis are Computerized Speech Lab, Analysis of Dysphonia in Speech and Voice (ADSV), SpeechTool, Praat, etc., and the CPPS value varies depending on the analysis algorithm used.34,37−39 In an analysis of voice measurements of 65 patients with glottic cancer using various CPPS tools, ADSV (5.87 § 3.78), SpeechTool (4.44 § 3.14), and Praat (9.41 § 4.02) were used to obtain measurements in the vowel task, and ADSV (3.87 § 2.14), SpeechTool (3.04 § 1.46), and Praat (6.75 § 2.18) were used in the connected speech task.37 In other studies, the CPPS values in the normal speech group were 5.89 § 1.00 in the ADSV and 20.11 § 1.27 in the Praat; the corresponding values in the disordered voice group were 4.15 § 1.73 in the ADSV and 17.49 § 1.52 in the Praat; and the mean CPPS was 4.87 § 1.70 in the ADSV and 18.58 § 1.91 in the Praat, respectively.39
ARTICLE IN PRESS 8 In this study, we quantified and compared voices corresponding to various diagnostic categories by using AVQIv2, AVQIv3, and PraatCPPS, and also confirmed the correlations among these variables. The findings are significant in that they present the acoustical criteria to distinguish between normal and pathological group. CONCLUSIONS The results of this study confirm that the AVQI (v2 and v3), PraatCPPS, A-P ratings (G and OS) are strongly correlated, and acoustic measurements show acceptable quantification of dysphonia severity. However, the associations observed between the AVQIv2 and A-P ratings were stronger than those shown by the AVQIv3 and PraatCPPS. Therefore, AVQIv2 was superior to AVQIv3 and PraatCPPS in assessing the contents of 26 syllables. Clinicians should consider that these results were derived within the context of the 26 syllable reading content. DISCLOSURE The authors have no funding or financial relationship to disclose SUPPLEMENTARY MATERIALS Supplementary material associated with this article can be found, in the online version, at https://doi:10.1016/j. jvoice.2018.11.013. REFERENCES 1. Maryn Y, De Bodt M, Roy N. The Acoustic Voice Quality Index: toward improved treatment outcomes assessment in voice disorders. J Commun Disord. 2010;43:161–174. 2. Maryn Y, Corthals P, Van Cauwenberge P, et al. Toward improved ecological validity in the acoustic measurement of overall voice quality: combining continuous speech and sustained vowels. J Voice. 2010; 24:540–555. 3. Reynolds V, Buckland A, Bailey J, et al. Objective assessment of pediatric voice disorders with the acoustic voice quality index. J Voice. 2012;26. 672.e1−7. 4. Barsties B, Maryn Y. Test-retest variability and internal consistency of the Acoustic Voice Quality Index. HNO. 2013;61:399–403. 5. Maryn Y, De Bodt M, Barsties B, et al. The value of the Acoustic Voice Quality Index as a measure of dysphonia severity in subjects speaking different languages. Eur Arch Otorhinolaryngol. 2014;271: 1609–1619. 6. Barsties B, Maryn Y. The improvement of internal consistency of the Acoustic Voice Quality Index. Am J Otolaryngol. 2015;36:647–656. ~ez-Batalla F, Díaz-Fresno E, Alvarez-Fern 7. N un andez A, et al. Application of the Acoustic Voice Quality Index for Objective Measurement of Dysphonia Severity. Acta Otorrinolaringologica. 2017;68:204–211. 8. Hosokawa K, Barsties B, Iwahashi T, et al. Validation of the Acoustic Voice Quality Index in the Japanese Language. J Voice. 2017;31. 260. e1−260.e9. 9. Uloza V, Petrauskas T, Padervinskis E, et al. Validation of the Acoustic Voice Quality Index in the Lithuanian Language. J Voice. 2017;31. 257e1−251.e11. 10. Maryn Y, Kim HT, Kim J. Auditory-perceptual and acoustic methods in measuring dysphonia severity of Korean Speech. J Voice. 2016;30: 587–594.
Journal of Voice, Vol. &&, No. &&, 2018 11. Lee JM, Roy N, Peterson E, et al. Comparison of two multiparameter acoustic indices of dysphonia severity: the Acoustic Voice Quality Index and Cepstral Spectral Index of Dysphonia. J Voice. 2017;32. 515.e1−515.e13. 12. Hosokawa K, Barsties VLB, Iwahashi T, et al. The Acoustic Voice Quality Index Version 03.01 for the Japanese-speaking Population. J Voice; 2017. 13. Barsties B, Maryn Y. External Validation of the Acoustic Voice Quality Index Version 03.01 with extended representativity. Ann Otol Rhinol Laryngol. 2016;125:571–583. 14. Awan SN, Roy N, Zhang D, et al. Validation of the Cepstral Spectral Index of Dysphonia (CSID) as a screening tool for voice disorders: development of clinical cutoff scores. J Voice. 2016;30:130–144. 15. Watts CR, Awan SN. An examination of variations in the Cepstral Spectral Index of Dysphonia across a single breath group in connected speech. J Voice. 2015;29:26–34. 16. Peterson EA, Roy N, Awan SN, Merrill RM, Banks R, Tanner K. Toward validation of the Cepstral Spectral Index Of Dysphonia (CSID) as an objective treatment outcomes measure. J Voice. 2013;27:401–410. 17. Kim GH, Lee YW, Bae IH, et al. Validation of the Acoustic Voice Quality Index in the Korean Language. J Voice. 2018. In Press. 18. Uloza V, van Latoszek B, Ulozaite-Staniene N, et al. A comparison of Dysphonia Severity Index and Acoustic Voice Quality Index measures in differentiating normal and dysphonic voices. Eur Arch Otorhinolaryngol. 2018;275:949–958. 19. Barsties VLB, Ulozaite-Staniene N, Maryn Y, et al. The influence of gender and age on the Acoustic Voice Quality Index and Dysphonia Severity Index: a normative study. J Voice; 2017. 20. Kim G-H, Lee Y-Y, Lee B-J, et al. Acoustic and auditory-perceptual evaluation as predictor of voice recovery after laryngeal microsurgery in patients with vocal polyp. Korean J Otolaryngol Head Neck Surg. 2018;61:361–369. 21. Hernandez JD, Gomez NML, Jimenez A, et al. Validation of the Acoustic Voice Quality Index Version 03.01 and the Acoustic Breathiness Index in the Spanish language. Ann Otol Rhinol Laryngol. 2018;127:317–326. 22. Maryn Y, Weenink D. Objective dysphonia measures in the program praat: smoothed Cepstral Peak Prominence and Acoustic Voice Quality Index. J Voice. 2015;29:35–43. 23. Brinca LF, Batista APF, Tavares AI, et al. Use of cepstral analyses for differentiating normal from dysphonic voices: a comparative study of connected speech versus sustained vowel in European Portuguese female speakers. J Voice. 2014;28. 24. Watts CR, Awan SN. Use of spectral/cepstral analyses for differentiating normal from hypofunctional voices in sustained vowel and continuous speech contexts. J Speech Lang Hear Res. 2011;54:1525–1537. 25. Barsties B, Maryn Y. The Acoustic Voice Quality Index. toward expanded measurement of dysphonia severity in German subjects. HNO. 2012; 60:715–720. 26. Nemr K, Simoes-Zenari M, Cordeiro GF, et al. GRBAS and Cape-V Scales: high reliability and consensus when applied at different times. J Voice. 2012;26. 812.e17−22. 27. Heman-Ackah YD, Michael DD, Goding GS. The relationship between cepstral peak prominence and selected parameters of dysphonia. J Voice. 2002;16:20–27. 28. Delgado-Hernandez J, Leon-Gomez NM, et al. Cepstral analysis of normal and pathological voice in Spanish adults. Smoothed cepstral peak prominence in sustained vowels versus connected speech. Acta Otorrinolaringol Esp. 2018;69:134–140. 29. Bae IH, Kim GH, Lee YW, et al. Clinical application of cepstral peak prominence for treatment outcomes assessment in voice disorders: cfomparing ADSV, Speechtool, and PNU_CPP. J Speech-Lang, Hear, Disord. 2016;25:93–102. 30. Awan SN, Helou LB, Stojadinovic A, et al. Tracking voice change after thyroidectomy: application of spectral/cepstral analyses. Clin Linguist Phon. 2011;25:302–320. 31. Karnell MP, Melton SD, Childes JM, et al. Reliability of clinicianbased (GRBAS and CAPE-V) and patient-based (V-RQOL and IPVI) documentation of voice disorders. J Voice. 2007;21:576–590.
ARTICLE IN PRESS Geun-Hyo Kim, et al
Comparison of Two Versions of the Acoustic Voice Quality
32. Wuyts FL, De Bodt MS, Van de Heyning PH. Is the reliability of a visual analog scale higher than an ordinal scale? An experiment with the GRBAS scale for the perceptual evaluation of dysphonia. J Voice. 1999;13:508–517. 33. Hsiung MW, Woo P, Minasian A, et al. Fat augmentation for glottic insufficiency. Laryngoscope. 2000;110:1026–1033. 34. Heman-Ackah YD, Michael DD, Baroody MM, et al. Cepstral peak prominence: a more reliable measure of dysphonia. Ann Otol Rhinol Laryngol. 2003;112:324–333. 35. Hillenbrand J, Houde RA. Acoustic correlates of breathy vocal quality: dysphonic voices and continuous speech. J Speech Lang Hear Res. 1996;39:311–321.
9
36. Heman-Ackah YD, Sataloff RT, Laureyns G, et al. Quantifying the cepstral peak prominence, a measure of dysphonia. J Voice. 2014;28:783–788. 37. Kim GH, Lee YW, Bae IH, et al. A study of cepstral peak prominence characteristics in ADSV, SpeechTool and Praat. J Speech-Lang, Hear, Disord. 2017;26:99–111. 38. Lowell SY, Colton RH, Kelley RT, et al. Predictive value and discriminant capacity of cepstral- and spectral-based measures during continuous speech. J Voice. 2013;27:393–400. 39. Sauder C, Bretl M, Eadie T. Predicting voice disorder status from smoothed measures of cepstral peak prominence using praat and analysis of dysphonia in speech and voice (ADSV). J Voice. 2017;31:557–566.