Comparison of Two Multiparameter Acoustic Indices of Dysphonia Severity: The Acoustic Voice Quality Index and Cepstral Spectral Index of Dysphonia

Comparison of Two Multiparameter Acoustic Indices of Dysphonia Severity: The Acoustic Voice Quality Index and Cepstral Spectral Index of Dysphonia

ARTICLE IN PRESS Comparison of Two Multiparameter Acoustic Indices of Dysphonia Severity: The Acoustic Voice Quality Index and Cepstral Spectral Index...

529KB Sizes 1 Downloads 122 Views

ARTICLE IN PRESS Comparison of Two Multiparameter Acoustic Indices of Dysphonia Severity: The Acoustic Voice Quality Index and Cepstral Spectral Index of Dysphonia *Jeong Min Lee, *Nelson Roy, *Elizabeth Peterson, and †Ray M. Merrill, *Salt Lake City and †Provo, Utah Summary: Objectives. The Acoustic Voice Quality Index (AVQI) and the Cepstral Spectral Index of Dysphonia (CSID) are two multiparameter acoustic indices designed to objectively estimate dysphonia severity and track treatment outcomes. This study compared the performance of these two indices using a common corpus of dysphonic speakers. Method. Pre- and posttreatment samples of sustained vowel and connected speech were elicited from 112 patients across six diagnostic categories: unilateral vocal fold paralysis (n = 12), adductor spasmodic dysphonia (n = 12), primary muscle tension dysphonia (n = 12), benign vocal fold lesions (n = 12), presbylaryngis (n = 12), and mutational falsetto (n = 12). Listener ratings of dysphonia severity were compared to acoustic estimates of severity derived from two iterations of the AVQI (versions 2.02 and 3.01) as well as the CSID. Results. The AVQI- and CSID-estimated severity for sustained vowels, connected speech, and a combined context were strongly correlated and significantly associated with listener ratings pretreatment, posttreatment, and change observed pre- to posttreatment. However, multiple regression analysis (adjusted for age, sex, and diagnostic category) revealed that the CSID generally accounted for more variance in listener-perceived severity ratings, and the contribution of the AVQI was small and statistically insignificant when the CSID was already in a combined model. Conclusions. The AVQI and the CSID were strongly correlated and both provided valid estimates of dysphonia severity. However, associations observed between the CSID- and listener-estimated dysphonia were almost uniformly stronger than either version of the AVQI, suggesting that the CSID outperformed the AVQI. Key Words: Voice disorders–Cepstral analysis–Dysphonia severity–AVQI–CSID.

INTRODUCTION Acoustic assessment of voice using cepstral analysis is a valuable tool for quantifying dysphonia severity and tracking treatment outcomes in research and clinical settings.1–4 The cepstrum is a Fourier transform of the log power spectrum and may be used to determine the extent to which the dominant rahmonic (an anagram of “harmonic” often associated with the vocal fundamental frequency) is individualized and emerges out of the background noise.5 This has also been referred to as the cepstral peak prominence (CPP), and numerous studies have demonstrated that increased dysphonia severity is often associated with a decrease in the amplitude of the cepstral peak (ie, lower harmonic energy) and an increase in high-frequency spectral energy. Furthermore, Hillenbrand and Houde6 described a method of computing the normalized CPP by comparing the amplitude of the cepstral peak with the expected amplitude as determined via linear regression. A smoothed version of the cepstral peak prominence (CPPS) has been shown recently to be strongly associated with listener-estimated dysphonia severity.1,2,5–20

Accepted for publication June 20, 2017. Conflict of interest: The authors have no conflicts of interest to disclose. Disclosure: The authors have no financial relationships relevant to this article to disclose. From the *Department of Communication Sciences and Disorders, The University of Utah, Salt Lake City, Utah 84112; and the †Department of Health Science, Brigham Young University, Provo, Utah 84602. Address correspondence and reprint requests to Jeong Min Lee, Department of Communication Sciences and Disorders, The University of Utah, 390 South 1530 East, Suite 1201, BEH SCI, Salt Lake City, UT 84112. E-mail: [email protected] Journal of Voice, Vol. ■■, No. ■■, pp. ■■-■■ 0892-1997 © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jvoice.2017.06.012

Unlike time-based measures of aperiodicity such as jitter and shimmer, the CPP does not require a quasi-periodic signal to be valid and can be derived from both sustained vowel and connected speech samples.4 The limitations of traditional timebased analysis combined with the strong performance of cepstral (as well as spectral-based) acoustic measures have led researchers to develop multiparameter algorithms incorporating measures of the CPP along with other spectral- and time-based acoustic parameters to optimize the quantification of dysphonia severity. One such example is the Cepstral Spectral Index of Dysphonia (CSID), a commercially available index within the Analysis of Dysphonia in Speech and Voice program (ADSV model 5109; KayPENTAX, Montvale, NJ).9–11,16,18,19,21,22 Another such index is the Acoustic Voice Quality Index (AVQI),15,23 which is an application that operates within Praat, a free-software program.24 The AVQI is a single estimate of dysphonia severity based on a weighted algorithm incorporating six acoustic parameters derived from an analysis of a concatenated sample combining both sustained vowel and connected speech samples from the same speaker. Each of the six acoustic measures was identified previously as uniquely accounting for variance explained in listener-perceived ratings of dysphonia severity.15 The AVQI algorithm includes the CPPS, harmonics-to-noise ratio (HNR), shimmer local (SL, also known as percent shimmer), shimmer local dB (SLdB, also known as shimmer in dB), as well as the slope and tilt of the regression line through the long-term average spectrum (SLOPE dB and TILT dB). The analysis script, when incorporated into Praat, automatically estimates an AVQI score, which ranges from 0 to 10, with increasing values reflecting a continuum of severity from normal to profoundly abnormal voice. Unlike other indices, the concatenation of both voice contexts

ARTICLE IN PRESS 2 (ie, sustained vowel and connected speech) is at the core of the AVQI development, and this combination has been argued to represent a more ecologically valid estimate of a speaker’s overall dysphonia severity.15 Recently, an early version of the AVQI (v. 2.02) underwent a major modification, producing version 3.01, which was designed to improve the AVQI’s external validity and overall performance.23 Although version 3.01 retained the original six acoustic parameters in the analysis algorithm, significant adjustments to the weighting of each parameter (among other important changes) were made. Like the AVQI, the CSID was developed to objectively quantify dysphonic voices and track treatment outcomes. However, the CSID analyzes voices obtained from sustained vowel and the connected speech samples separately and automatically generates dysphonia severity estimates for each individual context. In this regard, two separate regression formulas for the sustained vowel and the connected speech samples were developed. For an analysis of connected speech obtained from the “Rainbow Passage,” the algorithm was based on a weighted three-factor model including spectral and cepstral measures—the CPPS, L/H spectral ratio (the ratio of spectral energy below 4 kHz versus above 4 kHz), and its standard deviation.9 The equation for the sustained vowel, however, included gender and standard deviation of the CPPS for a five-factor analysis.21 These acoustic variables were observed previously to be strong independent predictors of listener-estimated dysphonia severity. The CSID estimate is generally a value between 0 and 100, but in some cases it can generate a score below 0 or above 100. Like the AVQI, the validity of the CSID has been examined extensively, and it appears to represent a potentially robust tool for assessment of heterogeneous voice qualities and severities. For instance, a validation study of the CSID comprising patients from six diagnostic categories found that the CSID displayed strong associations with listener-perceived ratings and was sensitive to treatmentrelated changes across time.3 In addition, another recent study demonstrated that the CSID may have clinical value as a potential screening tool capable of distinguishing vocally normal individuals from those with voice disorders.4 Although the AVQI (versions 2.02 and 3.01) and CSID both include CPPS as a prominent variable in their respective algorithms, there are a number of significant differences between the two indices: First, the analysis protocol—whether separating each voice context or concatenating the contexts—differs between the two models. The AVQI generates a single estimate of dysphonia severity based on the analysis of a concatenated sample, which combines the sustained vowel and connected speech productions from the same speaker. In contrast, the CSID provides separate estimates of dysphonia severity for sustained vowel and connected speech contexts. Second, differences exist between the acoustic parameters that are included in each analysis algorithm. The AVQI algorithm incorporates time-based as well as cepstral and spectral measures, whereas the CSID algorithm contains principally cepstral- and spectral-based parameters. In addition, the Praat and ADSV programs, within which the AVQI and the CSID function, respectively, approach extracting voiced segments from connected speech samples differently. That is, the two programs differ in their method of identifying articulatory

Journal of Voice, Vol. ■■, No. ■■, 2017

or nonphonatory noise in estimating the CPPS value. For instance, Praat does not automatically distinguish voiced and unvoiced regions when analyzing dysphonic voice. Instead, the developers of the AVQI designed a script to extract voiced signals based on three criteria: (a) the sound energy should exceed 30% of overall sound energy; (b) the zero crossings of the signal should be lower than 1500 Hz; and (c) the normalized autocorrelation peak should exceed 0.3.15 In contrast, the ADSV program associated with the CSID removes highly aperiodic sound signals (ie, CPP values < 0) along with low-amplitude breathy sounds such as “s” or unvoiced consonants. Thus, the AVQI and the CSID differ in substantive ways. Despite their differences, it is clear that automated voice measures—like the AVQI and the CSID that quantify dysphonia severity—can provide valuable objective data regarding the effects of behavioral, medical, or surgical interventions. Yet clinicians and researchers are faced with decisions regarding which measure should they employ. Although both of these indices have grown in popularity, to date, no study has compared the performance of the AVQI and the CSID using an identical corpus of voice samples (and associated listener judgments of severity). Thus, the relative superiority of one index over the other is not known. We address this deficiency by comparing the AVQI and the CSID using the same set of dysphonic speakers from six diagnostic categories. The diagnostic categories were chosen to reflect disorder types commonly encountered in clinical practice, and we reasoned that each category would likely possess idiosyncratic voice qualities that could influence the performance of each acoustic index. Thus, the inclusion of diverse diagnostic categories would permit a fairer comparison of their performance. Therefore, the current research was designed to answer four questions. First, is there a significant association between the two acoustic indices? Second, what is the strength of the association between each index and listener-perceived severity ratings based on diagnostic category? Third, are the indices equally sensitive to change from pre- to posttreatment across each diagnostic category? Fourth, is there evidence to indicate superiority in the performance of one acoustic index over another? METHODS Speech samples This study used a previously collected corpus of dysphonic speakers from Peterson et al.3 The corpus included 112 patients across six diverse diagnostic categories who were recorded before and after treatment: (1) unilateral vocal fold paralysis (UVFP, n = 20), (2) adductor spasmodic dysphonia (ADSD, n = 20), (3) primary muscle tension dysphonia (PMTD, n = 20), (4) benign vocal fold lesion (BVFL, n = 20), (5) mutational falsetto (MF, n = 12), and (6) presbylaryngis (PRES, n = 20). The patients’ voices were recorded as part of routine care at the University of Utah Voice Disorders Center in Salt Lake City, Utah. As described in Peterson et al,3 . . .the samples were selected based upon their primary voice disorder diagnosis, as determined by an otolaryngologist and a speech-language pathologist who specialize in voice disorders. Second, patients underwent some form of

ARTICLE IN PRESS Jeong Min Lee, et al

Comparison of the AVQI and the CSID

intervention and a follow-up voice sample was available. . .To be included in the sample set, a second, follow-up posttreatment sample was required, as the intent of this study was to assess sensitivity to pre- and post-treatment changes in dysphonia severity. The specific intervention technique was of no particular consequence, so long as some change, whether positive or negative (small or large), was apparent in the posttreatment sample collected. Therefore, for each participant there were two sets of voice/speech recording samples, one pretreatment and the other post-treatment. (p. 403)

Before and after treatment, each patient was recorded using research quality instrumentation while (1) sustaining the vowel [a:] for approximately 5 seconds at a comfortable pitch and loudness and (2) reading the “Rainbow Passage.” Using MultiSpeech (model 3700; KayPENTAX, Montvale, NJ), voice samples were digitized at a sampling rate of 25 kHz. The central 3 seconds of the sustained vowel and two middle sentences from the “Rainbow Passage” were selected for acoustic analysis. As a consequence, the corpus included pre- and posttreatment voice recordings of sustained vowel and connected speech samples from each participant. Acoustic analyses All voice samples were analyzed using the Computerized Speech Lab (CSL) software module ADSV (model 5109, v. 3.4.2, KayPENTAX, Montvale, NJ) to estimate the CSID severity and then within Praat (v. 6.0.21) to provide the AVQI severity estimates. These were the identical samples used in the listener perceptual rating experiment (to be described later). For the CSID, the data from the Peterson et al3 study were used. In their research, the middle 3 seconds of the sustained vowel [a:] and the two middle sentences of the “Rainbow Passage,” (ie, 50 syllables) were analyzed separately to generate CSID estimates of dysphonia severity for each voice context. From the Peterson et al study, the voice samples from each patient (before and after treatment) were uploaded separately and the ADSV generated the CSID measurement (for the vowel [a:]), as well as the CPPS and L/H ratio data used in a manual computation of a CSID for the “Rainbow Passage.” The CSID-estimated severity for both voice sample contexts was generated based on the following multiple regression formulas:9,21

CSIDsv = 84.20 − (4.40 × CPPS) + (10.62 × σ CPPS) − (1.05× L H Spectral Ratio) + (7.61× σ L H ratio) − (10.68× Gender ) CSID RB = 154.59 − (10.39× CPPS) − (1.08× L H Spectral Ratio) − (3.71× σ L H ratio) To evaluate the performance of the two iterations of the AVQI, severity estimates were generated using the AVQI versions 2.02 and 3.01 based on scripts reported by Maryn and colleagues.14,23 Each AVQI script automatically concatenates a 3-second sustained vowel sample along with voiced segments extracted from a connected speech sample. It generates a single AVQI score along with a visual display of the concatenated sound signal, narrowband spectrogram, and power-cepstrogram. Although the acoustic

3

parameters used to generate the AVQI remained the same from v. 2.02 to v. 3.01, their relative weighting (along with other important attributes) changed within the regression formulas. The differences across the two versions of the AVQI are illustrated in the following two equations, which were used to compute the AVQI-estimated severities:14,23,25

AVQI v. 2.02 = 9.072 − 0.245× CPPS − 0.161× HNR − 0.470 ×SL + 6.158×SLdB − 0.071×Slope + 0.170 × Tilt AVQI v. 3.01 = {4.152 − (0.177× CPPS) − (0.006 × HNR) − (0.037×SL) + (0.941×SLdB) + (0.01×Slope) + (0.093× Tilt )}× 2.8902 Adapting the AVQI to permit a segregated analysis of sustained vowel and connected speech Although both versions of the AVQI require concatenation of the sustained vowel and the connected speech samples, the analysis algorithms are designed to be applied to the entire sample regardless of the voice context. That is to say, unlike the CSID, the analysis algorithm in the AVQI does not distinguish whether it is analyzing the sustained vowel or connected speech portions of the concatenated sample. Therefore, for the purpose of this study and to permit a more direct comparison between the CSID and the two versions of the AVQI, it was necessary to create a separate AVQI estimate for each voice context (ie, sustained vowel or connected speech). Thus, a customized segregation procedure was undertaken to override the default AVQI script, which requires loading .wav files for both sustained vowel and connected speech samples to analyze the concatenated sample. Instead, to permit a separate AVQI estimate of severity for the sustained vowel and connected speech sample, several steps were required within Praat. These steps and associated procedural details are described in Appendix A. Thus, separate acoustic estimates of dysphonia severity for both sustained vowel and connected speech were generated in both the CSID and the two versions of the AVQI. In the AVQI, a third severity estimate was generated using the standard (default) analysis procedure (ie, analysis of the concatenated sample that combined the sustained vowel sample with the connected speech sample). Finally, for comparison with the AVQI severity estimate based on the concatenated sample, the CSID values for the sustained vowel and connected speech were averaged to produce a mean CSID value for an individual patient. Table 1 summarizes the independent and dependent variables and contexts used in this experiment and describes how each estimate was derived. Auditory-perceptual ratings Auditory-perceptual evaluation of voice is often regarded as the “gold standard” to establish the presence and degree of dysphonia. Therefore, mean listener ratings of dysphonia severity served as the reference standard for comparison with the AVQIand CSID-estimated dysphonia severity. For the listener ratings, data from the Peterson et al study3 were used for analysis, and the interested reader is referred to their detailed description of

ARTICLE IN PRESS 4

Journal of Voice, Vol. ■■, No. ■■, 2017

TABLE 1. Description of Voice Contexts Employed and Associated Analysis Methods Used to Estimate Dysphonia Severity Including AVQI v. 2.02, AVQI v. 3.01, CSID, and Listener Ratings Voice Context Variable

Sustained Vowel (SV)

Connected Speech (CS)

Combined (SV and CS)

1

AVQI v. 2.02 (SV)

AVQI v. 2.02 (CS)

AVQI v. 2.02 Concatenated sample SV and CS

2

AVQI v. 3.01 (SV)

AVQI v. 3.01 (CS)

AVQI v. 3.01 Concatenated sample SV and CS

3

CSID (SV)

CSID (CS)

CSID (SV ) + CSID (CS ) = 2 Mean Overall CSID-Estimated Severity

4

Listener ratings (LR) (SV)

Listener ratings (LR) (CS)

LR (SV ) + LR (CS ) = 2 Mean Overall Listener-Estimated Severity

Notes: Variables 1–3 are independent (predictor) variables, whereas variable 4 is the dependent variable. The “Combined” context describes the method used to estimate overall severity based on a combination of the sustained vowel and connected speech contexts. Change scores from pre- to posttreatment were calculated as pre minus post for all variables.

the methods surrounding the listening experiments as well as associated listener reliability estimates for each diagnostic category. In brief, the judges were eight graduate students who were native speakers of English in the Department of Communication Sciences Disorders from the University of Utah and completed coursework on voice disorders, which included practicing making judgments of dysphonia severity. All pre- and posttreatment voice samples were presented to each listener in a quiet environment at a comfortable level of loudness. Voice samples within each diagnostic category were presented as separate listening experiments. Listeners heard paired, randomized recordings of pre- and posttreatment samples produced by each patient. A custom computer program with a 100-mm visual analog scale (VAS) similar to Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V) ratings was used to estimate dysphonia severity. Listeners used a cursor to place a vertical marker on the scale with “Normal Voice” represented on the far left (rating zero) and “Profoundly Abnormal Voice” on the far right side (rating 100 mm). In each listening experiment, which was separated by diagnostic category and voice context, the order of sound files was randomized.

Statistical analyses Statistical analyses were performed using dysphonia severity estimates computed from the two versions of the AVQI and the CSID for each of the sustained vowel, connected speech, and concatenated or combined contexts. To permit a comparison between the AVQI-estimated severity of the concatenated sample (which combined both voice contexts into a single numerical value), we averaged the CSID-estimated dysphonia severity from the sustained vowel and connected speech samples for each

individual subject to provide an overall CSID-estimated severity for the combined contexts. Likewise, to permit a comparison between the acoustic estimates based on the combined or concatenated samples and the listener ratings, we averaged the listener ratings for each individual speaker from the two sample types (ie, sustained vowel and connected speech), to provide an overall listener-perceived severity rating for the combined context (see Table 1). In all analyses, the AVQI- and CSID-estimated severities were the independent variables and listener-judged severity was the dependent variable. The change from pre- to posttreatment in the AVQI-, CSID- and listener-estimated severity rating was also analyzed. The change in the AVQI- and CSID-estimated severity was considered as the independent variable and the change in the listener-perceived rating was the dependent variable. Pearson product-moment correlation was used to evaluate the association between the CSID and the two versions of the AVQI dysphonia severity estimates. To evaluate which index significantly accounted for more variance in the listener ratings of severity, multiple regression analysis was performed. Partial R2 values (adjusted for age, sex, diagnostic category) were estimated to identify how much variance in listener-perceived severity was explained by each single index measure (ie, AVQI v. 2.02, AVQI v. 3.01, and CSID). Finally, simultaneous multiple regression analysis was used to assess the combined influence of the two versions of the AVQI and the CSID on listener-perceived dysphonia ratings to determine the overall performance of each acoustic index (after adjusting for age, sex, and diagnostic category). All statistical analyses were performed using SPSS 22.0 (IBM Corp., Armonk, NY). Statistical significance was based on the two-sided test of hypothesis at the 0.05 level.

ARTICLE IN PRESS Jeong Min Lee, et al

Comparison of the AVQI and the CSID

RESULTS Acoustic remeasurement reliability The remeasurement reliability of the AVQI was assessed by randomly selecting 20% of the samples (n = 22 samples) and having the original examiner reanalyze or recompute each severity estimate. The mean Pearson correlation coefficients (r) confirmed excellent remeasurement reliability for the sustained vowel (r = 1.00, P < 0.01), connected speech (r = 0.99, P < 0.01) as well as the concatenated context (r = 1.00, P < 0.01). Likewise, the remeasurement reliability of the CSID, as reported by Peterson et al,3 was similarly excellent for the sustained vowel (r = 0.99, P < 0.0001) and connected speech (r = 0.99, P < 0.0001). Listener reliability To assess intrarater reliability, 20%of the voice samples were randomly chosen and presented again to each listener. As reported by Peterson et al3 correlation coefficients of r = 0.87 (P < 0.001) for sustained vowel and r = 0.91 (P < 0.001) for

5

connected speech confirmed excellent intrarater reliability. Likewise, interrater reliability, as assessed by the intraclass correlation coefficient was above 0.95 for both speech contexts, across each diagnostic category, indicating excellent reliability.3 Correlations between the CSID and the two versions of the AVQI Correlations between the CSID- and the two versions of the AVQI-estimated dysphonia severity (v. 2.02 and v. 3.01) for the sustained vowel, connected speech, and combined contexts were calculated separately for pre- and posttreatment samples across all diagnostic categories (Table 2). t-Tests were performed to assess statistical significance. Overall, the results indicated a moderately strong, significant association between the CSID and the two versions of the AVQI at both pre-and posttreatment (across all contexts). However, there were statistically significant differences observed between the two versions of the AVQI (v. 2.02 and v. 3.01), as indicated by the shaded correlation coefficients

TABLE 2. Correlation Coefficients (r) Between the CSID and the Two Versions of the AVQI for the Sustained Vowel, Connected Speech, and Combined Context Within Each Diagnostic Category for Pre- and Posttreatment

Notes: (1) All correlations between the CSID and AVQI were significant unless bolded (P < 0.05). Bolded text indicates those correlations between the CSID and AVQI that were nonsignificant. For instance, in the case of the PMTD group, in the sustained vowel context (posttreatment condition), the correlations reported between the CSID and the AVQI v. 2.02 (r = 0.28) and v. 3.01 (r = 0.19) were nonsignificant at the 0.05 level. (2) Shaded correlation coefficients (and associated t-statistics) indicate statistically significant differences between the two versions of the AVQI and the strength of the correlation observed with the CSID (P <0 .05). The t-statistic was used to compare correlation coefficients of two dependent samples. For instance, for the UVFP group, in the sustained vowel context (pretreatment condition), the correlation between the CSID and the AVQI v. 2.02 (ie, r = 0.92) was significantly higher than with the AVQI v. 3.01 (r = 0.85) (ie, 0.92 > 0.85, P <0 .05). Abbreviations: UVFP, unilateral vocal fold paralysis; ADSD, adductor spasmodic dysphonia; PMTD, primary muscle tension dysphonia; BVFL, benign vocal fold lesion; MF, mutational falsetto; PRES, presbylaryngis.

ARTICLE IN PRESS 6

Journal of Voice, Vol. ■■, No. ■■, 2017

(and associated t-statistics) in Table 2. Inspection of the significance patterns revealed that the AVQI v. 2.02 tended to correlate more strongly with the CSID-estimated severity for the sustained vowel context, whereas the AVQI v. 3.01 tended to correlate more strongly with the CSID estimate for connected speech, at both pre- and posttreatment. Although there were moderately strong correlations observed between both versions of the AVQI and the CSID, there were exceptions noted. For instance, at posttreatment for sustained vowel and connected speech, the PMTD group showed no significant correlations between the CSID-estimated severity and either version of the AVQI, a pattern also observed for the MF group for the connected speech context only (see bolded correlation coefficients in Table 2, which indicate no significant association between the AVQI and the CSID). Finally, to further examine the influence of diagnostic category on the association between each of the index measures, we used a regression model with interaction effects. For the sustained vowel at pretreatment, the strength of the relationship between both versions of the AVQI and the CSID was not significantly associated with diagnostic category, based on nonsignificant tests of the interaction in the regression models. This was also true for the connected speech context. However, for the sustained vowel context at posttreatment, the relationship between the two iterations of the AVQI and the CSID was dependent on diagnostic category (interaction effect P = 0.0231 for v. 2.02 and P = 0.0840 for v. 3.01). The PMTD group was observed to have a much weaker level of association than the other groups. For the connected speech context, the interaction terms at posttreatment were significant for both versions of the AVQI (P = 0.0231 for v. 2.02 and P = 0.0293 for v. 3.01). The PMTD and MF diagnostic categories showed much weaker levels of association as compared with the other groups. Comparing the performance of the CSID and the two versions of the AVQI Multiple regression models were estimated to determine which acoustic index could best predict the auditory-perceptual ratings

provided by listeners. Recall that dysphonia severity estimates were generated using each acoustic index for the sustained vowel, connected speech, and combined speech contexts across six diagnostic categories for pre-and posttreatment conditions. To provide summary CSID and listener ratings for the combined context, each CSID- and listener-estimated severity for an individual’s sustained vowel and connected speech sample was averaged. The concatenated sample, however, was used to generate the AVQI-estimated severity for the combined context (Table 1). Table 3 reports the partial R2 values associated with each individual acoustic index. The partial R2 value represents the proportion of residual variance in listener-estimated severity explained by the specific acoustic index, independent of age, sex, and diagnostic category. Inspection of Table 3 reveals that for the sustained vowel, connected speech, and combined contexts the CSID accounted for greater residual variance in the listenerperceived severity ratings (after adjusting for age, sex, and diagnostic category) as compared with both versions of the AVQI. This is evidenced by noticeably larger partial R2 values for the CSID, especially in the pretreatment condition, and as compared with AVQI v. 2.02. Furthermore, at pretreatment, the AVQI v. 3.01 performed better than v. 2.02 for the sustained vowel, connected speech, and combined context (Table 3). However, at posttreatment, the AVQI v. 2.02 performed better than v. 3.01 for the sustained vowel. For the connected speech and combined context, the two versions of the AVQI performed similarly. In addition to the above regression analyses, multiple regression was used to assess the combined influence of the two versions of the AVQI and the CSID on listener-perceived dysphonia ratings, adjusted for age, sex, and diagnostic category (Table 4). In these models, which simultaneously estimated the association between listener ratings and the three acoustic indices, the CSID consistently explained most of the variation in listener ratings (Table 4). With the CSID in the model, the two versions of the AVQI either became insignificant or had a very minor influence on explaining the variability in listener ratings, with no more than 0.0717 in any given model. Despite the AVQI measures being

TABLE 3. Associations Between Listener Ratings and the AVQI v. 2.02, AVQI v. 3.01, and the CSID Estimates at Pre- and Posttreatment for the Sustained Vowel, Connected Speech, and Combined Context AVQI v. 2.02 Partial R2 Sustained vowel Connected speech Combined context

0.3924 0.2121 0.4060

Sustained vowel Connected speech Combined context

0.5548 0.3727 0.6082

Pr > F

AVQI v. 3.01 Partial R2

Pretreatment 0.3888 0.3603 0.5278 Posttreatment <0.0001 0.4318 <0.0001 0.4222 <0.0001 0.5979 <0.0001 <0.0001 <0.0001

CSID

Pr > F

Partial R2

Pr > F

<0.0001 <0.0001 <0.0001

0.6565 0.4526 0.6519

<0.0001 <0.0001 <0.0001

<0.0001 <0.0001 <0.0001

0.6921 0.4353 0.7083

<0.0001 <0.0001 <0.0001

Notes: The regression model was adjusted for age, sex, and diagnostic category. For each voice context examined, the partial R2 values reported for the CSID were uniformly stronger as compared with the two versions of the AVQI (for both pre- and posttreatment).

ARTICLE IN PRESS Jeong Min Lee, et al

7

Comparison of the AVQI and the CSID

TABLE 4. Associations Using Multiple Regression Models Between Listener Ratings and the AVQI v. 2.02, AVQI v. 3.01, and the CSID Estimates at Pre- and Posttreatment for the Sustained Vowel, Connected Speech, and Combined Context AVQI v. 2.02 Partial R2

Pr > F

Sustained vowel Connected speech Combined context

0.0000 0.0717 0.0111

0.9005 <0.0001 0.0461

Sustained vowel Connected speech Combined context

0.0159 0.0094 0.0272

AVQI v. 3.01 Partial R2

Pretreatment 0.0025 0.0077 0.0087 Posttreatment 0.0133 0.0075 0.1321 0.0236 0.0008 0.0033

CSID

Pr > F

Partial R2

Pr > F

0.3667 0.1766 0.0816

0.6500 0.4357 0.6379

<0.0001 <0.0001 <0.0001

0.0795 0.0194 0.2046

0.6921 0.4350 0.7083

<0.0001 <0.0001 <0.0001

Notes: The regression models were adjusted for age, sex, and diagnostic category. Partial R2 values indicate the amount of unique variance accounted for in listener-perceived severity by each acoustic index.

included in this model, the partial R2 values for the CSID remained relatively unchanged as compared with the findings reported in Table 3. Table 5 shows more precisely the potential influence of diagnostic category on the performance of acoustic index. The level of association between listener dysphonia severity ratings and each index measure varied by diagnostic category, after adjusting for age and sex (Table 5). Although the CSID tended to have the greatest level of agreement with the listener ratings, for some diagnostic categories one of the AVQI versions had a higher level of agreement. For example, for the sustained vowel at pretreatment, listener-perceived dysphonia rating was more strongly related to the AVQI v. 2.02 for the UVFP group, and listener ratings had a higher level of association with the AVQI v. 3.01 for patients with MF. In contrast, some diagnostic categories had a poor level of agreement for all of the index measures. For instance, at posttreatment the listener-perceived severity rating was poorly associated with each of the index measures in the PMTD group for the sustained vowel, connected speech, and combined context. In addition, the MF group displayed poor association with all of the acoustic indices for the connected speech context. Responsiveness to change To compare how accurately the AVQI and the CSID tracked treatment-related changes, the change in the listener-perceived severity rating from pre- to posttreatment was regressed onto the acoustically estimated change for both versions of the AVQI and the CSID (Table 6). Significant associations were observed, but the strength of the relationships was greatest between change in listener-perceived dysphonia rating and change in the CSID for each of the diagnostic categories. That is, after adjusting for age, sex, and diagnostic category, the partial R2 values for the CSID were uniformly higher as compared with the two versions of the AVQI. Multiple regression analysis was also used to assess the combined influence of changes from pre- to posttreatment in the two versions of the AVQI (v. 2.02 and v. 3.01) and the CSID on change in listener ratings, adjusted for age, sex, and diagnostic category.

Inspection of Table 7 reveals that in the models simultaneously estimating the association between pre- to posttreatment changes observed on each acoustic index and changes observed in listener ratings, the CSID consistently explained most of the variation in listener-perceived change following treatment. Again, with the CSID in the model, the two versions of the AVQI either became insignificant or had a very minor influence on explaining the variability in listener ratings, no more than 0.032 in any given model. In spite of the inclusion of the two AVQI measures in this model, the partial R2 values for the CSID remained very similar to results shown in Table 6, and were uniformly higher as compared with the two versions of the AVQI. To provide improved precision, Table 8 was constructed to identify the specific effect of diagnostic category on the association between changes in listener ratings and the acoustic indices (adjusted for age and sex). Inspection of Table 8 reveals that for the sustained vowel context, the CSID was more sensitive to changes in listener-perceived ratings after treatment than the two versions of the AVQI across each diagnostic category, except in the PRES group, wherein the AVQI v. 2.02 was more sensitive. For connected speech, the CSID was more sensitive to listener-estimated change than both versions of the AVQI, with the exception of ADSD and PRES groups, wherein the AVQI v. 3.01 was more sensitive. Finally, for the combined context, the CSID was more sensitive to change in listener-perceived severity ratings across all diagnostic categories as compared with both versions of the AVQI, with the exception of the PRES group, wherein the AVQI v. 3.01 was more sensitive. DISCUSSION The purpose of this study was to compare the performance of two popular, multiparameter acoustic indices of dysphonia severity using a common corpus of voice samples from a variety of diagnostic categories. Although both the CSID and the AVQI include cepstral and spectral parameters, the algorithms that generate the dysphonia severity estimates differ in important ways. Despite these differences, the results of this study confirmed a

8 TABLE 5. Associations Between Listener Ratings and Two Versions of the AVQI in the Sustained Vowel, Connected Speech, and Combined Context Within Each Diagnostic Category Compared With the CSID Estimates at Pre- and Posttreatment

ARTICLE IN PRESS

Journal of Voice, Vol. ■■, No. ■■, 2017

Note: The regression models are adjusted for age and sex. Shaded partial R2 values indicate instances wherein the CSID outperformed the two versions of AVQI. Bolded partial R2 values indicate instances wherein at least one version of the AVQI outperformed the CSID. Abbreviations: UVFP, unilateral vocal fold paralysis; ADSD, adductor spasmodic dysphonia; PMTD, primary muscle tension dysphonia; BVFL, benign vocal fold lesion; MF, mutational falsetto; PRES, presbylaryngis.

ARTICLE IN PRESS Jeong Min Lee, et al

9

Comparison of the AVQI and the CSID

TABLE 6. Associations Between Listener Rating Change (Pre Minus Post) and the AVQI v. 2.02, AVQI v. 3.01, and the CSID Change (Pre Minus Post) for the Sustained Vowel, Connected Speech, and Combined Context AVQI v. 2.02

Sustained vowel Connected speech Combined context

AVQI v. 3.01

CSID

Partial R2

Pr > F

Partial R2

Pr > F

Partial R2

Pr > F

0.4888 0.3204 0.5617

<0.0001 <0.0001 <0.0001

0.4629 0.4703 0.6532

<0.0001 <0.0001 <0.0001

0.6631 0.5898 0.7387

<0.0001 <0.0001 <0.0001

Note: The regression model was adjusted for age, sex, and diagnostic category.

TABLE 7. Associations Using Multiple Regression Models Between Listener Rating Change (Pre Minus Post) and the AVQI v. 2.02, AVQI v. 3.01, and the CSID Change (Pre Minus Post) for the Sustained Vowel, Connected Speech, and Combined Context AVQI v. 2.02

Sustained vowel Connected speech Combined context

Partial R2 0.0165 0.0320 0.0032

AVQI v. 3.01

Pr > F

Partial R2

0.0207 0.0035 0.2341

0.0000 0.0115 0.0207

CSID

Pr > F

Partial R2

Pr > F

0.9355 0.0870 0.0038

0.6625 0.5772 0.7257

<0.0001 <0.0001 <0.0001

Note: The regression models were adjusted for age, sex, and diagnostic category.

strong association between the CSID and the AVQI (both versions). However, inspection of partial R2 values revealed that the CSID more accurately predicted listener-estimated dysphonia as compared with the two versions of the AVQI. That is, the CSID appeared to outperform both versions of the AVQI across all speech contexts (ie, sustained vowel, connected speech, and combined or concatenated contexts). Although some variation existed among diagnostic categories, the more recent version of the AVQI (v. 3.01) as compared with its predecessor appeared to provide improved performance, particularly in the connected speech context. In the following section, the performance of each index will be discussed in more detail. Correlations between the two versions of the AVQI (v. 2.02 and v. 3.01) and the CSID The results of the Pearson product-moment correlations revealed a strong association between the two iterations of the AVQI and the CSID across all voice contexts. This association confirms that despite differences between the algorithms and analysis protocols (eg, concatenated vs. segregated analysis of sustained vowel and connected speech samples), a strong relationship exists and is most likely related to the contribution of the CPPS as a common denominator.4,15 In previous research, the CPPS has been shown to contribute disproportionately to the CSID estimate for sustained vowel and connected speech samples.21 The CPPS also plays a principal role in the AVQI algorithms.15 Furthermore, the CPPS values obtained from both the ADSV and Praat have been highly correlated (r = 0.92, P < .001 in sustained vowel and r = 0.96, P < .001 in connected speech contexts using English), so the results of our study are not entirely unexpected.4,26

The current study revealed that the early version of the AVQI (v. 2.02) showed a stronger relationship with the CSID for the sustained vowel context as compared with the newest version of the AVQI (v. 3.01). As the AVQI has evolved, the algorithm was modified ostensibly to optimize its performance when the duration of the connected speech portion (within the concatenated sample) was extended from 17 to 34 syllables. This increase in the number of syllables analyzed was intended to ensure that the duration or contribution of the connected speech portion (after voiced segment extraction) would more closely approximate the 3-second duration of the sustained vowel portion. The AVQI v. 2.02 was validated and based on a shorter sample (ie, 17 syllables) of connected speech, and after extraction of voiced segments, the sustained vowel portion was overrepresented in the final concatenated sample, thus contributing disproportionately to the AVQI v. 2.02 estimate for an individual speaker. Although the newest version (ie, AVQI v. 3.01) is associated with improved performance in the context of connected speech, it appears, based on our analysis of the segregated voice contexts, that this improvement may have been accomplished at the expense of accuracy in estimating severity for sustained vowels. Our results emphasize two important points pertaining to the AVQI: (1) clinicians need to be aware of and stipulate which version of the AVQI severity measure they are using (v. 2.02 versus v. 3.01), and (2) clinicians need to take into account the possible influence of the length (ie, number of syllables) of the connected speech sample they are appending to the sustained vowel sample. Clinicians should ensure that they have a sufficiently long sample of connected speech if the AVQI (v. 3.01) is used as the dysphonia severity assessment tool. This issue is further discussed in the section on the limitations of the study.

ARTICLE IN PRESS 10

Journal of Voice, Vol. ■■, No. ■■, 2017

TABLE 8. Associations between Listener-Rated Change (Pre Minus Post) and the AVQI v. 2.02-, AVQI v. 3.01-, and CSID-Estimated Change (Pre Minus Post) for the Sustained Vowel, Connected Speech, and Combined Context According to Diagnostic Category

Notes: The regression model was adjusted for age and sex. Shaded partial R2 values highlight instances wherein the CSID accounted for more variance in listener-perceived change in dysphonia severity following treatment as compared with both versions of the AVQI. Bolded values indicate instances wherein at least one version of the AVQI accounted for more variance than the CSID in listener-perceived change in severity following treatment. Abbreviations: UVFP, unilateral vocal fold paralysis; ADSD, adductor spasmodic dysphonia; PMTD, primary muscle tension dysphonia; BVFL, benign vocal fold lesion; MF, mutational falsetto; PRES, presbylaryngis.

Performance of the AVQI (v. 2.02 and v. 3.01) and the CSID across diagnostic category and sensitivity to change from pre- to posttreatment Although both the AVQI and the CSID share CPPS as a prime constituent in their respective algorithms, the current results indicate that CSID-estimated dysphonia severity was more closely related to listener-based estimates, regardless of the voice context. A moderately strong association between the CSID-estimated severity and listener-perceived severity across all speech contexts was observed. Comparison of the partial R2 values revealed that despite improvements to the AVQI, the proportion of total variability of the listener-perceived ratings accounted for by the CSID was appreciably higher (often >20%), as compared with the two iterations of the AVQI. In light of the present findings, clinicians and researchers should weigh a number of factors, including the cost and availability of each tool (ie, the AVQI is free) versus how much measurement imprecision is acceptable. In general, although the CSID outperformed the AVQI, there were a few exceptions based on diagnostic category. Of the six diagnostic categories, the diagnosis that showed the lowest partial R2 in CSID was the PRES group. The superior performance of the AVQI in PRES is likely due to the acoustic

parameters included in its algorithms. Recently, shimmer local (SL) and shimmer local dB (SLdB) were observed to be significantly elevated in hypofunctional voices associated with presbyphonia.27 Thus, the inclusion of time-based measures such as shimmer local and shimmer local dB in the AVQI may have contributed to the its improved predictive performance in the PRES group only. Differences in the patterns of association in the PMTD and MF were also observed. For both versions of the AVQI and the CSID, weak associations were seen between listener-perceived severity (in posttreatment samples) in connected speech and sustained vowel for PMTD. This is likely because many of the PMTD posttreatment samples for connected speech and sustained vowel were consistently rated close to normal by the listeners. The absence of variability in posttreatment ratings likely attenuated the ability to detect associations and contributed to the nonsignificant partial R2 values for this group (regardless of the acoustic index employed). Both indices also showed weak associations with severity ratings for the MF category, which might be related to the relatively small sample size and the absence of gender and F0 as variables to be considered in the connected speech algorithms.

ARTICLE IN PRESS Jeong Min Lee, et al

Comparison of the AVQI and the CSID

LIMITATIONS AND SUGGESTIONS FOR FUTURE RESEARCH We acknowledge that a “direct” comparison of the AVQI and the CSID is impossible given the fundamental differences in the way each measure generates its respective severity estimates (ie, separate analysis of sustained vowel and connected speech samples in the case of the CSID versus a concatenated sample used to generate a single severity estimate in the AVQI). Because the AVQI applies a single analysis algorithm, regardless of the voice context, we reasoned that for comparison purposes it was appropriate to modify the AVQI script in Praat to generate separate estimates for sustained vowel and connected speech. To allow for a comparison between the concatenated estimates in the AVQI, we simply averaged the CSID and listener ratings from both speech samples (eg, sustained vowel and connected speech). This produced an overall mean for CSID- and listener-estimated severity ratings for the combined context. However, listeners never rated a truly concatenated (combined) voice sample, and the possibility exists that the performance of the AVQI may have been better within this type of combined context. The CSID was originally validated using a VAS rating from 0 to 100, whereas the AVQI was validated using the “Grade” from the GRBAS scale,2,8,14,15,23,25,28,29 which is an ordinal scale with increasing severity from 0 to 3. In this study, we used a VAS as in the original CSID validation studies. Thus, it is possible that our results may have differed if we had used “Grade”based listener-ratings as our dependent measure. Given the relatively strong correlations observed between the CSID and the AVQI, we acknowledge the potential influence of multicollinearity when both of these variables were included at the same time in our multiple regression model. However, the strikingly similar partial R2 values obtained when the CSID was entered independently or in a combined or simultaneous fashion with the AVQI suggest that the superiority of the CSID is not simply an artifact of collinearity. We used the two middle sentences from the “Rainbow Passage” (ie, 50 syllables) as the connected speech sample to generate the AVQI and the CSID, instead of the 17 syllables for AVQI (v. 2.02) and 34 syllables for AVQI (v.301), as suggested by Barsties and Maryn.23,25 This decision was made to match the identical sample on which listeners rendered their judgment of severity (ie, the full 50 syllables of the “Rainbow Passage”). Because we segregated the AVQI into separate analyses of sustained vowel and connected speech, this difference in length of the connected speech sample should have no significant influence on the results for our segregated analysis. However, the length of the connected speech sample may have influenced our AVQI results for the concatenated sample. To assess the potential influence of a longer sample length on the analysis of the concatenated sample, we computed correlations between AVQI v. 3.01 estimates using 34 syllables from the “Rainbow Passage” (recommended) versus 50 syllables (we used) and found them to be strongly correlated r = 0.992 (P < .0001). Therefore, it appears that our result of superiority of the CSID over the AVQI, even when combining the two contexts, is not simply an artifact of the number of syllables analyzed.

11

CONCLUSIONS The results from this study confirm that the CSID and AVQI are strongly correlated, and both appear to provide reasonable estimates of dysphonia severity. However, associations observed between the CSID- and listener-estimated dysphonia were almost uniformly stronger as compared with both versions of the AVQI, irrespective of voice context and diagnostic category. Thus, the CSID outperformed the AVQI (v. 2.02 and v. 3.01), and clinicians should weigh these findings within the context of differences in cost and availability when deciding which acoustic index to use. APPENDIX A A customized segregation procedure was applied within Praat to permit a comparison between the two versions of the AVQI with the CSID. Because the CSID provides separate severity estimates for sustained vowel and connected speech contexts, the following procedures were developed to override the default AVQI script (v. 2.02 and v. 3.01) which concatenates a 3-second sustained vowel sample with a connected speech sample for the purpose of generating a composite AVQI severity rating. To accomplish this segregation procedure in Praat v. 6.0.21, the AVQI v. 2.02 and v. 3.01 scripts were used14,23 . To permit a segregated analysis, a silenced sound file was established to serve as a placeholder for either the sustained vowel or the connected speech sample. The procedure for sustained vowel was as follows: (1) Praat v. 6.0.21 was opened and in the “Praat Objects” window, “Praat” was selected from the menu bar, and the drop-down menu “Open Praat script. . .” option was selected. The .txt file for the specific AVQI script (v. 2.02 or v. 3.01) was uploaded. (2) Once the script was uploaded, from the menu bar, “Run” was selected and “Run” was chosen from the dropdown menu. (3) When the “Run script: Acoustic Voice Quality Index (v. 2.02 or v. 3.01)” window appeared, the “illustrated” version of the AVQI was selected and the window was left open before uploading a voice sample file. (4) From the “Praat Objects” window, from the menu bar, “Open” was selected, and from the drop down menu, “Read from file. . .” was chosen. (5) The sustained vowel sample to be analyzed was located from the appropriate folder and this sample was selected/ uploaded by double-clicking the file. (6) In the “Praat Objects” window, the “Rename. . .” tab (on the lower left part of the window) was selected. When the “Rename object” window appeared, the file name was changed to “sv” as prescribed by the AVQI script, and the “OK” button was clicked. (7) From the “Praat Objects” window, the highlighted file labeled “Sound sv” was copied by selecting the “Copy. . .” tab on the lower left of the “Praat Objects” window. (8) From the “Copy object” window, the “sv” object was renamed “cs” and the “OK” button was selected.

ARTICLE IN PRESS 12 (9) From the “Praat Objects” window, the highlighted “Sound cs” file was chosen and the “View & Edit” tab (on the right side of the window) was selected, and the waveform of the file was automatically displayed. (10) The entire waveform (upper window) was selected by clicking and dragging from the far left to the far right. And, from the menu bar, “Edit” was selected and “set selection to zero” from the drop-down menu was chosen. This resulted in silencing of the “Sound cs” file. The “Sound cs” window was then closed. (11) The researcher returned to the “Run script: Acoustic Voice Quality Index (v. 2.02 or v. 3.01)” window and “Apply” tab on the bottom of the window was selected and the warning notification was acknowledged by selecting “OK.” (12) The segregated sustained vowel sample and associated acoustic measures including the AVQI estimate for this individual sustained vowel production was then automatically displayed in “Praat Pictures” window. Likewise, to create a separate AVQI estimate for the connected speech sample, a customized segregation procedure was also developed. A silenced sound file was established to serve as a placeholder for the sustained vowel sample, thereby permitting analysis of the connected speech sample only. Thus, to estimate the AVQI for the connected speech sample only, the following steps were applied: (1) Praat v. 6.0.21 was opened and in the “Praat Objects” window, “Praat” was selected from the menu bar, and the drop-down menu “Open Praat script. . .” option was selected. The .txt file for the specific AVQI script (v. 2.02 or v. 3.01) was uploaded. (2) Once the script was uploaded, from the menu bar, “Run” was selected and “Run” was chosen from the drop down menu. (3) When the “Run script: Acoustic Voice Quality Index (v. 2.02 or v. 3.01)” window appeared, the “illustrated” version of the AVQI was selected and the window was left open before uploading a voice sample file. (4) From the “Praat Objects” window, from the menu bar, the “Open” option was selected, and from the drop down menu, “Read from file. . .” was chosen. (5) The connected speech sample to be analyzed was located from the appropriate folder and this sample was selected/ uploaded by double-clicking the file. (6) In the “Praat Objects” window, the “Rename. . .” tab (on the lower left part of the window) was selected. When the “Rename object” window was open, the file name was changed to “cs” as prescribed by the AVQI and the “OK” button was chosen. (7) From the “Praat Objects” window, the highlighted file labeled “Sound cs” was copied by selecting the “Copy. . .” tab on the lower left the “Praat Objects” window. (8) When the “Copy object” window appeared, the “cs” object was renamed “sv” as prescribed by the AVQI script, and the “OK” button was selected.

Journal of Voice, Vol. ■■, No. ■■, 2017

(9) In the “Praat Objects” window, the highlighted “Sound sv” file was chosen and the “View & Edit” tab (on the right side of the window) was selected, and the waveform of the file was automatically displayed. (10) From the “Sound sv” window, the entire waveform (upper window) was selected by clicking and dragging from the far left to the far right. And, from the menu bar, “Edit” was selected and “set selection to zero” from the drop-down menu was chosen. This resulted in silencing of the “Sound sv” file. (11) Continuing in the “Sound sv” window, from the menu bar “Select” and the drop-down menu “Select. . .” was chosen to designate the appropriate starting and ending trimming point of the sound file. From the “Select” window, the start of selection was entered “0.0010” and the duration for the end of selection was entered by observing the “Total duration—seconds” at the bottom of the “Sound sv” window, and the “Apply” button was then clicked. The waveform was automatically selected by leaving the previous 0.001 seconds. (12) In the “Sound sv” window, from the menu bar, “Edit” was selected and “Cut” from the drop-down menu was chosen. This resulted in trimming the length of the “Sound sv” file, and the “Sound sv” window was then closed. (13) From the “Run script: Acoustic Voice Quality Index (v. 2.02 or v. 3.01)” window, “Apply” button on the bottom of the window was selected. (14) The segregated connected speech sample, spectrogram of extracted voiced segments of connected speech, and associated acoustic measures including the AVQI estimate for this individual’s connected speech production was automatically displayed. To confirm that the duration of the extracted voiced segments portion of connected speech from the concatenated file matched the length of extracted voiced segments from the segregated connected speech sample, the researcher compared the previously recorded duration of voiced extracted segments (from connected speech portion on the concatenated sound file) and the duration from the segregated connected speech sample analysis. REFERENCES 1. Maryn Y, Roy N, De Bodt M, et al. Acoustic measurement of overall voice quality: a meta-analysis. J Acoust Soc Am. 2009;5:2619–2634. 2. Awan SN, Roy N. Outcomes measurement in voice disorders: application of an acoustic index of dysphonia severity. J Speech Hear Res. 2009;52:482– 499. 3. Peterson EA, Roy N, Awan SN, et al. Toward validation of the cepstral index of dysphonia (CSID) as an objective treatment outcomes measure. J Voice. 2013;27:401–410. 4. Watts CR, Awan SN, Maryn Y. A comparison of cepstral peak prominence measures from two acoustic analysis programs. J Voice. 2016;31:387, e1-387.e10. 5. Hillenbrand J, Cleveland RA, Erickson RL. Acoustic correlates of breathy vocal quality. J Speech Hear Res. 1994;37:769–778. 6. Hillenbrand J, Houde RA. Acoustic correlates of breathy vocal quality: dysphonic voices and continuous speech. J Speech Hear Res. 1996;39:311– 321.

ARTICLE IN PRESS Jeong Min Lee, et al

Comparison of the AVQI and the CSID

7. Awan SN, Roy N. Acoustic prediction of voice type in women with functional dysphonia. J Voice. 2005;19:268–282. 8. Awan SN, Roy N. Toward the development of an objective index of dysphonia severity: a four-factor acoustic model. Clin Linguist Phon. 2006;20:35–49. 9. Awan SN, Roy N, Dromey C. Estimating dysphonia severity in continuous speech: application of a multi-parameter spectral/cepstral model. Clin Linguist Phon. 2009;23:825–841. 10. Awan SN, Helou LB, Stojadinovic A, et al. Tracking voice change after thyroidectomy: application of spectral/cepstral analyses. Clin Linguist Phon. 2011;25:302–320. 11. Awan SN, Solomon NP, Helou LB, et al. Spectral-cepstral estimation of dysphonia severity: external validation. Ann Otol Rhinol Laryngol. 2013;122:40–48. 12. Heman-Ackah YD, Michael DD, Goding G. The relationship between cepstral peak prominence and selected parameters of dysphonia. J Voice. 2002;16:20–27. 13. Heman-Ackah YD, Heuer RJ, Michael DD, et al. Cepstral peak prominence: a more reliable measure of dysphonia. Ann Otol Rhinol Laryngol. 2003;112:324–333. 14. Maryn Y, De Bodt M, Roy N. The Acoustic Voice Quality Index: toward improved treatment outcomes assessment in voice disorders. J Comm Disord. 2010;43:161–174. 15. Maryn Y, Corthals P, Van Cauwenberge P, et al. Toward improved ecological validity in the acoustic measurement of overall voice quality: combining continuous speech and sustained vowels. J Voice. 2010;24:540–555. 16. Maryn Y, Weenink D. Objective dysphonia measures in the program Praat: smoothed cepstral peak prominence and acoustic voice quality index. J Voice. 2015;29:35–43. 17. Watts CR, Awan SN, Marler JA. An investigation of voice quality in individuals with inherited elastin gene abnormalities. Clin Linguist Phon. 2008;22:199–213.

13

18. Watts CR, Awan SN. Use of spectral/cepstral analyses for differentiating normal from hypofunctional voices in sustained vowel and continuous speech contexts. J Speech Hear Res. 2011;54:1525–1537. 19. Watts CR. The effect of CAPE-V sentences on cepstral/spectral acoustic measures in dysphonic speakers. Folia Phoniatr Logop. 2015;67:15–20. 20. Watts CR, Awan SN. An examination of variations in the cepstral spectral index of dysphonia across a single breath group in connected speech. J Voice. 2015;29:26–34. 21. Awan SN, Roy N, Jetté ME, et al. Quantifying dysphonia severity using a spectral/cepstral-based acoustic index: comparisons with auditory-perceptual judgements from the CAPE-V. Clin Linguist Phon. 2010;24:742–758. 22. Awan SN. Analysis of Dysphonia in Speech and Voice (ADSV): An Application Guide. Montvale, NJ: Pentax Medical; 2011:4. 23. Barsties B, Maryn Y. External validation of the Acoustic Voice Quality Index version 03.01 with extended representativity. Ann Otol Rhinol Laryngol. 2016;125:571–583. 24. Boersma P, Weenink D. Praat: doing phonetics by computer Version 6.0.21 [Software]. Available at: http://www.fon.hum.uva.nl/praat/. Accessed September 25, 2016. 25. Barsties B, Maryn Y. The improvement of internal consistency of the Acoustic Voice Quality Index. Am J Otolaryngol. 2015;36:647–656. 26. Sauder C, Bretl M, Eadie T. Predicting voice disorder status from smoothed measures of cepstral peak prominence using Praat and analysis of dysphonia in speech and voice (ADSV). J Voice. 2017;doi:10.1016/j.jvoice.2017.01.006. In press. 27. Mezzedimi C, Francesco MD, Livi W, et al. Objective evaluation of presbyphonia: spectroacoustic study on 142 patients with Praat. J Voice. 2017;31:257, e25-257.e32. 28. Barsties B, Maryn Y. The influence of voice sample length in the auditoryperceptual judgment of overall voice quality. J Voice. 2017;31:202–210. 29. Hosokawa K, Barsities B, Iwahashi T, et al. Validation of the Acoustic Voice Quality Index in the Japanese language. J Voice. 2017;31:260, e1-260.e9.