Analysis of Polish Vowels of Tracheoesophageal Speakers

Analysis of Polish Vowels of Tracheoesophageal Speakers

ARTICLE IN PRESS Analysis of Polish Vowels of Tracheoesophageal Speakers Marzena Mie˛sikowska, Kielce, Poland Summary: Objectives/Hypothesis. The aim ...

520KB Sizes 0 Downloads 93 Views

ARTICLE IN PRESS Analysis of Polish Vowels of Tracheoesophageal Speakers Marzena Mie˛sikowska, Kielce, Poland Summary: Objectives/Hypothesis. The aim of this study was to determine the acoustical differences between normal and tracheoesophageal Polish speakers during Polish vowel production. Methods. Formant frequencies, namely, the first (F1) and second (F2) formant frequencies for 6 Polish vowels produced by 11 normal and 11 tracheoesophageal speakers, were analyzed using statistical analysis of variance and discriminant analysis. Results. Spectral analysis showed that the F1 and F2 values of Polish vowels produced by tracheoesophageal speakers were significantly higher than those produced by normal speakers, with the exception of the F2 value of /i/ produced by tracheoesophageal speakers. Analysis of variance showed significant differences between speeches based on the F1 and F2 formant frequencies. Discriminant analysis based on the formant frequencies for F1 and F2 exhibited 73.33% of the mean classification score for tracheoesophageal speakers and 96.36% for normal speakers. Conclusions. Tracheoesophageal speakers exhibit higher F1 and F2 formant frequencies, with the exception of the F2 value for the vowel /i/ than normal speakers. Discriminant analysis showed that the classification process for TE speech exhibits lower accuracy due to the poorer classification of the vowels /i/, /u/, and /y/. Key Words: Tracheoesophageal speech–Formant frequency–Discriminant analysis–Vowels–Classification.

INTRODUCTION Analysis of vowels based on formant characteristics has been of interest to many researchers. Various methods have been used to study vowels. Some vowels are better understood than others due to the “limit” positions of articulatory mechanism representation.1 Authors studying English vowels produced by laryngeal speakers have shown that children exhibit the highest values, women exhibit mid-level values, and men exhibit the lowest values of the first, the second, and the third formant frequencies (F1, F2, and F3).1 The differences in formant frequencies among these three groups of normal, laryngeal (NL) speakers have been attributed to vocal tract length.1,2 The effect of shortening vocal tract has been attributed to laryngectomy.3,4 F1 and F2 were found to be significantly higher in alaryngeal speech than in laryngeal speech. The explanation for the increased formant frequencies provided in some studies4–9 is that a reduction in the effective length of the vocal tract may account for these changes in formant frequencies. Finnish vowels were characterized with higher formant frequencies F1 and F2 in all vowels produced by alaryngeal speakers compared with NL speakers, with the exception of F1 for /u, o, e/.3 Among English esophageal (ES) speakers, formant frequencies F1 and F2 were found to be significantly higher in ES speakers than in NL speakers.4 For Dutch tracheoesophageal (TE) speakers, higher formant frequencies F1 and F2 in TE speakers than in NL speakers were reported.5 An additional explanation for changes in formant frequencies was that the back of the tongue might be slightly lowered due to the removal of the larynx.5 It was also Accepted for publication April 8, 2016. From the Kielce University of Technology, Aleja Tysia˛clecia Pan´stwa Polskiego 7, 25314 Kielce, Poland. Address correspondence and reprint requests to Marzena Mie˛sikowska, Kielce University of Technology, Aleja Tysia˛clecia Pan´stwa Polskiego 7, 25-314 Kielce, Poland. E-mail: [email protected] Journal of Voice, Vol. ■■, No. ■■, pp. ■■-■■ 0892-1997 © 2016 The Voice Foundation. http://dx.doi.org/10.1016/j.jvoice.2016.04.007

reported that the variation among TE speakers may be larger than among NL speakers because the anatomy of the voice source and the vocal tract both depend on the type and extent of the surgical intervention.5,8 Regarding Spanish vowels, TE speakers achieved higher F1 and F2 values compared with NL speakers, with the exception of F2 for the vowel /o/.6 According to other studies, it was suggested that TE and ES speakers articulate vowels with fronted and higher tongue positions relative to the tongue position in NL speakers.6 Higher F1 and F2 values were also reported in studies of Cantonese vowels.7 Formant frequency F3 values were also found to be significantly higher in English TE speakers8 and Mandarin ES speakers9 than in laryngeal speakers. Although, the changes in formant frequency values in laryngectomy population were attributed to chemoradiotherapy and postoperative complications.8 Another possible explanation provided in literature when investigating Russian vowels for the changes in formant frequencies was in paralinguistic conditions and the strong psychological preoperative stress that may induce abnormally high formant frequencies.10 Three laryngectomized patients produced vowels with the formant frequency values move closer to normal values 2 weeks after the operation, and two of them 2 years after the operation.10 Unfortunately, the findings of the study10 are supported by only a very few subjects observed. Discriminant analysis is present in the studies of vowels of laryngeal11 and alaryngeal12 speakers. Rosique et al12 analyzed the energy, bandwidth, and frequency of the four first formant frequencies F1, F2, F3, and F4 of the five Castilian vowels using an established phrase produced by TE, ES, and NL speakers. Discriminatory analysis affirmed that TE vocalization is not as similar to NL vocalization as the ES vocalization.12 The aim of the present study was to compare the formant frequencies F1 and F2 during the production of Polish vowels by Polish-speaking TE speakers versus NL speakers using statistical analysis of variance (ANOVA) and discriminant analysis. The analysis provided in this study will allow to compare Polish

ARTICLE IN PRESS 2

Journal of Voice, Vol. ■■, No. ■■, 2016

language with other languages with respect to formant frequencies and to consider the classification accuracy of TE vowels using discriminant analysis and formant frequencies F1 and F2. Alaryngeal speech is of interest both from the standpoint of the ongoing need to improve speech rehabilitation approaches for laryngectomy patients, and because the investigation modes of alaryngeal speech offers unique opportunities for examining the impact of altered voicing source parameters on speech production/acoustics. METHODS Participants Eleven male TE speakers from Holy Cross Cancer Center, Department of Head and Neck Surgery in Kielce, Poland, participated in the study. TE speakers ranged in age from 50 to 73 years, with a mean age of 63 years. Postoperation time ranged from 6 months to 4 years. All TE speakers were using the Provox2 (Atos Medical AB, Kraftgatan 8, 242 35 Hörby, Box 183, SE24222, Hörby, Sweden) prosthesis. Eleven male NL speakers participated in the study. NL speakers ranged in age from 47 to 64 years, with a mean age of 58 years. All speakers were native Polish.

Formant frequencies extraction method First (F1) and second (F2) formant frequencies were extracted automatically at the midpoint of each vowel instantiation using script written in Praat software (Praat Software, the authors: Paul Boersma and David Weenink, Phonetic Sciences, University of Amsterdam, Spuistraat 210, 1012VT Amsterdam, The Netherlands). The Praat formant extraction algorithm works by resampling the speech signal to a frequency of twice the maximum formant (a user-defined parameter in the algorithm). After this, preemphasis is applied, the signal is windowed with a Gaussianlike window, and the linear predictive coding (LPC) coefficients with the algorithm by Burg are computed. In this study, formant frequencies were also visually inspected using Praat software. Formant frequencies F1 and F2 were analyzed in the present study due to be clearly visible and strong present in spectrograms of vowels of TE speakers. Procedures To numerically evaluate the difference between TE and NL speakers in vowel production, the Euclidean distance (ED) between formant frequencies was calculated for each vowel with the following equation:

ED (v) = ( F1NL (v) − F1TE (v)) + ( F 2 NL (v) − F 2TE (v)) 2

Speech materials and recordings Speech materials consisted of six isolated vowels in Polish, which are presented in Table 1 in International Phonetic Alphabet (IPA) notation. The vowels presented in Table 1 were uttered an average of 10 times by each speaker in the order /a/, /a/, . . ., /a/, /e/, /e/ , . . ., /e/, /i/, /i/, . . ., /i/, /o/, /o/, . . ., /o/, /u/, /u/, . . .., /u/, and /y/, /y/, . . ., /y/. In the present study, isolated vowels were investigated to find out if in Polish-isolated vowels produced by TE speakers higher formant frequencies can be observed in comparison to NL speakers and how these changes affect vowel classification in TE and NL speakers. Speech recordings were made in an audiometric room in regular conditions with a digital recorder. Speakers were in a sitting position; the mouth-to-microphone distance ranged from 0.35 to 0.40 m. The speech sound was transmitted via an electret condenser microphone (Olympus Corporation, Head office: Shinjuku Monolith, 3-1 Nishi-Shinjuku 2-chome, Shinjuku-ku, Tokyo 1630914, Japan) with a 22-kHz sampling rate and a 16-bit signal resolution.

2

(1)

where v indicates vowels /a/, /e/, . . ., /y/. The obtained values for formant frequencies were analyzed using STATISTICA software (StatSoft, Inc., 2300 East 14th Street, Tulsa, OK. 74104, USA), including descriptive statistical analysis, one-way ANOVA, and discriminant analysis. ANOVA was performed with F1 and F2 as dependent variables and the NL and TE speaker groups as factors. Tukey post hoc analysis was conducted to test differences among group factor levels across the dependent variables. Discriminant analysis was performed with F1 and F2 as the independent (or entry) variables and the vowels produced by TE and NL speakers as the grouping variables. Discriminant analysis was applied to investigate the classification of vowels, especially to observe misclassifications. Two discriminant functions (namely, Root1 and Root2) were created due to the two entry variables used in the model. For the classification process of discriminant analysis, the classification functions as linear combination of entry variables were used. For every group, a separate linear combination function expressed by Equation (2) was introduced:

K i = cio + ci1F1 + ci 2 F 2

(2)

where cij is the coefficients of entrance variables. The sample was assigned to the group when it obtained the highest K i value.

TABLE 1. Six Polish Vowels—IPA Notation Vowel

IPA

/a/ /e/ /i/ /o/ /u/ /y/

/a/ /ɛ/ /i/ /ɔ/ /u/ /ɨ/

RESULTS Formant frequencies F1 and F2 The mean and standard deviation values of F1 and F2 of Polish vowels produced by TE and NL speakers are presented in Table 2. To provide a visual comparison between NL and TE vowels for F1 and F2, a plot of the mean values of F1 and F2 for vowels produced by TE and NL speakers is presented in Figure 1. For F1

ARTICLE IN PRESS Marzena Mie˛sikowska

3

Analysis of Polish Vowels of Tracheoesophageal Speakers

TABLE 2. Mean and Standard Deviation Values of F1 and F2 Formant Frequencies of Polish Vowels Produced by 11 TE and 11 NL Speakers NL Vowel

TE

Formant

Mean [Hz]

Standard Deviation

Mean [Hz]

Standard Deviation

F1 F2 F1 F2 F1 F2 F1 F2 F1 F2 F1 F2

699.19 1282.77 598.56 1565.14 344.99 2173.96 562.96 994.55 390.92 767.93 422.25 1727.50

42.67 294.22 40.26 100.29 25.47 235.65 24.08 146.14 22.24 46.12 35.85 147.89

743.06 1370.04 618.79 1770.47 403.58 2000.87 592.14 1136.30 478.73 1405.45 480.92 1999.55

87.52 235.88 53.27 199.42 69.88 518.39 46.75 259.93 69.08 740.47 50.84 251.21

/a/ /e/ /i/ /o/ /u/ /y/

FIGURE 1. Mean values of F1 and F2 of vowels produced by TE and NL Polish speakers.

and F2, higher values were obtained from vowels produced by TE speakers than NL speakers, with the exception of F2 for /i/. The ED values calculated using F1 and F2 for corresponding vowels produced by NL and TE speakers are shown in Table 3. As presented in Table 3, the highest value of ED was

TABLE 3. The Calculated Euclidean Distance between Formants of Corresponding Vowels Produced by NL and TE Speakers Euclidean Distance [Hz] between NL and TE vowels Vowel

/a/

/e/

/i/

/o/

/u/

/y/

ED [Hz] 97.67 206.33 182.74 144.72 643.54 278.30

obtained for the vowel /u/ (643.54), followed by the vowel /y/ (278.30). The smallest value was obtained for the vowel /a/ (97.67). On the basis of the mean F1 and F2 values of the corner vowels /a/, /i/, and /u/, the vowel spaces for NL and TE Polish speakers are presented in Figure 2. According to Figure 2, the vowel /u/ produced by TE speakers showed the largest distance from the vowel /u/ produced by NL speakers. ANOVA One-way ANOVA was performed to assess the significant differences in F1 and F2 among different speech groups. The results revealed significant main effects among TE and NL speaker groups for F1 and F2 (F1: F[1, 1318] = 48.54, p < 0.000001; F2: F[1, 1318] = 47.16, p < 0.000001). Tukey post hoc analysis was

ARTICLE IN PRESS 4

Journal of Voice, Vol. ■■, No. ■■, 2016

FIGURE 2. Vowel spaces for NL and TE Polish speakers. conducted to test differences among group factor levels in both dependent variables F1 and F2, and it showed significant differences for TE and NL speakers (p < 0.00001). Discriminant analysis of NL vowels Discriminant analysis performed for NL speech, which was based on F1 and F2, was significant (Wilks’ Lambda = 0.0088, approx. F[10, 1306] = 1263.28, p < 0.0001). Formant frequencies F1 and F2, which were used as independent variables, were significant (p < 0.01). The chi-squared test with successive roots removed was performed at the canonical stage, and it was significant for

both discriminant functions Root1 and Root2 used in the model (Root1: Eigenvalue = 16.83, canonical R = 0.97, Wilks’ Lambda = 0.0088, χ 2 = 3101.70, p < 0.01; Root2: Eigenvalue = 5.39, canonical R = 0.92, Wilks’ Lambda = 0.1566, χ2 = 1214.61, p < 0.01). The first discriminant function was responsible for 75% of the variance explained, and the second discriminant function was responsible for 25% of the variance explained. The scatter plot of canonical scores for NL speech is presented in Figure 3. Classification functions for NL vowels are presented in Table 4. The classification scores of NL vowels are presented in Table 5.

FIGURE 3. The scatter plot of canonical scores for NL speech.

ARTICLE IN PRESS Marzena Mie˛sikowska

5

Analysis of Polish Vowels of Tracheoesophageal Speakers

TABLE 4. The Classification Functions of NL and TE Vowels

ci Speech NL ci 0 ci1 ci 2 Speech TE ci 0 ci1 ci 2

K 1(a )

K 2 (e )

K 3 (i )

K 4 (o )

K 5 (u )

K 6 (y )

−243.778 0.637 0.030

−194.418 0.539 0.040

−119.725 0.294 0.062

−157.770 0.514 0.023

−78.322 0.356 0.018

−121.121 0.372 0.047

−68.582 0.176 0.002

−50.347 0.141 0.005

−27.616 0.085 0.009

−44.282 0.140 0.002

−31.008 0.109 0.004

−34.936 0.105 0.008

Discriminant analysis of TE vowels For TE speech, discriminant analysis was significant for both entry variables (Wilks’ Lambda = 0.1519, approx. F[10, 1306] = 204.46, p < 0.0001). Formant frequencies F1 and F2, which were used as independent variables, were significant (p < 0.01). The chi-squared test with successive roots removed was performed at the canonical stage, and it was significant for both discriminant functions Root1 and Root2 used in the model (Root1: Eigenvalue = 3.85, canonical R = 0.89, Wilks’ Lambda = 0.1519, χ 2 = 1234.24, p < 0.01; Root2: Eigenvalue = 0.36, canonical R = 0.51, Wilks’ Lambda = 0.7367, χ2 = 200.17, p < 0.01). The first discriminant function was responsible for 92% of the variance explained, and the second discriminant function was responsible for 8% of the variance explained. The scatter plot of canonical scores for TE speech is presented in Figure 4. Classification functions for TE vowels are presented in Table 4. The classification scores of TE vowels are presented in Table 6. DISCUSSION Formants ANOVA showed significant differences between NL and TE speeches in formant frequencies F1 and F2. Formant frequencies F1 and F2 were higher for TE speakers than for NL speakers, with the exception of F2 for the vowel /i/. The highest value of ED was obtained for the vowel /u/ (643.54), and the smallest for the vowel /a/ (97.67). Figure 2 confirms the large distance in vowels /u/ between speeches. Acoustic findings suggest that

a possible explanation for the increased formant frequencies in TE speakers could be that the vocal tract length serves as an important factor in determining the average formant positions, with shorter vocal tracts linked to higher formant frequencies; in addition, paralinguistic conditions may affect the formant frequencies. Discriminant analysis of NL vowels For NL speech, the mean classification score was 96.36%. The highest classification scores were for vowels /i/ and /u/ (100%), and the lowest scores were for vowels /a/ and /e/ (91.82%). Classification result was very high with some small exceptions. Vowel /a/ was misclassified as vowels /e/ (5.45%) and /o/ (2.73%). Vowel /e/ was misclassified as vowel /a/ (8.18%). Vowel /o/ was misclassified as vowels /a/ (1.82%) and /e/ (0.91%). Vowel /y/ was misclassified as vowel /i/ (2.73%). Discriminant analysis of TE vowels For TE speech, the mean classification score was 73.33%. The highest classification score was for the vowel /e/ (90.00%), and the lowest was for the vowel /y/ (50.00%). Vowels /a/ (87.27%), /e/ (90.00%), and /o/ (85.45%) exhibited higher classification scores than vowels /i/ (65.45%), /u/ (61.82%), and /y/ (50.00%). Vowel /a/ was misclassified as vowels /e/ (0.91%) and /o/ (11.82%). Vowel /e/ was misclassified as vowels /a/ (3.64%), u (2.73%), and /y/ (3.64%). Vowel /i/ was misclassified as vowels /a/ (0.91%), /u/ (20.00%), and /y/ (13.64%). Vowel /o/ was misclassified as vowels /a/ (9.09%), /e/ (0.91%), /u/ (3.64%),

TABLE 5. The Results of Classification of NL Vowels Based on NL Classification Functions Vowel

Speech

/a/ (NL)

/e/ (NL)

/i/ (NL)

/o/ (NL)

/u/ (NL)

/y/ (NL)

/a/ /e/ /i/ /o/ /u/ /y/ Mean:

NL NL NL NL NL NL NL

91.82% 8.18% 0.00% 1.82% 0.00% 0.00%

5.45% 91.82% 0.00% 0.91% 0.00% 0.00%

0.00% 0.00% 100.00% 0.00% 0.00% 2.73%

2.73% 0.00% 0.00% 97.27% 0.00% 0.00%

0.00% 0.00% 0.00% 0.00% 100.00% 0.00%

0.00% 0.00% 0.00% 0.00% 0.00% 97.27%

96.36%

Bold format indicates the percent of correct classified vowels /v/ to the corresponding vowel group /v/. Bold format was also used to present mean classification score for considered speech.

ARTICLE IN PRESS 6

Journal of Voice, Vol. ■■, No. ■■, 2016

FIGURE 4. The scatter plot of canonical scores for TE speech.

and /y/ (0.91%). Vowel /u/ was misclassified as vowels /e/ (10.00%), /i/ (8.18%), /o/ (10.00%), and /y/ (10.00%). Vowel /y/ was misclassified as vowels /e/ (6.36%), /i/ (30.00%), and /u/ (13.64%). The classification process of vowels produced by NL and TE speakers showed better classification of NL vowels. As presented in Figures 3 and 4, the NL vowel groups are better discriminated than the TE vowel groups. For TE speech, the vowel classification process was inferior for vowels /i/, /u/, and /y/. The vowel /i/ was identified as the vowel /u/ in 20.00% of all instances of /i/. The vowel /u/ was classified as the vowel /y/ in 10.00% of all instances of /u/. The vowel /y/ was classified as the vowel /i/ in 30.00% of all instances of /y/. This creates a vicious circle that indicates a problem with the classification of the vowels /i/, /y/, and /u/ in TE speech. The lowest classification accuracy of vowels /i/, /y/, and /u/ may suggest that after laryngectomy, patients might have problems in backing the tongue in its highest position. All of the mentioned vowels are produced from a high tongue position but different with respect

to the back /u/, middle /y/, and front /i/ positions. This classification process as well as the ED values can be compared with the vowel spaces presented in Figure 2, which shows the larger deviations between the formants of TE speech and NL speech for the vowel /u/. During the production of the vowel /u/, TE speakers might have difficulty in backing the tongue. However, this is only a speculation that requires further substantiation based on future research. CONCLUSIONS The present study examined the formant characteristics associated with Polish vowels produced by TE and NL Polish speakers. Formant frequencies F1 and F2 were obtained from vowels produced by TE and NL Polish speakers. TE speech was associated with significantly higher F1 and F2 values than NL speech, with the exception of F2 for the vowel /i/. Discriminant analysis showed that the classification process for TE speech exhibits lower accuracy due to the poorer classification of the vowels /i/, /u/, and /y/.

TABLE 6. The Results of Classification of TE Vowels based on TE Classification Functions Vowel

Speech

/a/ (TE)

/e/ (TE)

/i/ (TE)

/o/ (TE)

/u/ (TE)

/y/ (TE)

/a/ /e/ /i/ /o/ /u/ /y/ Mean:

TE TE TE TE TE TE TE

87.27% 3.64% 0.91% 9.09% 0.00% 0.00%

0.91% 90.00% 0.00% 0.91% 10.00% 6.36%

0.00% 0.00% 65.45% 0.00% 8.18% 30.00%

11.82% 0.00% 0.00% 85.45% 10.00% 0.00%

0.00% 2.73% 20.00% 3.64% 61.82% 13.64%

0.00% 3.64% 13.64% 0.91% 10.00% 50.00%

73.33%

Bold format indicates the percent of correct classified vowels /v/ to the corresponding vowel group /v/. Bold format was also used to present mean classification score for considered speech.

ARTICLE IN PRESS Marzena Mie˛sikowska

Analysis of Polish Vowels of Tracheoesophageal Speakers

REFERENCES 1. Peterson GE, Barney HL. Control methods used in a study of the vowels. J Acoust Soc Am. 1952;24:175–184. 2. Fant G. Acoustic Theory of Speech Production. The Hague: Mouton; 1960. 3. Kyttae J. Finnish oesophageal speech after laryngectomy: sound spectrographic and cineradiographic studies. Acta Otolaryngol Suppl. 1964;195:1–94. 4. Sisty NL, Weinberg B. Formant frequency characteristics of esophageal speech. J Speech Hear Res. 1972;15:439–448. 5. van As CJ, van Ravensteijn AMA, Koopmans-van Beinum FJ, et al. Formant frequencies of Dutch vowels in tracheoesophageal speech. Proceedings. 1997;21:143–153. 6. Cervera T, Miralles JL, González-Àlvarez J. Acoustical analysis of Spanish vowels produced by laryngectomized subjects. J Speech Lang Hear Res. 2001;44:988–996.

7

7. Manwa LN, Rhoda C. An acoustical and perceptual study of vowels produced by alaryngeal speakers of Cantonese. Folia Phoniatr Logop. 2009;61:97–104. 8. Kazi RA, Prasad VM, Kanagalingam J, et al. Assessment of the format frequencies in normal and laryngectomized individuals using linear predictive coding. J Voice. 2007;21:661–668. 9. Hanjun L, Manwa LN. Formant characteristics of vowels produced by mandarin esophageal speakers. J Voice. 2009;23:255–260. 10. Sorokin V, Olshansky V, Kozhanov L. Internal model in articulatory control: evidence from speaking without larynx. Speech Commun. 1998;25:249–268. 11. Hillenbrand J, Getty LA, Clark MJ, et al. Acoustic characteristics of American English vowels. J Acoust Soc Am. 1995;97:3099–3111. 12. Rosique M, Ramón JL, Canteras M, et al. Discriminant analysis applied to formants of vowels in Castellano dialect during the phonation with prosthesis and esophageal voice after total laryngectomy. Acta Otorrinolaringol Esp. 2003;54:361–366.