Psychological impression and listening score while listening to audio signal under meaningless steady noise

Psychological impression and listening score while listening to audio signal under meaningless steady noise

Applied Acoustics 64 (2003) 443–457 www.elsevier.com/locate/apacoust Psychological impression and listening score while listening to audio signal und...

252KB Sizes 0 Downloads 23 Views

Applied Acoustics 64 (2003) 443–457 www.elsevier.com/locate/apacoust

Psychological impression and listening score while listening to audio signal under meaningless steady noise Takahiro Tamesuea,*, Shizuma Yamaguchia, Tetsuro Saekia, Yuichi Katob a

Faculty of Engineering, Yamaguchi University, 2-6-11 Tokiwadai, Ube 755-8611, Japan b Interdisciplinary Faculty of Science and Engineering, Shimane University, 1060 Nishikawatsu-cho, Matsue 690-8504, Japan

Received 20 February 2002; received in revised form 23 August 2002; accepted 24 September 2002

Abstract The indices for evaluating psychological impression and listening score are first introduced in the case of listening to a Japanese monosyllabic audio signal while subjected to meaningless steady noise. Hereupon, the mutual relationship between the power spectrum level of audio signal and that of external noise is reflected in the above evaluation indices. Next, estimation and/or prediction problems of the psychological impression and listening score are discussed. The predicted values of the psychological impression and listening score are compared with the experimental observed data. A useful index is discussed, considering consistency between predicted and observed values. # 2002 Elsevier Science Ltd. All rights reserved. Keywords: Psychological impression; Listening score; Audio signal; Steady noise; Search method

1. Introduction One of the most fundamental means of information transmission is direct transfer of speech. For speech to be effective, it is important to have a comfortable sound environment in which the listener can concentrate on the speech without being distracted by external noise. Hence, to design a comfortable sound environment (such as when planning and selecting the sound insulation material for reduction of * Corresponding author. 0003-682X/03/$ - see front matter # 2002 Elsevier Science Ltd. All rights reserved. PII: S0003-682X(02)00106-8

444

T. Tamesue et al. / Applied Acoustics 64 (2003) 443–457

external noise and the volume adjustment of audio signal) it is very important to understand quantitatively the relationship between the audio signal and external noise, the psychological impression to noise and audio signal, and listening score (which indicates how well speech can be understood). We have considered psychological impression of external noise on subjects [1–5]. However, two aspects of the psychological impression to audio signal and the listening score have not been taken into consideration. Regarding the listening score, research on the relationship between syllable articulation (or word/sentence intelligibility) and the characteristics of noise has been carried out by a number of researchers and the results have been reported [6–9]. However, most of this research has placed emphasis on the audio signal from the point of view of perception and understanding, therefore attention has not been directed towards psychological impression of the noise and audio signal, which plays an important role in the creation of a comfortable sound environment. On the other hand, the psychological impressions of listeners to external noise and audio signal are strictly based on subjective judgment. Also it is fundamentally important to consider the ambiguity of language expression itself for sensory response, and the unavoidable ambiguity accompanying subjective judgment. With these points in mind, we used a fuzzy set theory to consider psychological impressions to external noise [1–5]. This paper focuses on the following three aspects while listening to a Japanese monosyllabic audio signal under the meaningless steady noise: annoyance caused by noise, speech audibility of the audio signal, and the listening score. First, we introduced the indices reflecting the mutual relationship between the power spectrum level of audio signal and that of external noise. Then, with respect to psychological impression, not only of noise but also of audio signal, we use the membership function based on the index to consider the listener’s psychological impression in terms of annoyance caused by noise, and speech audibility of the audio signal. Next, the results of estimation and/or prediction problems of the two psychological impressions and the listening score are compared with the observed data, using a psychological experiment. A practical index is selected to evaluate in common with the three aspects.

2. Outline of listening psychological experiments The outline of the indoor listening psychological experiments is as follows. 2.1. Experiment I In order to establish the initial membership function of the psychological impressions towards the noise and audio signal, and regression function of the listening score, Experiment I was conducted. 2.1.1. Subjects A total of 384 subjects, 337 male and 47 female students, all with normal hearing, participated in the psychological experiment.

T. Tamesue et al. / Applied Acoustics 64 (2003) 443–457

445

2.1.2. Presented sound 2.1.2.1. Audio signal. The audio signal presented comprised monosyllabic lists (the total number of lists was eight, each list containing 50 monosyllables) from a CD for the evaluation and fitting of hearing aids (TY-89) [10] was used. The value of peak sound pressure level was about 62 dB. The power spectra of each list were measured. No significant difference was recognized between the power spectrum forms of the eight cases. Therefore, a power spectrum of the audio signal was adopted by powermean value to each octave band. The power spectral form of the audio signal in reference to the overall value 62 dB is shown in Fig. 1. 2.1.2.2. External noise. White noise was passed through an octave band-path filter with center frequencies 63,125,. . ., 8000 Hz. The sound pressure level in each subject’s ears was adjusted to 52, 55, 62, 65, 72, 75 dB. 2.1.3. Measurement for psychological impression and listening score Both the audio signal and the noise were presented from a speaker to eight subjects to allow assessment of the psychological impressions of the subjects to the noise and audio signal, and the listening score while listening to the audio signal. To quantify the psychological evaluation of the noise, various psychological evaluation scales for external noise were conceivable. Here, we adopted the seven categorized psychological impressions Fi(i=1,2,. . .,7) proposed by Furihata and Yanagisawa

Fig. 1. Power spectrum level form of audio signal.

446

T. Tamesue et al. / Applied Acoustics 64 (2003) 443–457

[11] F1: Not at all annoying, F2: Not annoying, F3: Not too annoying, F4: Slightly annoying, F5: Annoying, F6: Very annoying, F7: Extremely annoying. On the other hand, the scale for the psychological evaluation of the audio signal adopted the following five categorized psychological impressions Ai(i=1,2,. . .,5) of speech audibility [12]: A1: Bad, A2: Poor, A3: Fair, A4: Good, A5: Excellent. Eight subjects participated in the psychological experiment simultaneously. They listened to the audio signal and filled in a form recording the monosyllables exactly as listened. In addition they completed the earlier two psychological evaluations Fi(i=1,2,. . .,7) of the noise and Ai(i=1,2,. . .,5) of the audio signal. This operation was carried out with the same subjects for an external noise condition. The subjects were given sufficient rest to avoid fatigue. 2.2. Experiment II In order to compare the observed values of the two psychological impressions for the noise and audio signal, and the listening score with the predicted values, Experiment II was conducted. 2.2.1. Subjects A total of 56 people, 49 male and seven female students, all with normal hearing participated in the psychological experiment. 2.2.2. Presented sound 2.2.2.1. Audio signal. The same audio signal as Experiment I. 2.2.2.2. External noise (a) Synthesized noise: Two types of octave band-limited white noise with different center frequencies (500, 1000 Hz) were synthesized with a power ratio of 1:1. The sound pressure level was adjusted to 55, 65, 75 dB. (b) Pseudo voice noise: Pseudo voice noise from a CD for the evaluation and fitting condition of hearing aids (TY-89). The sound pressure level was 55 dB. (c) Meaningless voice noise: Multi-talker noise from a CD for the evaluation and fitting of hearing aids (TY-89). The sound pressure level was 52, 62 dB. (d) No external noise: The power spectrum forms in the external noises of (a), (b), and (c) are shown in Fig. 2. The specific method of the psychological experiment is the same as that used in Experiment I.

3. Indices for evaluating the psychological impression and listening score 3.1. Introduction of indices The differences between the audio signal and noise (amplitude and frequency characteristics) were considered, taking into account the psychological impressions to the noise and audio signal, and the listening score. Following these seven indices

T. Tamesue et al. / Applied Acoustics 64 (2003) 443–457

447

Fig. 2. Power spectrum level form of external noise.

relevant to evaluation of the three aspects were introduced. This paper used not only traditional indices such as the well known, signal-to-noise ratio (SN, SNA, articulation index (AI) [13] (which is a measure of the speech-communication system’s potential intelligibility) and speech interference level (SIL) [14] (which indicates the required intensity of the speech signal at the listener’s ear for a given noise condition to be heard reliably) but also the three newly set up indices. A. B. C. D. E.

Signal-to-noise ratio (SN) Signal-to-noise ratio with A-weighting (SNA) Articulation index (AI) [13] Speech interference level (SIL) [14] Signal to interference noise ratio (SI): Focusing on the four octave band with center frequency f i ( f 4=500, f 5=1000, . . ., f 7=4000 Hz) which is required to compute the SIL, SI was calculated. SI ¼

7 1X ½LSA ð fi Þ  LNA ð fi Þ 4 i¼4

ð1Þ

where LSA( f i) and LNA(f i) denote A-weighted sound pressure level in the octave band with center frequency f i (i=4,5, . . ., 7) of the audio signal and of the noise respectively.

448

T. Tamesue et al. / Applied Acoustics 64 (2003) 443–457

F. Weighted-mean spectral distance (WSPD): In the study of AI [13], experiments show that only frequency band (200, 6100) Hz contribute to speech intelligibility. It has also been possible to determine 20 frequency bands, which seem to contribute equally to intelligibility. Therefore, based on these 20 frequency bands, eight weighting ai for the octave bands were calculated. Further, WSPD was calculated as follows:

WSPD ¼

8 X ai ½LS ðfi Þ  LN ðfi Þ

ð2Þ

i¼1

where LS( f i) and LN( f i) denote sound pressure level in the octave bands with center frequency f i (i=1, 2, . . ., 8) of the audio signal and of the noise respectively. Also, values of ai are shown as follows: a1 a3 a5 a7

¼ 0:000000 ¼ 0:063794 ¼ 0:226255 ¼ 0:227360

a2 a4 a6 a8

¼ 0:000000 ¼ 0:140096 ¼ 0:319855 ¼ 0:022640

ð3Þ

G. Arithmetic-mean spectral distance (ASPD) ASPD ¼

8 1X ½LS ðfi Þ  LN ðfi Þ 8 i¼1

ð4Þ

3.2. Membership functions of the psychological impression As mentioned previously, the subjects made judgments of their psychological impressions to long-term noise (about for 3 min) in the psychological experiment. In this section, using the recorded data of Experiment I, membership functions Fi() of the psychological impressions Fi (i=1, 2, . . ., 7) based on the index ‘’ are set up. The results of the Fi(WSPD) are shown in Fig. 3. This figure reveals the following: when the value of WSPD is increased (or decreased), the psychological impression to the noise approaches F1: Not at all annoying (or F7: Extremely annoying). In the same way as with psychological impression to noise, membership functions Ai() of the psychological impression Ai (i=1,2,. . ., 5) based on the index ‘’ are set up. The result of the Ai(WSPD) is shown in Fig. 4. (Other cases are omitted.) 3.3. Regression functions of the listening score As mentioned (Section 2.1.3), subjects noted the monosyllables exactly as listened. Then the number of correct monosyllables was assessed. The listening score is defined as the percentage of correct monosyllables of the total (50). Since relationships

T. Tamesue et al. / Applied Acoustics 64 (2003) 443–457

449

Fig. 3. Membership function for psychological impression Fi (i=1, 2, . . ., 7).

between each index and the listening score had to be understood, the following types of model describing regression function between them were adopted. Linear function: y ¼ ax þ b

ð5Þ

Logistic function: y¼

k 1 þ aebx

Modified exponential function:   xb y ¼ a 1  e k

ð6Þ

ð7Þ

The observed data of Experiment I established the relationship between each index and the listening score. One of these results are shown in Fig. 5, regarding the special case where WSPD was adopted as index. (Other cases using other indices are omitted.) In this Fig. 5, a solid line indicates the regression curve selected by AIC [15]. In this case, expression is represented by Eq. (6). (The same results were obtained in other index cases.)

450

T. Tamesue et al. / Applied Acoustics 64 (2003) 443–457

Fig. 4. Membership function for psychological impression Ai (i=1, 2,. . ., 5).

4. Prediction of psychological impression and listening score As an evaluation method of the psychological impressions to the noise and audio signal, we adopted a fuzzy probability set: ð PðFi Þ ¼ Fi ðÞpðÞd ð8Þ ð PðAi Þ ¼ Ai ðÞpðÞd

ð9Þ

for each psychological impression Fi (i=1, 2, . . ., 7) or Ai (i=1, 2, . . ., 5) as a fuzzy event [16] (time ratio of the occurrence of Fi or Ai) and following average psychological impression: ¼

7 X iPðiÞ ðPðiÞ ¼ PðFi ÞÞ

ð10Þ

i

¼

5 X iPðiÞ i

ðPðiÞ ¼ PðAi ÞÞ

ð11Þ

T. Tamesue et al. / Applied Acoustics 64 (2003) 443–457

451

Fig. 5. Relationships between WSPD and the listening score.

Probability distributions p() on index ‘’ follow a characteristic function for each presented sound of Experiment II. The fuzzy probabilities P(Fi) (i=1, 2, . . ., 7) for each psychological impression Fi obtained from Eq. (8) using the membership functions set up based on the WSPD are shown as predicted values in the case of noise (a) 65 dB in Fig. 6. (Results for the other cases are omitted.) As for the example of the result of P(Ai) (i=1, 2, . . ., 5) obtained from Eq. (9) using the membership functions set up based on the WSPD in the case of noise (a) 65 dB is shown in Fig. 7. (The other cases are omitted.) The predicted values are consistent with the observed values. On the other hand, the comparisons between the theoretical predicted values of the noise or audio signal from Eqs. (10) or (11) for each noise conditions in the case when WSPD is used, and values obtained directly from the recorded data are shown in Figs. 8 and 9, respectively. (Other cases using other indices are omitted.) The following findings are revealed by Figs. 6–9: In spite of predicting the psychological impressions to the noise and signal using the membership function from the psychological experiment in which other subjects participated, in the case of meaningless random noise with various power spectral level forms, a high level of consistency is seen between the predicted and observed values. In terms of the listening score, the regression model based on WSPD using the recorded data of Experiment I is as follows:

452

T. Tamesue et al. / Applied Acoustics 64 (2003) 443–457

Fig. 6. Comparison between predicted and observed fuzzy probability of Fi (i=1, 2, . . ., 7) [External noise: (a) 65 dB ; Index: WSPD].

Fig. 7. Comparison between predicted and observed fuzzy probability of Ai (i=1, 2,. . ., 5) [External noise: (a) 65 dB ; Index: WSPD].

T. Tamesue et al. / Applied Acoustics 64 (2003) 443–457

453

Fig. 8. Comparison between predicted and observed value of < F > (Index: WSPD).



  101:13 y : t^; x : WSPD 0:06x 1 þ 0:28e

ð12Þ

The comparisons between the theoretical predicted values of the listening score from Eq. (12) for each noise conditions of Experiment II, and values obtained directly from the recorded data are shown in Fig. 10. It can be seen from this figure that predicted results are consistent with observed values. With respect to the other indices, we investigated how the predicted values changed when the other indices was used. (Because membership function of the noise and audio signal based on SN could not be set up, no results of psychological impressions using SN are shown.) Table 1 shows the results of the correlation coefficient r between predicted and observed values in terms of the three aspects. Also, we adopted the following expression; y=x+a where y is observed data and x is predicted data, and the bias a values can be obtained by using the least-square method. Table 2 shows the results of bias a values. As can be seen in Tables 1 and 2, in common with the three aspects, it is clear that WSPD or SI is better because its estimation error a values are smaller than the other indices, and they can systematically predict the psychological impressions of the noise and audio signal, and the listening score. From the above results, the usefulness of two indices can be recognized for the purpose of predicting/ estimating in common with the psychological impressions and listening score.

454

T. Tamesue et al. / Applied Acoustics 64 (2003) 443–457

Fig. 9. Comparison between predicted and observed value of < A > (Index: WSPD). Table 1 Correlation coefficient r

SN SNA AI SIL SI WSPD ASPD

Annoyance

Speech audibility

Listening score

– 0.80 0.85 0.57 0.95 0.96 0.88

– 0.96 0.95 0.95 0.97 0.97 0.97

0.84 0.90 0.74 0.84 0.90 0.90 0.75

Annoyance

Speech audibility

Listening score

– 0.56 0.52 0.36 0.24 0.18 0.12

– 0.34 0.28 0.30 0.17 0.12 0.13

8.84 8.43 5.98 1.23 0.58 1.04 3.82

Table 2 Estimation error a

SN SNA Al SIL SI WSPD ASPD

T. Tamesue et al. / Applied Acoustics 64 (2003) 443–457

455

Fig. 10. Comparison between predicted and observed value of (Index: WSPD).

5. Conclusions In the previous paper, the proof of validity of the evaluation method of the psychological impression to noise was considered. But, in this paper, we have focused not only on the psychological impression to noise but also on the psychological impression to the audio signal and the listening score. We considered the estimation and/or prediction problems of the psychological impressions to noise and audio signal, and the listening score of monosyllable derived by using indices reflecting the mutual relationship between the power spectrum level of the audio signal and that of noise. Practical consideration of a useful index has been discussed and a result, WSPD and SI can be selected. The validity and the applicability were confirmed experimentally, and reasonable results were obtained. The primary subjects that should be examined in future studies are listed below: 1. The audio signal used in this paper employs monosyllables. However, it is necessary to study cases with more realistic audio signals such as those containing 2 or 3 syllables, words, and sentences. 2. The discussion in this paper is limited by use of recorded data obtained from psychological experiments with subjects in their twenties. It is necessary to consider how differences in age can affect results.

456

T. Tamesue et al. / Applied Acoustics 64 (2003) 443–457

3. The applicability of the same method to situations where the external noise is meaningful, such as music and conversation, should be confirmed.

Acknowledgements The authors would like to express their cordial thanks to Ryuichi Hirata, Toshiya Nakashima, Miki Asai, Ippei Urushizaki and Hiroyuki Taguchi for their assistance, and to their colleagues who provided valuable comments at the 44th Japan Joint Automatic Control Conference [17]. This study was partially supported by Japan Society for the Promotion of Science, Grant-in-Aid for Scientific Research (C), No.11832016, 1999.

References [1] Yamaguchi S, Kato Y, Saeki T. Psychological evaluation of external noise in the case of listening to audio signal. The Transactions of the Institute of Electronics, Information and Communication Engineers A 1994;J77-A(11):1433–42 (in Japanese). [2] Yamaguchi S, Saeki T, Oimatsu K. A method for psychological evaluation of external non-white noise in the case of listening to audio signal based on fuzzy sets theory. The Transactions of the Institute of Electronics, Information and Communication Engineers A 1996;J79-A(4):845–57 (in Japanese). [3] Yamaguchi S, Saeki T, Oimatsu K. A method for predicting psychological response to meaningless random noise in the case of listening to audio signal based on bi-variate membership function. The Transactions of the Institute of Electronics, Information and Communication Engineers A 1999;J82A(9):1421–7 (in Japanese). [4] Yamaguchi S, Saeki T, Oimatsu K, Kato Y. The effect of interest in speech on psychological response to external noise. Proceedings of The Fourth Asian Fussy Systems Symposium 2000:126–31. [5] Yamaguchi S, Saeki T, Tamesue T, Kato Y. Psychological evaluation of external noise in the case of listening to an audio signal, taking account of the difference between the power spectral characteristics of the audio signal and that of noise. Journal of Sound and Vibration 2001;245(2):205–15. [6] Wiliams CE, Pearsons KS, Hecker MHL. Speech intelligibility in the presence of time-varying aircraft noise. The Journal of the Acoustical Society of America 1971;50(2):426–34. [7] Powers GL, Speaks C. Intelligibility of temporarily interrupted speech. The Journal of the Acoustical Society of America 1971;54(3):661–7. [8] Steeneken HJM, Houtgast MT. A physical method for measuring speech-transmission quality. The Journal of the Acoustical Society of America 1980;67(1):318–25. [9] Goodman DJ, Nnash RD. Subjective quality of the same speech transmission conditions in seven different countries. IEEE Transactions on Communications 1982;com-30(4):642–54. [10] Yonemoto K. Characteristics of CD for the evaluation of fitting condition with hearing aids (TY-89). JHONS 1995;11(9):1395–401 (in Japanese). [11] Furihata K, Yanagisawa T. Investigation on composition of a rating scale possible common to evaluate psychological effects on various kinds of noise sources. The Journal of the Acoustical Society of Japan 1989;45(8):577–82 (in Japanese). [12] Houtgast T, Steeneken HJM. A multi-language evaluation of the RASTI-method for estimating speech intelligibility in auditoria. Acustica 1984;54:185–99. [13] ANSI S3.5-1969, Method for the calculation of the speech intelligibility index. American National Institute Standard; 1969. [14] ANSI S3.14-1977, Rating noise with respect to speech interference. American National Institute Standard; 1977.

T. Tamesue et al. / Applied Acoustics 64 (2003) 443–457

457

[15] Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control 1974;AC-19(6):716–23. [16] Zadeh LA. Probability measures of fuzzy events. Journal of Mathmatical Analysis and Application 1968;23:421–7. [17] Tamesue T, Yamaguchi S, Saeki T, Kato Y. A consideration on the psychological impressions and the listening score in the case of listening to audio signal under the existing of meaningless steady noise. Preprints of Japan Joint Automatic Control Conference 2001:607–8 (in Japanese).