Binaural speech intelligibility through personal and non-personal HRTF via headphones, with added artificial noise and reverberation

Binaural speech intelligibility through personal and non-personal HRTF via headphones, with added artificial noise and reverberation

Speech Communication 105 (2018) 53–61 Contents lists available at ScienceDirect Speech Communication journal homepage: www.elsevier.com/locate/speco...

831KB Sizes 0 Downloads 23 Views

Speech Communication 105 (2018) 53–61

Contents lists available at ScienceDirect

Speech Communication journal homepage: www.elsevier.com/locate/specom

Binaural speech intelligibility through personal and non-personal HRTF via headphones, with added artificial noise and reverberation Felipe Orduña-Bustamante a,∗, A.L. Padilla-Ortiz b, Edgar A. Torres-Gallegos c a

Instituto de Ciencias Aplicadas y Tecnología, Universidad Nacional Autónoma de México, Circuito Exterior S/N, Ciudad Universitaria, A.P. 70-186, Coyoacán, Ciudad de México C.P. 04510, México b CONACyT-CICESE, Unidad Monterrey, Alianza Centro 504, PIIT Apodaca, Nuevo León C.P. 66629, México c Tecnologico de Monterrey, Escuela de Humanidades y Educacición, Carretera Lago de Guadalupe, km 3.5. Col. Margarita Maza de Juárez, Atizapán de Zaragoza, Estado de México C.P. 52926, México

a r t i c l e

i n f o

a b s t r a c t

Keywords: Speech intelligibility Binaural sound Personal HRTF Noise Reverberation

Subjective intelligibility tests were carried out by processing speech through personal and non-personal HeadRelated Transfer Functions (HRTF) for azimuth angle 𝜃 = +30◦ (sound source to the right), presented through headphones, under simulated adverse listening conditions. Tests with noise disturbance were also conducted at azimuth angles of 𝜃 = 0◦ , 15° and 45°. Phonetically balanced bi-syllable words in Spanish, uttered by a mexican female speaker, were used as speech material. Stimuli were convolved with personal or non-personal HRTF, and artificially contaminated with noise or reverberation, interaurally correlated or uncorrelated at the left and right ears. Results at 𝜃 = 30◦ show that binaural speech intelligibility scores reduce slightly to moderately with non-personal HRTF, compared to personal HRTF, for these types of acoustic disturbance. Average intelligibility score reductions of Δ 𝐼 = −7.6%, and −12.1%, were found statistically significant, p(Δ I ≠ 0) > 0.99, respectively for interaurally correlated and uncorrelated noise. Reductions of intelligibility with reverberation were found smaller: −1.9%, 𝑝 = 0.63, and −4.1%, p > 0.99, respectively for interaurally correlated and uncorrelated reverberation, the reduction being smaller, and statistically less significant for interaurally correlated reverberation. A tentative simple model of speech intelligibility is also presented, based on the modulation index theory, under the better-ear binaural assumption, and the spectral distortion metric to quantify HRTF differences, in an attempt to incorporate the effects of non-personal HRTF in speech intelligibility. The model explains correctly the qualitative trend of the results, but it overpredicts the observed reductions of the intelligibility scores, showing that spectral distortion of HRTF is probably too simplified, and insufficient to provide an accurate or suitable explanation in this context. Results with noise disturbance at other angles are as follows. At 0°, intelligibility scores decrease for non-personal HRTF with interaurally correlated noise (𝐿 = 𝑅), but increase with uncorrelated noise (L ≠ R). At 15°, intelligibility scores decrease in both conditions. At 45°, no change is observed.

1. Introduction

listener (Rumsey, 2012). In this case, it is possible to recover the spatial sound sensation by processing the audio signals through the subject’s own Head-Related Transfer Functions (HRTF) (Møller et al., 1995; Cheng and Wakefield, 1999), as in normal binaural hearing. Due to anatomical differences, HRTF vary significantly between different individuals (Møller et al., 1995), for that reason, personalization of HRTF is necessary for some applications (Bondu et al., 2006). Several studies have demonstrated that the use of personal or personalized HRTF improves the localization of sound sources (Kistler and Wightman, 1992; Wenzel et al., 1993; Møller et al., 1995; Xie, 2002). On the other hand, some studies have also been conducted dealing with the effect of HRTF on speech intelligibility. Some of these focus on the assessment of speech

Binaural technology aims to present acoustic stimuli at both ears in order to recreate a virtual auditory space that listeners can perceive as a real acoustic environment. The applications of binaural technology have attracted interest in many fields such as communications systems, hearing aids, speech technology, music entertainment and virtual reality, among others (Møller, 1992; Blauert, 1983; Gardner, 1998; Takeuchi et al., 2001). Binaural hearing through headphones requires the control of sound at each of the two ears independently. In this case, the main disadvantage is a potential loss of perceived spatialization, in which the sound appears as if it were located inside the head of the



Corresponding author. E-mail addresses: [email protected], [email protected] (E.A. Torres-Gallegos).

[email protected]

(F.

Orduña-Bustamante),

https://doi.org/10.1016/j.specom.2018.10.009 Received 23 April 2018; Received in revised form 4 September 2018; Accepted 30 October 2018 Available online 1 November 2018 0167-6393/© 2018 Elsevier B.V. All rights reserved.

[email protected]

(A.L.

Padilla-Ortiz),

F. Orduña-Bustamante, A.L. Padilla-Ortiz and E.A. Torres-Gallegos

Speech Communication 105 (2018) 53–61

intelligibility in the presence of noise sources or multiple talkers at different positions, but only very few compare personal and non-personal HRTF.

time for non-stationary interferers. Prediction quality of the original EC/SII model is maintained in the extended version. Wan et al. (2010) developed an extended version of the EC principle to predict speech intelligibility in complex acoustic environments, as in the presence of multiple maskers at different spatial locations, and different types of maskers: speech-shaped noise (SSN), modulated SSN, normal speech, and reversed speech. Tests under anechoic conditions show that this model predicts speech intelligibility very well for SSN, and modulated SSN maskers, but no so well for normal or reversed speech maskers. Another binaural model proposed by Lavandier and Culling (2010) predicts the loss of intelligibility considering the effects of reverberation on the interferer. In order to investigate this effect specifically, the speech target was always anechoic. This method is based on predicted Binaural Masking Level Differences (BMLDs). According to (Durlach, 1963), BMLD is the difference in the detection threshold when a listener is exposed to different signals: target and interferer, and there are no Interaural Time Differences (ITD), neither Interaural Level Differences (ILD) between both signals, compared to the case in which there is a given difference in ITD and ILD. Results from this model show a high correlation (0.95–0.97) between the observed SRT and the predicted value, in the presence of several discrete noise sources at various azimuth angles, distances, and virtual rooms. A computationally more efficient implementation of this model was presented later by Lavandier et al. (2012). The Binaural Speech Intelligibility Model (BiSIM) proposed by Cosentino et al. (2014), predicts the binaural advantage to speech intelligibility from mixed target and interferer signals, requiring only limited a priori information, such as the number of interferers present in the sound mixture, and the location of the target. However, in this study, the number of interferes was restricted to one, and the location of the target was only directly in front. The performance was evaluated under anechoic, and reverberant conditions. The model proposed by van Wijngaarden and Drullman (2008) is based on a binaural version of the Speech Transmission Index (STI) (Steeneken and Houtgast, 1980), and in this sense is of more standard practical use in applications. Instead of using the EC principle, this model calculates the Modulation Transfer Function (MTF) separately for both ears, it then applies the better-ear assumption (Edmonds and Culling, 2006) to determine the better MTF, left or right, at the octave bands of 125, 250, 4000, and 8000 Hz, while for each of the octave bands at 500, 1000 and 2000 Hz (assumed as the frequency range in which the most significant binaural interactions occur for speech intelligibility) a search is conducted for the maximum interaural crosscorrelation with interaural delays from −2 to +2 ms, and the MTF of the combined left and right ear signals, bandlimited at these frequencies, is calculated applying these maximizing interaural delays. Finally, the MTF values from 125 to 8000 Hz are combined for the final STI calculation. The binaural advantage is computed in this way, based on the interaural cross-correlation function. Results show an improved correspondence with subjective intelligibility under dichotic listening conditions.

1.1. Speech intelligibility and HRTF A study by Drullman and Bronkhorst (2000) using speech material (words and sentences) bandlimited at 4 kHz, with up to four competing talkers at different positions, found no difference in performance of speech intelligibility, talker recognition, or talker localization tasks between the use of personal and generic HRTF in 3D auditory displays. Kondo et al. (2010) studied speech intelligibility in the presence of a competing noise source considering different acoustic scenarios: one with real sound sources, and a virtual acoustic environment. In order to spatialize the sound sources in the virtual acoustic space, generic HRTF measured on a KEMAR dummy head were included in the signal processing for virtual sound reproduction. Personal HRTF were also measured, and included alternatively. Recordings of Japanese diagnostic rhyme test (DRT) were used as speech material. The target speech was placed in front of the listener, and a competing noise source was placed at various other azimuths on the horizontal plane. Results show that intelligibility with KEMAR HRTF reduces slightly, around −5% difference than with personal HRTF, when SNR is 0 and −6 dB, but that KEMAR HRTF slightly outperform personal HRTF by a similar margin at SNR −12 dB. Additionally, in the real acoustic space, speech intelligibility was always better than using both types of HRTF in the virtual space. In a study conducted by Bronkhorst and Plomp (1992) speech and noise were recorded with a KEMAR dummy head and presented to the subjects through headphones. Speech was recorded at an azimuth of 0° (frontal position), whereas noise was recorded at seven azimuth angles from 0° to 180° in steps of 30°. Gains in the speech-reception thresholds (SRT) were observed from 1.5 to 8 dB for normal-hearing listeners when noise maskers are moved from the frontal position to other azimuths. Peissig and Kollmeier (1997) measured SRT as a function of the azimuth angle. A virtual sound field was presented by processing signals through generic HRTF. Different spatial configurations were studied, showing that speech recognition improves when speech and noise come from different directions. Ayllón et al. (2013) proposed a method to reduce the loss of speech intelligibility due to non-personal HRTF in hearing aids. Evaluation mixed different speakers with two types of noise: spatially uncorrelated white noise and noise babble coming from different spatial directions. Results showed that a similar intelligibility was obtained in comparison to the case where personal HRTF are available. 1.2. Models for binaural speech intelligibility Over the past few years, binaural models have been developed to predict binaural speech intelligibility. The EC/SII model proposed by Beutelmann and Brand (2006) uses the binaural equalizationcancellation (EC) principle (Durlach, 1963) combined with the speech intelligibility index (SII) method (ANS, 1997). The binaural SRT was measured in the presence of a stationary noise masker at different azimuths and a speech target always in front. Three different acoustic environments were considered: anechoic, an office room, and a cafeteria hall, for both normal-hearing and hearing-impaired listeners. The intelligibility scores from the listening tests correlate strongly with those of predicted SRT, with a coefficient of correlation of 0.95. A revision and extension of the EC/SII model was made by Beutelmann et al. (2010). The extended model, identified as the shorttime binaural speech intelligibility model (stBSIM), incorporates short time frames of speech and noise signals, averaging the predictions over

1.3. Motivation and outline Personal or personalized HRTF are not always available, and are generally difficult to obtain. For that reason, the aim of the present study is to evaluate the effects on speech intelligibility of non-personal HRTF, that could be incorporated as part of a binaural sound reproduction system. A tentative simple model of speech intelligibility with non-personal HRTF is presented, similar in very basic terms to the binaural STI model of van Wijngaarden and Drullman (2008). The model is compared with results from subjective speech intelligibility tests carried out with personal and non-personal HRTF, under simulated strong acoustic disturbances with noise and reverberation. 54

F. Orduña-Bustamante, A.L. Padilla-Ortiz and E.A. Torres-Gallegos

Speech Communication 105 (2018) 53–61

2. Theory

HRTF mismatch, a binaural feature, only approximately. However, this simplification can be justified along the lines of the better-ear assumption (Edmonds and Culling, 2006), and in this sense, this model can be related in very basic terms to the binaural STI model proposed by van Wijngaarden and Drullman (2008). Estimation of speech intelligibility scores obtained from subjective tests based on phonetically balanced PB words of the same type, also requires a mapping from the modulation index m (or more conventionally from the Speech Transmission Index STI, but which will still be denoted for simplicity in what follows with the same symbol m) to the speech intelligibility score I(m), specific to the type of PB words used for testing. With this information lacking, or limited, in the context of the present study, this mapping is here assumed to be expressed very approximately in the following form:

Speech intelligibility is normally reduced under disturbing noise and reverberation. The model based on the modulation transfer function MTF, the modulation index m, and its elaboration into the Speech Transmission Index STI, is a well established useful practical framework (Houtgast and Steeneken, 1985a; 1985b). The modulation index can be expressed as follows: 𝑚= √

1 1 + [2𝜋𝐹 𝑇 ∕13.8]2

×

1 ; 1 + 10(−SNR∕10)

(1)

where F is the modulation frequency of speech in hertz (related to the number of syllables per second), T the reverberation time in seconds, and SNR the signal-to-noise ratio in decibels. The combined effects of noise and reverberation on speech intelligibility can be taken into account through an implied equivalent signal to noise ratio SNR′ consistent with Eq. (1), as follows: ( ) 𝑚 , SNR′ = 10 log10 (2) 1−𝑚

𝐼(𝑚) = 100 tanh [tanh−1 (𝐼0 ∕100) 𝑚∕𝑚0 ];

where I0 is a reference speech intelligibility score obtained from subjective tests for a disturbance with a single specific modulation index (STI value) m0 , preferably corresponding to a medium loss of intelligibility I0 ∼ 50%. This mapping has the following desirable properties, that can be observed in other mappings of the Speech Transmission Index STI to intelligibility scores (Houtgast and Steeneken, 1985a; 1985b): 𝐼(𝑚) ⟶ 0 for 𝑚 ⟶ 0, m ≪ 1, 𝐼(𝑚0 ) = 𝐼0 , and 𝐼(𝑚) ⟶ 100 for 𝑚 ⟶ 1, 𝑚 ≫ 𝑚0 ∕ tanh−1 (𝐼0 ∕100). Taking into account the relationships obtained before, Eq. (7), in terms of the spectral distortion occurring when non-personal HRTF are used, the following mapping is obtained for the speech intelligibility score with non-personal HRTF, as a function of the spectral distortion SD (HRTF mismatch): [ ] tanh−1 (𝐼0 ∕100) 𝐼(SD) = 100 tanh ; (9) 𝑚0 + (1 − 𝑚0 ) 10SD∕10



𝑚=

10SNR ∕10 ; ′ 10SNR ∕10 + 1

(3)

where the equivalent signal to noise ratio SNR′ combines the effects of the actual signal to noise ratio SNR, reverberation time T, and modulation frequency F. 2.1. A tentative simple model of speech intelligibility with non-personal HRTF The effect of non-personal HRTF on binaural speech intelligibility can be modeled in basic terms, as follows. Two very similar metrics have been proposed to quantify differences between two comparable HRTF: the spectral distortion SD (Nishino et al., 2007), and the complex spectral distortion CSD (Torres-Gallegos, E. A. and Orduña-Bustamante, F. and Arámbula-Cosío, 2015), expressed in decibels, typically found between 0 and 15 dB: √ √ √ | ′ | 2 ⎛ √ ∑ ⎡ |𝐻𝑓 − 𝐻𝑓 | ⎞⎤ √1 | | ⎟⎥ ; 2 ⎢ ⎜ CSD = √ 𝐴 20 log10 1 + (4) | | ⎟⎥ ⎜ 𝑁 𝑓 𝑓⎢ 𝐻 | | 𝑓 ⎣ ⎝ | | ⎠⎦

where I0 is to be obtained from preliminary subjective tests as a measured reference speech intelligibility score with personal HRTF, for a given reference value of the modulation index (STI value) m0 . 3. Speech material and signal processing 3.1. Speech material A small corpus of 191 bi-syllable phonetically balanced (PB) words with meaning in Spanish was used in this study (Castañeda and Pérez, 1991). See Appendix A. In phonological terms, most of these words have stress in the penultimate syllable (paroxytone type), representing the most common type of bi-syllable word in Spanish (Zubick et al., 1983). Here, it is important to point out that compilation of intelligibility test material in Spanish has so far been limited (Zubick et al., 1983; Tato, 1949; Ferrer, 1960; Cancel, 1965; Berruecos T. and Luis Rodriguez, 1967; Benitez and Speaks, 1968). A particular issue in this respect, is the small number of monosyllable words with meaning in Spanish (Tato, 1949; Ferrer, 1960) available for intelligibility tests. Speech material for each test consisted of a list of 50 words, randomly selected without repetitions, from the corpus of 191 bi-syllable Spanish words. Speech was produced in Spanish by a female speaker born in Mexico City, monophonically recorded with an Audio-Technica AT3525 cardiod microphone, under low noise dry studio conditions. In the audio recordings, words were introduced by one of six different carrier sentences in Spanish, variably and randomly presented during the tests, which in translation are similar to: “Next, you will hear the word...′′ In this regard, different presentation strategies are sometimes used for research in psycho-linguistics and other fields, with carrier phrases containing test words out of semantic context, and others, as in Cervera and González-Alvarez (2011), who adapted into Spanish a test developed initially for English native speakers (Kalikow et al., 1977).

where Hf , 𝐻𝑓′ , are complex amplitudes of personal and non-personal HRTF as a function frequency f, N is the number of frequency components, and Af are frequency weighting factors, such as the standard A-weighting curve. (For brevity in what follows, the term spectral distortion, and the symbol SD, will refer also to the complex spectral distortion CSD). A tentative proposition is that spectral distortion can be incorporated into the modulation index model of speech intelligibility as if it were additional noise. In this way, spectral distortion can then be considered as a reduction of the equivalent signal to noise ratio SNR′ to SNR′ − SD, so that Eq. (3) for the modulation index, transforms to: ′

𝑚SD =

10SNR ∕10 , ′ ∕10 SNR 10 + 10SD∕10

(5)



=𝑚×

=

10SNR ∕10 + 1 , ′ 10SNR ∕10 + 10SD∕10

𝑚 ; 𝑚 + (1 − 𝑚) 10SD∕10

(8)

(6)

(7)

which depends on the equivalent signal to noise ratio SNR′, as well as on the spectral distortion SD (HRTF mismatch). It is important to note that this is a monophonic model which has been extended to include 55

F. Orduña-Bustamante, A.L. Padilla-Ortiz and E.A. Torres-Gallegos

Speech Communication 105 (2018) 53–61

3.2. Signal processing of speech material 3.2.1. Speech convolved with HRTF The monophonic speech recordings s(t) were convolved with personal or non-personal HRTF, as follows: 𝑠L (𝑡) = 𝑠(𝑡) ∗ ℎL (𝑡),

(10)

𝑠R (𝑡) = 𝑠(𝑡) ∗ ℎR (𝑡);

(11)

where hL (t), hR (t) represent personal or non-personal Head-Related Impulse Responses (HRIR) at the left and right ears, obtained as described below, in Section 4. 3.2.2. Acoustic disturbance In order to obtain a similar degradation of the speech signals when disturbed with noise or reverberation, a modulation index of 𝑚0 = 0.1 was selected, which some preliminary intelligibility tests with the same speech material had shown leading to noticeable reduction of intelligibility (Houtgast and Steeneken, 1985a; 1985b). A modulation frequency of 𝐹 = 3.7 Hz, was assumed, which corresponds with the measured average pace of continuously running speech production, syllables per second, in the audio recordings of the test material used in the present study. The proposed modulation index of 𝑚0 = 0.1 was then obtained approximately, according to Eq. (1), either with a reverberation time of 𝑇 = 5 s, or with a signal-to-noise ratio of SNR = −10 dB. Using these parameters, the artificial acoustic disturbances in this study are intended to exhibit the same modulation index 𝑚0 = 0.1 in all perceptually relevant octave frequency bands, typically considered from 125 to 8000 Hz, and at both ears, independently of the interaural delay. It is then assumed that with these artificial disturbances we have to a good approximation that STI ≈ 𝑚0 = 0.1.

Fig. 1. Head-Related Impulse Responses (HRIR) on a linear [Lin] vertical scale, measured at the left and right ears (top and lower graphs respectively) for source azimuth angle at 𝜃 = +30◦ (to the right) of 14 test subjects (thin dotted lines), and CIPIC subject 40 (thick solid line). The left ear graphs have a vertical offset for clarity.

4. Personal and non-personal HRTF 4.1. Personal HRTF A group of 14 subjects participated in the intelligibility tests whose HRTF had previously been measured (Torres-Gallegos, E. A. and Orduña-Bustamante, F. and Arámbula-Cosío, 2015) in an anechoic chamber with the sound source at different azimuth angles 𝜃 and elevation angle 𝜙 = 0◦ . For speech intelligibility tests with added noise and reverberation, the clean speech signal was first processed through HRTF at 𝜃 = +30◦ (to the right). Additional tests with noise were also conducted with speech processed through HRTF at 𝜃 = 0◦ , 15°, 45°. We assume an approximate left/right symmetry of the results, so this should also cover the corresponding angles to the left. HRTF were measured with a sinusoidal linear sweep from 0 Hz to 20 kHz played back through an Event Electronics ALP-5 5-inch monitor loudspeaker, and the sound reaching the blocked ear canal of the subjects was recorded by means of an Etymotic ER-7c microphone at a sampling rate of 44.1 kHz. The distance from the source to the center of the subject’s head was 1 m. Signal generation and data acquisition was done by means of an external audio interface M-Audio ProFire 610 and custom Matlab code. Subjects sat on a chair in the center of the anechoic chamber with their backs straight. A protractor, a laser pointer, and a string attached to a headset were used to measure the angle, and the distance from the center of the subject’s head to the sound source. Fig. 1 shows Head-Related Impulse Responses (HRIR) for azimuth angle +30◦ at the left and right ears of the 14 test subjects, together with HRIR of an external subject, and Fig. 2 shows frequency responses of measurements at the right ear.

3.2.3. Speech disturbed with noise Random noise with A-weighted spectral amplitude was added to the clean binaural HRTF processed speech signals, as follows: 𝑠LN (𝑡) = 𝑠L (𝑡) + 𝑛L (𝑡),

(12)

𝑠RN (𝑡) = 𝑠R (𝑡) + 𝑛R (𝑡);

(13)

where sL (t), sR (t) are the clean binaural signals at the left and right ears, processed through the corresponding HRTF; nL (t), nR (t) are A-weighted random noise signals with zero mean, and scaled to the specified signalto-noise ratio, which was set to SNR = −10 dB. The same or different (interaurally correlated or uncorrelated) noise disturbance signals were added at the left and right ears. 3.2.4. Speech disturbed with reverberation Recorded speech was convolved with an artificial reverberant impulse response, generated as follows: 𝑟(𝑡) = 𝑟0 𝑛(𝑡) exp (−6.9 𝑡∕𝑇 );

(14)

where T is the reverberation time, n(t) is one instance of a random noise signal with zero mean, unit variance, with a brown spectrum frequency weighting, −6 dB∕oct spectral slope, and r0 is a scale factor. The reverberation time was set to 𝑇 = 5 s. The reverberated speech signals were obtained by convolution, as follows: 𝑠LT (𝑡) = 𝑠L (𝑡) ∗ 𝑟L (𝑡),

(15)

𝑠RT (𝑡) = 𝑠R (𝑡) ∗ 𝑟R (𝑡);

(16)

4.2. Non-personal HRTF In order to carry out subjective intelligibility tests using non-personal HRTF, the CIPIC HRTF database was used, prepared at the Center for Image Processing and Integrated Computing (CIPIC) Interface Laboratory of the University of California at Algazi et al. (2001). The CIPIC HRTF database contains HRTF of 45 subjects, at 25 sound source azimuth angles, and 50 elevation angles. The CIPIC database also includes a set of anthropometric data for a subset of 35 subjects out of the 45 subjects with HRTF. The measurement setup and procedure used for the CIPIC HRTF database was followed in the present study as closely

where sL (t), sR (t) are the clean binaural HRTF processed signals; rL (t), rR (t) are the reverberant impulse responses, which were taken to be the same or different (interaurally correlated or uncorrelated) at the left and right ears. 56

F. Orduña-Bustamante, A.L. Padilla-Ortiz and E.A. Torres-Gallegos

Speech Communication 105 (2018) 53–61

5. Speech intelligibility tests Intelligibility tests were carried out in a small sound damped room, fitted with absorbing foam at the walls, and a thick textile carpet on the floor. Subjects listened to the speech signals, processed as described in Section 3.2, convolved with personal or non-personal HRTF at 𝜃 = +30◦ with noise and reverberation, and additional angles with noise, as explained in Section 4, through supra-aural headphones Sony MDR-SA1000, and an external audio interface M-Audio ProFire 610. The subject’s task was to typewrite the words onto a text file in a laptop computer; possible spelling mistakes were ignored. In order to give the subjects enough time to typewrite the words, a silent pause of 3 s was included after each word and the following carrier sentence in the recordings. Tests were split into several sessions of about 30 min each, in order to avoid undue fatigue, and a possible word memorization effect. Two initial dummy tests were done in order to train the subjects with speech material and acoustic disturbances similar to those in the tests including personal and non-personal HRTF. Fourteen subjects took part in the listening tests, 12 male, 2 female. Their age was from 28 to 59 years, with an average of 41.0 years. Subjects were mexican Spanish native speakers from the central region of Mexico, except for a colombian student, who was near the end of a one year stay in Mexico City. Subjects were screened with a standard tonal audiometric test, ensuring normal hearing levels within at most 20 dB HL up to 4 kHz, with only a few age-related higher hearing losses at 6–8 kHz being observed in a minority of the subjects. Subjects were allowed at least 20–30 min in a quiet environment before starting any test session, and they had not been exposed to high sound levels prior to the tests. Three of the participants were previously familiar with the lists of words, and with the type of intelligibility tests carried out in this study, the other 11 participants had no prior experience in this, or in any other kind of subjective tests, or psycho-acoustic experiments.

Fig. 2. HRTF frequency responses measured at the right ear for source azimuth angle at 𝜃 = +30◦ (to the right) of 14 test subjects (thin dotted lines), and CIPIC subject 40 (thick solid line).

as possible for compatibility, in the measurement of personal HRTF of the subjects participating in the speech intelligibility tests, as described in the previous section, Section 4.1. Subject number 40, out of the 45 subjects in CIPIC HRTF database, was selected to represent HRTF different to personal HRTF of the subjects participating in the speech intelligibility tests. Complete anthropometric data is also available in the CIPIC database for this subject. The HRIR and HRTF of CIPIC subject 40 are also shown in Figs. 1 and 2. Other subjects in the CIPIC database showed HRTF differing similarly to the personal HRTF of the intelligibility test subjects, so that no clear, or very specific selection guideline was followed in the selection of CIPIC subject 40, other than being a subject different to the subjects participating in the intelligibility tests. Some quantitative comparisons are quoted next for subject 40, other subjects in the CIPIC database, and the 14 subjects participating in the present study. The average complex spectral distortion SD in decibels for the right ear (the better-ear under the conditions of this study) for the azimuth angle at 𝜃 = +30◦ was calculated, using Eq. (4), between the 14 participants in this study, and each of the 35 CIPIC subjects with anthropometric data. The 14 subject average complex spectral distortion varies in a range from SDmin = 5.0 dB, to SDmax = 6.9 dB, with mean value (and standard deviation in parenthesis), over the subset of 35 CIPIC subjects, of: SDmean = 5.6 (0.5) dB. Within this range, CIPIC subject 40 has an average complex spectral distortion (standard deviation in parenthesis), over the 14 subjects in this study, of: SD40 = 5.2 (0.6) dB. It must be pointed out here, that we did not opt for CIPIC subjects maximizing the average SD, as this strategy resulted in CIPIC HRTF too far removed from the personal HRTF of the 14 participants, at one, or both ears. Additionally, a normalized average measure of anthropometric distance was calculated (Torres-Gallegos, E. A. and Orduña-Bustamante, F. and Arámbula-Cosío, 2015) between the 14 participants in this study, and each of the 35 CIPIC subjects with anthropometric data. The 14 subject average anthropometric distance, in arbitrary normalized linear non-dimensional metric units, varies in a range from 𝑑min = 1.31, to 𝑑max = 2.26, with mean value (and standard deviation in parenthesis), over the subset of 35 CIPIC sbjects, of: 𝑑mean = 1.76 (0.32). Within this range, CIPIC subject 40 has an average anthropometric distance (standard deviation in parenthesis), over the 14 subjects in this study, of: 𝑑40 = 2.07 (0.42).

6. Results Fig. 3 shows speech intelligibility scores obtained under disturbing noise, inter-aurally correlated or uncorrelated at the left and right ears, as a function of complex spectral distortion of HRTF, for azimuth angle 𝜃 = 30◦ . It can be observed that average intelligibility scores reduce for non-personal HRTF relative to personal HRTF. Fig. 4 shows intelligibility scores under disturbing reverberation, inter-aurally correlated or uncorrelated at the left and right ears, as a function of complex spectral distortion of HRTF, for azimuth angle 𝜃 = 30◦ . Again, average intelligibility scores are seen to reduce for non-personal HRTF. These results show that average intelligibility scores reduce slightly to moderately for non-personal relative to personal HRTF in all conditions studied, with mean reductions from −1.9% to −12.1%. The figures also show a linear fit to the data, which always results in a small descending slope, and the simple prediction model based on the modulation index theory, and the better-ear binaural assumption. The measured reduction of speech intelligibility is small, slightly less, or even much less, than the accepted meaningful perceptual change of 15% (Houtgast and Steeneken, 1984), and noticeable smaller than the reduction predicted by the tentative simple model, even though the model correctly predicts the observed trend of the results. The squared correlation coefficients R2 of the linear data fit are significantly less than 1, for noise, and very close to zero for reverberation, showing that there is a very poor, or no linear relationship between intelligibility scores I, and spectral distortion SD. In fact, no other type of possible functional relationship can be clearly observed. Spectral distortion results in a narrow distribution that prevents a clear differentiation between the subjects, as it has also been reported in the context of HRTF personalization (Torres-Gallegos, E. A. and Orduña-Bustamante, F. and ArámbulaCosío, 2015). Table 1 shows the interaural condition of the acoustic disturbance, correlated (𝐿 = 𝑅) or uncorrelated (L ≠ R) at the left and right ears, the 57

F. Orduña-Bustamante, A.L. Padilla-Ortiz and E.A. Torres-Gallegos

Speech Communication 105 (2018) 53–61

Fig. 3. Binaural speech intelligibility scores vs. complex spectral distortion of HRTF for azimuth angle 𝜃 = 30◦ , with (a) interaurally correlated noise (𝐿 = 𝑅), and (b) interaurally uncorrelated noise (L ≠ R): personal HRTF (circles), non-personal HRTF ( × marks). Linear fit with correlation coefficients (a) 𝑅2 = 0.35, and (b) 𝑅2 = 0.23 (solid lines), and simple prediction model (dashed lines). 𝑁 = 14 data points shown, some of them graphically overlap.

Fig. 4. Binaural speech intelligibility scores vs. complex spectral distortion of HRTF for azimuth angle 𝜃 = 30◦ , with (a) interaurally correlated reverberation (𝐿 = 𝑅), and (b) interaurally uncorrelated reverberation (L ≠ R): personal HRTF (circles), non-personal HRTF ( × marks). Linear fit with correlation coefficients (a) 𝑅2 = 0.01, and (b) 𝑅2 = 0.05 (solid lines), and simple prediction model (dashed lines). 𝑁 = 14 data points shown, some of them graphically overlap.

reduce slightly to moderately, −1.9% to −12.1%, but the change is statistically significant (p > 0.99), except for interaurally correlated reverberation (𝑝 = 0.63). These results partly agree with Kondo et al. (2010), and are in contrast with Drullman and Bronkhorst (2000), who found no significant difference in speech intelligibility with speech bandlim-

change of subject-mean binaural speech intelligibility scores from personal to non-personal HRTF when disturbed with noise or reverberation: Δ𝐼 = 𝐼𝑛𝑝 − 𝐼𝑝 , standard deviation shown in parenthesis (std.), and the statistical t-test probability of assuming the change being non-null: p(ΔI ≠ 0), with a sample size 𝑁 = 14. The average intelligibility scores

58

F. Orduña-Bustamante, A.L. Padilla-Ortiz and E.A. Torres-Gallegos

Speech Communication 105 (2018) 53–61

Fig. 5. Approximate predicted mappings of modulation index m to speech intelligibility score with (a) noise, and (b) reverberation, from Eq. (8), assuming reference values (circles) with modulation index 𝑚0 = 0.1, and mean intelligibility scores with personal HRTF, I0 as given in Table 1, interaurally correlated (𝐿 = 𝑅, solid lines), interaurally uncorrelated (L ≠ R, dashed lines).

Table 1 Interaural condition of the acoustic disturbance with noise or reverberation, correlated (𝐿 = 𝑅) or uncorrelated (L ≠ R) at the left and right ears, subjectmean binaural speech intelligibility scores I0 with personal HRTF for azimuth angle 𝜃 = 30◦ , change of mean speech intelligibility scores from personal to non-personal HRTF Δ𝐼 = 𝐼𝑛𝑝 − 𝐼𝑝 , with standard deviation shown in parenthesis (std.), and statistical t-test probability of change being non-null: p(ΔI ≠ 0), with a sample size 𝑁 = 14. Disturbance

I0

ΔI (std.)

p(ΔI ≠ 0)

Noise 𝐿 = 𝑅 Noise L ≠ R Reverb 𝐿 = 𝑅 Reverb L ≠ R

52.3 45.3 47.7 60.7

−12.1 (8.1) −7.6 (6.4) −1.9 (7.4) −4.1 (4.4)

> 0.99 > 0.99 0.63 > 0.99

that interaural differences emphasized by lateralization of the sound source tend to improve binaural speech intelligibility in the presence of interaurally correlated noise (𝐿 = 𝑅). With interaurally uncorrelated noise (L ≠ R), intelligibility scores are larger for personal than for non-personal HRTF for slightly lateral sound source positions, azimuth angles at 15° and 30°, and again, no significant difference is found at 45°. However, very notably at the frontal position, azimuth angle 0°, a peculiar result is observed, where intelligibility scores are greater for non-personal than for personal HRTF. For personal HRTF, intelligibility scores are smaller at 45°, where the perceptual advantage of having interaurally uncorrelated noise (L ≠ R) is also smaller. For non-personal HRTF, intelligibility scores remain relatively low and with little variation for azimuth angles from 15° to 45°, probably indicating the negative effect of unfamiliar and confusing interaural differences produced by the interaurally uncorrelated noise and non-personal HRTF. At the frontal position 0°, where HRTF tend to exhibit minimal or no interaural differences, the unfamiliar characteristics of non-personal HRTF seem to have a surprisingly beneficial effect, with greater intelligibility scores than with personal HRTF, partially overcoming the interaurally uncorrelated noise disturbance (L ≠ R). Table 2 shows changes of subject-mean binaural speech intelligibility scores from personal to non-personal HRTF when disturbed with noise: Δ𝐼 = 𝐼𝑛𝑝 − 𝐼𝑝 , standard deviation shown in parenthesis (std.), and statistical t-test probability of change being non-null: p(ΔI ≠ 0), with sample size 𝑁 = 14, for different azimuth angles 𝜃 in degrees, interaurally correlated (𝐿 = 𝑅) or uncorrelated (L ≠ R) at the left and right ears. Except for interaurally uncorrelated noise at the front 0°, binaural intelligibility score differences are negative or near zero, indicating the perceptual advantage of personal over non-personal HRTF. For interaurally correlated noise (𝐿 = 𝑅), intelligibility scores are negative for azimuth angles at 30° and below, with the difference decreasing to non significant levels at the larger azimuth angle of 45°. For interaurally uncorrelated noise (L ≠ R), intelligibility scores are negative for azimuth angles of 15° and 30°, with the difference being again non significant at 45°. At the frontal position 0°, binaural intelligibility score differences are positive (intel-

ited to 4 kHz, which might have reduced the influence of personal and non-personal HRTF. Fig. 5 shows the approximate predicted mapping of modulation index m to speech intelligibility score, used as part of the simple intelligibility model, particular to the present study with bi-syllable words in Spanish under disturbing reverberation and noise. Reference values for these mappings were obtained from intelligibility scores I0 obtained in this study with personal HRTF, and the assumed modulation index 𝑚0 = 0.1. 6.1. Effect of azimuth angle with noise disturbance Additional tests with noise were also conducted with HRTF at 𝜃 = 0◦ , 15°, 45°. Fig. 6 shows binaural speech intelligibility scores vs. azimuth angle, with interaurally correlated noise (𝐿 = 𝑅) and interaurally uncorrelated noise (L ≠ R), for personal and non-personal HRTF. With interaurally correlated noise (𝐿 = 𝑅), intelligibility scores are larger for personal than for non-personal HRTF for azimuth angles from 0° to 30°, the difference being slightly less clear at 15°, and with no significant difference being found at 45°. In both cases, intelligibility scores tend to increase very slightly as the azimuth angle increases, indicating 59

F. Orduña-Bustamante, A.L. Padilla-Ortiz and E.A. Torres-Gallegos

Speech Communication 105 (2018) 53–61

Fig. 6. Binaural speech intelligibility scores vs. azimuth angle, with (a) interaurally correlated noise (𝐿 = 𝑅), (b) interaurally uncorrelated noise (L ≠ R): personal HRTF (solid lines), non-personal HRTF (dashed lines), shown as mean values and error bars ( ± 1 standard deviation). Data points are slightly offset along the horizontal axis for clarity.

Table 2 Changes of subject-mean binaural speech intelligibility scores from personal to non-personal HRTF when disturbed with noise: Δ𝐼 = 𝐼𝑛𝑝 − 𝐼𝑝 , standard deviation shown in parenthesis (std.), statistical t-test probability of change being non-null: p(ΔI ≠ 0), sample size 𝑁 = 14, for different azimuth angles 𝜃 in degrees, interaurally correlated (𝐿 = 𝑅) or uncorrelated (L ≠ R) at the left and right ears. Noise

𝜃 (deg)

ΔI (std.)

p(ΔI ≠ 0)

𝐿=𝑅

0 15 30 45 0 15 30 45

−10.0 (12.9) −3.4 (8.4) −12.1 (8.1) −0.6 (10.7) +10.1 (9.0) −9.7 (9.0) −7.6 (6.4) −0.6 (9.1)

0.99 0.85 > 0.99 0.16 > 0.99 > 0.99 > 0.99 0.18

L≠R

bility scores reduce noticeably less than predicted by the model, indicating the need for different or more detailed explanations of the observed results. In particular, this appears to imply that spectral distortion SD is an oversimplified metric of HRTF differences, and that HRTF differences cannot be properly incorporated as an equivalent noise disturbance into binaural speech intelligibility models based on the modulation index, and the Speech Transmission Index. Furthermore, the squared correlation coefficients R2 are very small, showing that there is a very poor linear relationship between intelligibility scores I, and spectral distortion SD. In fact, no other functional relationship is evident in the results, stressing the conclusion that broadband spectral distortion SD is an inadequate measure of HRTF differences. For speech processed through personal and non-personal HRTF at other angles, and with interaurally correlated noise (𝐿 = 𝑅), intelligibility scores are larger for personal than for non-personal HRTF for azimuth angles from 0° to 30°, the difference being slightly less clear at 15°, and with no significant difference being found at 45°. With interaurally uncorrelated noise (L ≠ R), intelligibility scores are larger for personal than for non-personal HRTF for azimuth angles at 15° and 30°, with no significant difference being found at 45°. At the frontal position, azimuth angle 0°, intelligibility scores are greater for non-personal than for personal HRTF. This peculiar result warrants further investigation. The artificial noise and reverberation used in this study allow precise adjustment of the modulation index, but have an extremely smooth and regular character, lacking fluctuation, intermittence, and other acoustic information-garbling characteristics present in real acoustic disturbances, which are known to have strong effects on speech intelligibility. Additionally, noise source location is also known to be a very important factor, whereas in this work noise is assumed to be spatially uniform. Therefore, further study will be necessary to investigate the effects of non-personal HRTF on binaural speech intelligibility under more realistic noise and reverberation scenarios.

ligibility improves for non-personal HRTF), according to the peculiar result already described. 7. Conclusions For speech processed through personal and non-personal HRTF at 𝜃 = +30◦ , with added noise or reverberation, binaural intelligibility scores reduce slightly for non-personal HRTF compared with personal HRTF, and this reduction is statistically significant, except for interaurally correlated reverberation. Changes of the intelligibility scores are perceptually, from barely noticeable, to moderate, below the 15% change that is conventionally considered as perceptually significant. A simple model based on the modulation index theory of speech intelligibility, under the better-ear assumption, and on the spectral distortion measure of HRTF mismatch, provides a tentative explanation, predicting the correct qualitative trend of the results. However, measured intelligi60

F. Orduña-Bustamante, A.L. Padilla-Ortiz and E.A. Torres-Gallegos

Speech Communication 105 (2018) 53–61

Acknowledgments

Cheng, C.I., Wakefield, G.H., 1999. Introduction to Head-Related Transfer Functions (HRTFs): representations of HRTFs in time, frequency, and space. In: In Audio Engineering Society Convention 107. Audio Engineering Society. Audio Engineering Society, pp. 231–249. Cosentino, S., Marquardt, T., McAlpine, D., Culling, F., Falk, T., 2014. A model that predicts the binaural advantage to speech intelligibility from the mixed target and interferer signals. J. Acoust. Soc. Am. 135 (2), 796–807. doi:10.1121/1.4861239. Drullman, R., Bronkhorst, A.W., 2000. Multichannel speech intelligibility and talker recognition using monaural, binaural, and three-dimensional auditory presentation. J. Acoust. Soc. Am. 107 (4), 2224–2235. doi:10.1121/1.428503. Durlach, N., 1963. Equalization and cancellation theory of binaural masking-level differences. J. Acoust. Soc. Am. 35 (8), 1206–1218. doi:10.1121/1.1918675. Edmonds, B.A., Culling, J.F., 2006. The spatial unmasking of speech: evidence for betterear listening. J. Acoust. Soc. Am. 120 (3), 1539–1545. doi:10.1121/1.2228573. Ferrer, O., 1960. Speech audiometry. Laryngoscope 70 (11), 1541–1551. doi:10.1288/00005537-196011000-00004. Gardner, W.G., 1998. 3-D audio using loudspeakers. Kluwer Academic. Houtgast, T., Steeneken, H., 1984. A multi-language evaluation of the RASTI-method for estimating speech intelligibility in auditoria. Acta Acustica united with Acustica 54 (4), 185–199. Houtgast, T., Steeneken, H.J.M., 1985a. A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria. J. Acoust. Soc. Am. 77 (3), 1069–1077. doi:10.1121/1.392224. Houtgast, T., Steeneken, H.J.M., 1985b. The modulation transfer function in room acoustics. B&K Tech. Rev. 3, 3–12. Kalikow, D., Stevens, K., Elliott, L., 1977. Development of a test of speech intelligibility in noise using sentence materials with controlled word predictability. J. Acoust. Soc. Am. 61 (5), 1337–1351. doi:10.1121/1.381436. Kistler, D.J., Wightman, F.L., 1992. A model of head-related transfer functions based on principal components analysis and minimum-phase reconstruction. J. Acoust. Soc. Am. 91 (3), 1637–1647. doi:10.1121/1.402444. Kondo, K., Chiba, T., Kitashima, Y., Yano, N., 2010. Intelligibility comparison of japanese speech with competing noise spatialized in real and virtual acoustic environments. Acoust. Sci. Technol. 31 (3), 231–238. doi:10.1250/ast.31.231. Lavandier, M., Culling, J., 2010. Prediction of binaural speech intelligibility against noise in room. J. Acoust. Soc. Am. 127 (1), 387–399. doi:10.1121/1.3268612. Lavandier, M., Jelfs, S., Culling, J., Watkins, A., Raimond, A., Makin, S., 2012. Binaural prediction of speech intelligibility in reverberant rooms with multiple noise sources. J. Acoust. Soc. Am. 131 (1), 218–231. doi:10.1121/1.3662075. Møller, H., 1992. Fundamentals of binaural technology. Appl. Acoust. 36 (3–4), 171–218. doi:10.1016/0003-682X(92)90046-U. Møller, H., Sørensen, M.F., Hammershøi, D., Jensen, C.B., 1995. Head-related transfer functions of human subjects. J. Audio Eng. Soc. 43 (5), 300–321. Nishino, T., Inoue, N., Takeda, K., Itakura, F., 2007. Estimation of HRTFs on the horizontal plane using physical features. Appl. Acoust. 68 (8), 897–908. doi:10.1016/j.apacoust.2006.12.010. Peissig, J., Kollmeier, B., 1997. Directivity of binaural noise reduction in spatial multiple noise-source arrangements for normal and impaired listeners. J. Acoust. Soc. Am. 101 (3), 1660–1670. doi:10.1121/1.418150. Rumsey, F., 2012. Spatial Audio. CRC Press. Steeneken, H., Houtgast, T., 1980. A physical method for measuring speech transmission quality. J. Acoust. Soc. Am. 67 (1), 318–326. doi:10.1121/1.384464. Takeuchi, T., Nelson, P.A., Hamada, H., 2001. Robustness to head misalignment of virtual sound imaging systems. J. Acoust. Soc. Am. 109 (3), 958–971. doi:10.1121/1.1349539. Tato, J.M., 1949. Lecciones de audiometría. Ateneo. Torres-Gallegos, E. A. and Orduña-Bustamante, F. and Arámbula-Cosío, F., 2015. Personalization of head-related transfer functions (HRTF) based on automatic photo-anthropometry and inference from a database. Appl. Acoust. 97, 84–95. doi:10.1016/j.apacoust.2015.04.009. Wan, R., Durlanch, N., Colburn, H., 2010. Application of an extended equalizationcancellation model to speech intelligibility with spatially distributed maskers. J. Acoust. Soc. Am. 128 (6), 3678–3690. doi:10.1121/1.3502458. Wenzel, E.M., Arruda, M., Kistler, D.J., Wightman, F.L., 1993. Localization using nonindividualized head-related transfer functions. J. Acoust. Soc. Am. 94 (1), 111–123. doi:10.1121/1.407089. van Wijngaarden, S.J., Drullman, R., 2008. Binaural intelligibility prediction based in the speech transmission index. J. Acoust. Soc. Am. 123 (6), 4514–4523. doi:10.1121/1.2905245. Xie, B.S., 2002. Effect of head size on virtual sound image localization. Appl. Acoust. 21 (5), 1–7. Zubick, H.H., Irizarry, L.M., Rosen, L., Feudo, P., Kelly, J.H., Strome, M., 1983. Development of speech-Audiometric materials for native spanish-Speaking adults. Int. J. Audiol. 22 (1), 88–102. doi:10.3109/00206098309072772.

Participation of authors Padilla-Ortiz, Torres-Gallegos was supported by scholarships from CEP-UNAM (Coordinación de Estudios de Posgrado, Universidad Nacional Autónoma de México). Additional funds were also granted by Intel Corporation. Significant improvements were made possible thanks to discussions and suggestions by Rafael de la Guardia, and Héctor Cordourier. Appendix A. Corpus of 191 Spanish words algún allá asno beca botes brazo buque busto cabe calle calor cama canción cano cardo carmen caro cebo cebra cedros celda celo celtas cena cera cerca cero choca ciega cielo cierta cifra cita clame clavo codo conde control corea corta crean críos cuales cumbres cura curas damas dante dardo dejo deme diego dieta dije dime diosa dique disco dones dota duda duelo dulce duna duque ellos esos feria fierro fino flaca flanes freno gestor goce grasa hacia hambre himno hombro irma jalan jaque laca lacre lenta libre lila lina lince liso lista listo llenos lloro luces manto medios meta mide miden mimo mini miope mismo monte nada nadie nave necio nena neta nilo nina niña níquel noche nombre norte nube nuca nulo onda pacto padre pajes pardo pera perla perros persa piano pica pili pisen pista plato pleno pluma premios prensa prima pura puse quepa queso radio regla reno reto saco seda sella sello selva senda seso seta siete siglo sigo simple sones sudo suela surco taches talco tape tapia tarde tecleo tiendas tigre timbre tina tira tiro torno toro trance trenza turco une unos veinte viena vienen viernes vino yeso. References Algazi, V.R., Duda, R.O., Thompson, D.M., Avendano, C., 2001. The CIPIC HRTF database. In: Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575). IEEE, pp. 99–102. doi:10.1109/ASPAA.2001.969552. ANSI S3.5–1997. Methods for the Calculation of the Speech Intelligibility Index. 1997. American National Standards Institute, New York. Ayllón, D., Gil-Pita, R., Rosa-Zurera, M., 2013. Design of microphone arrays for hearing aids optimized to unknown subjects. Signal Process. 93 (11), 3239–3250. Benitez, L., Speaks, C., 1968. A test of speech intelligibility in the Spanish language. Int. Audiol. 7 (1), 16–22. doi:10.3109/05384916809074301. Berruecos T., P., Luis Rodriguez, I.J., 1967. Determination of the phonetic percent in the Spanish language spoken in Mexico City, and formation of P. B. lists of trochaic words. Int. Audiol. 6 (2), 211–216. doi:10.3109/05384916709074256. Beutelmann, R., Brand, T., 2006. Prediction of speech intelligibility in spatial noise and reverberation for normal-hearing and hearing-impaired listeners. J. Acoust. Soc. Am. 120 (1), 331–342. doi:10.1121/1.2202888. Beutelmann, R., Brand, T., Kollmeier, B., 2010. Revision, extension, and evaluation of a binaural speech intelligibility model. J. Acoust. Soc. Am. 127 (4), 2479–2497. doi:10.1121/1.3295575. Blauert, J., 1983. Spatial hearing: The psychophysics of human sound localization. The MIT Press. Bondu, A., Busson, S., Lemaire, V., Nicol, R., 2006. Looking for a relevant similarity criterion for HRTF clustering: a comparative study. Audio Engineering Society Convention 120, Paper Number: 6653. Bronkhorst, A.W., Plomp, R., 1992. Effect of multiple speechlike maskers on binaural speech recognition in normal and impaired hearing. J. Acoust. Soc. Am. 92 (6), 3132– 3139. doi:10.1121/1.404209. Cancel, C.A., 1965. Multiple-Choice intelligibility lists for spanish speech audiometry. Int. Audiol. 4 (2), 91–93. doi:10.3109/05384916509074111. Castañeda, G.R., Pérez, R.S., 1991. Análisis fonético de las listas de palabras de uso más extendido en logoaudiometría. Anales de la Sociedad Mexicana de Otorrinolaringología 1, 23–30. Cervera, T., González-Alvarez, J., 2011. Test of Spanish sentences to measure speech intelligibility in noise conditions. Behav. Res. Methods 43 (2), 459–467. doi:10.3758/s13428-011-0063-2.

61