Fast auralization using radial basis functions type of artificial neural network techniques

Fast auralization using radial basis functions type of artificial neural network techniques

Applied Acoustics 157 (2020) 106993 Contents lists available at ScienceDirect Applied Acoustics journal homepage: www.elsevier.com/locate/apacoust ...

2MB Sizes 0 Downloads 21 Views

Applied Acoustics 157 (2020) 106993

Contents lists available at ScienceDirect

Applied Acoustics journal homepage: www.elsevier.com/locate/apacoust

Fast auralization using radial basis functions type of artificial neural network techniques Roberto A. Tenenbaum a,⇑, Filipe O. Taminato b, Viviane S.G. Melo a a b

Federal University of Santa Maria, Brazil State University of Rio de Janeiro, Brazil

a r t i c l e

i n f o

Article history: Received 2 July 2019 Received in revised form 29 July 2019 Accepted 31 July 2019

Keywords: Fast auralization Room acoustics simulation Artificial neural networks Computational cost reduction

a b s t r a c t This work presents a new technique to produce fast and reliable auralizations with a computer code for room acoustics simulation. It discusses the binaural room impulse responses generation classic method and presents a new technique using radial basis functions type of artificial neural networks. The radial basis functions type of artificial neural networks is briefly presented and its training and testing procedures are discussed. The artificial neural network models the filtered head-related impulse responses for 64,442 directions uniformly distributed around the head with a significant reduction in computational cost of around 90% in the generation of binaural impulse responses. It is shown that the filtered head-related impulse responses calculated with the classical convolution method and with the artificial neural network technique are almost indistinguishable. It is concluded that the new technique produces fastest and reliable binaural room impulse responses for auralization purposes. Ó 2019 Elsevier Ltd. All rights reserved.

1. Introduction This work deals with room acoustics computer simulation and its techniques to generate auralization at selected seats in the room. In general, the room acoustics simulation follows the requirements of geometrical acoustics [1]. This means that the sound waves can be treated as acoustic rays that leave the sound source and propagate in the room, reflecting and refracting on their internal surfaces [2]. There are two main ways of modeling acoustic rays: the ray-tracing method [3] and the image source method [4]. There are also hybrid algorithms that use the image source method for the calculation of first specular reflections and the ray-tracing method for the calculus of the remaining ones [5]. However, as already pointed out by several authors [6,7], diffuse reflection plays an important role in room acoustics, providing a greater uniformity in the sound field. Having in mind the room’s auralization, the diffuse reflections are also fundamental, leading to greater authenticity. In this case, it seems essential to have a good model to deal with diffuse reflections, since the ray-tracing technique cannot handle properly. One of the ways to approximately model diffuse reflections is the radiosity technique [8,9]. The computer models in room acoustics simulation go back to the pioneering work of Schroeder [10] and the main interest then ⇑ Corresponding author. E-mail address: [email protected] (R.A. Tenenbaum). https://doi.org/10.1016/j.apacoust.2019.07.041 0003-682X/Ó 2019 Elsevier Ltd. All rights reserved.

was to simulate room impulse responses (RIRs). Nevertheless, in the early 19900 s years, methods to produce binaural room impulse responses (BRIRs) and to generate auralizations in computer simulation software appeared in the literature [11–18]. In the 20000 s years and later, new developments in the techniques to produce auralizations were achieved [19–24]. Since then, the auralization technique and the acoustical virtual reality has attracted a great deal of interest, becoming the subject of numerous academic research work and doctoral thesis [25–33]. More recently, a special issue on room acoustics simulation and auralization was published by the Journal of the Acoustical Society of America [34] and, in 2016, it was launched the fourth Round Robin on room acoustics simulation, the first one with auralization [35]. Reliable auralization – usually to be heard with properly equalized headphones – can provide the listener with an authentic sense of sound immersion in the simulated environment. This supplies the acoustical designer, for instance, with information about the room’s sound quality, which would not be possible to evaluate by computing only the acoustic quality parameters obtained by the simulator. One could say that the difference between looking to the room’s quality parameters or alternatively hearing its auralization from a given anechoic signal is similar to the difference between reading the menu of a good restaurant or tasting its dishes. It is worth noting, however, that the evaluation of an auralization, being subjective, i.e., depending on the human interpretation, is still a subject of research in progress. Perhaps the most

2

R.A. Tenenbaum et al. / Applied Acoustics 157 (2020) 106993

complete work in this area is the Spatial Audio Quality Inventory [36]. Although 48 different terms in English are considered in [36] to qualify the different human sensations related to an audio content, there was no pretense of finding objective parameters, i.e., a measurable metric estimated by a number. Some alternatives have been published in this direction, using articulation scores to evaluate speech intelligibility, comparing actual rooms with their corresponding computational auralizations [37–39]. It is worth noting that in the room acoustic simulation with auralization there are two bottlenecks to be considered, i.e., calculation segments that result in high computational cost. The first one is the room numerical simulation aiming to the generation of all acoustic rays that reach the receivers. The second one corresponds to the calculus of the binaural room impulse responses for all source-receiver pairs, which must take into account the headrelated impulse responses (HRIRs). In the following, a new technique for generating auralizations is presented, replacing the classical convolution method with a methodology that models the filtered HRIRs through artificial neural networks [40] of the radial basis functions type [41], with a significant computational gain. In Section 2, the radial basis function type of artificial neural network (RBF-ANN) is briefly presented. It is explained how it operates in the present application and its main advantages in terms of computational effort. In Section 3 the training and testing procedures to obtain the set of RBF-ANNs that replaces the HRIRs databank are discussed. In Section 4 the general methodology to produce auralizations with the RBF-ANN technique is examined. In Section 5, the computational cost of the ANN technique is discussed and compared with the computational cost of the classical convolution method. In Section 6, the results obtained for the computation of filtered HRIRs through the convolution method and through artificial neural networks technique are compared, showing that the two methods present results that are virtually indistinguishable from each other. Finally, in Section 7, the work main conclusions are stressed. 2. Radial basis functions type of artificial neural network An artificial neural network (ANN) is an information processing system based on simplified mathematical models of biological neurons whose learning process results from experience. The knowledge gained by the network through the examples are stored in the form of connection synaptic weights that are adjusted in order to make the correct decisions when presented to new entries. In other words, the network has the ability to generalize the learned information. The process of adjusting synaptic weights is performed by the learning algorithm. Artificial neural networks are useful tools for solving many types of problems as, for instance, classification, grouping, optimization, approximation and forecasting. One of the main applications of ANNs is on pattern recognition [42], and this is the application under consideration here: the ANNs are trained to learn the HRIRs patterns. The general model of an artificial neuron used in the different architectures of ANNs, has in its structure the following elements, shown in Fig. 1.where: xj are the inputs of the neuron; wj are the P synaptic weights that balances the inputs; is a linear combinator; b is a bias or activation threshold, whose effect is to increase (or decrease) the activation function; v is an activation potential, which determines if the neuron will produce an excitatory or inhibitory potential; u is the activation function, whose purpose is to restrict the output amplitude of the artificial neuron; and, finally, u is the neuron output. The relationship among the inputs xj and the output u of a neuron is given by

Fig. 1. Elements that make up an artificial neuron.

XN  u¼u ðwj xj Þ þ b ; j¼1

ð1Þ

where N is the number of inputs. The architecture of an artificial neural network describes how the neurons are connected to each other. There are two main types of architectures: non-recurrent neural networks and recurrent neural networks, the second one with feedback [40]. Among several possible architectures of artificial neural networks one of the most efficient and computationally economical are the radial basis functions (RBF) type of ANN. The RBF networks present only one intermediate layer, whose neurons utilize radial basis functions, while in the output layer the linear function is adopted [43]. The radial basis functions belong to a class of functions whose response grows or decreases monotonically in relation to the distance from a central point. The center and the growth or decay rate are parameters to be adjusted in the training phase. The radial basis function that decreases monotonically more commonly used is the Gaussian function, and this option is adopted in this work. Fig. 2 depicts the RBF architecture. Note that at the entry layer there are N0 nodes, at the intermediate layer there are N1 neurons and at the output layer there are N2 neurons. Usually, the number of input nodes and the number of output neurons are previously established for the considered application, so that what remains to be studied is the number of neurons in the intermediate layer and all other ANN parameters. It is possible to determine the number of neurons of the intermediate layer as being equal to the number of input patterns. Thus, the network accurately maps all the input vectors with the desired outputs. However, this exact mapping is undesirable because it usually leads to overfitting and this method was not adopted. On the other hand, a very small quantity is also not much desirable because the network would spend a lot of time trying to find the best representation of the physical problem.

Fig. 2. General architecture of an RBF type of artificial neural network.

R.A. Tenenbaum et al. / Applied Acoustics 157 (2020) 106993

3

Different training algorithms can be used to adjust the parameters (weights, centers, widths) of the RBF networks. Most of these methods have two phases. In the first phase, the number of radial functions and their parameters are determined. These parameters are adjusted by non-supervised algorithms [44], while in the second phase the weights of the output neurons are adjusted. The training time of an RBF network is generally faster than that of other type of networks, since the learning methods for adjusting the radial functions and synaptic weights can be chained sequentially [45], enabling an optimized training. In this work, artificial neural networks of the RBF type are used to generate, given a spectrum in nine octave bands, a filtered HRIR with N2 number of samples, directly in the time domain. For this, each RBF network is trained in a specific direction, constituting a set of 64,442 networks to model the HRIRs all around the human head. The training and test procedures of the RBF-ANNs are discussed in the next section. 3. Training and testing the artificial neural network set The RBF parameters were calculated as follows [46]: first the centers are obtained using the non-supervised K-means algorithm [44]. Once the centers have been calculated, the widths are determined. Finally, after defining the parameters of the radial functions, the free parameters of the output layer are computed using the same procedures that are used for the output layer of other types of neural networks [40]. In the first phase of the RBF training, the K-means algorithm – which is a clustering technique that partitions the set of input patterns in N1 disjoint sets Sj containing Kj vectors each – is used. After this step, the centers lj are recalculated through the mean of the vectors belonging to the set, in the form

lj¼K1

j

X

xðpÞ:

ð2Þ

xðpÞ2Sj

This update of centers and vectors associated with the nearest center is done up to the algorithm convergency, that is, when the centers no longer need to be changed anymore. With the centers already calculated, the width of each radial function rj is determined through the average over all Euclidean distances between the center of the set Si and the center of Sj, that is,

rj ¼

Fig. 3. Spatial distribution of directions around the head with 64,442 directions.

N1 1 X jjlj  l1 jj N1 i¼1

ð3Þ

In the second phase of the RBF training, the task is to compute the synaptic weights and bias of the neurons of the output layer. In contrast to the first phase of RBF training, the second phase uses a supervised learning procedure [44]. In this phase, the main steps are: 1. Initialize the free parameters of the neurons of the output layer; 2. Specify the required precision and the maximum number of iterations; 3. Present to the network the vector x(p) belonging to the training set; 4. Compute the network output; 5. When all the vectors of the training set were presented, determine the mean square error; 6. If the quadratic average error is less than the required accuracy or the number of iterations is greater than a maximum value, then store the free parameters of the network. Otherwise, return to Step 3. For the training, testing and validation of the RBF networks, the HRIRs database with 64,442 directions, available in [47] for the ‘‘Fabian” dummy head, was used as reference patterns. The number of samples of each HRIR in this database is 512 and the directions are distributed around the head at each degree (except at the poles), both in azimuth and elevation. Fig. 3 illustrates the spatial distribution used in the whole procedure of training, test and validation.

Fig. 4. One of the 1790 groups for training the ANNs with Du = Dh = 6°.

The HRIRs were divided into1790 groups of nearby directions, covering an angular amplitude of six degrees, both in azimuth and elevation, that is, with 6  6 = 36 distinct HRIRs in each group, named ‘bud’. It is worth noting that in each of these buds, the HRIR patterns are naturally similar. Fig. 4 illustrates one of these groups. Within each group with 36 directions, 12 were chosen for the training, another 12 for testing and the remaining 12 for the final validation. Therefore, 1790  12 = 21480 training patterns were used in the whole RBF training procedure. In all cases, the network input data is the spectrum in nine bands of the acoustic ray that reaches the receiver. In the training phase and to avoid loss of generality, a large variety of spectra was used. The spectra were generated from simulations in 25 actual rooms of different natures and at distinct seat positions, to obtain a wide range of realistic spectra. Fig. 5 illustrates four such spectra.

4. Fast auralization with ann technique Once the acoustic field in the room simulation is completed, the goal in sequel consists in the determination of the room impulse mono (RIRs) and binaural (BRIRs) responses at selected points. As regards the calculation of RIRs, it is about converting the energy

4

R.A. Tenenbaum et al. / Applied Acoustics 157 (2020) 106993

Fig. 5. (Color online) Four examples of acoustic ray spectra used as entries in the RBS-ANNs training procedure. The horizontal axis is frequency, in nine octave bands, from 63 Hz (1) to 16 kHz (9), and the vertical axis is the normalized amplitude.

arrival, via Hilbert’s transform [48] and filtering in octave bands, obtaining filtered impulse responses, whose computational cost is relatively small. In order to compute the BRIRs, however, it is necessary to take into account the head-related impulse responses (HRIRs) – or their corresponding in frequency domain, the socalled head-related transfer functions (HRTFs). In the computational codes that generate auralization, this is usually done via the convolution procedure. Each acoustic ray that reaches a given receiver carries three main information: (a) its spectrum, in octave bands; (b) its direction of arrival, in azimuth and elevation; and (c) its arrival time. The acoustic rays, when leaving the sound source, have a flat spectrum, since the goal is to obtain RIRs and BRIRs. Nevertheless, due to the several reflections with absorption and scattering at the room’s boundary surfaces, as well as the high-frequency attenuation in the air during its trajectory, they reach the receiver with a filtered spectrum. This must then be multiplied by the HRTF of the direction closest to its arrival direction. The choice of the closest direction is done by rounding, as shown in Fig. 6. The computational cost of this approximation is negligible and the maximum error is below 0.5 degree, under the minimum audible angle [49]. In the sequel, on the resulting spectrum an inverse fast Fourier transform must be applied in order to obtain the filtered HRIR for that acoustic ray. Then, every filtered HRIR must be delayed by its

arrival time and the sum of all the filtered and delayed HRIRs that reach the receiver will constitute the BRIR for that source-receiver pair. The procedure is equivalent to the convolution (in time domain) of the acoustic ray with the HRIR of the corresponding direction and henceforth it will be called convolution method (CM), or classical method. Fig. 7 illustrates the procedure. The described process has the drawback of its computational cost. In fact, the complex product between the HRTF of the considered direction with the spectrum of the wavefront that reaches the receiver, followed by the fast inverse Fourier transform in two channels to obtain the filtered HRIRs presents a high computational cost, mainly taking into account that the number of acoustic rays that reaches each receiver can be of the order of 105, in a good simulation. However, the delay and addition procedures of the filtered HRIRs for the BRIR calculation demand very low computational cost. The idea of the proposed technique is to replace the HRIRs/ HRTFs database with a set of 64,442 trained and tested artificial neural networks that ‘‘learned” previously the HRIRs patterns. In this way, the procedure is all carried out in time domain. The radial basis function networks have as input the nine acoustic ray spectral components in octave bands (between 63 Hz and 16 kHz) that reach the receiver. So, N0 is taken as 9. N1 neurons are used in the hidden network layer and, at the output, N2 neurons are adopted, which correspond to the time amplitude samples of the resulting filtered HRIR. In the next section the values for N1 and N2 will be examined. Then, the delay procedure of each filtered HRIR and addition of all these functions to obtain the BRIR is identical to that of the convolution method. Fig. 8 illustrates the procedure. At the bottom of Fig. 8 two previously trained RBF-ANNs are displayed, one for each ear, for the given acoustic ray direction. In the next section the computational cost of the two techniques are compared.

5. Computational cost

Fig. 6. (Color online) Direction of acoustic ray arriving (grey) and direction assumed with available data (red). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

In order to examine the convolution method (CM) numerical efficiency and that of the artificial neural network method (ANNM), a comparison is made as to the number of arithmetic operations that each technique requires. The number of operations in the convolution method to compute the filtered HRIRs equals the sum of two parts. The first one corresponds to the number of multiplications between the ray spectrum (in octave bands) and the HRTF of the considered direction. Note, however, that due to the HRTF symmetry, only l/2 products, with l being the number of samples, are necessary. The second

R.A. Tenenbaum et al. / Applied Acoustics 157 (2020) 106993

5

Fig. 7. (Color online) General procedure to produce a BRIR by the convolution method.

one corresponds to the number of operations for calculating the inverse Fourier transform. The total number of mathematical operations in the convolution method is then given by [50]

OCM ¼

l þ 5l log2 l: 2

ð4Þ

It is worth noting that, since the convolution method deals with Fourier transforms, l = 512 samples are strictly necessary to preserve information in all nine octave bands. So, for each filtered HRIR, the computational cost in number of arithmetic operations is

OCM ¼ 23; 296

ð5Þ

The training and testing procedures of artificial neural networks are known to be computationally costly and were discussed in Section 3. However, the execution phase – particularly the one of the radial basis functions networks – is quite efficient. The number of operations of the RBF network during the implementation phase is the sum of three parts. The first one is associated to the sum of the number of operations performed to compute the output of the activation function of each neuron of the intermediate layer. It can be shown [51] that N0 being the number of input entries of the RBF network, N1 the number of neurons in the intermediate layer and N the number of retained terms of the Taylor series in which the activation function is truncated [52], the number of operations of this part is

O1 ¼ N1 ð3N0 þ 2NÞ:

ð6Þ

The second one corresponds to the sum of the operations’ number to compute each neuron of the output layer. Being N2 the number of neurons of this layer, the number of sums and products of this second part is

O2 ¼ 2N1 N2

ð7Þ

In the third part, it must be taken into account the cost of normalizing the inputs and the subsequent multiplication of the network output by the inverse factor that was used to normalize the input. These procedures require N0 + N2 multiplications. One can conclude than that the total number of operations to generate the output of the RBF network in the ANN method is, then,

OANNM ¼ N1 ð3N0 þ 2N2 þ 2NÞ þ N0 þ N 2 :

ð8Þ

Taking now N0 = 9 (nine octave bands) and N = 35 [33] one would have

OANNM ¼ ð2N2 þ 97ÞN 1 þ N2 þ 9

ð9Þ

Let us now examine the computational cost of the ANN method as a function of N1 and N2. It is worth noting that there is no way other than trial and error to check the minimum number of neurons in the intermediate layer that still provides good binaural room impulse responses. Several tests were performed with N1 ranging from 2 to 20. The best cost-benefit ratio was found for

6

R.A. Tenenbaum et al. / Applied Acoustics 157 (2020) 106993

Fig. 8. (Color online) General procedure to produce a BRIR by the ANN-RBF technique.

N1 = 5. This means that the difference between the filtered HRIRs calculated by the two methods (CM and ANNM) becomes, for that value of the number of neurons in the intermediate layer of the network, imperceptible. The comparative results presented in Section 6 are computed with N1 = 5. The computational cost in terms of the number of arithmetic operations of the artificial neural network’s method is, then,

OANNM ¼ 11N2 þ 494:

ð10Þ

Considering, alternatively, N 2 ¼ 128; N 2 ¼ 256and N 2 ¼ 512, the number of arithmetic operations of the ANN method and its percentual rate to the number of arithmetic operations of the convolution method, Eq. (5), is displayed in Table 1. As can be seen in Table 1, even with N 2 ¼ 512 samples, the ANN method presents a computational cost reduction of 73.7%. How-

Table 1 Number of arithmetic operations for the ANNM method and percentual rate among the two methods as a function of.N 2 : N2 OANNM OANNM OCM

 100%

128 1902 8.16

256 3310 14.2

512 6126 26.3

ever, since the ANN method operates directly in time domain, it is not necessary to retain 512 samples to generate the BRIRs and 128 samples are enough to describe the filtered HRIRs. In this case, the computational cost reduction is 91.8%. In other words, with the RBF-ANN method, the BRIRs are computed roughly ten times faster (since the delay and sum procedures present very low computational cost). In the next section the filtered HRIRs obtained by the two methods are compared – both in time and frequency domain –, showing that, for all practical purposes, they are indistinguishable. 6. Comparative results for filtered HRIRs As mentioned, the convolution technique is the classic BRIR generation method and it is present in almost all acoustic field simulation software with auralization, to our knowledge. Therefore, in order to verify the reliability of the method of generating BRIRs with artificial neural networks of the radial basis function type, a comparison between the two methods is presented in the sequel. Since, once the filtered HRIRs are generated, the procedure is identical, involving the delays and sum to generate the BRIRs, the comparison between the two methods will be done among the filtered HRIRs. In other words, since the procedures of delay and sum of the filtered HRIRs are exactly the same in the two techniques, if the

R.A. Tenenbaum et al. / Applied Acoustics 157 (2020) 106993

7

Fig. 9. (Color online) Filtered HRIRs obtained with CM and ANNM, for u = 37° and h = 31°.

Fig. 10. (Color online) Filtered HRIRs obtained with CM and ANNM, for u = 156° and h = 22°.

filtered HRIRs computed by both techniques are almost identical, the resulting BRIR will be also the same. Figs. 9 and 10 indicate the filtered HRIRs computed via the convolution method (CM) compared with the same functions calculated by the artificial neural networks method (ANNM), for two randomly chosen directions, one in front of the frontal plane and other behind the frontal plane. In addition to the confrontation of the filtered HRIRs, in time domain, their counterpart in frequency domain, both in magnitude and phase, are shown. Since the algorithm for adjusting the free parameters of the output layer uses the quadratic error to evaluate the performance of the network, these errors can be masked by the low absolute value. The relative error also does not offer a good choice, since the relative errors may be high, but this does not mean that there is an important error between the two functions. To evaluate the similarity between the functions, then, the normalized crosscorrelation coefficient r, given by [53]

PN 2 j¼1 uj yj r ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N P2  2 PN2  2 uj j¼1 yj

ð11Þ

j¼1

is used. In Eq. (11), uj are the RBF-ANN outputs and the yj the corresponding outputs computed by the convolution method. The similarity between the two functions is greater as r is closest to 1. In Figs. 9 and 10, u = azimuth and h = elevation. It can be seen in Figs. 9 and 10 a significant similarity between the graphs, both in time and frequency domain, including the phase plot, with values of the cross-correlation coefficient very close to one. Indeed, this comparison was performed for all 64,442 considered directions around the head and the corresponding cross-correlation coefficient remained in the interval (0.9985, 1,0000), regardless of the considered direction (0 < u < 360; 90 < h < 90Þ. These results prove that the technique of generating filtered HRHRs – and thus generating BRIRs – is not only fast but accurate.

7. Conclusion remarks A new technique was proposed to model the filtered headrelated impulse responses, which are the core of binaural room impulse responses generation with the purpose of auralization production. The model is based on artificial neural networks of the radial basis functions type, resulting in a computational time saving of around 90% in the BRIRs computation. As in any procedure of room acoustics simulation, there are two blocks of higher computational cost, namely: the calculation of all acoustic rays arriving at each receiver; and the computation of the binaural room impulse responses for each pair source-receiver. This second block benefited from the modeling through artificial neural networks, with a significant computational time saving, especially if many receivers are under consideration, opening possibility of future real-time calculations. The comparative results between the filtered HRIRs obtained by the classical convolution method and by the proposed model using artificial neural networks showed that the functions are, for all practical purposes, indistinguishable, presenting a normalized cross-correlation coefficient virtually equal to unity, both in time and frequency domains for all considered directions. The training and testing procedures of the radial basis functions type of artificial neural networks are faster than other architecture networks. However, to train, test and validate the whole set of 64,442 of networks takes a large computational resource commitment. Nevertheless, once trained, the RBF-ANN databank is available for use in any room auralization.

References [1] Kuttruff H. Room acoustics. 5th ed. London: Spon Press; 2009. [2] Savioja L, Svensson UP. Overview of geometrical room acoustic modeling techniques. J Acoust Soc Am 2015;138(2):708–30. [3] Ondet M, Barbry JL. Modeling of sound propagation in fitted workshops using ray tracing. J Acoust Soc Am 1989;85(2):787–96.

8

R.A. Tenenbaum et al. / Applied Acoustics 157 (2020) 106993

[4] Borish J. Extension of the image model to arbitrary polyhedra. J Acoust Soc Am 1984;75:1827. [5] Vorländer M. Simulation of the transient and steady-state sound propagation in rooms using a new combined ray-tracing/image-source algorithm. J Acoust Soc Am 1989;86(1):172–8. [6] D’Antonio P, Cox TJ. Diffusor application in rooms. Appl Acoust 2000;60:113–42. [7] Cox TJ, Dalenbäck BI, D’Antonio P, Embrechts JJ, Jeon JY, Mommertz E, et al. A tutorial on scattering and diffusion coefficients for room acoustic surfaces. Acta Acustica United Acustica 2006;92(1):1–15. [8] Alarcão D, Bento Coelho JL, Tenenbaum RA, On modeling of room acoustics by a sound energy transition approach. In: Proceedings of EEA Symposium on Architectural Acoustics, 2000. [9] Tenenbaum RA, Camilo TS, Torres JCB, Gerges SNY. Hybrid method for numerical simulation of room acoustics: part 1 – theoretical and numerical aspects. J Braz Soc Mech Sci Eng 2007;29(2):211–21. [10] Schröeder M, Digital computers in room acoustics. In: Proc. 4th ICA, Copenhagen, 1962. [11] Blauert J, Lehnert H, Pompetzki W, Xiang N. Binaural room simulation. Acustica 1990;72:295–6. [12] Ahnert W, Feistel R, Binaural auralization from a sound system simulation programme. In: Proc. 91th AES Convention, New York, 1991. [13] Lehnert H, Blauert J. Principles of binaural room simulation. Appl Acoust 1992;36:259. [14] Møller H. Fundamentals of binaural technology. Appl Acoust 1992;36:171. [15] Vian J-P, Martin J. Binaural room acoustics simulation: practical uses and applications. Appl Acoust 1992;36:293. [16] Kleiner M, Dalenbäck B-I, Svensson P. Auralization – an overview. J Audio Eng Soc 1993;41:861. [17] Begault D. 3-D sound for virtual reality and multimedia. Cambridge: Academic Press Professional; 1994. [18] Dalenbäck B-I, McGrath D. Narrowing the gap between virtual reality and auralization. In: Proc. 15th ICA, Trondheim, 1995. [19] Sottek R, Virtual binaural auralization of product sound quality: importance and application in practice. Proc. EURONOISE, Naples, 2003. [20] Rindel JH, Otondo F, Christensen CL. Sound source representation for auralization. In: Proc. Int. Symp. on Room Acoust., Awaji, 2004. [21] Torres JCB, Petraglia MR, Tenenbaum RA. An efficient wavelet-based HRTF for auralization. Acta Acustica United Acustica 2004;90:108. [22] Otondo F, Rindel JH. A new method for the radiation representation of musical instruments in auralizations. Acta Acustica United Acustica 2005;91:902. [23] Dalenbäck B-I, Strömberg M. Real time walkthrough auralization – the first year. In: Proc. IOA Spring Conference, Copenhagen, 2006. [24] Summers JE. What exactly is meant by the term ‘auralization?’. J Acoust Soc Am 2008;124(2):697. [25] Dalenbäck B-I. A new model for room acoustics prediction and auralization Doctoral thesis. Gothenburg: Chalmers University; 1995. [26] Hammershøi D. Binaural technique – a method of true 3-D sound reproduction. Doctoral thesis. Denmark: Aalborg University; 1995. [27] Savioja L. Modelling techniques for virtual acoustics Doctoral thesis. Finland: Helsinki University of Technology; 1999. [28] Lokki T. Physically-based auralization – design implementation and evaluation Doctoral thesis. Finland: Helsinki University of Technology; 2002. [29] Torres JCB. Efficient auralization system using wavelet transforms Doctoral thesis. Brazil: Federal University of Rio de Janeiro; 2004.

[30] Schröeder D. Integration of real-time room acoustical simulations in VR environments Doctoral thesis. Germany: RWTH Aachen University; 2004. [31] Thaden R. Auralization in building acoustics Doctoral thesis. Germany: RWTH Aachen University; 2005. [32] Naranjo JFL. Machine learning applied in the generation of binaural room impulse responses and in auralization in rooms Doctoral thesis. State University of Rio de Janeiro; 2014. [33] Taminato FO. Artificial neural networks applied to model head-related impulse responses to generate auralization Ph.D. Thesis. State University of Rio de Janeiro; 2018. [34] Savioja L, Xiang N. Introduction to the special issue on room acoustic modeling and auralization. J Acoust Soc Am 2019;145(4):2597–600. [35] Brinkmann F, Aspöck L, Ackermann D, Lepa S, Vorländer M, Weinzierl S. A round robin on room acoustical simulation and auralization. J Acoust Soc Am 2019;145(4):2746–60. [36] Lindau A, Erbes V, Lepa S, Maempel HJ, Brinkmann F, Weinzierl S. A spatial audio quality inventory for virtual acoustic environments (SAQI). Acta Acustica United Acustica 2014;100(5):984–94. [37] Peng J. Feasibility of subjective speech intelligibility assessment based on auralization. Appl Acoust 2005;66:591–601. [38] Hodgson M, York N, Yang W, Bliss M. Comparison of predicted, measured and auralized sound fields with respect to speech intelligibility in classrooms using CATT-Acoustic and ODEON. Acta Acustica United Acustica 2008;94(6):883–90. [39] Tenenbaum RA, Taminato FO, Melo VSG, Torres JCB. Auralization generated by modeling HRIRs with artificial neural networks and its validation using articulation tests. Appl Acoust 2018;130:260–9. [40] Haykin S. Neural networks and learning machines. 3rd ed. New Jersey: Prentice Hall; 2009. [41] Mulgrew B. Applying radial basis functions. IEEE Signal Process Mag 1996:50–65. [42] Bishop C. Neural networks for pattern recognition. Oxford: Oxford University Press; 2005. [43] Broomhead DS, Lowe D. Multivariable functional interpolation and adaptive networks. Complex Syst 1988;2:321–55. [44] Moody J, Darken CJ. Fast learning in networks of locally-tuned processing units. Neural Comput 1989;1:281–94. [45] Mulgrew B. Applying radial basis functions. IEEE Signal Process Mag 1996;50–65. [46] Chen S, Muldrew B, Mclaughlin S, Adaptative bayesian feedback equalizer based on a radial basis function network. IEEE International Conference on Communications, Chicago, 3, pp. 1267–1271, 1992. [47] Brinkmann F., Lindau A., Weinzierl S., Geissler G., van de Par S., Müller-Trapet M., Opdam R., Vorländer M. The FABIAN head-related transfer function data base. (2017), available at http://dx.doi.org/10.14279/depositonce-5718.2 (Last viewed September, 2018). [48] Kak S. Number theoretic Hilbert transform. Circ Syst Signal Process 2014;33:2539–48. [49] Blauert J. Spatial hearing. Cambridge: The MIT Press; 1997. [50] Willcox RR. Introduction to robust estimation and hypothesis testing. New York: Academic Press; 2005. [51] Proakis JG, Manolakis DG. Digital signal processing: principles, algorithms and applications. 3rd ed. New Jersey: Prentice-Hall; 1996. [52] Conte SD, Boor C. Elementary numerical analysis: an algorithmic approach. 3rd ed. New York: McGraw-Hill; 1980. [53] Xie B, Zhong X, He N. Typical data and cluster analysis on head-related transfer functions. Appl Acoust 2015;94:1–13.