Down-mixing of multi-channel audio for sound field reproduction based on spatial covariance

Applied Acoustics 71 (2010) 1177–1184 Contents lists available at ScienceDirect Applied Acoustics journal homepage: www.elsevier.com/locate/apacoust...

Download PDF

2MB Sizes 0 Downloads 36 Views

Report

PDF Reader
Full Text

Applied Acoustics 71 (2010) 1177–1184

Contents lists available at ScienceDirect

Applied Acoustics journal homepage: www.elsevier.com/locate/apacoust

Down-mixing of multi-channel audio for sound ﬁeld reproduction based on spatial covariance Yoshinori Takahashi a,b,⇑, Akio Ando a a b

NHK Science and Technical Research Laboratories, Tokyo, Japan Kogakuin University, Tokyo, Japan

a r t i c l e

i n f o

Article history: Received 26 December 2008 Received in revised form 29 July 2010 Accepted 3 August 2010 Available online 30 August 2010 Keywords: Sound ﬁeld reproduction Acoustic signal processing Virtual reality Surface acoustic waves Down-mixing

a b s t r a c t This article describes a method for automatic down-mixing multi-channel audio content on the basis of spatial covariance. Such a down-mixing method should be able to convert the signal of the original multichannel audio system into that for an alternative system with the lesser number of channels, while maintaining the spatial impression of sound. Moreover, it should take into account the listener’s position and transfer function. Wave surface control and convolving the head related transfer function are techniques used in sound ﬁeld control or reproduction. We consider that the spatial impressions of a sound ﬁeld, which we perceive through our ears, are reproduced by preserving the relative relationship between observation points even if the wave surface is not completely controlled. Takahashi et al. proposed a new sound ﬁeld reproduction method that we named ‘‘SOund ﬁeld Reproduction based on sPAtial Covariance” (SORPAC). SORPAC can control the point-to-point covariance in a sound ﬁeld. We expect that this sound ﬁeld reproduction method based on spatial covariance can be applied to down-mixing of multichannel content because SORPAC does not require the listener’s position or transfer function. This article describes SORPAC and its characteristics. We used SORPAC for down-mixing audio content. We conﬁrmed that SORPAC-based down-mixing could accurately reproduce the interaural cross correlation (IACC) in relation to the listener’s position. Ó 2010 Elsevier Ltd. All rights reserved.

1. Introduction An ultra high deﬁnition television (UHDTV) system with 4000 scanning lines and 22.2 channel audio signals has been developed by the Japan Broadcasting Corporation (NHK) [1]. The system requires a down-mixing method to reproduce the spatial impression of multi-channel audio content if that content is to be played in a conventional home audio-visual environment such as on a 5.1 channel surround audio system. Ando proposed an method of adapting multi-channel sound reproduction to a restricted speaker arrangement [2]. This method automatically makes a best up- or down-mix by corresponding the sound pressure vector at the listener’s position in the reproduction sound ﬁeld with the original ﬁeld by taking into account a geometrical relationship between the speaker positions and the listener’s position. However, this method requites the loudspeakers’ coordinates. Ambisonics provides for full upward compatibility to any number of loudspeakers in any reasonable conﬁguration for sound ﬁeld reproduction [11,12]. The encoding condition of sound ﬁeld using Ambisonic methods is necessary to reproduce the sound ﬁeld. Recently Pulkki ⇑ Corresponding author. Present address: Tokyo Metropolitan College of Industrial Technology, Tokyo, Japan. E-mail address: [email protected] (Y. Takahashi). 0003-682X/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.apacoust.2010.08.002

et al. propose the directional audio coding (DirAC) method [13]. For the applications of DirAC in stereo up-mixing have been discussed. The up-mixing based DirAC is possible by simulating the anechoic B-format re-recording [14,15]. While the down-mixing is not considered on those papers, the usage of DirAC for down-mixing is expected. However, the down-mixing based Ambisonics or DirAC also require the precise loudspeakers coordinates from the listening position. On the other hand, sound ﬁeld control or reproduction techniques have been widely studied over the past few decades. A group at the University of Göttingen, including Meyer, Burgtorf, and Damask, tried to simulate concert-hall performances by the most obvious approach, in which the listener is surrounded by 65 extra loudspeakers providing appropriately delayed sound from the proper directions [3]. Damaske [4] and Shaw [5,6] deﬁned a head related transfer function (HRTF) in order to evaluate the relationship between sound image perception and sound ﬁeld conditions. Subjective diffuseness is also important for conveying spatial impression [7]. Morimoto and Ando realized a sound image using the HRTF method [8]. Ise devised an active wave-surface control based on the Kirchhoff–Helmholtz integral equation [9,10]. These studies involve HRTF convolution and wave-surface control methods based on mathematics and physical theory; these methods are very effective when all the conditions of the theories are

1178

Y. Takahashi, A. Ando / Applied Acoustics 71 (2010) 1177–1184

satisﬁed. The conditions are, however, hard to satisfy in practice. This is because the HRTF technique is not robust to movements of the listening position, and the wave-surface control technique requires many devices. Tohyama and Suzuki analysed the frequency characteristics of the interaural cross correlation (IACC) in stereophonic reproduction [16]. They discussed how to reproduce IACC in a reverberant ﬁeld using 4-channel stereo reproduction. Muraoka proposed a multi-channel recomposition method based on the frequency characteristics of IACC [17]. It is known that IACC is statistical parameter related to sound ﬁeld perception and predict the subjective diffuseness [18–20]. Moreover, the IACC is also a signiﬁcant factor in determining the perceived horizontal direction of a sound [20]. Takahashi, an author of this paper, and Tohyama extended the IACC control to a spatial covariance control. They proposed a new method for sound ﬁeld reproduction [21] that renders the spatial covariance of the reproduced sound ﬁeld. This reproduction method does not require the listener’s position in the reproduced ﬁeld or the transfer function. And this method reproduces a sound ﬁeld to become close to the quality of the original sound ﬁeld as much as possible where reproduced by the limited number of loudspeakers in three-dimensional space. Hence, we expect that the spatial covariance method can be used for down-mixing. In this paper, We named the method sound ﬁeld reproduction based on spatial covariance (SORPAC), and have discussed it and general sound ﬁeld control in the frequency domain [22]. In this paper, we apply SORPAC to a new down-mixing method that can deal with changes in the listener’s position, loudspeakers coordinates and transfer function. We conﬁrmed that SORPAC-based down-mixing accurately reproduces the interaural cross correlation in relation to the listener’s position. This paper is organized a follows. Section 2 describes SORPAC. Section 3 describes a numerical simulation of SORPAC. We discuss the down-mixing experiment and its results in Section 4. Section 5 summarizes this paper.

SORPAC estimates the optimum signal-mixing coefﬁcients (amplitude weights and time delays for several loudspeakers) that minimize the difference in the point-to-point covariance between the original and reproduced sound ﬁelds. Fig. 1 overviews SORPAC. First, we measure a set of spatial covariance matrixes and record the multi-channel sound sources in the original sound ﬁeld. Then, we minimize the difference in spatial covariance between the original and reproduced sound ﬁelds and obtain the optimum coefﬁcients for signal-mixing the multi-channel records. In this algorithm, the difference between b of the original covariance matrix C and the covariance matrix C the reproduced sound ﬁeld is 2 b 1 tr½C1 C=ðJ b D ¼ tr½C C Þ:

ð3Þ

Here, tr means the trace of a matrix, and J is the dimension of b are completely the same. the square matrix C. D is 1 if C and C SORPAC estimates the optimum signal-mixing parameters to reduce the error in D by using the method of steepest descent [23] in the time domain. Fig. 2 shows the ﬂowchart of the optimization algorithm in SORPAC. In addition, SORPAC reduce the difference of the spatial covariance matrix between original ﬁeld and reproduction ﬁeld. In this paper, we use Eq. 3 for the error function such indicate the similarity of matrixes in mathematically. However, the deciding the error function has a controversial problem.

2. Sound ﬁeld reproduction based on spatial covariance (SORPAC) [21] 2.1. Overview of SORPAC SORPAC is a new reproduction method that extends IACC control [16,17] to include spatial covariance control. The spatial covariance coefﬁcient (point-to-point covariance coefﬁcient) between 2-channel signals s1(n) and s2(n) with N points length is given by

C 1;2 ¼

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ N 1 1X s1 ðnÞs2 ðnÞ ¼ E½s1 ; s2 ¼ q1;2 E½s21 E½s22 ; N n¼0

Fig. 1. Outline of sound ﬁeld reproduction based on spatial covariance (SORPAC).

ð1Þ

where n is the signal samples at corresponding times q1,2 means the correlation coefﬁcient between s1 and s2, E[] means the average of , and we suppose that the means of s1 and s2 are each zero. We call the covariance matrix

2

C 1;1

6 6 C 2;1 6 6 C¼6 6 .. 6 . 4 C J;1

C 1;2 C 2;2 .. . C J;2

C 1;J

3

7 C 2;J 7 7 7 7 .. .. 7 . . 7 5

ð2Þ

C J;J

the spatial covariance matrix for J channel signals. The spatial covariance is one of the parameter such represents the time invariant statistical characteristics of the sound ﬁeld including the spatial variances of phase and the mutual magnitude characteristics.

Fig. 2. Flowchart of the optimization algorithm in SORPAC.

1179

Y. Takahashi, A. Ando / Applied Acoustics 71 (2010) 1177–1184

2.2. SORPAC and general sound ﬁeld control We discuss the relationship between SORPAC and conventional sound ﬁeld control using transfer functions. The SORPAC is used in the time domain. So the spatial covariance is calculated by the signals in time domain. However, if we consider the iFT of correlaR R tion function /12 ðsÞ ¼ M1 U12 ðxÞejxs dx, while /12 ðsÞ s1 ðtÞs2 ðt þ sÞdt and U12 ðxÞ ¼ S1 ðxÞS2 ðxÞ. There, we think /12 ðsÞ ¼ R 1 S1 ðxÞS2 ðxÞejxs dx for s = 0. And then, the signal covariance M (in time domain) C1,2 can be rewritten by using the mutual energy spectrum of the signals s1(n) and s2(n) as follows:

C 1;2 ¼

N 1 1 X 1X 1 M s1 ðnÞs2 ðnÞ ¼ S1 ðkÞS2 ðkÞ: N n¼0 N M k¼0

ð4Þ

Here, S(k) means the spectrum of s(n) formed by a M(PN) point DFT, and S*(k) means the complex conjugate of S(k). The spatial covariance matrix can be rewritten as follows:

2P

S1 S1

k 6P 6 6 S2 S1 1 6 k 6 C¼ .. NM6 6 . 6 4P SJ S1 k

P k

P k

P k

S1 S2

P

S2 S2

P

.. .

..

SJ S2

3

7 7 S2 SJ 7 7 k 7 ¼ 1 ST S ; 7 NM .. 7 . 7 P 5 SJ SJ k

.

S1 SJ

ð5Þ

k

where S = [S1 S2 . . . SJ]T is shown in Fig. 1. We assume a sound ﬁeld reproduction in the frequency domain by using L channels of the reproduced spectrum Pl(1 6 l 6 L) and J

Fig. 3. Result of numerical simulation of SORPAC: (a) original wave surface, (b) reproduced wave surface by SORPAC, (c) wave surface by direct reproduction, (d) original spatial covariance matrix, (e) reproduced spatial covariance matrix, (f) reproduced spatial covariance matrix by direct reproduction, (g) wave forms recorded in the original sound ﬁeld, (h) wave forms recorded in the reproduced sound ﬁeld by SORPAC, (i) wave forms recorded in the reproduced by direct reproduction, (j) source direction estimation in the original sound ﬁeld from the correlation function of waves in panel (g), (k) source direction estimation using waves in panel (h) , and (l) source direction estimation using waves in panel (i).

1180

Y. Takahashi, A. Ando / Applied Acoustics 71 (2010) 1177–1184

ﬁeld to become close to the quality of the original sound ﬁeld as much as possible where reproduced by the limited number of loudspeakers. When H1 is unknown, the solution to X will include the uncertainty of the phase / such that

X j1 X j2 ¼ ðX j1 ei/ðj1 ;j2 Þ Þ ðX j2 ei/ðj1 ;j2 Þ Þ

ð8Þ

for any /. However, if use SORPAC in the time domain in each frequency band and the signal-mixing coefﬁcients are optimized to be positive real numbers, we can make S HXP. 3. Numerical simulation of SORPAC

Fig. 4. Loudspeakers and microphone arrangement in the down-mixing experiment: (a) loudspeaker arrangement for original 10-channel content, (b) loudspeaker arrangement for down-mixed 5-channel content, and (c) microphone arrangement for spatial covariance.

channels of the observed spectrum Sj(1 6 j 6 J) for the spatial covariance estimation (Fig. 1). We set the O channels of the reproduced spectrum Qo such that (1 6 o 6 O) and observe the reproduced spectrum b S j ð1 6 j 6 JÞ. We get the spatial covariance matrix

b¼ C

1 bT b 1 1 S S ¼ ðHQ ÞT ðHQ Þ ¼ ðHXPÞT ðHXPÞ ; NM NM NM

ð6Þ

in the reproduced sound ﬁeld, where P = [P1 P2 . . . PL]T, b S¼ ½b S1 b S J T , X (O rows and L columns) is the signal-mixing matrix, S2 b and the matrix H (J rows and O columns) is the transfer function between loudspeakers and the observation points in the reproduced sound ﬁeld. SORPAC ﬁnds a signal mixing matrix X that makes

ST S ðHXPÞT ðHXPÞ :

The simulation used a sound source signal s(t) = sin(2p155t) + sin(2p200t) + sin(2p245t), as shown in Fig. 3a. We arranged 36 microphones for recording the sound ﬁeld and 25 microphones for the spatial covariance observation. We mixed with SORPAC and got the sound ﬁeld shown in Fig. 3b. We direct reproduced without mixing and got the sound ﬁeld shown in Fig. 3c. We set the loudspeakers at the same positions as the recording microphones in Fig. 3a. We then observed the spatial covariance in the reproduced sound ﬁeld. Fig. 3d–e shows the spatial covariance matrices (25 25) in the original and reproduced sound ﬁelds. We used the method of steepest descent to solve a signal mixing matrix with D 1 for the error function. Fig. 3f shows the spatial covariance matrix in the direct reproduced sound ﬁeld. It is not used for direct reproduction, but calculated because of the comparison. We can conﬁrm that Fig. 3e is similar to Fig. 3d than Fig. 3f. We estimated the sound source direction from the correlation function between signal waves (Fig. 3g–i) recorded at the points r1 and r2 in Fig. 3a–c. Fig. 3j–l illustrates the results of the direction estimation. As shown in Fig. 3d–e, the spatial covariance matrices of the original and reproduced sound ﬁeld seem to be similar. The results show that SORPAC can almost reproduce the sound source direction compared with direct reproduction by non-mixing (Fig. 3j– l). The wave surfaces can be accurately reproduced on the observation area (Fig. 3a–c). The wave surfaces of SORPAC used (Fig. 3b) is more similar to the original wave surface (Fig. 3a) than the direct reproduction (Fig. 3c). The simulation shows that SORPAC can reproduce the sound source direction. Developing more practical reproduction systems, SORPAC requires processing in each frames for the changing sound ﬁeld with spatial covariance changes. The performances of several reproduction methods on the point of view of reproductivity and device costs are now discussing [24]. We used two sets of recording microphones for measuring the spatial covariance and recording the sound ﬁeld in this paper. This settings is for compare with SORPAC and direct reproduction. As a practical use, the microphones set for measuring for spatial covariance should be use for recording the sound ﬁeld too without preparing the another microphones set.

ð7Þ

In practically, we use SORPAC in several frequency band on time domain and obtain the solution X as the weights of mixing signals in each frequency bands. However, if we use SORPAC in every frequency bin, the solution X will converge H1 in theoretically. So the weights of mixing signals (the solution of SORPAC) X represents approximation of H1. Therefore, SORPAC reproduces a sound

4. Down-mixing experiment using SORPAC 4.1. About the experiment In this experiment, we used 10-channel signals as the original audio content and tried to down-mix to 5-channel signals. The original 10-channel signals were the middle layer signals of the 22.2 channel audio used for the HDTV system [1]. The original 10-channel content was reproduced by loudspeakers arranged as shown in Fig. 4a. We tested two pieces of audio content. One was a 7-s orchestral phrase. The other was 2 s of natural environmental sound including bird songs. These contents give different impressions. The 10-channel contents were each down-mixed to 5-channel contents and reproduced by loudspeakers arranged as

Y. Takahashi, A. Ando / Applied Acoustics 71 (2010) 1177–1184

1181

Fig. 5. Arrangement of the dummy-head-microphones and loudspeakers.

shown in Fig. 4b. The spatial covariance matrix was measured with microphones arranged as shown in Fig. 4c. The signal-mixing coefﬁcients were optimized in several frequency bands. 100, 200, 400, 800, 1600, 3200, 6400, 12,800 (Hz) were each taken for the center frequency (bandwidth of 100 Hz). The spatial covariance must be observed in an actual reproduction environment. However, it is unusual for the reproduction environment to be known in practical circumstances. In this experiment, we used the calculated spatial covariance that can be presumed when 10-channel signals are reproduced in an anechoic room. In addition, we used the dry room shown in Fig. 5. The reverberation time was 0.18 s. There is a relationship between interaural cross correlation (IACC) and subjective diffuseness [18]. Interaural time delay (IATD) and interaural level difference (IALD) are also well known for the binaural parameters besides the IACC. These parameters are effective for precedence effect [25] in reverberant space [26]. We evaluated the robustness of the listener’s movement based on IACC, IATD and IALD when the contents were down-mixed. We compared the IACCs, IATDs and IALDs measured in original 10-channel reproduction and in down-mixed 5-channel reproduction. We reproduced the original and down-mixed content, then recorded it with dummy-head-microphones (DHM). Regarding the recording conditions, the DHM positions were every 0.2 (m) from 0.4 to 0.4 (m), from backward to forward, and from left to right, relative to the center position shown in Fig. 5. The direction of the face of the dummy head was varied from 45° to 45° in 15° increments at the center point of the reproduced ﬁeld.

octave-band were shown in (d). In (d) the ‘‘ALL” means the standard deviation for the broadband signal. Ando and Kurihara present the relationship between the IACC and the subjective diffuseness S as

S ¼ 2:9ðIACCÞ3=2

ð9Þ

from their experimental results [18]. The estimated values of subjective diffuseness from our results are shown in (e)–(g). The standard deviation of the IACC errors were less than 0.10 for the experiment used the orchestral phrase, and less than 0.11 for the environmental sound used. When we evaluate the standard deviations for IACC errors in octave frequency bands, the center frequency were set from 100 Hz to 12.8 kHz, the standard deviation of IACC errors were less than 0.37 in the 400 Hz band for the orchestral phrase, and less than 0.20 in 800 Hz band for the environmental sound used. The down-mixed IACCs are slightly higher than the original ones because of the reduction in the number of loudspeakers. However, the results show that the general trends of IACCs for the down-mixed signal correspond to those the original IACCs in accordance with the listener’s position or facing direction. Fig. 6, 7h–j shows the results of the IATD, and k–m shows the results of the IALD respectively. From these results, IATDs and IALDs are reproduced well for extent to be able to conﬁrm the difference between IATD (or IALD) in Fig. 6 and in Fig. 7 such caused from the kings of sound sources. 5. Conclusions

4.2. Results of the experiment Fig. 6 shows the results of the experiment for the orchestral phrase, and Fig. 6a–d shows the results of the IACC measurement for the DHM moving backward to forward (a), moving from left to right (b), and for dummy head rotations (c). The ‘‘ ” marker means the down-mixed signal, and ‘‘h” means the original. Fig. 7 shows the results for the natural environmental sound. Altogether, the IACCs of the natural environmental sound are greater than the IACCs of the orchestral phrase. In Figs. 6 and 7, the standard deviation of the IACC errors between reproduced and original for each

We proposed sound ﬁeld reproduction based on spatial covariance (SORPAC) for down-mixing of multi-channel audio content. SORPAC [21] does not require the listener’s position in the reproduced ﬁeld or the transfer function. We described the outline of a SORPAC method and showed by the numerical simulation that the SORPAC reproduces the spatial characteristics of the original sound ﬁeld as much as possible with the limited number of loudspeakers. The simulation conﬁrmed that SORPAC can accurately reproduce the wave surfaces and the sound source direction on the observation area.

1182

Y. Takahashi, A. Ando / Applied Acoustics 71 (2010) 1177–1184

(b)

(c)

(d)

IACC

(a)

(f)

(g)

Subjective diffuseness

(e)

Direction (o)

Position (m)

Position (m)

(h)

Direction (o)

Position (m)

Position (m)

(i)

(j)

IATD (ms)

L Down-mixed 5 ch Original 10 ch R Position (m)

(k)

Direction (o)

Position (m)

(l)

(m)

IALD (dB)

L

R Position (m)

Position (m)

o -15o 0 15o 30o -30o 40o -40o Left Right

Front 2m

0.2 m

2m

Left

Direction (o)

0.2 m Right

Back

Dummy head microphone

Fig. 6. IACC results for the orchestral phrase. They show the results of the IACC measurement for the DHM moving backward to forward (a), moving from left to right (b), and for dummy head rotations (c). (d) The standard deviation of the IACC errors between reproduced and original for each octave-band. (e)–(g) The subjective diffuseness calculated by Eq. (9). (h)–(j) and (k)–(m) The IATD and IALD.

We performed an experiment on down-mixing audio content from 10 channels to 5 channels by using SORPAC. The experiment used an orchestral phrase and natural environmental sound as test content. Altogether, the natural environmental sound interaural cross correlations (IACCs) were greater than the orchestral phrase ones. The effect of down-mixing was evaluated by IACC for various dummy-head-microphones (DHM) positions and facings. The results of the experiment indicated that the general trends of IACCs for the down-mixed signal corresponded to the original IACCs in accordance with the listener’s position and face direction. It is

known that the IACC is a parameter which related with the subjective diffuseness [18] and also related with the source directions. In this work, we conﬁrmed that our method approximately reproduces the original IACC. And we also conﬁrmed IATD and IALD are reproduced by our method. The well-tested approach is commonly used for down-mixing. Because the usual method depends on the experiences of sound engineer or creator. It is very difﬁcult to compare proposed method with such a down-mixing. However the SORPAC can be expected for automatic down-mixing method.

1183

Y. Takahashi, A. Ando / Applied Acoustics 71 (2010) 1177–1184

(b)

(c)

(d)

IACC

(a)

(f)

(g)

Subjective diffuseness

(e)

Direction (o)

Position (m)

Position (m)

(h)

Direction (o)

Position (m)

Position (m)

(i)

(j)

IATD (ms)

L Down-mixed 5 ch Original 10 ch R

Position (m)

(k)

Direction (o)

Position (m)

(l)

(m)

IALD (dB)

L

R

Direction (o)

Position (m)

Position (m)

o -15o 0 15o 30o -30o 40o o -40 Left Right

Front

2m

0.2 m

2m Left

0.2 m Right

Back

Dummy head microphone

Fig. 7. IACC results for natural environmental sound including bird songs. Panels (a)–(d) correspond to those found in Fig. 6 regarding alternative content.

References [1] Hamasaki Kimio, Hiyama Koichiro, Okumura Reiko. The 22.2 multi-channel sound system and its application. In: 118th AES convention; 2005. p. 6406. [2] Ando Akio. Adaptation of multi-channel sound reproduction to restricted speaker arrangement. The institute of electronics information and communication engineers technical report (in Japanese with English abstract); 2006. p. 19–24 [EA2006-84]. [3] Meyer E, Burgtorf W, Damaske P. Eine apparatur zur elektroakustischen nachbildung von schallfeldern. Acustica 1965;15:334–9. [4] Damaske P. Head-related tow-channel stereophony with loudspeaker reproduction. J Acoust Soc Am 1971;50:1109–15. [5] Shaw EAG. Earcanal pressure generated by a free sound ﬁeld. J Acoust Soc Am 1966;39:465–70. [6] Shaw EAG. Transformation of sound pressure level from the free ﬁeld to the eardrum in the horizontal plane. J Acoust Soc Am 1974;56:1848–61. [7] Blauert J. Spatial hearing. MIT Press; 1983. [8] Morimoto M, Ando Y. On the simulation of sound localization. J Acoust Soc Japan (E) 1980;1:167–74.

[9] Ise S. A principle of sound ﬁeld control based on the Kirchhoff–Helmholtz integral equation and the theory of inverse systems. Acustica 1999;85:78–87. [10] Ise S. The development of the sound ﬁeld sharing system based on the boundary surface control principle. In: 19th International congress on acoustics; 2007 [ELE-04-003]. [11] Gerzon MA. The design of precisely coincident microphone arrays for stereo and surround sound. In: 50th Convention of the audio engineering society, vol. 23; 1975. p. 402. [12] Gerzon MA. Ambisonics in multichannel broadcasting and video. J Audio Eng Soc 1985(331):859–71. [13] Merimaa J, Pulkki V. Spatial impulse response rendering 1: analysis and synthesis. J Audio Eng Soc 2005(531):1115–27. [14] Avendano C, Jot J. A frequency-domain approach to multichannel upmix. J Audio Eng Soc 2004(527/):740–9. [15] Pulkki V. Directional audio coding in spatial sound reproduction and stereo upmixing. In: Proceedings of the audio engineering society 28th international conference; 2006. p. 251–8. [16] Tohyama M, Suzuki A. Interaural cross-correlation coefﬁcients in stereo reproduced sound ﬁeld. J Acoust Soc Am 1989;85:780–6.

1184

Y. Takahashi, A. Ando / Applied Acoustics 71 (2010) 1177–1184

[17] Muraoka T, Nakazato T. Examination of multichannel sound-ﬁeld recomposition utilizing frequency-dependent interaural cross correlation (FIACC). J Audio Eng Soc 2007(55):236–56. [18] Ando Y, Kurihara Y. Nonlinear response in evaluating the subjective diffuseness of sound ﬁelds. J Acoust Soc Am 1986;80:833–6. [19] Hiyama K, Komiyama S, Hamasaki K. The minimum number of loudspeakers and its arrangement for reproducing the spatial impression of diffuse sound ﬁeld. J Audio Eng Soc Convention Paper; 2007. p. ]5674. [20] Damaske P, Ando Y. Interaural crosscorrelation for multichannel loudspeaker reproduction. Acustica 1972;27:232–8. [21] Takahashi Yoshinori, Tohyama Mikio. Multi-channel recording and reproduction for minimizing the difference in the spatial covariances between the original and reproduced sound ﬁelds. In: 19th International congress on acoustics; 2007 [RBA-15-012].

[22] Takahashi Yoshinori, Ando Akio. A new sound ﬁeld reproduction method and the application based on the spatial covariance matrix. The institute of electronics information and communication engineers technical report (in Japanese with English abstract); 2008. p. 55–60 [EA2007-121]. [23] Haykin Simon. Unsupervised adaptive ﬁltering. Blind source separation, vol. 1. Berlin: Springer; 2000. p. 19–20. [24] Hagiwara Hiroki, Takahashi Yoshinori, Tohyama Mikio, Miyoshi Kazuaki. The sound ﬁeld reproduction method based on the spatial covariances and its reproduction efﬁciency. J Acoust Soc Am 2008;124(4 2/):2455. [25] Haas H. Uber den eingluss eines einfachechos auf die horsamkeit von sprache. Acustica 1951;1:49–58. English translation: The inﬂuence of a single echo on the audibility of speech. J Audio Eng Soc 1972;20:146–59. [26] Aoki S, Houtgast T. A precedence effect in the perception of inter-aural cross correlation. Hearing Res 1992;59:25–30.

Down-mixing of multi-channel audio for sound field reproduction based on spatial covariance

Down-mixing of multi-channel audio for sound field reproduction based on spatial covariance

Recommend Documents