Reciprocity gap functional in spherical harmonic domain for gridless sound field decomposition

Reciprocity gap functional in spherical harmonic domain for gridless sound field decomposition

Reciprocity gap functional in spherical harmonic domain for gridless sound field decomposition Journal Pre-proof Reciprocity gap functional in spher...

1007KB Sizes 0 Downloads 30 Views

Reciprocity gap functional in spherical harmonic domain for gridless sound field decomposition

Journal Pre-proof

Reciprocity gap functional in spherical harmonic domain for gridless sound field decomposition Yuhta Takida, Shoichi Koyama, Natsuki Ueno, Hiroshi Saruwatari PII: DOI: Reference:

S0165-1684(19)30435-9 https://doi.org/10.1016/j.sigpro.2019.107383 SIGPRO 107383

To appear in:

Signal Processing

Received date: Revised date: Accepted date:

28 March 2019 6 November 2019 18 November 2019

Please cite this article as: Yuhta Takida, Shoichi Koyama, Natsuki Ueno, Hiroshi Saruwatari, Reciprocity gap functional in spherical harmonic domain for gridless sound field decomposition, Signal Processing (2019), doi: https://doi.org/10.1016/j.sigpro.2019.107383

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier B.V.

Highlights • This is the first work to decompose and reconstruct a sound field based on the reciprocity gap functional (RGF) in the spherical harmonic domain.

• As opposed to the sparse-representation algorithms, the proposed method does not require the discretization of the target region into grid points.

• The proposed method makes it possible to avoid decomposition errors of off-grid sources and high computational cost of sparse representation.

• The RGF is applied to the sound field decomposition, which enables to decompose the sound field as a closed-form solution with the flexible arrangement of microphones.

1

Reciprocity gap functional in spherical harmonic domain for gridless sound field decomposition Yuhta Takida†∗ , Shoichi Koyama†,‡ , Natsuki Ueno† , Hiroshi Saruwatari† †

The University of Tokyo, Graduate School of Information and Techology, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan ‡

JST, PRESTO, 4-1-8 Honcho, Kawaguchi, Saitama 332-0012, Japan

Abstract A sound field decomposition method based on the reciprocity gap functional (RGF) in the spherical harmonic domain is proposed. To estimate and reconstruct a continuous sound field including sources by using multiple microphones, an intuitive and powerful strategy is to decompose the sound field into Green’s functions. Sparse-representation algorithms have been applied to this decomposition problem; however, it requires the discretization of the target region into grid points to construct a dictionary matrix. Discretization-based methods lead to decomposition errors of off-grid sources and high computational cost of sparse representation. We apply the RGF to sparse sound field decomposition, which makes it possible to decompose the sound field as a closed-form solution without discretization. In addition, the formulation in the spherical harmonic domain enables the flexible arrangement of microphones under the assumption of the spherical target region. Numerical simulation results indicated that high decomposition and ∗

Corresponding author Email addresses: [email protected] (Yuhta Takida), [email protected] (Shoichi Koyama), natsuki [email protected] (Natsuki Ueno), hiroshi [email protected] (Hiroshi Saruwatari) Preprint submitted to Signal Processing

November 23, 2019

reconstruction accuracies can be achieved by the proposed method, especially at low frequencies, with a low computational cost. Keywords: Sound field decomposition, reciprocity gap functional, source identification, spherical harmonics 1. Introduction Sound field reconstruction aims to estimate a continuous sound field inside a target region by using a discrete set of multiple microphones. Such a reconstruction has various applications in acoustic signal processing tasks, such as the visualization of an acoustic field, the identification of sound sources, and the recording of the sound field for reproduction with headphones or multiple loudspeakers. In general, this reconstruction is achieved by representing the captured sound field as a linear combination of element solutions of the Helmholtz equation, such as the fundamental solutions (Green’s functions) [1, 2, 3], multipoles [4], plane waves [5, 6, 7], and harmonic functions [8, 9, 10]. We refer to this type of representation as sound field decomposition. The sound field reconstruction problem can be classified into two types: whether the target region includes sound sources or not. The sound field inside a sourcefree region, i.e., a homogeneous sound field, can be reconstructed by decomposing it by plane-wave or harmonic functions [5]. It is also possible to apply the equivalent source method [1, 2], which is based on the representation by using Green’s function on the fictitious boundary enclosing the target region. In contrast, the sound field reconstruction inside a region including sources, i.e., an inhomogeneous sound field, is an ill-posed problem. It is necessary to impose some assumptions on the source distribution to achieve the reconstruction of such a sound 3

field. In this paper, we consider the reconstruction problem of an inhomogeneous field. The spatial sparsity of the source distribution will be one of the reasonable assumptions to achieve the reconstruction of an inhomogeneous field. The sparserepresentation algorithms that have recently been developed in the context of compressed sensing, can be applied to the sparse decomposition of the sound field into Green’s functions [4, 11]. However, these methods require the discretization of the target region into grid points to construct the dictionary matrix of Green’s functions. This discretization leads to several practical issues. When the sources are off the grid points, the estimated source locations cannot be closer to the true locations than the nearest grid points. In addition, the reconstruction accuracy can also deteriorate because false activations of the grid points can specifically affect the reconstructed field. Although this off-grid effect can be reduced by making the intervals between the grid points smaller, the larger number of grid points leads to a higher computational cost for sparse representation. The off-grid problem appears in various contexts of the sparse representation. Several attempts have been made to avoid this off-grid problem in the literature. In the direction-of-arrival estimation problem, Xenaki and Gerstoft proposed a one-dimensional gridless beamforming with a linear microphone array [12]. The atomic norm is used to overcome the basis mismatch due to discretization, and the optimization problem formulated by using a dual problem is solved by semidefinite programming. This method was extended to use an arbitrary microphone array [13]. Two-dimensional gridless beamforming based on the atomic norm was proposed in Refs. [14, 15, 16]. However, it is difficult to directly apply these methods to the reconstruction problem of the sound field generated by nearfield

4

sources since it is difficult to formulate a dual problem owing to the singular points of the transfer functions. We propose a sound field decomposition method based on the reciprocity gap functional (RGF). The RGF was proposed for inverse problems for source localization and inverse scattering [17, 18, 19, 20, 21]. The RGF enables to formulate a relation between source locations inside the target region and boundary values on the surface of the target region through test functions. By applying the RGF, we can estimate the point-source locations inside the target region in a gridless manner as a closed-form solution. However, the direct application of the RGF requires the sound pressure and its gradient on the boundary of the target region as measurements. Since the surface integral of these values on the boundary is necessary, microphones must be densely and uniformly arranged. To decompose the inhomogeneous sound field with the flexible arrangement of microphones, we formulate the RGF-based sound field decomposition in the spherical harmonic domain. By using spherical microphone arrays or acoustic vector sensors with the feasible arrangement on the boundary, we can accurately calculate the surface integral by implicitly interpolating the boundary values, and the gridless sound field decomposition can be achieved as a closed-form solution. Preliminary results of this work were presented in Ref. [22]. This paper is organized as follows: The sound field decomposition problem is defined in Sect. 2. In Sect. 3, the sound field decomposition based on the direct application of the RGF is introduced. The proposed method based on the RGF in the spherical harmonic domain is described in Sect. 4. Experimental results are reported in Sect. 5. Finally, Sect. 6 concludes this paper.

5

Figure 1: Problem setting. Spherical microphone arrays or acoustic vector sensors are arranged on the surface of the spherical target region to identify the sources.

2. Sound field decomposition inside region including sources 2.1. Problem statement Consider a spherical target region Ω including sound sources in a three-dimensional (3D) free field as shown in Fig. 1. The coordinate origin is set at the center of Ω. The boundary of Ω is denoted as ∂Ω. The sound pressure of the angular frequency ω at the position r ∈ R3 , u(r, ω), satisfies the following inhomogeneous Helmholtz

equation under the Sommerfeld radiation condition [5]: (∇2 + k2 )u(r, ω) = −Q(r, ω),

(1)

where k = ω/c is the wave number, c is the sound speed, and Q(r, ω) is the source distribution inside Ω. We hereafter omit ω for notational simplicity. The sound pressure u(r) is represented by the convolution of Q(r) and the 3D free-field Green’s function G(r|r0 ) inside Ω as Z u(r) = Q(r0 )G(r|r0 )dr0 ,

(2)



where G(·) is defined as

0

eikkr−r k2 G(r|r ) = . 4πkr − r0 k2 0

6

(3)

Our objective is to estimate and reconstruct u(r) for continuous r ∈ Ω from the dis-

crete measurements on ∂Ω. Equation (2) indicates that u(r) (r ∈ Ω), i.e., the sound

field inside Ω, can be reconstructed by estimating Q(r) from the measurements. However, the estimation of Q(r) without any constraint is ill-posed. To make the problem solvable, we assume that all the sources inside Ω can be approximated as point sources. Then, Q(r) is approximated as the linear combination of delta functions δ(·) as Q(r) ≈

J X j=1

c j δ(r − r j ),

(4)

where J is the number of sources and c j ∈ C and r j ∈ Ω are the amplitude and location of the jth source, respectively. By substituting Eq. (4) into Eq. (2), we approximate u(r) as u(r) ≈

J X

c jG(r|r j ).

(5)

j=1

Our approach to reconstructing u(r) is to decompose the sound field as in Eq. (5) by estimating the source parameters c j and r j from the measurements. Note that the representation in Eq. (2) is valid only when any sources do not exist outside Ω. When some sources exist outside Ω including reverberation, u(r) will include the homogeneous solution in addition to the particular solution Eq. (2), and it is necessary to apply an exterior and interior sound field separation technique [23, 24]; however, such a case is beyond the scope of this paper to simplify the problem. 2.2. Sparse-representation-based method for sound field decomposition The sparse-representation algorithms can be applied to the sound field decomposition problem in Sect. 2.1. In the sparse-representation-based method [3], the 7

region Ω is discretized into grid points densely arranged inside Ω, and Eq. (2) is approximated as u(r) ≈

N X

dnG(r|rn ),

(6)

n=1

where N is the number of grid points and rn is the location of the nth grid point. Under the assumption that Ω includes a small number of sources, most of the expansion coefficients dn will be zero. Therefore, by constructing a matrix-form linear equation with the dictionary matrix of G(r|rn ) using Eq. (6), we can apply the sparse-representation algorithms to estimating dn [25]. Such a representation is proved to be effective in reducing the spatial aliasing artifacts in the sound field reconstruction [3, 4]. However, when the sources are off the grid points, the decomposition accuracy deteriorates owing to discretization errors. By densely setting the grid points, we can reduce such errors; however, the more grid points are set, the higher computational load is required to solve the sparse-representation algorithms. Therefore, we consider how to decompose the sound field without the discretization, i.e., in a gridless manner. 3. Source identification based on RGF We introduce the source identification method based on the RGF, which is the foundation of our proposed method described in Sect. 4. The RGF has been applied to the point-source localization problem [17, 18, 19, 20, 21]. The locations of the point sources are estimated in a gridless manner from the surface integral of the sound pressure and its gradient, u(r) and ∂u(r)/∂n, respectively, on ∂Ω. Here, n denotes the unit normal vector on ∂Ω.

8

In the RGF, the solution space of the homogeneous Helmholtz equation in Ω is defined as Hk := {v ∈ H 1 (Ω) | (∇2 + k2 )v(r) = 0},

(7)

where H 1 (Ω) is the Sobolev space in Ω and the function v(·) ∈ Hk is called the

test function. We also define the functional R(·) for v(·) ∈ Hk as the weighted

integral of Q(r) inside Ω as

Z

R(v) :=

v(r)Q(r)dr.

(8)



When we apply the point-source assumption Eq. (4) and Green’s theorem [26] to Eq. (8), the following equation holds for all the test functions: R(v) =

J X j=1

c j v(r j ) =

Z

∂Ω

! ∂u(r) ∂v(r) − v(r) dS . u(r) ∂n ∂n

(9)

Eq. (9) represents the relationship between the source parameters to be estimated and the boundary values on ∂Ω. By choosing an appropriate test function v(·), we can estimate the parameters J, c j , and r j by calculating the surface integral in Eq. (9) with the boundary measurements. Various test functions v(·) have been proposed [17, 18, 19]. El Badia and Nara [17] proposed the use of the test function vn (·) with the positive integer n defined in the Cartesian coordinates as vn (r) := vn (x, y, z) = (x + iy)n e−ikz ,

(10)

where vn (r) (n ∈ N) satisfy the homogeneous Helmholtz equation and are the

elements of Hk . The parameters J, c j , and r j are estimated as follows. First, the 9

Hankel matrix consisting of sn := R(vn ) is defined as    s1  0 s · · · s 2 J       s2 s3 · · · s J 0 +1   H :=  . .. ..  , ..  .. . . .      s J 0 s J 0 +1 · · · s2J 0 −1

(11)

where J 0 is a constant value larger than the number of sources inside Ω, i.e., J 0 ≥

J. The number of sources, J, can be estimated by counting the singular values of

H larger than a threshold. Next, the Hankel matrix Hl is defined using sn as    sl+1 sl+2 · · ·  s l+J      sl+2 sl+3 · · · sl+J+1   Hl :=  . (12) .. ..  . ..  .. .  . .      sl+J sl+J+1 · · · sl+2J−1 The estimate of the source locations on the x-y plane, i.e., xˆ j and yˆ j of the jth

estimated source location rˆ j = ( xˆ j , yˆ j , zˆ j ), correspond to the real and imaginary −1 parts of the eigenvalue of H1 H−1 0 . In short, the jth eigenvalue of H1 H0 becomes

xˆ j + iˆy j ( j = 1, . . . , J) [27]. The RGF-based source localization procedure is summarized as follows: 1. Calculate the surface integral in Eq. (9) using the boundary measurements. 2. Compose the Hankel matrix H and evaluate its singular values to determine J. 3. Compose the Hankel matrices H0 and H1 , and evaluate the eigenvalues of H1 H−1 0 to estimate the source locations on the x-y plane. The above algorithm can be directly applied to the sound field decomposition by estimating the 3D source locations and their amplitudes in the later stage. 10

However, it requires the continuous sound pressure and its gradient on ∂Ω to calculate the surface integral in Eq. (9). Such an integration can be approximated by discretizing ∂Ω with the uniform sampling of the spherical surface such as the tdesign [28]. When pressure microphones are used for the measurement, multiple spherical layers must be sampled to obtain the pressure gradient. Thus, microphone pairs should be densely and uniformly arranged on ∂Ω at least to obtain sound pressure and its gradient at each sample point. 4. Proposed method To avoid the dense sampling on ∂Ω, we propose the RGF-based sound field decomposition method in the spherical harmonic domain. By using multiple spherical microphone arrays or acoustic vector sensors on ∂Ω as in Fig. 1, we can approximate the surface integral on ∂Ω with a more flexible arrangement of the sampling points. First, in Sect. 4.1, we formulate the RGF in the spherical harmonic domain. Next, the sound field decomposition based on the spherical-harmonic-domain RGF is described in Sect. 4.2. Finally, in Sect. 4.3, the computational cost of the proposed method is discussed comparing with the sparse-representation-based method. 4.1. RGF in spherical harmonic domain We analytically formulate the RGF in the spherical harmonic domain, which leads to an efficient algorithm of the sound field decomposition. Since u(r) at r ∈ ∂Ω is a radiated field, it can be represented in the spherical harmonic domain

11

using the spherical Hankel functions of the first kind hν (·) as [5] u(r) =

X

uν,µ hν (kr)Yν,µ (θ, φ),

(13)

ν,µ

where hν (·) is the νth-order spherical Hankel function of the first kind and P∞ Pν ν=0 µ=−ν . Here, the spherical harmonic function Yν,µ (·) is defined as Yν,µ (θ, φ) :=

s

(2ν + 1) (ν + µ)! Pν,µ (cos θ)eiµφ , 4π (ν − µ)!

P

ν,µ

:=

(14)

where Pν,µ (·) is the associated Legendre function. Since the sound field on ∂Ω can be well approximated by truncating the series of the spherical harmonic expansion Eq. (13) at an appropriate order, we consider the use of the coefficients uν,µ up to the truncation order Ntr , where the lower limit of Ntr is generally determined by dkRe [29] or dekR/2e [30] with R the radius of Ω. The coefficients uν,µ can be

estimated from the measurements of the multiple spherical microphone arrays or acoustic vector sensors arranged around ∂Ω by using the transform matrix of the spherical harmonics [31, 9]. By denoting the spherical harmonic coefficients up to Ntr as u := [u0,0 , . . . , uNtr ,Ntr ]T and the observed signals of M microphones as 2

y ∈ C M , u and y can be related as y = Tu, where T ∈ C M×(Ntr +1) is the matrix

whose column vectors corresponds to the transform from uν,µ into y. Therefore, u can be estimated by calculating the Moore–Penrose pseudoinverse of T. Specific forms of T are presented in Appendix A. The test function vn (r) is represented in the spherical coordinates as vn (r) = rn sinn θeinφ e−ikr cos θ .

(15)

Equation (15) is the representation of Eq. (10) by setting x = r sin θ cos φ, y = r sin θ sin φ, and z = r cos θ. Since the test function vn (r) is the element of Hk , 12

which means that vn (r) satisfies the homogeneous Helmholtz equation, Eq. (15) can be expanded by the spherical harmonic function with the spherical Bessel function jν (·) as vn (r) =

X

v(n) ν,µ jν (kr)Yν,µ (θ, φ).

(16)

ν,µ

The coefficients v(n) ν,µ can be analytically obtained as s n−ν 4π(2ν + 1)(ν + n)! i δν,µ , v(n) ν,µ = kn (ν − n)!

(17)

where δν,µ is the Kronecker delta. The proof is given in Appendix B.2. In addition, the analytical form of sn can also be obtained in the spherical harmonic domain as sn =

i X (−1)µ+1 uν,µ v(n) ν,−µ . kR ν,µ

(18)

The proof is given in Appendix B.3. The element sn is well approximated by truncating the order ν at Ntr and the coefficients v(n) ν,µ can be calculated using Eq. (17). Therefore, sn can be directly obtained from the estimate of uν,µ , uˆ ν,µ , as N

sn =

tr X i X (−1)µ+1 uˆ ν,µ v(n) ν,−µ . kR ν=0 µ=−ν

ν

(19)

4.2. Sound field decomposition based on spherical-harmonic-domain RGF To determine the source locations in 3D space, we apply the RGF again after rotating the coordinate, which is achieved by using the rotated test function described with the Euler angles (α, β, γ) as wn (r) := Λ(α, β, γ)vn (r), 13

(20)

where Λ(α, β, γ) denotes the rotational operator for (α, β, γ). The expansion coefficients of wn (r) are given by w(n) ν,µ =

ν X

µ0 =−ν

ν v(n) ν,µ0 Dµ,µ0 (α, β, γ),

(21)

where Dνµ,µ0 (·) is the (µ, µ0 )th element of the Wigner-D matrix Dν (·). The Wigner-D matrix Dν (·) is the rotational operator for the νth-order spherical harmonic coefficients with Euler angles as Λ(α, β, γ)Yν,µ (θ, φ) =

ν X

µ0 =−ν

Dνµ0 ,µ (α, β, γ)Yν,µ0 (θ, φ).

(22)

Specifically, we obtain the new test functions v0n (r) := vn (y, z, x) and v00n (r) := vn (z, x, y) by setting the Euler angles (α, β, γ) to (−π/2, −π/2, 0) and (0, π/2, π/2)

in Eq. (21), which make it possible to estimate the source locations on the y-z and z-x planes, respectively. After sn is obtained, the source locations are estimated by composing the Hankel matrices as described in Sect. 3. The eigenvalues of H1 H−1 0 for the three test functions do not always correspond to the same sound sources. We determine the combination of the eigenvalues so that the summation of the distance between the estimates of each coordinate is minimized. The final estimate of the source locations is obtained by averaging the two estimates of each coordinate. Finally, the amplitude c j is inferred by using the estimated 3D source locations r j . As in Eq. (5), the vectors of the observed signals y ∈ C M and the source

amplitudes c = [c1 , . . . , c J ]T are related by the matrix of the free-field Green’s functions at r j , G ∈ C M×J , as y = Gc. Therefore, the amplitudes c are estimated

by using the Moore–Penrose pseudoinverse of G.

The proposed sound field decomposition algorithm is summarized as follows. 14

1. Calculate v(n) ν,µ in Eq. (17). 2. Calculate sn by substituting v(n) ˆ ν,µ obtained from y into Eq. (19). ν,µ and u 3. Determine J by computing the eigenvalues of H with sn . 4. Obtain xˆ j + iˆy j by evaluating the eigenvalues of H1 H−1 0 . 0

00

(n) 5. Calculate v(n) ν,µ and vν,µ by Eq. (21), respectively.

6. Obtain yˆ j + iˆz j and zˆ j + i xˆ j by repeating procedure 2 and 4 for v0n (r) and v00n (r). 7. Determine 3D source location by averaging the estimates of each coordinate. 8. Obtain c j from y and G. Although the microphones distributed around ∂Ω can be used in the proposed method owing to the spherical-harmonic-domain formulation, the microphone arrangement effect on the decomposition accuracy as well as the estimation accuracy of u. In general, uniform distribution on ∂Ω is preferable to estimate u; however, more feasible arrangement is investigated in the experiment. 4.3. Computational cost In the sparse-representation-based method, such as that in Ref. [3], an optimization problem with the sparse regularization term is solved. Although several approximated algorithms can be applied to solve it, they require a high computational cost. For example, in the orthogonal matching pursuit (OMP) [32, 33], the greedy algorithm is applied and its computational cost is O(N MJ), where N and M are the numbers of grid points and microphones, respectively. The accelerated proximal gradient (APG) [34] solves an approximated optimization problem with the `1 -norm regularization, which is solved by an iterative algorithm, and its computational cost for each iteration is O(N M). On the other hand, in the proposed method, the source parameters can be estimated as a closed-form solution. The 15

computational cost of the proposed method are O(J 03 ) for estimating J and O(J 3 ) for the parameter estimation, which is significantly smaller than those of OMP and APG because J 0 and J are generally much smaller than M and N. 5. Numerical experiments 5.1. Experimental conditions Numerical experiments were performed in the 3D free-field to evaluate the decomposition and reconstruction performance characteristics of the proposed method. The proposed method based on the spherical-harmonic-domain RGF (SHD-RGF) was compared with the sparse-representation-based method (Sparse) [3] and the method based on the direct application of the RGF as described in Sect. 3 (RGF) [17]. The spherical target region Ω with a radius R = 1.0 m was set with its center at the origin. The following two types of microphone geometry on ∂Ω were investigated: uniform and four-ring geometries shown in Fig. 2. The uniform geometry was used to evaluate the case that the surface integral on ∂Ω can be approximately calculated. Therefore, 196 sensors that can measure pressure and its gradient, which can be regarded as ideal microphone pairs, were arranged on the sampling points on ∂Ω. We determined their locations by the t-design method [28, 35], which generates nearly-uniform sampling points on a spherical surface such that a function of a polynomial order t or lower satisfies the quadrature relation [36]. The four-ring geometry is a more feasible case. Sensors that can measure the spherical harmonic coefficients up to the second order, which can be regarded as ideal spherical microphone arrays or acoustic vector sensors, were arranged on the four horizontal rings on ∂Ω. Nine sensors were set along each circular ring 16

(a) Uniform geometry

(b) Four-ring geometry

Figure 2: Two types of microphone geometry: (a) uniform geometry (each circle represents microphone pair) and (b) four-ring geometry (each circle represents spherical microphone array).

and the rings were set at zenith angles of 40◦ , 75◦ , 105◦ , and 140◦ . In this scenario, since RGF cannot be applied, only SHD-RGF and Sparse were evaluated. Gaussian noise was also added to the observed signals so that the signal-to-noise ratio was 40 dB for both microphone geometries. We set two point sources at (0.33, −0.46, 0.13) m and (−0.15, −0.46, −0.14) m. The source signals were single frequency sine waves with amplitudes 1.0, and the frequencies of two sources were the same, which are assumed to be obtained by time-frequency representation of broadband signals. In SHD-RGF, the truncation order Ntr is determined for each frequency f as Ntr = dekR/2e + 2 for f ≤ 400 Hz, Ntr = 12 for 400 Hz < f ≤ 640 Hz, and

Ntr = dkRe for f > 640 Hz. In Sparse, the region Ω was discretized into grid

points, and three patterns with grid point intervals d = 0.10, 0.15, and 0.20 m were investigated. These intervals correspond to 4165, 1196, and 511 grid points inside Ω, respectively, that form the dictionary matrix of the Green’s function. The OMP algorithm [32, 33] was applied to the sparse decomposition. We evaluated both decomposition and reconstruction accuracies. The de17

(a) RMSE

(b) SDR

Figure 3: RMSE and SDR with respect to frequency for uniform geometry.

composition accuracy was evaluated from the estimated source locations rˆ j ( j = 1, · · · , J). We define the root-mean-square error (RMSE) for the source localization as

RMSE =

v u t

J

1X kr j,true − rˆ j k22 , J j=1

(23)

where r j,true is the true location of the jth source. To evaluate the reconstruction accuracy, the signal-to-distortion ratio (SDR) is defined as R |u (r)|2 dr Ω true SDR = 10 log10 R , |u (r) − uˆ (r)|2 dr Ω true

(24)

where utrue (r) is the true pressure distribution. The integral in Eq. (24) was calculated by discretizing Ω at intervals of 0.1 m. 5.2. Experimental results Figure 3 shows RMSE and SDR with respect to the frequency for the uniform geometry. These values at each frequency were averaged over 10 different noise patterns. In Sparse, RMSEs were almost constant at all the frequencies. This 18

is because the estimated source locations were not closer to the true locations than the nearest grid points. They became slightly smaller by setting the grid points at smaller intervals, as expected. On the other hand, in SHD-RGF, the smallest RMSE was achieved at frequencies below 720 Hz because the estimated source locations do not depend on the grid points. However, RMSEs of SHDRGF became larger than those of Sparse at frequencies above 720 Hz because the estimation accuracy of the coefficients uν,µ on ∂Ω markedly deteriorated. This is because the number of coefficients to be estimated (Ntr + 1)2 became larger than the number of measurements 196 at 740 Hz. Although RGF does not depend on the grid points either, RMSEs of RGF became larger than those of SHD-RGF at frequencies above 280 Hz. This means that the accuracy of the integral on the right side of Eq. (9) by SHD-RGF was higher than that of RGF. A similar tendency can also be seen in the reconstruction results of SDR. Figures 4 and 5 respectively show the reconstructed pressure and the normalized squared error distributions on the x-y plane for the uniform geometry at the frequency of 500 Hz. For Sparse, the result d = 0.10 m is only shown. The green crosses with circles and the blue crosses are the true and estimated source locations, respectively. Their size represents the source amplitude in dB. The highest reconstruction accuracy was achieved by SHD-RGF. The SDRs of SHD-RGF, Sparse, and RGF were 23.0, 8.1, and −5.1 dB, respectively.

Figure 6 shows RMSE and SDR with respect to the frequency for the four-

ring geometry. Although RGF cannot be applied to this geometry, the results were similar to those for the uniform geometry. These results indicate that both SHD-RGF and Sparse work even when irregular microphone arrays are used. The smallest RMSE and highest SDR were achieved by SHD-RGF at frequencies

19

(a) SHD-RGF

(b) Sparse (d = 0.10 m)

(c) RGF

Figure 4: Reconstructed pressure distribution at 500 Hz for uniform geometry. The green crosses with circles and the blue crosses represent the true and estimated source locations, respectively.

(a) SHD-RGF

(b) Sparse (d = 0.10 m)

(c) RGF

Figure 5: Normalized squared error distribution at 500 Hz for uniform geometry. The SDRs of SHD-RGF, Sparse (d = 0.10 m), and RGF were 23.0, 8.1, and −5.1 dB, respectively.

below 680 Hz by avoiding the off-grid effect. Figures 7 and 8 respectively show the reconstructed pressure and normalized squared error distributions on the x-y plane at the frequency of 500 Hz. The SDRs of SHD-RGF and Sparse were 22.5 and 8.1 dB, respectively. 6. Conclusion We proposed a gridless sound field decomposition method for estimating an inhomogeneous field for a spherical target region. In the sparse-representation20

(a) RMSE

(b) SDR

Figure 6: RMSE and SDR with respect to frequency for four-ring geometry.

based method, it is required to discretize the target region into grid points to apply sparse-representation algorithms. The decomposition and reconstruction accuracies are limited by this discretization, and the dense arrangement of the grid points leads to an extremely high computational load. Our proposed method is based on the RGF, which enables us to estimate the point source locations as a closedform solution in a gridless manner. We formulated the RGF-based decomposition in the spherical harmonic domain to use flexible microphone geometries around the surface of the spherical target region. Numerical simulation results indicated that high localization and reconstruction accuracies are achieved by the proposed method, especially at low frequencies. Appendix A. Estimation of spherical harmonic coefficients When we assume that M pressure microphones are distributed around ∂Ω, the observed signal of the mth microphone corresponds to the pressure at mth microphone rm , i.e., u(rm ). From Eq. (13), the matrix T is obtained as (T)m,ν,µ = hν (krm )Yν,µ (θm , φm ). Next, we assume that spherical microphone arrays (or acous21

(a) SHD-RGF

(b) Sparse (d = 0.10 m)

Figure 7: Reconstructed pressure distribution for four-ring geometry.

(a) SHD-RGF

(b) Sparse (d = 0.10 m)

Figure 8: Normalized squared error distribution for four-ring geometry. The SDRs in (a) and (b) were 22.5 and 8.1 dB, respectively.

tic vector sensors) are arranged around ∂Ω. Q microphone arrays are set and its index is denoted as q. The qth microphone array at rq is used to obtain the spher(q) T ical harmonic coefficients of the sound field α(q) := [α(q) 0,0 , . . . , αNm ,Nm ] , where Nm

is the maximum order, at the expansion center rq . The global spherical harmonic coefficients, i.e., the coefficients at the global origin, can be related to the coefficients at the qth local origin α(q) ν,µ (local spherical harmonic coefficients) as [37] α(q) ν,µ =

X

0

0

ν ,µ S ν,µ (rq )uν0 ,µ0 ,

ν0 ,µ0

22

(A.1)

0

0

ν ,µ where S ν,µ (r) ∈ C is defined by using the Gaunt coefficient G [37] as ν0 ,µ0

S ν,µ (r) =

ν+ν0 X l=0

0

0

4π(−1)µ iν−ν +l hl (kr)Yl,µ−µ0 (θ, φ)∗ G(ν0 , µ0 ; ν, −µ; l, µ − µ0 ). 0

0

0

(A.2)

0

,µ ν ,µ The matrix T can be derived as (T)νq,ν,µ = S ν,µ (rq ). The case of the pressure

microphones is a special case of this representation for Nm = 0. Appendix B. Derivation of RGF in spherical harmonic domain Appendix B.1. Preliminaries First, we introduce several characteristics of the special functions: the spherical harmonic function and the spherical Bessel and Hankel functions of the first kind. The plane wave eik·r is represented in the spherical harmonic domain [36] as eik·r =

X

∗ 4πiν Yν,µ (θk , φk ) jν (kr)Yν,µ (θ, φ),

(B.1)

ν,µ

where k denotes the wave vector represented by k = (k, θk , φk ) in the polar coordinates. Next, the Taylor expansion of the spherical Bessel function jν (t) is as follows: jν (t) =

∞ X 2ν (−1)l (ν + l)!tν+2l l=0

l!(2ν + 2l + 1)!

.

(B.2)

The Wronskian of the spherical Bessel and Hankel functions of the first kind [5] is formulated as i j0ν (t)hν (t) − jν (t)h0ν (t) = − 2 . t

23

(B.3)

Appendix B.2. Derivation of expansion coefficients of test function vn (·) From Eqs. (15) and (16), based on the orthogonality of the spherical harmonic functions, the expansion coefficient v(n) ν,µ is given as v(n) ν,µ jν (kr)

=

Z πZ 0

2π 0

∗ rn sinn θeinφ e−ikr cos θ Yν,µ (θ, φ) sin θdθdφ.

(B.4)

From Eqs. (14) and (B.1), e−ikr cos θ can be expanded by the Legendre polynomials as −ikr cos θ

e

=

∞ X

0

(−i)ν (2ν0 + 1) jν0 (kr)Pν0 (cos θ).

(B.5)

ν0 =0

By substituting Eq. (B.5) into Eq. (B.4) and using the orthogonality of the trigonometric functions, we obtain the following from Eq. (B.4) Z πZ 2π (n) vν,µ jν (kr) = rn sinn+1 θeinφ 0

·

∞ X

0

0

∗ (−i)ν (2ν0 + 1) jν0 (kr)Pν0 (cos θ)Yν,n (θ, φ)dθdφ

ν0 =0

= 2πrn ·

Z

π 0

s

∞ (2ν + 1) (ν − n)! X 0 (−i)ν (2ν0 + 1) jν0 (kr) 4π (ν + n)! ν0 =0

sinn θPν0 (cos θ)Pν,n (cos θ) sin θdθ.

(B.6)

The term sinn θPν0 (cos θ) can be expanded by the associated Legendre functions as n

sin θPν0 (cos θ) =

n+ν0 X

ν00 =n

24

0

00 cn,ν ν00 Pν ,n (cos θ).

(B.7)

By substituting Eq. (B.7) into the integral term of Eq. (B.6) and using the orthogonality of the associated Legendre function, we obtain  0 Z π Z π  X   n+ν n,ν0 n  sin θPν0 (cos θ)Pν,n (cos θ) sin θdθ = cν00 Pν00 ,n (cos θ) Pν,n (cos θ) sin θdθ 0 0 ν00 =n   2 (ν + n)! n,ν0    (n ≤ ν ≤ n + ν0 )   2ν + 1 (ν − n)! cν . =     0 (otherwise) (B.8)

From Eq. (B.8), Eq. (B.6) can be simplified as q   n,ν0 4π (ν+n)! n P∞  ν0 0  r jν0 (kr) (ν ≥ n) 0 =ν−n (−i) (2ν + 1)cν  ν  (2ν+1) (ν−n)! v(n) j (kr) =  ν ν,µ     0 (ν < n).

(B.9)

By substituting Eq. (B.2) into Eq. (B.9), we can obtain v(n) ν,n

∞ X 2ν (−1)l (ν + l)!kν+2l

rν+2l l!(2ν + 2l + 1)! l=0 s 0 0 ∞ ∞ X 4π (ν + n)! nX 2ν (−1)l (ν0 + l)!kν +2l ν0 +2l ν0 0 n,ν0 = r (−i) (2ν + 1)cν r . 2ν + 1 (ν − n)! ν0 =ν−n l!(2ν0 + 2l + 1)! l=0

(B.10)

This equation holds for arbitrary r ≥ 0; therefore, the coefficients for rn (n ≥ ν)

of both sides of Eq. (B.10) should be equivalent. By comparing the coefficients of both sides of Eq. (B.10), we can represent v(n) ν,n as s 4π (ν + n)! L!(2ν + 2L + 1)! v(n) ν,n = ν 2 (−1)L (ν + L)!kν+2L 2ν + 1 (ν − n)! ·

L X l=0

ν0

0

(−i) (2ν +

ν0 l 0 n,ν0 2 (−1) (ν 1)cν 0

0

+ l)!kν +2l , l!(2ν + 2l + 1)!

25

(B.11)

where L is an arbitrary non-negative integer. By substituting L = 0 into Eq. (B.11), n,ν−n v(n) as ν,n is represented by using cν

v(n) ν,n

(−i)ν−n (ν − n)!2ν! = (2k)n ν!(2ν − 2n)!

s

4π(2ν + 1)(ν + n)! n,ν−n cν . (ν − n)!

(B.12)

Here, cn,ν−n satisfies Eq. (B.7), and can be obtained as ν (ν − n)! pν−n,ν−n ν! pν,ν n 2 (2ν − n)!ν! = , (2ν)!(ν − n)!

cn,ν−n = ν

(B.13)

where pn,m denotes the mth expansion coefficient of the nth Legendre polynomial function and is represented as       0 pn,m =  n−m   2 (n+m)!    2(−1) n ( n−m )!( n+m )!m! 2

2

n − m : odd

.

(B.14)

n − m : even

The analytical representation of v(n) ν,µ (Eq. (17)) can be obtained by substituting Eq. (B.13) into Eq. (B.12). Appendix B.3. Derivation of sn The element sn can be obtained as ! Z ∂vn (r) ∂u(r) sn = u(r) − vn (r) dS . ∂n ∂n ∂Ω Since the region Ω is assumed to be a sphere, Eq. (B.15) is equivalent to Z πZ 2π ! ∂vn (r) ∂u(r) − vn (r) sn = u(r) R sin θdθdφ, ∂r r=R ∂r r=R 0 0 26

(B.15)

(B.16)

where R is the radius of Ω. By substituting the expansion representation of u(r) and vn (r) by the spherical harmonic function into Eq. (B.16), we obtain sn =

XX ν,µ ν0 ,µ0

·

Z πZ 0

0



0 0 0 kRuν,µ v(n) ν0 ,−µ0 (hν (kR) jν0 (kR) − hν (kR) jν (kR)) 0

(−1)µ Yν,µ (θ, φ)Yν∗0 ,µ0 (θ, φ) sin θdθdφ.

(B.17)

Finally, the analytical representation of sn can be obtained as Eq. (18) by applying Eq. (B.3) and the orthogonality of spherical harmonic functions to Eq. (B.17). Acknowledgments This work was supported by JSPS KAKENHI Grant Number JP15H05312, JST, PRESTO Grant Number JPMJPR18J4, and SECOM Science and Technology Foundation. References [1] G. H. Koopmann, L. Song, J. B. Fahnline, A method for computing acoustic fields based on the principle of wave superposition, J. Acoust. Soc. Am. 86 (5) (1998) 2433–2438. [2] M. E. Johnson, S. J. Elliot, K. H. Baek, J. Garcia-Bonito, An equivalent source technique for calculating the sound field inside an enclosure containing scattering objects, J. Acoust. Soc. Am. 104 (3) (1998) 1221–1231. [3] S. Koyama, S. Shimauchi, H. Ohmuro, Sparse sound field representation in recording and reproduction for reducing spatial aliasing artifacts, in: Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Florence, 2014, pp. 4443–4447. 27

[4] S. Koyama, N. Murata, H. Saruwatari, Sparse sound field decomposition for super-resolution in recording and reproduction, J. Acoust. Soc. Am. (2018) 3780–3895. [5] E. G. Williams, Fourier Acoustics: Sound Radiation and Nearfield Acoustical Holography, Academic Press, London, 1999. [6] G. Chardon, L. Daudet, A. Peillot, F. Ollivier, N. Bertin, R. Gribonval, Nearfield acoustic holography using sparsity and compressive sampling principles, J. Acoust. Soc. Am. 132 (3) (2012) 1521–1534. [7] S. Koyama, K. Furuya, Y. Hiwasaki, Y. Haneda, Analytical approach to wave field reconstruction filtering in spatio-temporal frequency domain, IEEE Trans. Audio, Speech, Lang. Process. 21 (4) (2013) 685–696. [8] M. A. Poletti, Three-dimensional surround sound systems based on spherical harmonics, J. Audio Eng. Soc. 53 (11) (2005) 1004–1025. [9] P. N. Samarasinghe, T. D. Abhayapala, M. A. Poletti, Wavefield analysis over large areas using distributed higher order microphones, IEEE Trans. Audio, Speech, Lang. Process. 22 (3) (2014) 647–658. [10] N. Ueno, S. Koyama, H. Saruwatari, Sound field recording using distributed microphones based on harmonic analysis of infinite order, IEEE Signal Process. Lett. 25 (1) (2018) 135–139. [11] S. Koyama, L. Daudet, Sparse representation of a spatial sound field in a reverberant environment, IEEE J. Sel. Topics Signal Process. 13 (1) (2019) 172–184. 28

[12] A. Xenaki, P. Gerstoft, Grid-free compressive beamforming, J. Acoust. Soc. Am. 137 (4) (2015) 1923–1935. [13] S. Semper, F. Roemer, T. Hotz, G. D. Galdo, Grid-free direction-of-arrival estimation with compressed sensing and arbitrary antenna arrays, in: Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Calgary, 2018, pp. 3251–3255. [14] Y. Yang, Z. Chu, Z. Xu, G. Ping, Two-dimensional grid-free compressive beamforming, J. Acoust. Soc. Am. 142 (2) (2017) 618–629. [15] X. Wu, W.-P. Zhu, J. Yan, gridless two-dimensional DOA estimation with Lshaped array based on the cross-covariance matrix, in: Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Calgary, 2018, pp. 3251–3255. [16] Y. Yang, Z. Chu, , G. Ping, Z. Xu, Resolution enhancement of twodimensional grid-free compressive beamforming, J. Acoust. Soc. Am. 143 (6) (2018) 3860–3872. [17] A. El Badia, T. Nara, An inverse source problem for Helmholtz’s equation from the Cauchy data with a single wave number, Inverse Prob. 27 (105001). [18] Z. Dogan, V. Tsiminaki, I. Jovanovic, T. Blu, D. Van De Ville, Localization of point sources for systems governed by the wave equation, in: Proc. SPIE, Wavelets and Sparsity XIV, Vol. 8138, San Diego, 2011. [19] Z. Dogan, I. Jovanovic, T. Blu, D. Van De Ville, 3D reconstruction of wavepropagated point sources from boundary measurements using joint sparsity and finite rate of innovation, in: Proc. IEEE Int. Symp. Biomed. Imaging (ISBI), Barcelona, 2012, pp. 1575–1578. 29

[20] C. Alves, R. Kress, P. Serranho, Iterative and range test methods for an inverse source problem for acoustic waves, Inverse Prob. 25 (055005). [21] D. Colton, H. Haddar, An application of the reciprocity gap functional to inverse scattering theory, Inverse Prob. 21 (1) (2005) 383–398. [22] Y. Takida, S. Koyama, N. Ueno, H. Saruwatari, Gridless sound field decomposition based on reciprocity gap functional in spherical harmonic domain, in: Proc. IEEE Int. Sensor Array Multichannel Signal Process. Workshop (SAM), Sheffield, 2018, pp. 627–631. [23] C. Bi, X. Chen, J. Chen, Sound field separation technique based on equivalent source method and its application in nearfield acoustic holography, J. Acoust. Soc. Am. 123 (3) (2008) 1472–1478. [24] Y. Takida, S. Koyama, H. Saruwatari, Exterior and interior sound field separation using convex optimization: comparison of signal models, in: Proc. European Signal Process. Conf. (EUSIPCO), Rome, 2018, pp. 2567–2571. [25] N. Murata, S. Koyama, N. Takamune, H. Saruwatari, Sparse representation using multidimensional mixed-norm penalty with application to sound field decomposition, IEEE Trans. Signal Process. 66 (12) (2018) 3327–3338. [26] D. Colton, R. Kress, Inverse Acoustic and Electromagnetic Scattering Theory, Springer, 1998. [27] A. El Badia, T. Ha-Duong, An inverse source problem in potential analysis, Inverse Prob. 16 (3) (2000) 651–663.

30

[28] A. Bondarenko, D. Radchenko, M. Viazovska, Optimal asymptotic bounds for spherical designs, Annals of Mathematics 178 (2) (2013) 443–452. [29] D. B. Ward, T. D. Abhayapala, Reproduction of a plane-wave sound field using an array of loudspeakers, IEEE Trans. Speech Audio Process. 9 (6) (2001) 697–707. [30] R. A. Kennedy, P. Sadeghi, T. D. Abhayapala, H. M. Jones, Intrinsic limits of dimensionality and richness in random multipath fields, IEEE Trans. Signal Process. 55 (6) (2007) 2542–2556. [31] P. N. Samarasinghe, T. D. Abhayapala, M. A. Poletti, 3D spatial soundfield recording over large regions, in: Proc. Int. Workshop Acoust. Signal Enhancement (IWAENC), Aachen, 2012, pp. 1–4. [32] Y. Pati, R. Rezaiifar, P. S. Krishnaprasad, Orthogonal matching pursuit: recursive function approximation with application to wavelet decomposition, in: Proc. 27th Annual Asilomar Conf. Signals, Systems, Computers, Pacific Grove, 1993, pp. 40–44. [33] J. Tropp, A. C. Gilbert, Signal recovery from random measurements via orthogonal matching pursuit, IEEE Trans. Inf. Theory 53 (12) (2007) 4655– 4666. [34] P. L. Combettes, V. R. Wajs, Signal recovery by proximal forward-backward splitting, SIAM Multiscale Model. Simul. 4 (4) (2005) 1168–1200. [35] X. Chen, R. Womerlsy, Spherical t-design with d = (t + 1)2 points, http://www.polyu.edu.hk/ama/staff/xjchen/sphdesigns.html (accessed 15 August 2018). 31

[36] B. Rafaely, Fundamentals of Spherical Array Processing, 1st Edition, Springer, New York, 2015. [37] P. A. Martin, Multiple Scattering: Interaction of Time-Harmonic Waves with N Obstacles, Cambridge University Press, New York, 2006.

Declaration of interests The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

32