A dual fast NLMS adaptive filtering algorithm for blind speech quality enhancement

A dual fast NLMS adaptive filtering algorithm for blind speech quality enhancement

Applied Acoustics 135 (2018) 101–110 Contents lists available at ScienceDirect Applied Acoustics journal homepage: www.elsevier.com/locate/apacoust ...

NAN Sizes 2 Downloads 117 Views

Applied Acoustics 135 (2018) 101–110

Contents lists available at ScienceDirect

Applied Acoustics journal homepage: www.elsevier.com/locate/apacoust

A dual fast NLMS adaptive filtering algorithm for blind speech quality enhancement

T



Akila Sayoud, Mohamed Djendi , Soumia Medahi, Abderrezak Guessoum University of Blida 1, Signal Processing and Image Laboratory (LATSI), Route de Soumaa, B.P. 270, Blida 09000, Algeria

A R T I C L E I N F O

A B S T R A C T

Keywords: FBSS structure Noise reduction SNR factor Speech enhancement FNLMS

This paper addresses the problem of acoustic noise reduction and speech enhancement in new telecommunications systems by adaptive filtering algorithms. Recently, a particular attention has been made to the blind source separation (BSS) approach applied for the separation of speech and noise components. The BSS application has inherits the good properties of the adaptive filtering algorithm to give more intelligible enhanced speech signal in term of quality. In this paper, we propose a new dual forward BSS algorithm that is based on signal prediction to give an automatic algorithm with a very fine behavior at the output. This algorithm is called the dual fast normalized least mean square (DFNLMS) algorithm. This algorithm has been tested in various noisy conditions and has shown its superiority in terms of the following objective criteria: cepstral distance (CD), segmental signal-to-noise-ratio (SegSNR), segmental mean square error (SegMSE), and system mismatch (SM). A comparison with other competitive and state-of-the-art algorithms is also presented in this paper.

1. Introduction

corrupted by non-stationary noise components. In the same thought, several adaptive algorithms were proposed to reduce noise and enhance the speech signal [11–13]. The most well-known algorithms are the recursive least square RLS [14–16], the least mean square LMS and its normalized version [17–19], and the affine projection algorithm family [20–22]. The NLMS algorithm is widely used in practice for its stability, ease implementation and less computational complexity in comparison with the RLS one that involves more complex mathematical operations and requires more computational resources. However the RLS converges faster than LMS and NLMS. Another approach that is widely used in literature to resolve the problem of corrupted speech signals is the blind source separation (BSS) techniques. In the literature, we find two widely used structures of BSS, the forward BSS (FBSS) [23] and the backward BSS (BBSS) [24]. The FBSS and BBSS structure are often combined with different adaptive algorithms and then used for numerous applications such as in speech enhancement and acoustic noise reduction [25–28]. In this paper we focus on the fast NLMS (FNLMS) algorithm combined with the FBSS structure. This proposed algorithm has shown best performance in comparison with the classical double forward NLMS (DNLMS) one. This paper is organized as follows: in Section 2 we present the mixing model, in Section 3 we describe the FBSS structure. In Section 4, we give the full mathematical formulation of the proposed dual fast FNLMS (DFNLMS) algorithm, and then in Section 5, we give the experimental results of the proposed algorithm in comparison with the classical DNLMS one. Finally the conclusion is

The integration of new services and applications in wireless telecommunication systems has led to several digital signal processing techniques that have been developed for different applications such as hand-free cell phone, hearing aids and teleconferencing systems. In these applications, the desired signal is often corrupted by noise and useless signals. The goal is to improve the quality and intelligibility of speech signals corrupted by the acoustic noise components. Due to the important of this field of research, it has been actively researched and several techniques and algorithms were proposed over the past several decades [1–3]. For example in [4], the additive acoustic noise components are eliminated by subtracting an estimate of noise spectrum from noisy speech spectrum. The Wiener filtering based approach was proposed in the same year to improve the a posteriori signal to noise ratio (SNR) [5]. Another technique called the minimum mean square error (MMSE) technique which achieves non-linear estimation of short time spectral amplitude of speech signal is proposed to improve the speech enhancement applications. Another efficient version of the MMSE algorithm referred as Log-MMSE which minimizes the mean square-error (MSE) in the logspectral domain was proposed in [6,7]. We can also recall here that there are further well-known classic algorithms which are called signal-subspace based methods were proposed in [8–10]. Another adaptive approach was adopted in last decade to improve the behavior of the speech enhancement techniques when a speech signal is



Corresponding author. E-mail addresses: [email protected] (A. Sayoud), [email protected] (M. Djendi), [email protected] (A. Guessoum).

https://doi.org/10.1016/j.apacoust.2018.02.002 Received 21 April 2017; Received in revised form 4 January 2018; Accepted 3 February 2018 0003-682X/ © 2018 Elsevier Ltd. All rights reserved.

Applied Acoustics 135 (2018) 101–110

A. Sayoud et al.

(4)

u2 (n) = m2 (n)−m1 (n) ∗w2 (n) +

The output of the FBSS structure can be obtained by inserting relations (1) and (2) in (3) and (4) respectively, and we get the following relations:

+

u1 (n) = b(n) ∗ [h1 (n)−w1 (n)] + s(n) ∗ [δ(n)−h2 (n) ∗w1 (n)]

(5)

u2 (n) = s(n) ∗ [h2 (n)−w2 (n)] + b(n) ∗ [δ(n)−h1 (n) ∗w2 (n)]

(6)

where ‘∗’ represents the convolution operation. The optimal solution of adaptive filters is obtained when h2 (n) = w2 (n) and h1 (n) = w1 (n) [12], so the outputs of this structure get the following forms:

Fig. 1. Convolutive mixture model.

given in Section 6.

1 ⎞ u1 (n) = s(n) ∗ ⎛ ⎝ δ(n)−h2 (n) ∗h1 (n) ⎠

(7)

1 ⎞ u2 (n) = b(n) ∗ ⎛ − ∗h2 (n) ⎠ δ(n) h (n) 1 ⎝

(8)



2. Mixture model The convolutive mixing model that we consider in this paper is shown on Fig. 1. We consider two independent sources, a first source of speech signal s(n) and a second source of punctual noise b(n) . At the output we observe two convolutive mixture signals of these two sources with two impulse responses h1 (n) , h2 (n) . In this model, we assume that the direct acoustic paths are equal to the unit impulse responses [3]. The observed signals at the output of the model of Fig. 1 are given by the following relations:

m1 (n) = s(n) + h1 (n) ∗b(n)

(1)

m2 (n) = b(n) + h2 (n) ∗s(n)

(2)





based on relations (7) and (8), the outputs u1(n) and u2(n) of the FBSS structure are distorted by the post-filter Pf (n) =

(

1 δ(n) − h2 (n) ∗ h1 (n)

). To

correct this distortion, post-filters are needed. An efficient way to compute this post-filter is to use the adaptive techniques that are described in [12]. This distortion takes place when the microphones are closely spaced. The problem of post-filters is beyond the scope of this paper, and the loosely spaced configuration that is considered in this paper allows us to avoid this problem.

where (∗) symbolizes the convolution operation and h1 (n) and h2 (n) represent the cross-coupling effects between the channels.

4. Proposed algorithm In this section, we present the mathematical formulation of the new dual forward blind source separation algorithm, based on the use of the Fast Normalized Least mean square (FNLMS) to update the two crossfilters of the forward structure as it is given in Fig. 3. The Fast NLMS algorithm was firstly proposed in [29] for acoustic echo cancellation application. In this paper, we propose a new dual forward blind source separation structure based on two-channel FastNLMS algorithm. It is the result of a simplification of the fast transversal filter algorithm. The adaptation gains of this dual algorithm are obtained by discarding the backward and forward predictors from the fast transversal filter algorithm by using only the calculation structure of the dual Kalman variables and a simple decorrelating technique for the input signals. The output u1 (n) and u2 (n) of the proposed FBBS algorithm of Fig. 3 are given as:

3. Forward BSS structure The forward and backward BSS structures have been largely investigated in last ten years [3,17,27,28] for speech enhancement and acoustic noise reduction applications. The combination of these two structures with several types of adaptive filtering algorithms has given a new insight in telecommunication systems. In this paper, we focus on the forward BSS (FBSS) structure, which is presented in Fig. 2. The objective of this approach of FBSS is to estimate two sources signals s(n) and b(n) by using only two noisy observations m1 (n) and m2 (n) . The separation of speech and noise by the FBBS structure is based on statistic independent assumptions of the source signals. Furthermore, the FBSS needs an adaptive algorithm to recover the original signals. However, FBSS structure presents the disadvantage of distorting the output signals in the situation where the microphones are loosely spaced. It was shown theoretically that the correction of the distortions is possible thanks to the equalization of the output signals by post-filtering [12], therefore, we can use two post-filters at the output of this FBSS structure to compensate this distortion. In this paper, we do not interest on the post-filters estimation which are useless in loosely spaced microphones situation that we are considering in this paper. The two outputs signals of the FBSS structure are given by the following relations:

u1 (n) = m1 (n)−W1T (n) M2 (n)

(9)

u2 (n) = m2 (n)−W2T (n) M1 (n)

(10)

1)]T

M1 (n) = [m1 (n),m1 (n−1),…,m1 (n− L+ M2 , and where (n) = [m2 (n),m2 (n−1),…,m2 (n− L+ 1)]T . The update relations of the adaptive filter W1 (n) and W2 (n) are given as follows:

(3)

u1 (n) = m1 (n)−m2 (n) ∗w1 (n)



W1 ( n+ 1) = W1 (n)−μ 1[u1 (n) C1 (n)]

(11)

W2 ( n+ 1) = W2 (n)−μ 2 [u2 (n) C2 (n)]

(12)

+

+

+

+ Fig. 2. Forward blind source separation (FBSS) structure.

Fig. 3. Proposed dual forward FNLMS algorithm.

102

Applied Acoustics 135 (2018) 101–110

A. Sayoud et al.

where 0 < μ1, μ 2 < 2 are two step-sizes control of convergence behavior of the cross-adaptive filters W1 (n) and W2 (n) , respectively. The vectors C1 (n) and C2 (n) are the adaptation gain, which are defined as follows:

C1 (n) = γ1 (n) K1(n)

(13)

C2 (n) = γ2 (n) K2 (n)

(14)

Table 1 The proposed DFNLMS algorithm [In this paper]. Initialization part

Dual forward predictors update

where γ1 (n) , γ2 (n) and K1(n) , K2 (n) are the likelihood variables and dual Kalman gain respectively. The likelihood variables γ1 (n) and γ2 (n) can be calculated as follows:

γ1 (n) =

γ2 (n) =

α1(0) = α2 (0) = E0 r2 (0) = r4 (0) = E0 r1(0) = r3 (0) = 0 W1(0) = W2 (0) = 0 , C1 (0) = C2 (0) = 0 r1(n) = λ a r1(n−1) + m2 (n)m2 (n−1) r2 (n) = λ a r2 (n−1) + m22 (n) r3 (n) = λ a r3 (n−1) + m1 (n)m1 (n−1) r4 (n) = λ a r4 (n−1) + m12 (n)

1

a1(n) =

1−K1T (n) M2 (n)

(15)

1 1−K2T (n) M1 (n)

(16)

Dual forward predictions errors

The proposed dual Kalman gain that is used in relations (13) and (14) can be calculated as follows:

(17)

−1 e2 (n) ⎡ ⎤ ⎡ K2 (n) ⎤ = (λα2 (n−1) + c 0) ⎢ ⎥ − K (n 1) ∗ 2 λα2 (n−1) + c 0 ⎦ ⎣ e (n) 2 ⎣ ⎦

(18)

=

r3(n) r4 (n) + ca

e1 (n) = m2 (n)−a1m2 (n−1) e2 (n) = m1 (n)−a2m1 (n−1)

Dual forward predictions variances

α1(n) = λ aα1(n−1) + e12 (n) ,

Dual adaptation gains

α2 (n) = λ aα2 (n−1) + e22 (n) C1 (n) = γ1 (n) K1(n) ,

C2 (n) = γ2 (n) K2 (n) Dual forward filtering errors

u1 (n) = m1 (n)−W1T (n) M2 (n)

Dual forward filters updates

u2 (n) = m2 (n)−W2T (n) M1 (n) W1( n+ 1) = W1(n)−μ1(u1 (n) C1 (n))

−1

e1 (n) ⎡ ⎤ ⎡ K1(n) ⎤ = (λα1(n−1) + c 0) ⎥ λα1(n−1) + c 0 ⎢ K1(n−1) ⎣ ∗ ⎦ e (n) 1 ⎣ ⎦

r1(n) , a (n) r2 (n) + ca 2

W2 ( n+ 1) = W2 (n)−μ 2 (u2 (n) C2 (n)) Where E0 is an initialization constant.

where λ (0 < λ < 1) is an exponential forgetting factor and c 0 is a small positive constant used to avoid division by very small values in absence of the input signal (silence periods). The asterisk in (17) and (18) represents the last unused element of the dual Kalman gain vectors K1(n) and K2 (n) , respectively. The parameters α1 and α2 are the forward prediction errors variances, they are defined as follows:

r2 (n) = λ a r2 (n−1) + m22 (n)

(28)

r3 (n) = λ a r3 (n−1) + m1 (n)m1 (n−1)

(29)

r4 (n) = λ a r4 (n−1) + m12 (n)

(30)

The prediction errors e1 (n) and e2 (n) can be calculated using a firstorder prediction model as follows:

where λ a is a forgetting factor and ca is a small positive constant. We recall here that we can also compute the parameters r1(n) , r2 (n) , r3 (n) , and r4 (n) by the correlation function of m1(n) and m2(n) as follows: r1(n) = rm2 (1) , r2 (n) = rm2 (0) , r3 (n) = rm1(1) , and r4 (n) = rm1(0) , where rm1(n) and rm2 (n) are the correlation functions of m1(n) and m2(n) respectively. A listing of the proposed DFNLMS algorithm is given in Table 1.

e1 (n) = m2 (n)−a1m2 (n−1)

(21)

5. Analysis of simulation results

e2 (n) = m1 (n)−a2m1 (n−1)

(22)

α1(n) = λ aα1(n−1) + e12 (n)

(19)

α2 (n) = λ aα2 (n−1) + e22 (n)

(20)

In this section, we present the simulation results of the proposed algorithm in the noise reduction and speech enhancement applications. We consider the mixture model of Fig. 1 to generate the combination signals m1 (n) and m2 (n) . The source signal is a speech signal of about 4 s, the sampling frequency is fs = 8 kHz and the point source noise signal is the USASI (United State of America Standard Institute now ANSI) noise. The mixing signals m1 (n) and m2 (n) are generated by two cross–coupling impulse responses h1 (n) and h2 (n) . These impulse responses are generated according to the proposed model in [3]. In Fig. 4, we show the example of the simulated impulse responses h1 (n) and h2 (n) with L = 128. In our work, the impulse response h1(n) and h2(n) represent the cross-path between the speech signal and the punctual noise source according to the model given in Section 2. However, in order to be consistent with the physic, we have used a confirmed model to generate these impulse response. According to Ref. [3], the crossimpulse responses h1(n) and h2(n) are generated by random sequences according to exponentially smoothed functions, i.e. h (n) = Ae−Bn , where A is a scale factor that is linked to the variance of random sequence (in our case the scalar A is taken equal to 1) and B is a damping factor, which models the absorption of the sound waves on the walls of the car and is therefore linked to the reverberation time tr, i.e. B = 3 log (10)/tr [3]. In Fig. 4, we present the example of the simulated impulse responses with L = 128 (see Fig. 5–7). To evaluate the quality of the speech signal at the output of the proposed algorithm, we present in the next subsections the comparative results obtained by the proposed algorithm (DFNLMS) and the classical double NLMS algorithm in terms of the following objectives measures

where a1 and a2 are prediction parameters that are obtained by miniE [e1 (n)2], and E [e2 (n)2], mizing the function i.e. E [e1 (n)2] = 0, and E [e2 (n)2] = 0 . The derivative of these last functions with respect to a1(n) , and a2 (n) respectively, leads to the following relations:

a1(n) =

E [m2 (n)m2 (n−1)] r (n) = 1 r2 (n) E[m22 (n−1)]

(23)

a2 (n) =

E [m1 (n)m1 (n−1)] r (n) = 3 r4 (n) E [m12 (n−1)]

(24)

where r1(n) and r2 (n) represent respectively, the first coefficient of the autocorrelation function of the mixture m2 (n) and the power of the mixture m2 (n) . The scalar r3 (n) and r4 (n) represent respectively, the first coefficient of the autocorrelation function of the mixture m1 (n) and the power of the mixture m2 (n) . Relations (23) and (24) can be calculated in recursive manner as follows:

a1(n) =

r1(n) r2 (n) + ca

(25)

a2 (n) =

r3 (n) r4 (n) + ca

(26)

where r1(n) , r2 (n) , r3 (n) , and r4 (n) are estimated recursively by the following relations:

r1(n) = λ a r1(n−1) + m2 (n)m2 (n−1)

(27) 103

Applied Acoustics 135 (2018) 101–110

A. Sayoud et al.

1

1

0.5

Amplitude

Amplitude

0.5 0 -0.5

-0.5

-1 -1.5

0

0

20

40

60

80

100

120

-1 0

140

20

40

60

80

100

120

140

Samples

Samples

Fig. 4. Example of simulated impulse responses in left h1 (n) and in right h2 (n).

criteria: (i) The Segmental Signal to Noise Ratio (SegSNR); (ii) The segmental Mean Square Error (SegMSE); (iii) The System Mismatch (SM), (iv) The cepstrale distance (CD). It should be noted that as we are interested on the enhanced speech signal quality, we will focus only on the output u1 (n) and the adaptive filter w1 (n) . We recall that the parameters of each algorithm are summarized in Table 2.

5.2. Segmental MSE (SegMSE) evaluation

5.1. Segmental SNR (SegSNR) evaluation

SegMSEdB =

In order to quantify the convergence of the adaptive filters of the proposed algorithm and the classical DNLMS, we propose to use a segmental mean square error (SegMSE) criterion that is given by the following relation:

We have evaluated the output segmental SegSNR of the proposed algorithm and the classical DNLMS one in different conditions tests. The SegSNR estimation is based on the following relation:

SegSNRdB =

10 M

M−1

∑ m=0

(31)

where s(n) and u1 (n) are the original and the enhanced speech signals, respectively. The parameters ‘M’ and ‘N’ are the number of segments and the segment length, respectively. We note that at the output, we get ‘M’ values of the SegSNR criterion, each one is mean averaged on ‘N’ samples. The symbol |•| represents the absolute operator. We recall here that all the ‘M’ segments correspond to only speech signal presence periods. The log10 symbol is the base 10 logarithm of a number. The simulation parameters are the same as given in Table 2. The obtained results are shown in Figs. 8, 9 and 10. In the experiment of Figs. 8–10, the input SNR is taken the same at the two noisy observations and equal to −3 dB, 0 dB, and then 3 dB, respectively. From these Figures, we can clearly observe the good behavior of the proposed algorithm in comparison with the classical DNLMS ones. Also, we can say that the proposed algorithm converge to the optimal solution faster than the DNLMS one. This remarks is observed with the entire test when L = 64, 128, and 256. The same remarks is noted when the input signals are highly corrupted by the noise components.

Frequency (Hz) 0.5

1

1.5

2

2.5

Amplitude

Samples

3

1

1.5

Samples

2

2.5

Amplitude

n=N m

⎞ |s(n)−u1 (n)|2 ⎟ ⎠

2000 0 0

0.5

1

x 10

Frequency (Hz) 0.5



1.5

2

2.5

3

Time (s)

0

0

N m+ N − 1

4000

4

1

-1

m=0

⎛1 log10 ⎜ N ⎝

(32)

In this section, we have evaluated the system mismatch (SM) criterion values obtained by the proposed algorithm and the classical

0

0



5.3. System mismatch (SM) evaluation

1

-1

M−1

where ‘N’ is the time averaging frame length of the original and output signals s(n) and u1(n), respectively, or the segment length. The ‘M’ is the number of segments when speech signal is absent. Relation (32) shows that the SegMSE criterion is evaluated only in the absence speech periods [3,24]. Simulations parameters are the same as given in Table 2. Simulations parameters are the same as given in Table 2. The obtained results are given in Figs. 11–13. From these Figures, we see well the faster convergence speed performance of the proposed algorithm in comparison with the DNLMS. This good property of fast convergence speed is due to the fast that the prediction of the proposed algorithm allows whitening the input and the proposed algorithm works under ideal conditions. This situation is not valid for the DNLMS algorithm that is highly penalized by the input properties. We also noted the robustness of the proposed algorithm when different input SNRs are used, i.e. −3 dB, 0 dB, and 3 dB. Also, the proposed algorithm keep a fast convergence even the adaptive filters length is high. These two good characteristics are not available in the DNLMS algorithm behavior.

N m+ N − 1

|s(n)|2 ∑ ⎞ ⎛ log10 ⎜ N m+nN=−N1m 2⎟ |s(n) − u ( n )| ∑ 1 ⎠ ⎝ n=N m

10 M

3 4

x 10

4000 2000 0 0

1

2

Time (s)

3

Fig. 5. Original signals [In left] and their corresponding spectrogram [In right]. Speech signal [above] and noise signal [below].

104

Applied Acoustics 135 (2018) 101–110

Frequency (Hz)

A. Sayoud et al.

Amplitude

1 0

-1

0

0.5

1

1.5

2

2.5

Frequency (Hz)

1

1.5

Amplitude

0

0.5

1

2

2.5

0 0

0.5

1

2

2.5

3

1.5

2

2.5

3

4000 2000

3

Samples

1.5

Time (s)

x 10

0

0.5

0

4

1

0

2000

3

Samples

-1

4000

Time (s)

4

x 10

Frequency (Hz)

Fig. 6. A sample of mixing signals and their corresponding spectrogram inside: mixing signal m1 (n) (left above), mixing signal m1 (n) [left bottom].

Amplitude

1

0

-1

0

0.5

1

1.5

2

2.5

Samples

3

x10

4

4000

2000

0

0

1

2

3

Time (s)

Fig. 7. Estimated speech signal and its corresponding spectrogram obtained by the proposed algorithm.

that the proposed algorithm behaves more efficiently than the classical DNLMS. The same conclusion can be made as before in previous sections. The convergence speed toward the true coefficients of the cross adaptive filters allows to get a good SegMSE, SegSNR, and a fast SM criteria values. These three criteria confirm the performance property of the proposed algorithm in all situations when highly noisy observations and long impulse responses are available.

Table 2 Simulation parameters of the proposed and the classical DNLMS algorithms. Proposed algorithm parameters Step-sizes of the adaptive filters w1, w 2 are given respectively by: μ1 = μ 2 = 1.8 Exponential forgetting factor: λ = 0.9925.Forgetting factor: λ a = 0.9985. Positive constant: c 0 = 1, ca = 0.001. Initialization constant: E0 = 5 . Classical double NLMS algorithm parameters

5.4. Cepstral distance (CD) criterion evaluation

Step-sizes of the adaptive filters w1, w2 are given respectively by: μ1 = μ 2 = 1.8.

We measure the distortions amounts of the output processing signals of the two algorithms, i.e. the proposed algorithm and the classical DNLMS in terms of cepstral distance criterion that is estimated as follows:

DNLMS one. As we are just interesting on the output speech u1 (n) , we have evaluated this objective SM criterion only on the adaptive filter w1 (n) . The SM of w1 (n) is evaluated according to the following expression: ⎜

‖h1−w1‖ ⎞ ‖h1‖ ⎠

0 -20 -40 0

DNLMS [28] Proposed DFNLMS 20

40

60

Bloc of 512 samples

80

100

Segemental SNR (SegSNR) in (dB)

Segemental SNR (SegSNR) in (dB)

where . represent the mathematical Euclidean norm operator. Simulations parameters are the same as given in Table 2. The obtained results are given in Figs. 14–16. From these Figures, we can see clearly

20

Tm + T − 1



log10

m=0

40 20 0 -20

DNLMS [28] Proposed DFNLMS 20

(cs (n)−cu1 (n))2

(34)

π log|S(ω)|e jωndω −π

60

-40 0

∑ n = Tm

1 =2π

π log|u1 (ω)|e jωndω −π

where cs (n) = ∫ and cu1 (n) ∫ are the nth real cepstral coefficients of the signals s(n) and u1(n), respectively. We recall here that S(ω) and U1 (ω) present the short Fourier transform of the original speech signal s(n) and the enhanced one u1 (n) , respectively. ‘T’ is the mean averaging value of the CD criterion and ‘M’

(33)

40

M−1

1 2π



60

10 M

40

60

80

Bloc of 512 samples

100

Segemental SNR (SegSNR) in (dB)

SMdB = 20log10 ⎛ ⎝

CDdB =

60 40 20 0 -20 -40 0

DNLMS [28] Proposed DFNLMS 20

40

60

80

100

Bloc of 512 samples

Fig. 8. SegSNR evaluation of the proposed algorithm and the classical DNLMS one for the adaptive filter length L: (in left) L = 64, (in middle) L = 128, (in right) L = 256. The input SNR is −3 dB at the two observations.

105

Applied Acoustics 135 (2018) 101–110

40

20

0 DNLMS [28] Proposed DFNLMS

-20 0

20

40

60

Bloc of 512 samples

80

100

60

40

20

0 DNLMS [28] -20 0

Proposed DFNLMS 20

40

60

80

100

Segemental SNR (SegSNR) in (dB)

60

Segemental SNR (SegSNR) in (dB)

Segemental SNR (SegSNR) in (dB)

A. Sayoud et al.

60

40

20

0 DNLMS [28] Proposed DFNLMS

-20 0

20

40

60

80

100

Bloc of 512 samples

Bloc of 512 samples

40 20 0 -20 0

20

40

DNLMS [28] Proposed DFNLMS 60 80 100

Bloc of 512 samples

60

40

20

0 DNLMS [28] Proposed DFNLMS

-20 0

20

40

60

80

100

Bloc of 512 samples

Segemental SNR (SegSNR) in (dB)

60

Segemental SNR (SegSNR) in (dB)

Segemental SNR (SegSNR) in (dB)

Fig. 9. SegSNR evaluation of the proposed algorithm and the classical DNLMS one for the adaptive filter length L: (in left) L = 64, (in middle) L = 128, (in right) L = 256. The input SNR is 0 dB at the two observations.

60 40 20 0 -20 0

20

DNLMS [28] Proposed DFNLMS 40 60 80

Bloc of 512 samples

100

DNLMS [28] Proposed DFNLMS

-20

-40

-60

-80 0

50

100

150

200

250

Bloc of 200 samples

300

0

DNLMS [28] Proposed DFNLMS

-20 -40 -60 -80 0

50

100

150

200

250

300

Mean Square Error (MSE) in (dB)

0

Mean Square Error (MSE) in (dB)

Mean Square Error (MSE) in (dB)

Fig. 10. SegSNR evaluation of the proposed algorithm and the classical DNLMS one for the adaptive filter length L: (in left) L = 64, (in middle) L = 128, (in right) L = 256. The input SNR is 3 dB at the two observations.

0

DNLMS [28] Proposed DFNLMS

-20 -40 -60 -80 0

50

100

150

200

250

300

Bloc of 200 samples

Bloc of 200 samples

DNLMS [28] Proposed DFNLMS

-30 -40 -50 -60 -70 -80 0

50

100

150

200

250

Bloc of 200 samples

300

-20

DNLMS [28] Proposed DFNLMS

-30 -40 -50 -60 -70 -80 0

50

100

150

200

250

Bloc of 200 samples

300

Mean Square Error (MSE) in (dB)

-20

Mean Square Error (MSE) in (dB)

Mean Square Error (MSE) in (dB)

Fig. 11. SegMSE evaluation of the proposed algorithm and the classical DNLMS one. The adaptive filter lengths are L: (in left) L = 64, (in middle) L = 128, (in right) L = 256.The input SNR at the two observations is −3 dB.

-20

DNLMS [28] Proposed DFNLMS

-30 -40 -50 -60 -70 -80 0

50

100

150

200

250

300

Bloc of 200 samples

Fig. 12. SegMSE evaluation of the proposed algorithm and the classical DNLMS one. The adaptive filter lengths are L: (in left) L = 64, (in middle) L = 128, (in right) L = 256. The input SNR at the two observations is 0 dB.

proposed algorithm distort less the output speech signal in all the situations, i.e. different input SNRs (−3 dB, 0 dB, and 3 dB), and different impulse response lengths (64, 128 and 256 coefficients). A poor behavior of the DNLMS algorithm is noted when L and input SNR are

represents the number of segment where only speech is present. The simulations parameters of this experiment are the same as given in Table 2. The obtained results of the CD criterion are reported in Figs. 17–19. From these three Figures, we can confirm that the 106

Applied Acoustics 135 (2018) 101–110

DNLMS [28] Proposed DFNLMS

-30 -40 -50 -60 -70 -80 0

50

100

150

200

250

300

-20

DNLMS [28] Proposed DFNLMS

-30 -40 -50 -60 -70 -80 0

50

100

150

200

250

300

Mean Square Error (MSE) in (dB)

-20

Mean Square Error (MSE) in (dB)

Mean Square Error (MSE) in (dB)

A. Sayoud et al.

-20

DNLMS [28] Proposed DFNLMS

-30 -40 -50 -60 -70 -80 0

50

100

Bloc of 200 samples

Bloc of 200 samples

150

200

250

300

Bloc of 200 samples

DNLMS [28] Proposed DFNLMS

0 -10 -20 -30 -40 -50 -60 0

1

2

3

4

5

samples

6 4 x 10

10

DNLMS [28] Proposed DFNLMS

0 -10 -20 -30 -40 -50 -60 0

1

2

3

4

5

samples

6 4 x 10

System Mismatch (SM) in (dB)

10

System Mismatch (SM) in (dB)

System Mismatch (SM) in (dB)

Fig. 13. SegMSE evaluation of the proposed algorithm and the classical DNLMS one. The adaptive filter lengths are L: (in left) L = 64, (in middle) L = 128, (in right) L = 256. The input SNR at the two observations is 3 dB.

10

DNLMS [28] Proposed DFNLMS

0 -10 -20 -30 -40 -50 0

1

2

3

4

5

samples

6 4 x 10

DNLMS [28] Proposed DFNLMS

0 -10 -20 -30 -40 -50 -60 0

1

2

3

samples

4

5

6 4 x 10

10

DNLMS [28] Proposed DFNLMS

0 -10 -20 -30 -40 -50 0

1

2

3 4 samples

5

6 4 x10

System Mismatch (SM) in (dB)

10

System Mismatch (SM) in (dB)

System Mismatch (SM) in (dB)

Fig. 14. SM evaluation of the proposed algorithm and the classical DNLMS for the adaptive filter length L: (in left) L = 64, (in middle) L = 128, (in right) L = 256. The input SNR at the two observations is −3 dB. 10

DNLMS [28] Proposed DFNLMS

0 -10 -20 -30 -40 -50 0

1

2

3

samples

4

5

6

4

x 10

DNLMS [28] Proposed DFNLMS

0 -10 -20 -30 -40 -50 0

1

2

3

4

samples

5

6 4 x 10

10

DNLMS [28] Proposed DFNLMS

0 -10 -20 -30 -40 -50 0

1

2

3

4

samples

5

6 4 x 10

System Mismatch (SM) in (dB)

10

System Mismatch (SM) in (dB)

System Mismatch (SM) in (dB)

Fig. 15. SM evaluation of the proposed algorithm and the classical DNLMS for the adaptive filter length L: (in left) L = 64, (in middle) L = 128, (in right) L = 256. The input SNR at the two observations is 0 dB.

10

DNLMS [28] Proposed DFNLMS

0 -10 -20 -30 -40 0

1

2

3

4

samples

5

6 4 x 10

Fig. 16. SM evaluation of the proposed algorithm and the classical DNLMS for the adaptive filter length L: (in left) L = 64, (in middle) L = 128, (in right) L = 256. The input SNR at the two observations is 3 dB.

first algorithm is the robust forward blind source separation (RFBSS) algorithm [24]. This algorithm is proposed to automatically control the DNLMS and improve the convergence speed of the DNLMS algorithm in the situation when speech and noise are present together (speech plus noise periods). The second algorithm is the wavelet transform forward blind source separation (WFBSS) algorithm [26] which is a direct implementation of the DNLMS algorithm in the wavelet domain. this

selected high and low, respectively.

5.5. A comparative study between the proposed DFNLMS and the state-ofthe-art algorithms with different types of noises In this section, we compare the performance of the proposed DFNLMS algorithm with the following state-of-the-art algorithms: the 107

Applied Acoustics 135 (2018) 101–110

A. Sayoud et al. -1

-1 -2 -3

Noisy Speech DNLMS [28] Proposed DFNLMS

-4 -5 -6 0

20

40

60

80

100

Bloc of 512 samples

Cepstral Distance CD in (dB)

-1

Cepstral Distance CD in (dB)

Cepstral Distance CD in (dB)

0

-2 -3 -4

Noisy Speech DNLMS [28]

-5

Proposed DFNLMS

-6 -7 -8

0

20

40

60

Bloc of 512 samples

80

-2 -3 -4

-6 -7 -8 -9

100

Noisy Speech DNLMS [28] Proposed DFNLMS

-5

0

20

40

60

Bloc of 512 samples

80

100

-1

-2 -3 -4 Noisy Speech DNLMS [28] Proposed DFNLMS

-5 -6 -7 -8 -9

0

20

40

60

80

100

Bloc of 512 samples

-1

Cepstral Distance CD in (dB)

-1

Cepstral Distance CD in (dB)

Cepstral Distance CD in (dB)

Fig. 17. Evaluation of the CD by the proposed algorithm and the classical DNLMS one. The adaptive filter length L: (in left) L = 64, (in middle) L = 128, (in right) L = 256. The input SNR at the two observations is −3 dB.

-2 -3 Noisy Speech DNLMS [28] Proposed DFNLMS

-4 -5 -6 -7 0

20

40

60

80

Bloc of 512 samples

100

Noisy Speech DNLMS [28] Proposed DFNLMS

-2 -3 -4 -5 -6 -7 -8

0

20

40

60

Bloc of 512 samples

80

100

Noisy Speech DNLMS [28] Proposed DFNLMS

-3 -4 -5 -6 -7 -8 -9

0

20

40

60

Bloc of 512 samples

80

100

-2 -3 -4 -5 Noisy Speech DNLMS [28] Proposed DFNLMS

-6 -7 -8 -9

0

20

40

60

Bloc of 512 samples

80

Cepstral Distance CD in (dB)

-2

Cepstral Distance CD in (dB)

Cepstral Distance CD in (dB)

Fig. 18. Evaluation of the CD by the proposed algorithm and the classical DNLMS one. The adaptive filter length L: (in left) L = 64, (in middle) L = 128, (in right) L = 256. The input SNR at the two observations is 0 dB.

100

-1

Noisy Speech DNLMS [28] Proposed DFNLMS

-2 -3 -4 -5 -6 -7 -8

0

20

40

60

80

100

Bloc of 512 samples

Fig. 19. Evaluation of the CD by the proposed algorithm and the classical DNLMS one. The adaptive filter length L: (in left) L = 64, (in middle) L = 128, (in right) L = 256. The input SNR at the two observations is 3 dB.

algorithm in the situation of L = 512 coefficients are given in Table 3. We have used three kind of noises taken from Aurora 3 [30] database, theses noise types are white, car, babble and street; we have used three values of the input SNRs, i.e. −6 dB, 0 dB, and 6 dB. We have evaluated three conventional criteria, which are the SegSNR, SegMSE and the CD. All the obtained results are presented below. In Fig. 20, we present the obtained simulation results of the four simulated algorithms compared in terms of the segmental SNR (SegSNR). From this Figure, we note the good superiority of the proposed DFNLMS algorithm in comparison with other ones with different noise types. We have also noted a close behavior of the WFBSS algorithm with the proposed DFNLMS algorithm especially with babble and street noises. Also, we have noted a close performance between DNLMS and RFBSS algorithms. From this experiment, we confirm the good performance of the proposed DFNLMS algorithm in cancelling the different noise types at the output. In the second experiment shown by Fig. 21, we have evaluated the CD criteria for the four algorithms. In this experiment, we have also noted the superiority of the proposed algorithm in comparison with other ones [−9.55 dB]. We have also noted a close performance of CD between WFBSS and the proposed DFNLMS algorithm especially with car noise. The DNLMS algorithm behaves poorly in comparison with the

algorithm is proposed to improve the convergence speed of the DNLMS one. In this comparative study, we have taken the DNLMS algorithm as a reference algorithm. We recall here that from the simulation of the previous sections, we decide to take the real and adaptive filters equal to L = 512. The impulse response is that of a room and is composed about 512 points, this means that we exactly model the impulse response of the room by taking the adaptive filters equal to 512. This means that there is no under-modelisation problem in this of simulation. We have to note that all the parameters of each algorithm are summarized and given by Table 3. Parameters in Table 3 are selected to give the best performance of each algorithm. In the case of RFBSS algorithm [24], we have selected the same fixed step-size parameters as in the case of the other algorithms (µ1 = µ2 = 1.95) and we have selected a smoothing parameters α = 0.899. All these parameters are selected to get the best performance of RFBSS algorithm in term of convergence speed and steady state values. However, in the WFBSS algorithm [26], we have selected the same fixed step-size parameters as in the other algorithms to do a fair comparison between them. The discrete wavelet transform (DWT) scale is selected equal to 8 which is a good compromise between a best convergence speed and low steady state values. However, in the case Finally, The best simulation parameters of the proposed DFNLMS 108

Applied Acoustics 135 (2018) 101–110

A. Sayoud et al.

Segmental MSE (SegMSE) in (dB)

Table 3 Simulation parameters of comparative dual noise reduction algorithms, i.e. DNLMS [28], RFBSS [24], WFBSS [26] and the proposed DFNLMS [In this paper] algorithms. Parameters Input signals

Speech signal: a male speaking in French, Noises: white, car, babble, street, Sampling frequency rate: fs = 8 kHz Real filters length: 512, Input SNR1 = 0 dB, Input SNR2 = 0 dB Adaptive filter length w1, w 2 : 512, Fixed step-sizes: μ1 = 1.95 , μ 2 = 1.95 Adaptive filter length of w1, w 2 : 512, Fixed step-sizes: μ1 = 1.95 , μ 2 = 1.95, α = 0.899, γ = 1 − α [24] Adaptive filter length of w1, w 2 : 512, Fixed step-sizes: μ1 = 1.95 , μ 2 = 1.95, ρ1 = 0.90, ρ2 = 0.75, φ = 0.0002, ξ = γ = 10−6, [26] DWT scale: 8 Adaptive filter length of w1, w 2 : 512, Fixed step-sizes: μ1 = 1.95 , μ 2 = 1.95, Exponential forgetting factor: λ = 0.9982 . Forgetting factor: λ a = 0.9988 . Positive constant: c 0 = 1, ca = 0.0035 . Initialization constant: E0 = 3

Mixing signals DNLMS [28] RFBSS [24]

WFBSS [26]

Proposed DFNLMS [In this paper]

DNLMS [28]

RFBSS [24]

WFBSS [26]

Proposed DFNLMS [In this paper]

Output SegSNR in (dB)

58

57,69

57,22

56,8

56

54

53,91

46 -6

0

6

-6

White

0

6

-6

Car

0

6

-6

Babble

0

6

Street

Input SNR in (dB) Fig. 20. SegSNR evaluation of the proposed DFNLMS and state-of-the-art algorithms, i.e. RFBSS [24], WFBSS [26] with four type of noises (white, car, babble, and street). Filter length L = 512 and input SNRs is −6 dB, 0 dB, and 6 dB.

RFBSS [24]

WFBSS [26]

Proposed DFNLMS [In this paper]

Cepstral Distance in (dB)

-5 -6,25

-6,65

-6,51

-7,27

-8

-8,65

-9

-9,07

-8,86 -8,89

-9,2

-9,55

-9,05-9,12

-10 -6

0 White

6

-6

0 Car

6

-6

0 Babble

6

-62,67

-64

-63,98

-66 -68

-64,25 -65,95

-66,87

-66,99

-70

-69,68

-69,98 -71,51

-72 -6

0

6

-6

-69,98 -71,28

-71,53 0

6

Car

-6

0

Babble

6

-6

0

6

Street

In this work, we have proposed a new dual algorithm for blind speech quality enhancement and acoustic noise reduction. This new algorithm can be seen as a combination of the forward blind source separation structure and the fast normalized least mean square FNLMS algorithm initially proposed for acoustic echo cancellation. The new dual algorithm has shown a good behavior of convergence speed with different conditions tests and noisy observations levels (highly noised conditions are investigated along this paper). In the first stage of simulation, we have evaluated the convergence speed performance in comparison with its original version (i.e. the DNLMS) by the mean of the system mismatch (SM) criterion. The obtained results have shown that the proposed DFNLMS algorithm is numerically stable and does not need any initial condition to converge towards the optimal solution to enhance the corrupted speech signal. Also, the proposed algorithm reduces the acoustic noise reduction in blind conditions and no need for a priori information about the source signals to enhance the speech signal at the output. In the second stage of simulation, we have compared the performance of the proposed DFNLMS algorithm with the state-of-theart algorithms (i.e., DNLMS, RFBSS, and WFBSS). From the obtained results, we can say that the proposed DFNLMS algorithm has shown the best performance in terms of three objective criteria that support this

-4

-7

-62

6. Conclusion

DNLMS [28]

-6

-60

other algorithms. On think must be noted is that the four algorithms enhance the quality of the speech signal at the output and all of them give performance of CD below −5 dB which is a good performance even with low input SNRs (i.e. −6 dB, see Fig. 21). In the third experiment, we have evaluated the final values of the segmental MSE (SegMSE). The obtained results of the four algorithms, i.e. DNLMS, RFBSS, WFBSS and the proposed DFNLMS are reported on Fig. 22. We note that the SegMSE criterion is evaluated only in speech presence periods at the processing output. The results of Fig. 22 shows clearly the superiority performance of the proposed DFNLMS algorithms in term of final MSE values. The most closed algorithm to this performance is the RFBSS especially when the input SNRs is low (i.e. −6 dB in the case of white, babble and street noises). The worst performance in this experiment is that of WFBSS algorithm. Again, this comparative study shows the good behavior of the proposed DFNLMS algorithm in enhancing the speech signal at the output. Also, we can say that the DFNLMS algorithm distort less the speech signal as the SegMSE is the lower one in comparison with state-of-the-art algorithms, i.e. DNLMS, RFBSS, and WFBSS. All these experiments prove that the proposed DFNLMS can be a good alternative to the used algorithm of literature in the field of noise cancellation and speech enhancement even in very noisy condition and divers situations with different noise types.

48,72

48

-58

Fig. 22. SegMSE evaluation of the proposed DFNLMS and state-of-the-art algorithms, i.e. RFBSS [24], WFBSS [26] with four type of noises (white, car, babble, and street). Filter length L = 512 and input SNRs is −6 dB, 0 dB, and 6 dB.

51,64

51,36

50,49

50

Proposed DFNLMS [In this paper]

Input SNR in (dB)

54,74

52

RFBSS [24]

WFBSS [26]

White

56,36

55,77

55,6

DNLMS [28]

-6

0

6

Street

Input SNR in (dB) Fig. 21. CD evaluation of the proposed DFNLMS and state-of-the-art algorithms, i.e., i.e. RFBSS [24], WFBSS [26] with four type of noises (white, car, babble, and street). Filter length L = 512 and input SNRs is −6 dB, 0 dB, and 6 dB.

109

Applied Acoustics 135 (2018) 101–110

A. Sayoud et al.

filtering. IEEE Trans Acoust Speech Signal Process 1984;ASSP-32:304–37. [15] Djendi M, Henni R, Sayoud A. A new dual forward BSS based RLS algorithm for speech enhancement. In: International conference on engineering and MIS, ICEMIS 2016, Agadir, Morooco; 2016. [16] Sayed AH. Fundamentals of adaptive filtering. Wiley; 2003. [17] Al-Kindi MJ, Dunlop J. Improved adaptive noise cancellation in the presence of signal leakage on the noise reference channel. Signal Process 1989;17(3):241–50. July. [18] Vanus J, Styskala V. Application of optimal settings of the LMS adaptive filter for speech signal processing. In: Proc 2010 international multi-conference on computer science and information technology; October 2010. p. 767–74. [19] Djendi M, Bendoumia R. Improved subband-forward algorithm for acoustic noise reduction and speech quality enhancement. Appl Soft Comput 2016;42:132–43. [20] Madhavan G, Bruin HD. Crosstalk resistant adaptive noise cancellation. Ann Biomed Eng 1990;18:57–67. [21] Gabrea M. Double affine projection algorithm-based speech enhancement algorithm. In: Proc IEEE. ICASSP Montréal, Canada, vol. 2; April 2003. p. 904–7. [22] Djendi M, Scalart P. Double Pseudo Affine Projection algorithm for speech enhancement and acoustic noise reduction. In: Proc IEEE EUSIPCO, Romania, Bucharest, vol. 1; 27–31 Aug. 2012. p. 2080–4. [23] Kuo SM, Peng WM. Asymmetric crosstalk-resistant adaptive noise canceler. In: Proc IEEE workshop on Signal Processing System, October; 1999. p. 605–14. [24] Zoulikha M, Djendi M. A new regularized forward blind source separation algorithm for automatic speech quality enhancement, vol. 112; November 2016, p. 192–200. [25] Mirchandani G, Zinser RL, Evans JB. A new Adaptive Noise Cancellation Scheme in the Presence of Crosstalk. IEEE Transact Circuits Syst 1992;39(10):681–94. [26] Ghribi K, Djendi M, Berkani D. A New wavelet-based forward BSS algorithm for acoustic noise reduction and speech quality enhancement. Appl Acoust 2016;105:55–66. [27] Zinser RL, Mirchandani G, Evans JB. Some Experimental and theoretical results using a new adaptive filter structure for noise cancellation in the presence of crosstalk. In: Proc ICASSP, Tampa, vol. 3; 1985, p. 1253–6. [28] Van Gerven S, Van Compernolle D. Feedforward and Feedback in a symmetric adaptive noise canceller: stability analysis in a simplified case. In: Proc IEEE. EUSIPCO, Belgium, Brussels, vol. 1; 24–27 Aug. 1992, p. 1081–4. [29] Benallal M Arezki. A fast convergence normalized least-mean-square type algorithm for adaptive filtering. Int J Control Signal Process 2014;28(10):1073–108. [30] Hu Y, Loizou PC. Evaluation of objective quality measures for speech enhancement. IEEE Trans Audio, Speech and language process ASLP, vol. 16, no. 1; January 2008.

conclusion, as the segmental signal to noise ratio (SegSNR), the segmental mean square error (SegMSE), and the cepstral distance (CD). Finally, we can confirm the the proposed DFNLMS algorithm can be a good alternative and candidate for speech enhancement and noise reduction applications. References [1] Loizou PC. Speech enhancement: theory and practice. CRC Press; 2013. [2] Cohen I, Gannot S. Spectral enhancement methods. Springer handbook of speech processing. Springer; 2008. p. 873–902. [3] Djendi M, Gilloire A, Scalart P. Noise cancellation using two closely spaced microphones: experimental study with a specific model and two adaptive algorithms. In: IEEE Int. Conf. ICASSP, Toulouse, France, vol. 3; 14–19 May 2006, p. 744–8. [4] Boll SF. Suppression of acoustic noise in speech using spectral subtraction. Acoust, Speech Signal Process, IEEE Trans 1979;27(2):113–20. [5] Lim JS, Oppenheim AV. Enhancement and bandwidth compression of noisy speech. Proc IEEE 1979;67(12):1586–604. [6] Ephraim Y, Malah D. Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. Acoust, Speech Signal Process, IEEE Trans 1984;32(6):1109–21. [7] Ephraim Y, Malah D. Speech enhancement using a minimum mean-square error logspectral amplitude estimator. Acoust, Speech Signal Process, IEEE Trans 1985;33(2):443–5. [8] Ephraim Y, Van Trees HL. A signal subspace approach for speech enhancement. Speech Audio Process, IEEE Trans 1995;3(4):251–66. [9] Hu Y, Loizou PC. A generalized subspace approach for enhancing speech corrupted by colored noise. Speech Audio Process, IEEE Trans 2003;11(4):334–41. [10] Williamson DS, Wang Y, Wang D. A two-stage approach for improving the perceptual quality of separated speech. In: Proc ICASSP; 2014. p. 7034–8. [11] Widrow B, Stearns SD. Adaptive signal processing. Englewood-Cliffs, NJ: PrenticeHall; 1985. [12] Djendi M, Gilloire A, Scalart P. New frequency domain post-filters for noise cancellation using two closely spaced microphones. In: Proc EUSIPCO, Poznan, vol. 1; 3–8 Sep. 2007. p. 218–21. [13] Rakesh P, Kumar TK. A novel RLS adaptive filtering method for speech enhancement. Electr, Comput, Energetic 2015;9(2). [14] Cioffi J, Kailath T. Fast recursive least squares transversal filters for adaptive

110