Blind speech intelligiblility enhancement by a new dual modified predator-prey particle swarm optimization algorithm

Applied Acoustics 141 (2018) 125–135 Contents lists available at ScienceDirect Applied Acoustics journal homepage: www.elsevier.com/locate/apacoust ...

Download PDF

6MB Sizes 0 Downloads 29 Views

Report

Full Text

Applied Acoustics 141 (2018) 125–135

Contents lists available at ScienceDirect

Applied Acoustics journal homepage: www.elsevier.com/locate/apacoust

Blind speech intelligiblility enhancement by a new dual modiﬁed predatorprey particle swarm optimization algorithm Soﬁane Fislia,b, Mohamed Djendia, a b

T

⁎

University of Blida 1, Signal Processing and Image Laboratory (LATSI), Blida, Algeria Université 8 Mai 1945- Guelma, Laboratoire d’Automatique et Informatique de Guelma (LAIG), Guelma, Algeria

A R T I C LE I N FO

A B S T R A C T

Keywords: Acoustic noise cancellation Particle swarm optimization Predator-prey PSO Output signal-to-noise ratio (SNR) System mismatch (SM)

This paper addresses the problem of acoustic noise cancellation by adaptive ﬁltering algorithms. To deal with acoustic noise reduction and speech enhancement problems, we propose to use the modiﬁed predator-prey particle swarm optimization (MPPPSO) to design a new dual adaptive noise canceller based on swarm intelligence heuristic search. The proposed dual MPPPSO algorithm improves the single-channel PPPSO algorithm convergence speed behavior when a large ﬁlter length is used. Also, the proposed algorithm leads to a low steady-state error in comparison with the single-channel PPPSO algorithm behavior which fails with large ﬁlters length and non-stationary input. The proposed dual MPPPSO algorithm shows signiﬁcant improvement in the system mismatch (SM) and Output signal-to-noise ratio (SNR) values. We present the simulation results of the proposed dual MPPPSO algorithm that conﬁrm its superiority and good performances in comparison with the single-channel PPPSO and the two-channel normalized least mean square (2C-FNLMS) algorithm.

1. Introduction Acoustic noise cancelling refers to the improvement in the quality of degradation of speech signal caused by diﬀerent types of noise. Several methods were proposed to resolve the problem of adaptive noise cancellation (ANC) [1], by the use of adaptive ﬁltering algorithms. The most used adaptive ﬁltering algorithm is the least mean square (LMS) [2], and others ones based on stochastic and meta-heuristic optimization techniques such as artiﬁcial bee colony algorithm (ABC) [3], genetic algorithm (GA) [4,5], and particle swarm optimization (PSO) [5]. The LMS-based algorithms which are widely used due to their simplicity in implementation and computation suﬀer from local minima problem and the global minima are seldom reached. In order to overcome this problem, the stochastic and meta-heuristic optimization algorithms are able to avoid local minima problem. Various meta-heuristic approaches were adopted to solve the ANC problem. In [7,8], the authors suggested to use adaptive genetic algorithm, standard particle swarm optimization and its derived version, gravitational search algorithm (GSA) and bat algorithm (BA) to be applied in speech enhancement and acoustic noise reduction application. In this paper, we propose a new dual modiﬁed predator-prey particle swarm optimization (MPPPSO) that can be used as a blind speech signal enhancer (we only suppose the knowledge of noisy observations that are, in our paper, generated by a convolutive mixing model [9]). ⁎

The proposed MPPPSO algorithm is based on the combination between the single-channel predator-prey particle swarm optimization (PPPSO) [10,11], and the forward blind source separation (FBSS) structure. This paper is organized as follows, in Section 2, we present the convolutive mixing model that generates the noisy observations, and we focus on the forward blind source separation (FBSS) structure. Section 3 describes the two-channel normalized least mean square (2CFNLMS) and the two-channel variable step size forward algorithm (2CVSSF). Then our proposed algorithm is presented in Section 4, and then Section 5 is reserved to the simulation results and discussions. Finally, we conclude our work in Section 6. 2. Problem statement 2.1. Simpliﬁed convolutive mixing model In this paper, we consider the simpliﬁed convolutive mixing model proposed in [9]. In Fig. 1, the input signals s (n) and b (n) are the speech signal, and the punctual noise respectively. According to Fig. 1, two microphones placed at the output of the mixing signal provide two noisy observations p1 (n) and p2 (n) . The parameters h12 (n) and h21 (n) represent the cross-FIR ﬁlters between the two channels. We suppose that the speech and the noise signals are statistically independent. The two noisy observations are given by:

Corresponding author. E-mail address: [email protected] (M. Djendi).

https://doi.org/10.1016/j.apacoust.2018.07.006 Received 21 February 2018; Received in revised form 10 June 2018; Accepted 5 July 2018 0003-682X/ © 2018 Elsevier Ltd. All rights reserved.

Applied Acoustics 141 (2018) 125–135

S. Fisli, M. Djendi

( )

( ) ( )

(1)

p2 (n) = b (n) + s (n) ∗h12 (n)

(2)

where (∗) represents the convolution operator. 2.2. Acoustic noise cancelling by forward blind source separation (FBSS) structurer

( )

In this paper, we consider the forward blind source separation (FBSS) structure shown by Fig. 2 [9,13,14], the observed signals p1 (n) and p2 (n) are the inputs of the adaptive FIR ﬁlters w12 (n) and w21 (n) respectively, The output signals, e1 (n) and e2 (n) , of the FBSS structure are given by:

( )

( )

Fig. 1. Two-microphone mixing model with two cross-FIR ﬁlters.

+

( )

p1 (n) = s (n) + b (n) ∗h21 (n)

( )

(3)

e2 (n) = p2 (n)−p1 (n) ∗w12

(4)

The estimations of the speech and noise signal are obtained when these solutions are got, i.e. w21 (n) = h21 (n) , w12 (n) = h12 (n) and we can write e1 (n) = s ̂(n), and e2 (n) = b ̂(n) , where s (̂ n) and b ̂(n) are given as follows:

( )

( ) ( )

( )

e1 (n) = p1 (n)−p2 (n) ∗w21 (n)

+

s (̂ n) = s(n) ∗ [δ (n)−h12 (n) ∗h21

(5)

b (̂ n) = b(n) ∗ [δ (n)−h12 (n) ∗h21

(6)

In addition, the ﬁlter coeﬃcients are updated by the adaptation algorithm using the error signals e1 (n)ande2 (n) , a selected algorithm updates the coeﬃcient of the adaptive ﬁlter whose output gives the estimated noise. Furthermore, the selected algorithm still minimizing

Fig. 2. Two-channel forward blind source separation FBSS structure. Table 1 Two-channel normalized least mean square (2C-FNLMS) [13]. Computation details

Variables

1. Initialize w12 (−1) = 0, w 21 (−1) = 0 2. forn = 1: kdo Estimation of the output signals: T (n−1) p2 (n) 3. e1 (n) = p1 (n)−w21 T (n−1) p1 (n) 4. e2 (n) = p2 (n)−w12 Filters coeﬃcient adaptation: p (n) 5. w12 (n) = w12 (n−1) + μ12 e2 (n)( 1 2 )

L : adaptive ﬁlter length.k: number of iteration.w12 :The adaptive FIR ﬁlters: w12 (n) = [w12,0 (n), w12,1 (n), …, w12, L − 1 (n)]T w 21 :The adaptive FIR ﬁlters: w21 (n) = [w21,0 (n), w21,1 (n), …, w21, L − 1 (n)]T p1 (n) = [p1 (n), p1 (n−1), …, p1 (n−L + 1)]T p2 (n) = [p2 (n), p2 (n−1), …, p2 (n−L + 1)]T

μ12 : First ﬁxed step-size: 0 < μ12 < 2 μ 21: Second ﬁxed step-size0 < μ 21 < 2ξ : Small positive constants that avoids division by zeros.

ξ + p1 (n)

6. w 21 (n) = w 21 (n−1) + μ 21 e1 (n)(

p2 (n) ) ξ + p2 (n)2

7. n = n + 1 8. end for

Table 2 Two-channel variable step size forward algorithm (2C-VSSF) [14]. Computation details

Variables

1. Initialize w12 (−1)= 0, w 21 (−1)= 0, g12 (−1)= 0, g21 (−1)= 0 2. forn = 1: kdo Estimation of the output signals: T (n−1) p2 (n) 3. e1 (n) = p1 (n)−w21 T (n−1) p1 (n) 4. e2 (n) = p2 (n)−w12 Filters adaptation: p (n) 5. w12 (n) = w12 (n−1) + μ12 (n) e2 (n)( 1 2 ) ξ + p1 (n)

6. w 21 (n) = w 21 (n−1) + μ 21 (n) e1 (n)(

p2 (n) ξ + p2 (n)2

)

L : adaptive ﬁlter length.k: number of iteration.w12 :The adaptive FIR ﬁlters: w12 (n) = [w12,0 (n), w12,1 (n), …, w12, L − 1 (n)]T w 21:The adaptive FIR ﬁlters:

w 21 (n) = [w 21,0 (n), w 21,1 (n), …, w 21, L − 1 (n)]T p1 (n) = [p1 (n), p1 (n−1), …, p1 (n−L + 1)]T

p2 (n) = [p2 (n), p2 (n−1), …, p2 (n−L + 1)]T μ12 (n) :

g12 (n)2 g12 (n)2 + δ g (n)2 μ 21, max 21 2 g21 (n) + δ e 2 (n) e1 (n − m) p1 (n)2 + γ

10. g21 (n) = α2g21 (n−1) + (1−α2)

e1 (n) e 2 (n − m) p2 (n)2 + γ

0 < μ12 (n) < μ12, max < 2 μ 21 (n) :

index, m = 0, 1, 2, …, L−1. g12 (n) = [g12,0 (n), g12,1 (n), …, g12, L − 1 (n)]T g21 (n) = [g21,0 (n), g21,1 (n), …, g21, L − 1 (n)]T

Step-sizes adaptation:

9. g12 (n) = α1g12 (n−1) + (1−α1)

variable-step-size:

γ : Small positive constants .δ : Positive constants to control the variation of μ12 (n) and μ 21 (n) .m : Delay

7. μ12 (n) = μ12, max 8. μ 21 (n) =

First

Second variable-step-size0 < μ 21 (n) < μ 21, max < 2α1, α2 :Small positive constants deﬁned between 0 and 1.ξ ,

11. n = n + 1 12. end for

126

Applied Acoustics 141 (2018) 125–135

S. Fisli, M. Djendi

cross-adaptive FIR ﬁlter. The most popular one is the normalized least mean square (NLMS), which is frequently used due to its simplicity of implementation [16]. The NLMS algorithm is used in ANC to identify the impulse responses h21 (n) and h12 (n) . The framework of the Twochannel ﬁxed size forward (2C-FNLMS) algorithm is presented in Table 1 [13].

the mean square error until the adaptive system reaches the best estimate of the reel ﬁlter. The convergence speed performance of selected algorithms is based on the minimum mean square error (MMSE) criteria. The cost function of ANC problem by BSS structure [15] is deﬁned in (7) as follows.

MSE =

1 L

L

∑

ei (n)2

n=0

(7) 3.2. Two-channel variable step size forward algorithm (2C-VSSF)

where L is the length of the input sequence and i = 1, 2 is the channel index.

Although NLMS algorithm has been applied successfully in many speech processing applications such as ANC and AEC, it still has diﬃculties of avoiding conﬂict between fast convergence and low misadjustment. In order to get a better performance for NLMS algorithm, many NLMS-variants based on variable step size techniques have been developed. In this subsection, we present in Table 2 the framework of the Two-channel variable step size forward algorithm (2C-VSSF) [14], which we will use in the comparison with our proposed algorithm.

3. State-of-the-art two-channel adaptive algorithms 3.1. Two-channel normalized least mean square with ﬁxed-step-size (2CFNLMS) Several adaptive ﬁltering algorithms were suggested to adapt the

Table 3 Framework of the proposed dual MPPPSO for FBSS structure [In this paper]. Computation details

Variables

1. Initialize w 21 (−1)= 0, L , Lf , Nd , Ny , MaxIt . 2. forf = 1: Lf : Ldo 3. g = w 21 (f −1) , c1, c2 , wd , wp , P . 4. Initialize xdi by random positions, and vdi by random velocities; 5. fori = 1: Nd do 6. iff (xdi ) < = f (bdi ) 7. setbdi = xdi 8. endif 9. iff (xdi ) < = f (g ) 10. setbdi = g 11. endif 12. Initialize x yi by random positions, and v yi by random velocities; 13. fori = 1: Ny do 14. iff (x yi ) < = f (b yi ) 15. setb yi = x yi 16. endif 17. iff (x yi ) < = f (g ) 18. setb yi = g 19. endif 20. whileiter < = MaxItdo 21. fori = 1: Nd do 22. By (18), update velocity of ith predator particle 23. By (11), update position of ith predator particle 24. iff (xdi ) < = f (bdi ) 25. setbdi = xdi 26. endif 27. iff (xdi ) < = f (g ) 28. setbdi = g 29. endif 30. forj = 1: Ny do 31. By (19), update velocity of j th prey particle 32. By (11), update position of j th predator particle 33. iff (x yj ) < = f (b yj ) 34. setb yj = x yj 35. endif 36. iff (x yj ) < = f (g ) 37. setb yj = g 38. endif 39. endfor 40. endfor −w w iter ) + 0.4 41. w y = wmax − max min iter , wd = 0.2exp(−10 MaxIt MaxIt 2 2 , c2 = c2 + 42. c1 = c1− MaxIt MaxIt 43. if |f (g (iter ))−f (g (iter −1))|≤ kth 44. re-initialize the prey's position 45. endif 46. iter = iter + 1 47. endwhile 48. setw 21 (f ) = g 49. f = f + 1 50. e1 (f) = p1 (f )−p2T (f ) w 21 (f −1) ; endfor f.

w 21:The adaptive FIR ﬁlters.L : adaptive ﬁlter length.Lf : The frame length.Nd : Number of predator particles.Ny : Number of prey particles.maxit : Maximum number of MPPPSO iteration g : Global best positionc1: Cognitive parameterc2 :Social parameterwd :Predator inertia weightw y : Prey inertia weightwmax , wmin : denote the maximum and minimum value of the preys inertia weight and chose to be 0.9 and 0.2.P : Binary variable xdi :The ith predator positionvdi :The ith predator positionbdi :The best position of ith predator x yi :The ith prey positionv yi :The ith preyb yi :The best position of ith predator f (.) :Objective function where: f (x ) =

1 Lf

Lf ∑n = 0 [p1 (f )−p2T (f ) x )]2 p1 (f ) , p2 (f ) : Vector represent

f th frame of the mixing signals.e1 (f): Vector represent f th frame of the denoised signal. Initialization 1: a population of predator particles with random positions xdi , and random velocities vdi in a given search space ( xdi representing a possible solution); Initialization 2: Initialize a population of prey particles with random positions x yi , and random velocities v yi in a given search space ( xdi representing a possible solution);

127

Applied Acoustics 141 (2018) 125–135

S. Fisli, M. Djendi

Fig. 3. Example of simulated impulse response. (In left): h12 (n) , and (in right): h21 (n) .

Fig. 4. Source and noisy signals obtained with input SNR1 = SNR2 = −3dB , ﬁlter length L = 100 and b (n) is a white Gaussian noise: (a) the original speech s (n) , (b) noise signal b(n), (c) noisy speech p1 (n) , (d) noisy speech p2 (n) . Table 4 Simulation parameter of the proposed dual MPPPSO, PPPSO, 2C-FNLMS and 2C-VSSF algorithms. Algorithms

Parameters

Single-channel PPPSO [10,11]

Ny = 18 ; Np = 8 ; c1 = c2 = 2 ; wmin = 0.2;wmax = 0.9;P = 0.00055 ; maxit = 500 ; Search Space [−10, 10]n . μ12 = μ 21 =1;

2C-FNLMS [13] 2C-VSSF [14] Proposed dual MPPPSO [in this paper]

μ12 (0) = 0.2 , μ12,max= 2, μ 21(0) = 0 , μ 21,max= 1, α1 = 0.98 , α2 = 0.60 , δ = 10−4 , γ = ξ = 10−6 .

Nd = 18 ; Ny = 8; c1 = c2 = 2 ; wmin = 0.2;wmax = 0.9;P = 0.00055 ; maxit = 500 ; Search Space [−10, 10]n .

4. The concept of the proposed dual modiﬁed predator-prey PSO algorithm for FBSS structure

vector in the space of solution and a velocity vector which determines the next movement of this agent. The particles update its velocity using current velocity, the best position and the global best position. The PSO process is then repeated until a stop optimization criterion is reached. It has been shown that the PSO can solve eﬃciently a diﬃcult optimization problem. The velocity and position update equations are given by:

In this section, we give the exact mathematical formulation of the proposed dual modiﬁed predator-prey PSO algorithm. First, we introduce the concept of the single-channel predator-prey PSO algorithm. 4.1. Predator-prey PSO algorithm for FBSS structure

vik + 1 = wvik + c1 r1 (bik−xik ) + c2 r2 (g k −xik )

Particle swarm optimization (PSO) [6,17] is a population based search algorithm inspired by the social behavior of birds, bees or a school of ﬁshes. Each swarm’s agent is characterized by a position

x ik + 1 = xik + vik + 1 where 128

vik

and

xik

are velocity and position of the

(8) (9)

i th

particle at iteration

Applied Acoustics 141 (2018) 125–135

S. Fisli, M. Djendi

Fig. 5. Temporal evolution and spectrogram of: (a) Original speech signal, (b) estimated speech signal by the proposed dual MPPPSO algorithm, (c) estimated speech signal by 2C-VSSF algorithm, (d) estimated speech signal by 2C-FNLMS algorithm, (e) estimated speech signal by PPPSO algorithm. Input SNR1 = SNR2 = −3dB . The punctual acoustic noise is white. Filter length is L = 100 , the same results are obtained with L = 256.

• Step 1: Randomly generate the particles, each particlê position representing a possible solution for adaptive FIR ﬁlter h (n) . • Step 2: The input signals s (n) is segmented into frames using 25 ms time window and allowed to be processed frame by frame. • Step 3: Evaluate the objective function for each particle using (7).

k ; bik is the best position visited by the i th particle; g k is the best position which particles have ever found; r1 and r2 are a random number in the range [0, 1]; ﬁnally, w , c1 and c2 are the inertia weight, cognitive and social parameters respectively. According to the acoustic noise cancelling (ANC) algorithm proposed in [7,8], we consider the following steps which can be applied to solve the ANC problem:

• 129

The cost function is deﬁned as the average error between the desired signal and the output of the adaptive ﬁlter in each frame. Step 4: Find the best global particle for iteration k;

Applied Acoustics 141 (2018) 125–135

S. Fisli, M. Djendi

Fig. 6. Comparison of the SM criterion of the adaptive ﬁlter w21 (n) value obtained by the Proposed MPPPSO, single-channel PPPSO, 2C-FNLMS and 2C-VSSF. The punctual acoustic noise is white. The ﬁlter length is L = 100 [In Top] and L = 256 [In Bottom]. [In left: input SNR1 = SNR2 = −3 dB], [In right: input SNR1 = SNR2 = 0 dB ].

• Step 5: The position and velocity of each particle are updated by

vyik+ 1 (f ) = wy vyik (f ) + c1 r1 (byik (f )−x yik (f )) + c2 r2 (g k (f )−x yik (f ))

using (10) and (11).

vik + 1 (f )

=

wvik (f )

+

c1 r1 (bik (f )−xik (f )

−Pasign (xdIk (f )−x yik (f ))exp[−b|xdIk (f )−x yik (f )|]

+

c2 r2 (g k (f )−xik (f ))

x ik + 1 (f ) = xik (f ) + vik + 1

(10)

where d and y represent the predator and prey respectively, bdi is the best predators position, b yi is the best prey’s position; g is the global best position witch both predators end preys have ever found. The subscript I represents the index number of the nearest predator from each prey particle at iteration k which is selected by the following relation:

(11)

where f is the frame index .

• Step 6: Repeat the above steps 2–5 until the stop criteria are met.

I = {k|mink (|xdk−x yi |)}

(14)

and wd and wy are the inertia weight of predators and preys respectively;

However, the PSO suﬀers from local optimum problems, especially when it comes to complex higher-dimensional objective functions. To solve this problem, the authors in [10,11], has introduced predatorprey PSO algorithm (PPPSO). The PPPSO is an algorithm which is inspired from the behavior of schools of sardines and pods of killer whales. In this technique (in PPPSO algorithm), particles are divided into two groups, predator and prey. Predators show the behavior of chasing the center of preys; and preys try to escape from predators with the movement in the search space. This lead to make the particles avoid the local optimal solutions and ﬁnd the global optimal solution. The velocities of the predator and the prey in the PPPSO are expressed as:

vdik+ 1 (f ) = wd vdik (f ) + c1 r1 (bdik (f )−xdik (f )) + c2 r2 (g k (f )−xdik (f ))

(13)

iter ⎞ + 0.4 wd = 0.2exp ⎛−10 maxit ⎠ ⎝ wy = wmax −

wmax −wmin iter maxit

(15) (16)

where wmax and wmin denote the maximum and minimum values of the preys inertia weight, and maxit represents the maximum number of iteration. The constants 0.2 and 0.4 are selected experimentally in accordance of the initial condition of the inertia weight of predators and preys respectively. Finally, P in (13) represents the probability that preys escape from the predators, and a and b are the parameters that determine the manner of the preys escaping from the predators.

(12)

130

Applied Acoustics 141 (2018) 125–135

S. Fisli, M. Djendi

Fig. 7. Comparison of SegSNR values obtained by the Proposed MPPPSO, single-channel PPPSO, 2C-FNLMS and 2C-VSSF. The punctual acoustic noise is white. The ﬁlter length is L = 100 [In Top] and L = 256 [In Bottom]. [In left: input SNR1 = SNR2 = −3 dB], [In right: input SNR1 = SNR2 = 0 dB ].

Firstly, we use relation (18) to update predator and prey velocity

Table 5 Simulation parameter of proposed dual MPPPSO, single-channal PPPSO, 2CFNLMS and 2C-VSSF algorithms.

vdik+ 1 (f )

Parameters

Single-channel PPPSO [10–11]

Ny = 18 ; Np = 8 ; c1 = c2 = 2 ; wp = 0.7;w y = 0.7;P = 0.00055; MaxIt = 500 ; Search

−Pasign (xdjk (f )−x yik (f )). exp[−b|xdjk (f )−x yik (f )|]

prey’s, wd and wy are the The subscript j represents the index of inertia weight given by (15) and (16) respectively. Analysing relations (18) and (19), it can be seen that prey’s escape from all predators not only the nearest one, then we proceed to evaluate the cost function at every escape movement. It can be seen that, if we make the group of predator greater than the group of prey, one can increase the number of the escape movement which will improve the quality of the solution. Finally, we proceed to re-initialize the prey's position if the value of the best cost remains almost constant during a number of iterations, in the goal to achieve the global optimum.

μ12 (0) = 0.2 , μ12,Max= 2, μ 21(0) = 0 , μ 21,Max= 1, α1 = 0.38,

α2 = 0.88 , δ = 10−5 , γ = ξ = 10−6 ; Nd = 18 ; Ny = 8 ; c1 = c2 = 2 ; wp = 0.7;w y = 0.7;P = 0.00055; MaxIt = 500 ; Search Space [−10, 10]n .

The parameters a and b are given by:

a = x span , b =

100 x span

(19)

j th

Space [−10, 10]n . μ12 = μ 21 =1.

Proposed dual MPPPSO [in this paper]

(18)

vyik+ 1 (f ) = wy vyik (f ) + c1 r1 (byik (f )−x yik (f )) + c2 r2 (g k (f )−x yik (f ))

Algorithms

2C-FNLMS [13] 2C-VSSF [14]

= wd vdik (f ) + c1 r1 (bdik (f )−xdik (f )) + c2 r2 (g k (f )−xdik (f ))

|g (i)−g(i−1)|kth (17)

(20)

where |. | stands for the absolute value and kth is a very small quantity nearly to zero, the subscript i represents the index of i th global best cost. Based on the above explanation, the pseudo-code of the proposed modiﬁed predator-prey PSO algorithm for FBSS structure is given in Table 3. We may notice that only the ﬁrst channel tap-weight adaptation w21 (n) is presented since w12 (n) is similarly obtained.

where x span is a span variable. The concept of the proposed dual modiﬁed predator-prey PSO (MPPPSO) algorithm for FBSS structure. In the classical single-channel PPPSO algorithm, since each prey particle moves in the search space following the escape movement from the nearest predator, its best performance bdi and the global best position g , and after some iteration, the escape process may stop progressing when the predator partricles get trapped in a local optimal solution. To solve such problem, a new strategie called dual modiﬁed MPPPSO algorithm is proposed in this paper. It consists to maintain the prey particle in a movement, even if the predators are stacked in a local minima. In next, we give the details of the proposed dual algorithm:

5. Simulation results In next, we have tested and compared the performance of the proposed dual MPPPSO algorithm with its standard single-channel PPPSO version [10–12], the 2C-FNLMS [13] and the 2C-VSSF [14] algorithms under various noisy observation conditions. we have considered the 131

Applied Acoustics 141 (2018) 125–135

S. Fisli, M. Djendi

Fig. 8. Temporal evolution and spectrogram of: (a) Original speech signal, (b) estimated speech signal by the proposed dual MPPPSO algorithm, (c) estimated speech signal by 2C-VSSF algorithm, (d) estimated speech signal by 2C-FNLMS algorithm, (e) estimated speech signal by PPPSO algorithm. Input SNR1 = SNR2 = −3dB. Input noise is USASI. Filter length is L = 100 , the same results are obtained with L = 256.

(i) Time evolution and spectrogram of the output speech signals (ii) System mismatch (SM) deﬁned as follows:

simpliﬁed convolutive mixed model of Fig. 1. The input signal s (n) is a speech signal of a French male speaker. We consider two types of punctual noises b (n) : white Gaussian noise and USASI noise [United States of America standard Institute now ANSI]. The two cross-coupling impulse response h12 (n) and h21 (n) are generated by random sequences, with exponentially decreasing functions [18,19], these two impulse responses were sampled at 8-kHz and both has L = 100 coeﬃcients. In Fig. 3, we present an example of simulated impulse responses that we use to generate the noisy signals p1 (n) and p2 (n) where: the source signals are (i) a speech signal and (ii) a Gaussian noise; the input signal-to-noise ratio (SNRs) at the two microphones are SNR1 = SNR2 = −3 dB (see Fig. 4). In order to evaluate the performance of the proposed dual MPPPSO algorithm, we have used the following criteria:

SMdB = 20log ⎛ ⎝ ⎜

h21−w21 ⎞ h21 ⎠ ⎟

(21)

where h21−w21 is the Euclidian distance between the adaptive coeﬃcient vector w21 (n) and the reel ﬁlter vector h21 (n), with a Euclidian norm h21 . (iii) Segmental Signal-to-noise ratio (SegSNR) between the enhanced speech signal at the output and the original signal. Expressed by:

132

Applied Acoustics 141 (2018) 125–135

S. Fisli, M. Djendi

Fig. 9. Comparison of the SM criterion of the adaptive ﬁlter w21 (n) value obtained by the Proposed MPPPSO, single-channel PPPSO, 2C-FNLMS and 2C-VSSF. The punctual acoustic noise is USASI. The ﬁlter length is L = 100 [In Top] and L = 256 [In Bottom]. [In left: input SNR1 = SNR2 = −3 dB], [In right: input SNR1 = SNR2 = 0 dB ]. U −1

SegSNRdB

⎛ ∑ |s (i)|2 ⎞⎟ ⎜ = 10log ⎜ U − 1i = 0 ⎟ ⎜ ∑ |s (i)−e1 (i)|2 ⎟ ⎜ ⎟ ⎝ i=0 ⎠

5.1.1. Temporel and frequency domain description of the output speech signals In Fig. 5 we compare the temporel evolution and spectrogram of the input signal s (n) and the denoised signals e1 (n) obtained by each algorithm (Proposed MPPPSO, single-channel PPPSO, 2C-FNLMS and 2CVSSF). According to Fig. 5, we can easilly see that the outputs signals obtained by the proposed dual MPPPSO and 2C-VSSF algorithm are visually well denoised in comparison with those obtained by 2C-FNLMS and single-channel PPPSO algorithms. Also, when we analyse the spectrogram of the output speech signals, we can see that the one obtained by the proposed dual MPPPSO is very close to the original speech signal. However, in order to well highlight the intelligibility performance superiority and the output speech signal quality of the proposed dual MPPPSO algorithm, we have used others objectives criteria to objectively qualify the performances and then identify the problems of each algorithms. This is will be carry out in next sections.

(22)

where s (n) and e1 (n) are the original speech and the enhanced output respectively. The parameter U represents the number of samples needed to obtain the average values of the output SNR. We note that we have used a manual voice activity detector (MVAD) mechanism in all algorithms. The MVAD allows to adapt the ﬁlter w12 (n) only in speechonly periods, and the second ﬁlter w21 (n) in silence-only periods [20]. We recall that the noisy observations p1 (n) and p2 (n) are processed frame by frame in the proposed dual MPPPSO algorithm, each frame has 256 samples which corresponds to 25 ms of speech [8].

5.1. Test with white noise

5.1.2. System mismatch (SM) criterion evaluation According to Fig. 1, the output speech signal is retrieved et the output e1 (n) and it is controlled by the adaptive ﬁlter coeﬃcient w21 (n) . In this section, we will evaluate the convergence speed performance and the stready-state misalignment on these two elements, i.e. e1 (n) and w21 (n) . We use the SM criterion to evaluate the convergence speed performance of the proposed dual MPPPSO algorithm in comparsion in comparsion with single-channel PPPSO, 2C-FNLMS and 2C-VSSF

In the ﬁrst simulation, we consider the case of a white noise signal with input SNRs equal to SNR1 = SNR2 = −3dB . The used parameters values for the adjustment of the proposed dual MPPPSO, single-channel PPPSO, 2C-FNLMS and 2C-VSSF algorithm are summarized in Table 4. The time and frequency domains analysis of the output speech signal, and the SM and SegSNR criterion are evaluated in this section. 133

Applied Acoustics 141 (2018) 125–135

S. Fisli, M. Djendi

Fig. 10. Comparison of SegSNR values obtained by the Proposed MPPPSO, single-channel PPPSO, 2C-FNLMS and 2C-VSSF. The punctual acoustic noise is USASI. The ﬁlter length is L = 100 [In Top] and L = 256 [In Bottom]. [In left: input SNR1 = SNR2 = −3 dB], [In right: input SNR1 = SNR2 = 0 dB ].

output speech signal, and the SM and SegSNR criterion are evaluated in this section.

algorithms. We show in Fig. 6 the results of SM evolution of the adaptive ﬁlter coeﬃcient w21 (n) of each algorithms under diﬀerent input SNRs conditions i.e., SNR1 = SNR2 = −3dB, and SNR1 = SNR2 = 0 dB . The ﬁlter length is L = 100 [Top of Fig. 6] and L = 256 [Bottom of Fig. 6]. The obtained results conﬁrm the good behavior of the proposed dual MPPPSO in terms of misalignment steady-state minimization, and convergence speed performances in both simulation (diﬀerent SNR and L). However, we observe a degradation in the steady-state minimization in test with low input SNRs.

5.2.1. Temporel and frequency domain description of the output speech signals In Fig. 8, we show the temporal evolution of the output speech signals obtained by the proposed dual MPPPSO, single-channel PPPSO, 2C-FNLMS and 2C-VSSF. The simulation parameters of each algorithm are summarized in Table 5. The respective spectrogram of ach output speech signal are given in the same ﬁgure. One can clearly see that the denoised signals obtained by the four algorithm are close to the original signal.

5.1.3. Segmental SNR (SegSNR) criterion evaluation In this section, we have evaluated the SegSNR criterion of (21) by the four algorithms, i.e. the proposed dual-channel MPPPSO, singlechannel PPPSO, 2C-FNLMS and 2C-VSSF. The obtained results of SegSNR are reported on Fig. 7. The ﬁlter length is L = 100 [Top of Fig. 7] and L = 256 [Bottom of Fig. 7]. From this result, we can see an important improvement in terms of SegSNR obtained by the proposed dual MPPPSO in comparison with the other algorithms especially in the transient regime in both simulation (diﬀerent SNR and L).

5.2.2. System mismatch (SM) criterion evaluation In order to test the convergence speed performance of each algorithm, we have evaluated the SM criterion of each algorithm with a real USASI noise (United State of America Standard Institute, now ANSI) as a punctual noise and a real speech signal sampled at 8 kHz. The ﬁlter length is L = 100 [Top of Fig. 9] and L = 256 [Bottom of Fig. 9]. The obtained results are reported in Fig. 9. From Fig. 9, we can easilly observe the fast convergence speed and the low misalignment of the proposed dual MPPPSO algorithm in comparison with the other stateof-the-art and competitive algorithms in both simulation (diﬀerent SNR and L).

5.2. Test with USASI noise In the second simulation, we consider the case of a USASI (United State of America Standard Institute, now ANSI) noise signal used in the mixing model with input SNRs equal to SNR1 = SNR2 = −3dB . The used parameters values for the adjustment of the proposed dual MPPPSO, single-channel PPPSO, 2C-FNLMS and 2C-VSSF algorithm are summarized in Table 5. The time and frequency domains analysis of the

5.2.3. Segmental SNR (SegSNR) criterion evaluation In order to quantify the noise reduction characteristics of the proposed algorithm, we have given in Fig. 10 the SegSNR criterion values obtained by each algorithm, i.e. the Proposed MPPPSO, single-channel 134

Applied Acoustics 141 (2018) 125–135

S. Fisli, M. Djendi

PPPSO, 2C-FNLMS and 2C-VSSF. The ﬁlter length is L = 100 [Top of Fig. 10] and L = 256 [Bottom of Fig. 10]. Fig. 10 shows clearly the superiority of the proposed algorithm in terms of SegSNR and also in terms of convergence speed performance in both simulation (diﬀerent SNR and L). These results conﬁrm again that the eﬃciency of the proposed algorithm in speech enhancement applications.

cooperating agents. IEEE Trans Syst Man Cybernet B 1996;26(1):29–41. [4] Chen B-Sen, Lee Bo-Kuen, Peng S-Chueh. Maximum likelihood parameter estimation of F-ARIMA processes using the genetic algorithm in the frequency domain. Signal Process IEEE Trans 2002;50(2002):2208–20. [5] Chang CY, Chen DR. Active noise cancellation without secondary path identiﬁcation by using an adaptive genetic algorithm. IEEE Trans Instrum Meas Sept 2010;59(9):2315–27. [6] Eberhart R, Kennedy J, “A New Optimizer Using Particles Swarm Theory”, Proc. Sixth International Symposium on Micro Machine Human Science (Nagoya, Japan), IEEE Service Center, Piscataway, NJ, 1995. [7] Kunche P, Reddy KVVS. Metaheuristic Applications to Speech Enhancement. Springer Briefs in Speech: Springer International AG, avril; 2016. [8] Prajna K, Reddy KS, Sasi Bhushan Rao G, Uma Maheswari R. A comparative study of BA, APSO, GSA, hybrid PSOGSA and SPSO in dual channel speech enhancement. 2015;18:663. [9] Djendi M, Scalart P, Gilloire A. Analysis of two-sensors forward BSS structure with post-ﬁlters in the presence of coherent and incoherent noise. Speech Commun 2013;55(10):975–87. [10] Silva A, Neves A, Costa E. An empirical comparison of particle swarm and predator prey optimisation. Artiﬁcial Intelligence and Cognitive Science. AICS 2002. Lecture Notes in Computer Science. O'Neill M, Sutcliﬀe RFE, Ryan C, Eaton M, Griﬃth NJL, editors. vol 2464. Berlin, Heidelberg: Springer; 2015. [11] Higashitani M, Ishigame A, Yasuda K, “Particle swarm optimization considering the concept of predator-prey behavior”. In: Proceedings of the 2006 IEEE Congress on Evolutionary Computation. Vancouver, BC. [12] Djendi M. Advanced techniques for two-microphone noise reduction in mobile communications, Ph.D. dissertation (in French), vol. 1901. France: University of Rennes 1; 2010. [13] S. van Gerven, D. van Compernolle, Feedforward and feedback in symmetric aptive noise canceller: stability analysis in a simpliﬁed case, in Proceedings of the European Signal Processing Conference, Brussels, Belgium, August 1992, pp. 1081–1084. [14] Bendoumia R, Djendi M. Two-channel variable-step-size forward and backwardadaptive algorithms for acoustic noise reduction and speech enhancement. Signal Process 2015;108:226–44. [15] Elliot SJ, Nelson PA. Active noise control. IEEE Sig Process Mag 1993:12–35. [16] Haykin S. Adaptive Filter Theory. Englewood Cliﬀs, NJ: Prentice-Hall; 2001. [17] Chan FTS, Tiawari MK. Swarm Intelligence Focus on Ant and Particle Swarm Optimization. 1st ed. I-Tech Education and Publishing; 2007. [18] Djendi M, Scalart P, Gilloire A. “Noise cancellation using two closely spaced microphones: experimental study with a speciﬁc model and two adaptive algorithms”. In: Proc IEEE. ICASSP, vol. 3; May 2006. p. 744–7. [19] Djendi M. “Advanced techniques for two-microphone noise reduction in mobile communications”, Ph.D. dissertation (in French), vol. 1901. France: University of Rennes 1; 2010. [20] Hu Yi, Loizou PC. Evaluation of objective quality measures for speech enhancement. 2008;16:229–38.

6. Conclusion This paper deals with speech enhancement problem by adaptive ﬁltering algorithms using swarm intelligence heuristic search. In this work, we have proposed a new dual modiﬁed predator-prey particle swarm optimization algorithm for two-channel blind acoustic noise cancellation application. The comparison between four state-of-the-art algorithms, i.e. single-channel PPPSO, 2C-FNLMS, 2C-VSSF and the proposed dual MPPPSO algorithm has shown good properties of the proposed dual MPPPSO in extracting the speech signal from very noisy observations. We have tested the proposed dual MPPPSO under various noisy observations with two type of real noise to analyze and to conﬁrm its good behavior. The system mismatch evaluation and the segmental signal to noise ratio criteria have shown very close behaviors between the proposed dual MPPPSO and the 2C-VSSF with a slight superiority for the proposed dual MPPPSO when white noise is used in the mixing process. However, when we use a USASI noise in the mixing signals (i.e. when the input noise is not white), the proposed dual MPPPSO outperforms the other algorithms in term of both criteria, i.e. system mismatch and segmental signal to noise ratio. According to these results, we can say that the proposed dual MPPPSO algorithm can be a good and eﬃcient alternative for speech enhancement and noise reduction applications. References [1] Loizou PC. Speech Enhancement: Theory and Practice. 2nd edn. Boca Raton: CRC Press and Taylor and Francis Group; 2013. [2] Widrow B, Goodlin RC, et al. Adaptive noise cancelling: principles and applications. Proc IEEE 1975;63:1692–716. [3] Dorigo M, Maniezzo V, Colorni A. Ant system: optimization by a colony of

135

Blind speech intelligiblility enhancement by a new dual modified predator-prey particle swarm optimization algorithm

Blind speech intelligiblility enhancement by a new dual modified predator-prey particle swarm optimization algorithm

Recommend Documents