Computers and Electrical Engineering 38 (2012) 938–952
Contents lists available at SciVerse ScienceDirect
Computers and Electrical Engineering journal homepage: www.elsevier.com/locate/compeleceng
An efficient stabilized fast Newton adaptive filtering algorithm for stereophonic acoustic echo cancellation SAEC q Mohamed Djendi University Saad Dahleb of Blida, Signal Processing and Image Laboratory (LATSI), Route de Soumaa, B.P. 270, Blida 09000, Algeria
a r t i c l e
i n f o
Article history: Received 22 October 2011 Received in revised form 13 February 2012 Accepted 13 February 2012 Available online 15 March 2012
a b s t r a c t This paper addresses the field of stereophonic acoustic echo cancellation (SAEC) by adaptive filtering algorithms. Recently, we have proposed a new version of the fast Newton transversal FNTF algorithm for SAEC applications. In this paper, we propose an efficient modification of this algorithm for the same applications. This new algorithm uses a new proposed and simplified numerical stabilization technique and takes into account the cross-correlation between the inputs of the channels. The basic idea is to introduce a small nonlinearity into each channel that has the effect of reducing the inter-channel coherence while not being noticeable for speech due to self masking. The complexity of the proposed algorithm does not alter the complexity of the original version and is kept less than half the complexity of the fastest two-channel FTF filter version. Simulation results and comparisons with the extended two-channel normalized least mean square NLMS and FTF algorithms are presented. Ó 2012 Elsevier Ltd. All rights reserved.
1. Introduction Acoustic echo cancellers are indispensable for communication systems such as teleconferencing in order to decrease echoes which impair the quality of communications. Theoretically, stereophonic acoustic echo cancellation SAEC can be viewed as a simple generalization of the usual single-channel acoustic echo cancellation principle to the two channel case [1–3]. In SAEC, there is a desire to have far better sound quality and sound localization than what has been provided before. The improvements in quality can be achieved by increasing the signal bandwidth and also by adding more audio channels to the system. This last fact spurred the need for multi-channel acoustic echo cancellers. Two-channel SAEC is most interesting since only complexity issues differ for the more general multi-channel case. A basic scheme for SAEC is sketched in Fig. 1, where we illustrate the concept with a transmission room on the left and a receiving room on the right. The transmission room is sometimes referred to as the far-end and the receiving room as the near-end. In this figure, when we have a signal in the transmission room (that means the source send a signal, which can be a man or a woman speaker), the two microphones mic1 and mic2 receive two amounts of signal, the first received signal amount is the direct source signal modified by the path Gv1 (then captured by mic1) and GV2 (then captured by mic2), respectively. The second signal amounts is the presents diffuse noises Bv1 and Bv2 in the transmission room and captured by mic1 and mic2, respectively. We note that in this paper we do not take into account these two quantities of diffuse noises Bv1 and Bv2 and they are beyond the scope of this paper. On the other hand, in the receiving room, and as depicted in Fig. 1, the echo is due to acoustic coupling between the loud-speakers and the incorporated microphones in this room. In this scheme of Fig. 1, the acoustic echo paths Ch1 and Ch2 in m1 m2 ^ of the the local room are modeled by adaptive FIR filters h and h , from which their added outputs produces an estimate y true echo y. Indeed, the physical impulse responses Ch1 and Ch2 are of infinite length; nevertheless it is assumed that the
q
Reviews processed and approved for publication by Editor-in-Chief Dr. Manu Malek. E-mail addresses:
[email protected],
[email protected]
0045-7906/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.compeleceng.2012.02.010
939
M. Djendi / Computers and Electrical Engineering 38 (2012) 938–952
Fig. 1. Schematic diagram of a stereophonic echo canceller. Two adaptive filtering algorithms are used between Room 1 and Room 2 where: Room1 is the transmitting room and Room2 is the receiving room.
m1
m2
m1
m2
filters h and h are ‘‘sufficiently long’’, in the sense that the tails of Ch1 and Ch2 not modeled by h and h have low energy and thus can be neglected. Speaking in the sequel of ‘‘true’’ impulse responses means that we only consider the first parts of m1 Ch1 and Ch2 which contain most of the energy, and which are assumed to be of the same size L as the model filters h and m2 h . In SAEC for teleconferencing, we have a fundamental problem of the possibility to identify the true impulse responses of the acoustic echo paths. This problem arises from the correlation between the two signals picked up in the remote room in this request. SAEC is fundamentally different from traditional mono echo cancellation. A SAEC, straightforwardly implemented, not only would have to track changing echo paths in the receiving room but also in the transmission room. For example, the canceller has to converge adaptively if one talker stops talking and another starts talking at a different location in the transmission room. There is no adaptive algorithm that can track such a change sufficiently fast and this scheme therefore results in poor echo suppression. Thus, a generalization of the mono AEC in the stereo case does not result in satisfactory performance. The problems of SAEC were first described in an early paper [4], and later on in [3]. The fundamental problem is that the two channels may carry linearly related signals which in turn may make the normal equations, to be solved by the adaptive algorithm, singular. This implies that there is no unique solution to the equation but an infinite number of solutions and it can be shown that all solutions (but the physically true one) depend on the transmission room. As a result, intensive studies have been made of how to handle this properly. Generalization of the solution to the normal equations in a more practical sense was addressed in Refs. [5] and [6]. It was explained that in practice, the problem is not actually singular but extremely ill-conditioned due to the fact that the length of the adaptive filter is shorter than the echo paths of the transmission room. Furthermore, in practice, the transmission room is not completely stationary, i.e. smooth continuous changes exist, which slightly improves the situation by making the problem somewhat less ill-conditioned [7,8]. A complete theory of non-uniqueness and characterization of the SAEC solution was presented in Refs. [9] and [10]. It is shown that the only solution to the non-uniqueness problem is to reduce the correlation between the stereo signals and an efficient low complexity method for this purpose was also given in [8] and [9]. Ref. [11] presents a combination of mono and stereo echo cancellation which has the benefit of lower complexity than a pure stereo solution. Currently, attention has been focused on the investigation of other methods that decrease the cross-correlation between the channels in order to get well-behaved estimates of the echo paths [12]. The main problem is how to reduce the correlation sufficiently without affecting stereo perception and sound quality. Early examples of SAEC implementations can be found in [13–15]. These solutions were presented before the theory and limitations of SAEC were fully understood, and were mainly based on the use of a single adaptive filter for each return channel. The performance of the SAEC is strictly affected by the choice of algorithm more than in the monophonic case. This is easily recognized since the performance of most adaptive algorithms depends on the condition number of the input signal covariance matrix. We have to recall here that there are several efficient other techniques that allow to resolve these problems differently, one of these techniques is the partial update of the filter coefficients techniques as explained in [16] and [17], and the use of the two errors filtering to avoid the problem of channel coherence as described in [18]. In the SAEC application, the condition-number is very high, and algorithms such as the LMS or the NLMS that do not take the coherence between the input signals into account, converge very slowly to the theoretical solution. It is consequently very interesting to study multi-channel adaptive filtering algorithms. A framework for multi-channel adaptive filtering can be found in Refs. [2–5], [16] and [19]. In [1], we have proposed a new version of the fast Newton transversal FNTF algorithm for SAEC applications. Here, we propose an efficient modification of this algorithm for the same applications. The new proposed algorithm has good performances in SAEC case. This new algorithm takes into consideration the correlation effect of the impulse responses. We also
940
M. Djendi / Computers and Electrical Engineering 38 (2012) 938–952
propose, in this paper, a new numerical stabilization technique that allows good properties of the prediction part of the proposed algorithm even with speech signal as input. We describe the basic FNTF algorithms and its modified version and show simulation results to demonstrate the good performance properties of the proposed algorithm in SAEC applications in which the acoustic channels are highly correlated. This paper is organized as follows: Section 2 explains the SAEC problem and describes the fundamental differences between mono and stereo acoustic echo cancellation. In Section 3, we present two-channel adaptive filtering algorithms with a particular and detailed presentation for the two-channel 2CFNTF algorithm. In Section 4, we give in first, the existing decorrelating versions of the algorithms used, and then we describe the proposed modified two-channel fast Newton transversal M2CFNTF algorithm which takes into account the correlation effects of the channels. In Section 5, we present a new numerical technique that stabilizes the proposed M2CFNTF algorithm. In Section 6, we give a comparison between the proposed and others algorithms in terms of complexity. Finally, simulations results are presented in Section 7. The notations that we have used in this paper are fairly standard. Boldface symbols are used for vectors and matrices. We also have the following notations: L: length of the adaptive filter; N: length of the predictors; t: discrete time index; (.)T: transpose. 2. The stereophonic acoustic echo cancellation SAEC problem In our study, we suppose that the distant room system is stationary, linear and time invariant; we have the following relation:
ðX m1 ÞT Gm2 ¼ ðX m2 ÞT Gm1
ð1Þ
where Gm1 and Gm2 stand for the impulse responses of the source-to-microphone acoustic paths in the remote room as indicated in Fig. 1, X m1 ðnÞ and X m2 ðnÞ stand for vectors of signal samples of the microphones outputs in the same room. Now, we suppose the following recursive least square (RLS) cost function (see Fig. 1 for notations):
J L;t ¼
Xt p¼1
h i2 m1 m2 m1 m2 wtp yt ðhL;t ÞT X L;t ðhL;t ÞT X L;t
ð2Þ
The minimization of this cost function leads to the following solution [6]:
" RL;t
m1 #
hL;t
¼ r L;t
m2
hL;t
m1
ð3Þ
m2
where hL;t and hL;t are the two adaptive filters of the two channels, w (0 < w 6 1) is the exponential forgetting factor and RL,t is the correlation matrix which is given by the following expression:
"
Xt
RL;t ¼
np
w p¼1
# T m1 X L;t m2
m1 X L;t
X L;t
T R m1 m1 X X m2 ¼ X L;t RX m2 Xm1
RXm1 X m2
RXm2 X m2
ð4Þ
We note that the parameter rL,t represent the correlation vector between the input signals and the output signal in the local room and given by the following:
r L;t ¼
"
Xt
wtp yt p¼1
m1 X L;t
#
m2 X L;t
ð5Þ
Our aim is to obtain the optimum filters from Eq. (3). Now, consider the vector:
U¼
G2 G1
ð6Þ
It can be readily verified by using Eq. (1) that RL;t U ¼ 0, which means that the matrix RL;t is not invertible. Therefore, there is no unique solution to the problem of minimizing (Eq. (2)), and the adaptive algorithm drives to any of the possible solum1 m2 tions, which can be very different from the ‘‘true’’ expected solution hL;t ¼ Ch1 and hL;t ¼ Ch2 . However, in practical situation m1 m2 there are -at least- two reasons that make this matrix invertible: (1) the signals X l;t and X L;t at the outputs of the distant room m1 m2 contain noise components that are uncorrelated and (2) the filters hL;t and bfhL;t that model the impulse responses of the local m2 room are of finite length, so the size of X vL;t1 and X L;t is much smaller that the length of Gm1 and Gm2 , and the relation (Eq. (1)) is not satisfied. For this reason, the matrix RL;t becomes invertible (but is ill-conditioned because the two input signals are m1 m2 strongly correlated) and the true solution hL;t ¼ Ch1 and hL;t ¼ Ch2 can be found accordingly.
M. Djendi / Computers and Electrical Engineering 38 (2012) 938–952
941
3. The adaptive filtering algorithm In SAEC applications, we use a two-channel adaptive filter. However, there is a very important difference in performance according to the chosen algorithm. In the following, we present known adaptive algorithms which are NLMS, FTF or Fast recursive least square FRLS and 2CFNTF [1] algorithms. These algorithms are selected to be compared with the proposed and stabilized M2FNTF one. 3.1. Stereophonic version of the NLMS algorithm The stereophonic version of the NLMS algorithm which we call 2CNLMS is proposed in [5] and [20]. It is an improved version of the two-channel LMS algorithm. The stereophonic adaptive filtering formulation of this algorithm (according to Fig. 1) is given by:
"
m1 #
hL;t
" ¼
m2
hL;t
m1
hL;t1
# þ
m2
hL;t1
1 m1 m2 EL;t þ EL;t
"
m1 X m1 l1 eL;t L;t m2 X m2 l2 eL;t L;t
# ð7Þ
m1 and m2 are the filtering errors of the two channels. In where l1 , l2 are the two step sizes of the two filtering algorithms. eL;t eL;t stereophonic applications, and according to Fig. 1, these two errors are equals and given by the following expression:
m1 m2 m1 m2 m1 m2 eL;t ¼ eL;t ¼ eL;t ¼ yt ðhL;t1 ÞT X L;t ðhL;t1 ÞT X L;t
ð8Þ m1
m2
We note that the update of the normalized step size which is given by EL;t andEL;t , is exponentially smoothed. It is given by: mi Etmi ¼ CEt1 þ ð1 CÞðxtmi Þ2 with i 2 f1; 2g
ð9Þ
and C is the power smoothing factor. In stereophonic applications, the described version of the NLMS algorithm is very much penalised by the correlation of and between the inputs, and the algorithm behaves badly in this case. To overcome the problem of the correlation between the channels, another version of the NLMS algorithm has been proposed in [5]. This version will be presented in Section 4.1. 3.2. Stereophonic version of the FTF algorithm The precise description of the stereophonic version of the FTF algorithm is beyond the scope of this paper. However, a general analysis of the FTF algorithm can be found in [7] and [21] and stabilized two-channel versions are described in [5,6]. The prediction part of the basic FTF algorithm is generated twice for its two-channel version. The filtering errors are given by Eq. (8) and the update filtering equations are given by:
"
m1 #
hL;t
m2
hL;t
" ¼
m1
hL;t1 m2
hL;t1
#
" þ eL;t
m1 C ~ m1 cL;t L;t m2 C ~ m2 cL;t L;t
# ð10Þ
~ m1 , C ~ m2 , cm1 and cm2 are the two dual Kalman gains and the two likelihood variables for the two channels (see Fig. 1). where C L;t L;t L;t L;t All these parameters are provided by the prediction part of the algorithm [7,21]. 3.3. Stereophonic version of the FNTF algorithm The two-channel FNTF algorithm that we have proposed recently in [1], is a generalization of the mono-channel FNTF [22–24] algorithm to the stereophonic case when two microphones are used (see Table 1 that summarizes this algorithm). It is based on the minimization of the following cost function:
J L;t ¼
t X
m1
m2
m1 m2 2 kti ½yt ðhL;t1 ÞT X L;t ðhL;t1 ÞT X L;t
ð11Þ
i¼0
It is well established that time-recursive minimization of (Eq. (11)) leads to the following RLS set of equations: mi mi mi hL;t ¼ hL;t1 C L;t eL;t ;
i 2 f1; 2; g
ð12Þ
eL;t ¼ yt y ^t ;
ð13Þ
T m1 m2 m1 m2 ^L;t ¼ hL;t1 X L;t þ ðhL;t1 ÞT X L;t ; y
ð14Þ
mi mi 1 mi C L;t1 ¼ ðRL;t Þ X L;t1 ;
ð15Þ
i 2 f1; 2g
mi where RL;t , i 2 f1; 2g, are the L L covariance matrix of the input signal.
942
M. Djendi / Computers and Electrical Engineering 38 (2012) 938–952
Table 1 The two-channel fast Newton transversal filters 2CFNTF algorithm proposed in [1]. - Forward and backward predictors: i 2 f1; 2g
mi mi mi ~ mi C N;t1 aN;t ¼ aN;t1 eN;t mi
(16)
mi
mi ~ mi C N;t bN;t ¼ bN;t1 r N;t
(17)
- Likelihood variables of length (N): i 2 f1; 2g
mi kaN;t1 cN;t1
mi cNþ1;t ¼
(18)
aN;t mi eN;t
mi ¼ S~Nþ1;t
"
#
1
(19)
mi mi aN;t1 kaN;t1
- Dual Kalman gain of length (N + 1): i 2 f1; 2g
" mi C~ Nþ1;t ¼
#
0
(20)
mi S~Nþ1;t
mi C~ N;t1 mi cNþ1;t
mi cN;t ¼
(21)
mi mi C ~ Nþ1;mi r N;t 1 þ cNþ1;t Nþ1;t
" # mi ~ Nþ1;mi bN;t1 ~ mi U ¼ C Nþ1;t Nþ1;t 1
(22)
- Dual Kalman gain of length (N): i 2 f1; 2g
"
mi C~ N;t
#
(23)
mi ~ mi U ¼ C~ Nþ1;t Nþ1;t
0
- Extrapolation part (from N to L): i 2 f1; 2g
" mi ¼ C~ Lþ1;t
"
mi C~ L;t
0
#
mi C~ L;t1
"
# mi ¼ C~ Lþ1;t
0
"
mi S~Nþ1;t
#
0LN ~ mi U
#
(25)
Nþ1;t
mi mi mi mi ~ Nþ1;mirmi eN;t dL;t ¼ dL;t1 þ S~Nþ1;t þU Nþ1;t N;t mi cL;t ¼
(24)
0LN
1
(26)
(27)
mi 1 dL;t
- Error filtering
m1 m2 m1 m2 ~t ¼ ðhL;t1 ÞT X L;t þ ðhL;t1 ÞT X L;t y
(28) (29)
eL;t ¼ yt y ~t - Filters update: i 2 f1; 2g
mi
mi
mi ~ mi C L;t hL;t ¼ hL;t1 eL;t cL;t
(30)
In the above RLS equations, the update of the gain vector C mL;ti requires the update of the inverse covariance matrix. In the mi FNTF algorithm, this is obtained by the use of two prediction parts of order N << L moreover, instead of updating C L;t , the FNTF version that we use in this paper updates the so-called dual Kalman gain: mi mi ¼ k1 c1 C~ L;t L;t C L;t ;
i 2 f1; 2g
ð31Þ
943
M. Djendi / Computers and Electrical Engineering 38 (2012) 938–952
where cvL;ti , i 2 f1; 2g, are the likelihood variable defined as:
T
mi mi mi cL;t ¼ 1 þ C~ L;t X L;t ; i 2 f1; 2g
ð32Þ
The extended version of the FNTF algorithm to the two channel case which we propose in this paper is similar to that of the two channel FTF algorithm. The difference with the two-channel algorithms is in the calculation of the two dual Kalman ~ m1 , C ~ m2 and the two likelihood’s variables cm1 and cm2 that we use to update the adaptive filters hm1 and hm2 . gains C L;t
L;t
L;t
L;t
L;t
L;t
In the two-channel FNTF algorithm, we use for each channel (or path) two predictions part of length N. The first prediction part works on the first N samples of the input vector while the second one works on the last N samples of the same vector. To obtain the dual Kalman gains with lengths L, we use an extrapolation part. The overall equations of the 2CFNTF algorithm are given in Table 1 [1]: mi and cmi In Table 1, eN ; t mi , rN;t Nþ1;t (with i 2 f1; 2g), are respectively, the two forward and backward prediction error and the two likelihood variables, of order (N + 1), of the 2CFNTF algorithm. All of these quantities are provided by the two prediction parts of the 2CFNTF algorithm. 4. The correlation effect on the algorithms As we have explained before, the SAEC can be viewed as a straightforward generalization of the single-channel acoustic echo cancellation principle [2]. Fig. 1 shows this technique for one microphone in the receiving room (which is represented by the two echo paths, hv1 and hv2, between the two loudspeakers and the microphone). The two reference signals, Xv1 and Xv2, from the transmission room are obtained by two microphones in the case of teleconferencing. These signals are derived by filtering from a common source, which gives rise to a non-uniqueness problem that does not arise for the single-channel AEC [2,3]. As a result, conventional two-channel least-mean-square (LMS) type adaptive algorithms converge very slowly to the solution and the two-channel FRLS algorithm must be used. This requirement implies a high level of computational complexity, so that a real-time implementation of this algorithm is difficult. One solution to this problem with the LMS algorithm has been proposed in [3–5] and [8]. This solution is well detailed in [5] and called the extended two-channel LMS (E2CLMS) algorithm. 4.1. The extended two-channel least mean square (E2CLMS) algorithm From [5], we recall here the E2CLMS algorithm formulation, which is a direct approximation of the RLS algorithm. The error signal at time t is given by:
eL;t ¼ yt y1;t y2;t
ð33Þ
where
T vi mi yi;t ¼ hL;t1 X L;t
with i 2 f1; 2g
ð34Þ
The two filters are updated as follows:
h i m1 m1 m1 m2 hL;t ¼ hL;t1 þ l1 r 1 X L;t qr12 r1 1 22 X L;t eL;t
ð35Þ
h i m2 m2 m2 m1 hL;t ¼ hL;t1 þ l2 r 1 X L;t qr21 r1 2 11 X L;t eL;t
ð36Þ
T m1 m1 rij;t ¼ X L;t X L;t ; with fi; jg ¼ f1; 2g and
ð37Þ
h i 2 ri;t ¼ r ii;t 1 q2 kt ; with i 2 f1; 2g
ð38Þ
r 12;t kt ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi r 11;t r22;t
ð39Þ
with
where
is the cross-correlation coefficient. We suppose in the following that 0 < l1 < 1, 0 < l2 < 1 and 0 < q < 1. This algorithm is interesting because it introduces the cross-correlation coefficient between the two input signals.
944
M. Djendi / Computers and Electrical Engineering 38 (2012) 938–952
4.2. The new Modified two-channel M2CFNTF algorithm: First modification The fast Newton algorithms family is known by its prediction part that allows fast convergence. This desirable characteristic is altered in the stereophonic applications because of the strong correlation between the inputs. In order to improve the behaviour of the 2CFNTF algorithm, we have modified this algorithm based on the following analysis: We know that the correlation magnitude between two processes is equal to 1 if and only if they are linearly related (a linear relation between the signals), and this is what happens in the stereophonic case. In order to decrease this relation, some nonlinear or time-varying transformation of the stereo channels has to be made [8,9]. Such a transformation reduces the correlation and hence the condition number of the covariance matrix, thereby improving the output means-square error MSE power. This transformation has to be performed cautiously so that it is inaudible and has no effect on stereo perception. A simple non-linear method that gives good performance uses a half-wave rectifier [9,16], which is given by the following equation: mi mi 0 X L;t þ jX L;t j mi mi X L;t ¼ X L;t þb ; with i 2 f1; 2g 2
ð40Þ
For this method there still may be a linear relation between the non-linearly transformed channels, for example when m2 1 m2 X mL;t1 P 0 and X L;t P 0 or if we have bX mL;t s1 ¼ X L;ts2 P 0 with b > 0. In practice however, these cases never occur because we always have zero-mean signals and Gm1 ; Gm2 are never related by just a simple delay. In this paper, we proposed to use an improved version of this technique based on the use of positive and negative half-wave rectifiers on each channel with two introduced parameters a1 and a2. These new parameters (a1 and a2) allow to control the degrees of linearity that decorrelates the input. The new proposed inputs are given, respectively: m1 m1 0 X L;t þ a1 jX L;t j m1 m1 X L;t ¼ X L;t þb 2
ð41Þ
m2 m2 0 X L;t þ a2 jX L;t j m2 m2 ¼ X L;t þb X L;t 2
ð42Þ
This principle removes the linear relation in the special signal cases given above and does not alter the stereo perception even with b as large as 0.5 and a1 = 0.75, a2 = 0.85. 5. New numerical stabilization technique of the new M2CFNTF algorithm: Second modification In [1], we have adapted then generalized a new numerical stabilization method proposed recently in [25], to the twochannel 2CFNTF algorithm. We recall that this technique is inspired form the work in [26–28]. In this paper, we propose a new version of this technique to be used with the proposed algorithm. The most important difference between these two stabilization techniques lies in the calculation of the a priori backward prediction error in the two prediction parts of the 2CFNTF algorithm [1]. The forward prediction error variance aN;t was used alone in the computation of the a priori backward prediction error according to (we discard the use of the backward variance bN;t ):
rpN;t ¼ kNþ1 aN;t C~ Nþ1 Nþ1;t
ð43Þ
and thus avoids computation of the backward variance bN;t which is numerically unstable. To stabilize the new algorithm, we have generalized then applied this method to the two-channel case for the M2CFNTF algorithm as follows (i.e. two a priori backward prediction errors were used to compute the likelihood variable and the backward predictor). Hence, we define m1 2 control variables nN;t (for the first channel) and nmN;t (for the second channel). These two parameters are null theoretically and are given by: ðCÞ;m1
lm1r N;t1
ðCÞ;m2
lm2r N;t1
m1 nN;t ¼ r N;t m2 nN;t ¼ r N;t
ðf Þ;m1
ð44Þ
ðf Þ;m2
ð45Þ
with:
T m1 m1 m1 m1 rðCÞ X N;t N;t ¼ xtN bN;t1
ð46Þ
T m2 m2 m2 m2 rðCÞ X N;t N;t ¼ xtN bN;t1
ð47Þ
ðf1 Þ;m1 m1 m1 ~ Nþ1;m1 rN;t ¼ kN cNþ1;t aN;t C Nþ1;t
ð48Þ
and:
M. Djendi / Computers and Electrical Engineering 38 (2012) 938–952 ðf1 Þ;m2 m2 m2 ~ Nþ1;m2 rN;t C Nþ1;t ¼ kN cNþ1;t aN;t
945
ð49Þ
We note that the scale parameters lm1 and lm2 control propagation of the numerical errors in the new algorithm. For the choice of the forgetting factor k values, it can be easily shown that the backward predictor is numerically stable (i.e. v1 m2 fnmN;t ; nN;t g ¼ f0; 0g) under the following condition.
N þ 1:25
ð50Þ
In practice, the numerical stabilization parameters lm1 and lm2 are chosen by simulation. The values of these two coefficients must be selected between 0 and 1 (0 6 lm1 6 1 and 0 6 lm2 6 1). 6. Computational complexity In the computational complexity study of the proposed algorithms, we only take into account multiplication operations. The computational complexity of the fast version of the 2CFNTF algorithm, listed in Section 3.3 and proposed in [1], is 4L+24N multiplications (see Table 2), 4L multiplications for the filtering parts and 24N multiplications for the predictions parts. Here, the complexity is given for the two-channel case (for more details, see Fig. 1). The complexity of the 2CNLMS is 4L and that of the 2CFTF it is 14L. The 2CFNTF algorithm becomes a very interesting algorithm when we choose a very small length N for its prediction part. For example, in SAEC applications when the input is speech, the 2CFNTF [1] and the proposed M2CFNTF algorithms will be good candidates because we can use their prediction part with low values for N (between 12 and 20 coefficients). This leads to a complexity for the 2CFNTF and the M2CFNTF algorithms close to that of the 2CNLMS (4L), which is very small in comparison with that of the 2CFTF algorithm (14L). In Table 2, we prove that the proposed M2CFNTF is the best algorithm according to its complexity and also to its good behaviour brought by the two introduced techniques that improve its convergence speed and tracking in SAEC applications. This advantage of low complexity of the proposed M2CFNTF algorithm allows to use it in real time applications and, can easily be implemented on special fieldprogrammable gate array FPGA algorithms and or on special digital signal processor DSP for real-times applications. 7. Experimental results 7.1. Description of the signals used in simulations In this simulation, we have conducted two kinds of experiments according to the signals used. The first is done with stationary USASI noise signals (speech-like spectrum). This signal is real and used in simulations to test the convergence speed of algorithms) sampled at 16 kHz. The second experiment is realized with non-stationary signals (real speech signals) sampled at 16 kHz. We have used two speech samples as signal sources shown in Fig. 2(male) and Fig. 3(female). These signals will be convolved with two real acoustic impulse responses (of the same room) measured in a real stationary regime (see Figs. 4–7). The two impulse responses of Figs. 4 and 5 correspond to a small room, whereas the impulse responses of Figs. 6 and 7 are measured in a large audio-conference room. All these signals are sampled at 16 kHz. To evaluate the performance of each used algorithm, the following mean square error (MSE) criterion, expressed in dB, was used:
MSEðtÞ ¼ 10 log10 heL;t i
ð51Þ
where <⁄> denotes short-time averaging over 256 samples. 7.2. Convergence speed test In order to evaluate the convergence speed performance of the proposed and presented algorithms 2CFNTF, M2CFNTF, 2CFTF and E2CLMS in a stationary mode, we have used the real impulse responses of acoustic channels of Figs. 4 and 5, truncated at 256 coefficients. The input signals are a united state (USASI) stationary noise (This signal is very interesting because it has a stationary speech-like spectrum). We have used the criteria given by (Eq. (51)) to objectively evaluate the
Table 2 Computational cost of the 2CNLMS, 2CFTF, 2CFNTF, E2CLMS and the proposed M2CFNTF algorithms (L: Adaptive filter order, N: forward and backward predictors order). Algorithms
Computational cost
2CNLMS 2CFTF 2CFNTF E2CLMS M2CFNTF
4L multiplications 14L multiplications 4L + 24N multiplications 5L multiplications 4L + 24N multiplications
946
M. Djendi / Computers and Electrical Engineering 38 (2012) 938–952
4
x 10
4
3
Amplitude
2 1 0 -1 -2 -3
0
1
2
3
4
5
Samples
6
x 10
4
Fig. 2. Speech signal 1 (male).
4
x 10
4
3
Amplitude
2 1 0 -1 -2 -3
0
1
2
3
4
5
Samples
6 x 10
4
Fig. 3. Speech signal 2 (female).
4000 3000
Amplitude
2000 1000 0 -1000 -2000 -3000
0
200
400
600
800
1000
1200
Samples Fig. 4. Acoustic impulse response 1.
convergence speed performance of the algorithms referred above. In this experiment, the signal to noise ratio is taken SNR = 90 dB. The MSE criterion results, for the transient regime, of this experiment are shown on Fig. 8.
947
M. Djendi / Computers and Electrical Engineering 38 (2012) 938–952
4000 3000
Amplitude
2000 1000 0 -1000 -2000 -3000
0
200
400
600
800
1000
1200
Samples Fig. 5. Acoustic impulse response 2.
4
x 10
4
3
Amplitude
2 1 0 -1 -2 -3
0
1000
2000
3000
4000
5000
6000
7000
Samples Fig. 6. Acoustic impulse response 3 (Audio-conference room).
From this Fig. 8 we can see well that the proposed M2CFNTF algorithm has the best performances in term of convergence speed in transient phase because it cancels the stereophonic acoustic echo SAE just after 4000 iterations. The two others 2CFTF and 2CFNTF algorithms need 7780 iterations to achieve convergence and cancel SAE. For the E2CNLMS algorithm, it needs more iteration to achieve convergence (50,000 iterations). This good performance in the transient phase of the proposed M2CFNTF algorithm is due to the introduced modification that allows decorrelation of the inputs of each SAEC filter. We have also noted the superiority in term of convergence speed performance of the proposed M2CFNTF algorithm over E2CNLMS even if it takes in consideration the coherence of the inputs. According to this experiment and the obtained result, we confirm that the proposed M2CFNTF algorithm is the best candidate for SAEC applications. In order to test the performance of the proposed M2CFNTF algorithm in permanent regime, we have performed a simulation with the same algorithms and parameters as in Fig. 8. The differences between these two experiments are: (i) the output signal to noise ration SNR is 60 dB and, (ii) we have made a jump at bloc 150 (each bloc contains 256 samples) to test the convergence performances of each algorithm in the steady-state or permanent regime. The obtained results of this experiment are reported on Fig. 9. From this Fig. 9, we can easily deduce that the proposed M2CFNTF algorithm keeps the best performance in transient and steady-state or permanent regime even with noisy inputs (60 dB). For this experiment, M2CFNTF algorithm needs only 2560 iterations to achieve convergence, for the FTF and the FNTF algorithms they need about 7000 and 7600 iterations, respectively. The same number of iterations is needed for the three algorithms in the permanent regime except for the E2CNLMS algorithm that needs about 30,000 and 13,000 iterations to achieve convergence in transient and permanent regimes, respectively. In this experiment, we can also see that the MSE values in the steady state is toward 60 dB which is the same value of
948
M. Djendi / Computers and Electrical Engineering 38 (2012) 938–952
3
x 10
4
2
Amplitude
1
0
-1
-2
-3
0
1000
2000
3000
4000
5000
6000
7000
Samples Fig. 7. Acoustic impulse response 4 (Audio-conference room).
Fig. 8. Convergence performance of algorithm E2CLMS, 2CFTF, 2CFNTF (M2CFNTF with b ¼ 0) and the proposed M2CFNTF algorithm with b ¼ 0:5, a1 = 0.75 and a2 = 0.85. Input signal: USASI noise, L1 = L2 = 256. Output SNR = 90 dB.
the output SNR added to the echo signal at the output. From this result, we conclude that the proposed M2CFNTF algorithm confirm its good convergence characteristic in all the phases of convergence, and keeps its superiority in term of convergence properties (in transient and steady-state regime) over the other algorithms even in more noisy conditions.
7.3. Stereophonic acoustic echo cancellation with the proposed algorithm 7.3.1. First test with small values of the adaptive filters In this section, two experiments are carried out with algorithms (E2CLMS, 2CFTF, 2CFNTF and M2CFNTF algorithms). In this first experiment, we have used the speech signal of Fig. 2 convolved with the two real impulse responses of Figs. 4 and 5. The adaptive filters length is 1024 and the output SNR is chosen equal to 60 dB. The result of this experiment is shown in Fig. 10. On all Figures in this section, we have shown the echo to be cancelled and the temporal MSE criterion evolution for each algorithm. We note that the proposed 2CFNTF and its improved version M2CFNTF shown good SAEC performance.
M. Djendi / Computers and Electrical Engineering 38 (2012) 938–952
949
Fig. 9. Convergence performance of Algorithm E2CLMS, 2CFTF, 2CFNTF (M2CFNTF with b ¼ 0) and the proposed M2CFNTF algorithm with b ¼ 0:5, a1 = 0.75 and a2 = 0.85. Input signal: USASI noise. L1 = L2 = 256; N1 = N1 = 20. Output SNR = 60 dB. There is a jump at block 150 (each block contains 256 samples).
Fig. 10. Temporal evolution of the MSE for the E2CLMS, 2CFTF, 2CFNTF (M2CFNTF with b ¼ 0) and the proposed M2CFNTF algorithms with b ¼ 0:5, a1 = 0.75 and a2 = 0.85. Input signal: first speech signal of Fig. 2, L1 = L2 = 1024; l1 = l2 = 0.5, k = 0.99985; lv1 ¼ lv2 ¼ 0:85; N1 = N2 = 32; output SNR = 60 dB.
We also note that the convergence speed performance of the proposed M2CFNTF algorithm is much higher than that of the other algorithms. We also observed that the convergence speed of the 2CFNTF and 2CFTF algorithms are close to each other and keep the same good behaviour characteristics, even in the SAEC domain. We conclude that the introduced modification of the M2CFNTF algorithm has improved its convergence speed in this type of application. This good characteristic of the modified algorithm is due to: (1) the prediction parts of this algorithm that uncorrelate the inputs and, (2) the efficient technique of (Eq. (41)) and (Eq. (42)) that improves the prediction part behaviour of this algorithm and avoids cross-correlation of the inputs. In this second experiment, we use the speech signal of Fig. 3 convolved with the two channels of Figs. 6 and 7. The filters length is 1024 and the output SNR is chosen equal to 60 dB. The result of this experiment is given by Fig. 11. From this figure, we note the same behavior of each algorithm in the SAEC application as that in the first experiment. This result confirms the good performance of the M2CFNTF algorithm in the SAEC field. In the two experiences presented above, we have shown
950
M. Djendi / Computers and Electrical Engineering 38 (2012) 938–952
Fig. 11. Temporal evolution of the MSE for the E2CLMS, 2CFTF, 2CFNTF (M2CFNTF with b ¼ 0) and the proposed M2CFNTF algorithms with b ¼ 0:5, a1 = 0.75 and a2 = 0.85. Input signal: second speech signal of Fig. 3: L1 = L2 = 1024, l1 = l2 = 0.5, k = 0.99985, lv1 ¼ lv2 ¼ 0:85, N1 = N2 = 32; output SNR = 60 dB.
Fig. 12. Temporal evolution of the MSE for the E2CNLMS, 2CFTF, 2CFNTF (M2CFNTF with b ¼ 0) and the proposed M2CFNTF algorithms with b ¼ 0:5, a1 = 0.75 and a2 = 0.85. Input signal: first speech signal1 of Fig. 2; L1 = L2 = 2200; l1 = l2 = 0.5, k = 0.99998; lv1 ¼ lv2 ¼ 0:85; N1 = N2 = 64; output SNR = 60 dB.
m1
m2
experimentally that in real situations, i.e. when the adaptive filters h and h are of finite length and when there are nonzero, uncorrelated noisy components Bm1 and Bm2 in the input signals X m1 and X m2 , the true responses Ch1 and Ch2 (see Fig. 1) can be identified and that the adaptive filters do converge towards this true solution in the MSE sense. We have also noted that the final MSE values of each algorithm are toward the SNR input (60 dB). 7.3.2. Second test with a large values of the adaptive filters In this experiment, we use the same parameters of Fig. 10 (i.e. we have used the speech signal of Fig. 2 convolved with the two real impulse responses of Figs. 4 and 5; the output SNR is chosen equal to 60 dB). The new lengths of the adaptive filters are L1 = L2 = 2200 and that of the prediction parts is N4 =N2 = 64. Simulation results are exposed on Fig. 12.
M. Djendi / Computers and Electrical Engineering 38 (2012) 938–952
951
Fig. 13. Temporal evolution of the MSE for the E2CNLMS, 2CFTF, 2CFNTF (M2CFNTF with b ¼ 0) and the proposed M2CFNTF algorithms with b ¼ 0:5, a1 = 0.75 and a2 = 0.85. Input signal: second speech signal 2 of Fig. 3; L1 = L2 = 2200; l1 = l2 = 0.5, k = 0.99998; lv1 ¼ lv2 ¼ 0:85; N1 =N2 = 64; output SNR = 60 dB.
From this figure (Fig. 12.), we note the same behavior of each algorithm in the SAEC application as in the first experiment of Fig. 10. The obtained results in this experimentation confirm the good performance of the proposed algorithm in the SAEC applications even with large values of the adaptive filters. In this second experiment that its simulation results are shown in Fig. 13, we have used the same simulation parameters as in Fig. 11 (i.e. the speech signal of Fig. 3 is convolved with the two channels of Figs. 6 and 7, and the output SNR is chosen equal to 60 dB). Only two parameter values have changed: the adaptive filters length L1 = L2 = 2200; and the prediction parts length N4 =N2 = 64, this last parameter is used in 2CFTF, 2CFNTF and the proposed M2CFNTF algorithms. From the result of this experiment, that reported on Fig. 13, we can see well that the proposed M2CFNTF algorithm confirms its good performance proprieties with non stationary signals like speech over the other algorithms. We have also noted the bad convergence speed performance of the E2CLMS algorithm even if it takes into consideration the correlation effect of the inputs. This is due to the adaptive filters lengths that decrease the convergence speed of this algorithm because it requires a long time to converge to the real filters (solutions) in the MSE sense. Finally, and from the entire carried out test in this Section 7, we have exposed the best performance of the proposed M2CFNTF algorithm over all the used algorithms in transient phase and steady state, with stationary and non stationary signals, and with large and small lengths of the adaptive filters. All these tests and experiments qualify the proposed algorithm to be a good candidate for real-time applications like SAEC.
8. Conclusion In this paper, we have derived a modified two-channel version of the fast Newton transversal filter FNTF algorithm. We have compared the performances of the M2CFNTF and the 2CFNTF algorithms with two two-channel adaptive filtering algorithms (the E2CLMS and the 2CFTF algorithms). Simulation results have shown similar performances for the proposed 2CFNTF algorithm [1] in term of convergence speed and tracking ability, with the 2CFTF algorithm in SAEC applications. We have also noted the superiority in term of convergence speed performance of the proposed M2CFNTF algorithms over the other algorithms. This is due to the introduced modification on this algorithm that increases the performance of this algorithm even with much correlated input signals. We have observed that the proposed M2CFNTF and 2CFNTF algorithms have computational complexities which are very close to that of the two-channel 2CNLMS algorithm when the length of the backward and the forward predictors is chosen smaller in comparison with the adaptive filters length. This choice can be valid when we have a speech signal at the input of the M2CFNTF and 2CFNTF algorithms. It has also been observed that the performance properties of the M2CFNTF algorithm (convergence speed, tracking ability and complexity) combine the advantages of the E2CLMS (complexity) and the 2CFTF algorithm (convergence speed and tracking ability). It should also be noted that no numerical divergence problems were experienced in the simulations. This is due to the robustness of the proposed numerical stabilization technique. We conclude that the new M2CFNTF algorithm has shown high performances with very small complexities. This M2CFNTF algorithm version is a very interesting candidate for SAEC applications.
952
M. Djendi / Computers and Electrical Engineering 38 (2012) 938–952
Acknowledgements The author, Dr Mohamed DJENDI, would like to thank the anonymous reviewers for the useful comments that they provided and their overall objective recommendations, which have largely improved the paper. References [1] Djendi M, Guessoum A. A new fast Newton-type adaptive filtering algorithm for stereophonic acoustic echo cancellation (SAEC). Int J Adaptive Control Signal Process 2010;35(June):435–44. [2] Sondhi MM. An adaptive echo canceller. Bell Sys Tech J 1967;XLVI:497–510. [3] Sondhi MM, Morgan DR, Hall JL. Stereophonic acoustic echo cancellation, an overview of the fundamental problem. IEEE Signal Process Lett 1995;2:148–51. [4] Sondhi MM, Morgan DR. Acoustic echo cancellation for stereophonic teleconferencing. Proceedings of IEEE ASSP workshop Appls. Signal Processing Audio Acoustics, 1991. [5] Benesty J, Amand F, Gilloire A, Grenier Y. Adaptive filtering algorithms for stereophonic acoustic echo cancellation. Proceedings of IEEE ICASSP. 1995, pp. 3099–3102. [6] Amand F. Annulation d’Echo Multi-voies et Application à la Teléconférence à Haute Qualité. Ph.D. thesis, Lannion, 1996. [7] Shimauchi S, Makino S. Stereo projection echo canceller with true echo path estimation. Proceedings of IEEE ICASSP. 1996; pp. 3059–3062. [8] Makino S, Strauss K, Shimauchi S, Haneda Y, Nakagawa A. Subband stereo echo canceller using the projection algorithm with fast convergence to the true echo path. Proceedings of IEEE ICASSP. 1997; pp. 299–302. [9] Benesty J, Morgan DR, Sondhi MM. A better understanding and an improved solution to the problems of stereophonic acoustic echo cancellation. IEEE Trans Speech Audio Process 1998;6:156–65. [10] Benesty J, Morgan DR, Sondhi MM. A better understanding and an improved solution to the problems of stereophonic acoustic echo cancellation. Proceedings of IEEE ICASSP. 1997; pp. 303–306. [11] Benesty J, Morgan DR, Sondhi MM. A hybrid mono/stereo acoustic echo canceler. IEEE Trans Speech Audio Process 1998;6:468–75. [12] Gansler T, Eneroth P. Influence of audio coding on stereophonic acoustic echo cancellation. Proceedings of IEEE ICASSP. 1998; pp. 3649–3652. [13] Hirano A, Sugiyama A. A new multi-channel echo canceller with a single adaptive filter per channel. Proceedings of IEEE ICASSP. 1992; pp. 1922–1925. [14] Minami S. A stereophonic echo canceler using single adaptive filter. Proceedings of IEEE, ICASSP. 1995; pp. 3027–3030. [15] Gilloire A, Turbin V. Using auditory properties to improve the behavior of stereophonic acoustic echo cancellers. In Proceeding, IIEEE ICASSP, 1998, pp. 3681–3684. [16] Emura S, Haneda Y, Kataoka A, Makino S. Stereo echo cancellation algorithm using adaptive update on the basis of enhanced input-signal vector. In Signal Process 2006;86(7):1157–67. [17] Mayyas K. Low complexity LMS-type adaptive algorithm with selective coefficient update for stereophonic acoustic echo cancellation. Computers Electric Engineer May 2009;35(3):450–8. [18] Nguyen-Ky T, Leis J, Xiang W. An improved new error estimation algorithm for optimal filter lengths for stereophonic acoustic echo cancellation. Computers Electric Engineer 2010;36(4):664–75. [19] Benesty J, Duhamel P, Grenier Y. Multi-channel adaptive filtering applied to multi-channel acoustic echo cancellation. Proceedings of EUSIPCO 1996; pp. 1405–1408. [20] S. Haykin, Adaptive Filter Theory. 3rd ed. Prentice Hall International; 1996 [chapter 9]. [21] Cioffi J, Kailath T. Fast recursive least squares transversal filters for adaptive filtering. IEEE Trans Acoust Speech Signal Process ASSP-32 1984:304–37. [22] Djendi M, Bouchard M, Guessoum A, Benallal A, Berkani D. Improvement of the convergence speed and the tracking ability of the fast Newton type adaptive filtering(FNTF) algorithm. Signal Process 2006;86(7):1704–19. [23] Moustakides GV, Theodoridis S. Fast Newton transversal filters, a new class of adaptive estimation algorithms. IEEE Trans Signal Process October 1991;39(10):2184–93. [24] Petillon T, Gilloire A, Theodoridis S. The fast newton transversal filters: An efficient scheme for acoustic echo cancellation in mobile radio. IEEE Trans Signal Process 1994;42(3):509–18. [25] Ykhlef F, Arezki M, Guessoum A, Berkani D. Adaptive noise reduction using numerically stable fast recursive least squares algorithm. Int J Adaptive Control Signal Process 2007;21:354–74. [26] Benallal A, Gilloire A. A new method to stabilize fast RLS algorithms based on a first-order model of the propagation of numerical errors. In Proceedings of the IEEE ICASSP 1988 Conference, New York, April 1988, pp. 1373–1376. [27] Regalia PA. Numerical stability issues in fast least-squares adaptation algorithms. Opt Eng 1992;31:1144–52. [28] Slock DTM, Kailath T. Numerically stable fast transversal filters for recursive least squares adaptive filtering. IEEE Trans Signal Process 1991;39(1):92–114. Mohamed Djendi received the DEUA, Eng. state, and M.Sc. degrees from Blida University of Science and Technology, Algeria, in 1994, 1997 and 2000, respectively, all in Electrical Engineering, communications and control. He received his first Ph.D degree in Electronics-signal and communications from ENP School of Algiers, Algeria, in 2006. In 2010, He received a second Ph.D degree in signal processing and telecommunications from the University of Science and Technology of Rennes, France. Currently, he is a full Professor at Blida University. In 2011-12, he holds a Postdoctoral position at University of Rennes—IRISA/ENSSAT. His fields of interest are speech and signal enhancement, adaptive filtering, SAEC, BSS and DSP for communications.