Speech enhancement using MMSE estimation of amplitude and complex speech spectral coefficients under phase-uncertainty

Accepted Manuscript Speech Enhancement using MMSE estimation of Amplitude and Complex speech spectral coefficients under phase-uncertainty Ravi Kumar...

Download PDF

945KB Sizes 10 Downloads 180 Views

Report

PDF Reader
Full Text

Accepted Manuscript

Speech Enhancement using MMSE estimation of Amplitude and Complex speech spectral coefficients under phase-uncertainty Ravi Kumar Kandagatla , P.V. Subbaiah PII: DOI: Reference:

S0167-6393(16)30357-0 10.1016/j.specom.2017.11.001 SPECOM 2497

To appear in:

Speech Communication

Received date: Revised date: Accepted date:

9 December 2016 2 November 2017 2 November 2017

Please cite this article as: Ravi Kumar Kandagatla , P.V. Subbaiah , Speech Enhancement using MMSE estimation of Amplitude and Complex speech spectral coefficients under phase-uncertainty, Speech Communication (2017), doi: 10.1016/j.specom.2017.11.001

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Speech Enhancement using MMSE estimation of Amplitude and Complex speech spectral coefficients under phase-uncertainty

CR IP T

Title/ Author/ ABSTRACT page  Name of the Authors 1) RAVI KUMAR KANDAGATLA 2) P. V. SUBBAIAH

 Title: Speech Enhancement using MMSE Estimation of Amplitude and Complex Speech Spectral Coefficients under Phase-Uncertainty

Corresponding / First Author:

AN US

 Affiliation(s) and address(es) of the author(s)

RAVI KUMAR KANDAGATLA

M

Assistant Professor

Laki reddy Bali reddy Engineering College

ED

Mylavaram

Pin code: 521230

PT

Krishna District Andhra pradesh

CE

India

AC

Email Id: [email protected]

Contact No: 9052544504

Co-Author: Dr. P.V. SUBBAIAH Professor of ECE Velagapudi Siddhartha Engineering College Kanuru Vijayawada

ACCEPTED MANUSCRIPT Krishna District

Andhra Pradesh  E-mail address of the corresponding author: [email protected]

M

AN US

CR IP T

Abstract: Traditional speech enhancement algorithms are based on amplitude only processing, in which the amplitudes of speech are processed and phase is left unprocessed. Recently, Short Time Fourier Transform (STFT) based single channel speech enhancement algorithms are developed by considering prior knowledge of phase and its uncertainty. The uncertain knowledge of the phase is obtained from the phase reconstruction algorithms. The goal of this paper is twofold. One is deriving Joint Minimum Mean Square Error (MMSE) estimate of Complex speech coefficients given Uncertainty Phase (CUP) by assuming the speech coefficients as Nagakami, Gamma and noise distribution as Generalized Gamma distribution (GGD). Also estimators of type, Amplitudes given Uncertainty Phase (AUP), which uses uncertain phase only for amplitude estimation and not for phase improvement are derived. Also Novel Phase- blind estimators are developed using Nagakami PDF / Gamma as speech priors and Generalized Gamma as Noise Prior. Finally comparison of all estimators using uncertain prior phase information is discussed and how initial phase information affects the enhancement process is analyzed with novel estimators. The proposed CUP estimator outperforms the existing algorithms in terms of objective performance measures Segmental Signal to Noise Ratio (SSNR), Phase Signal to Noise Ratio (PSNR), Perceptual Evaluation of Speech Quality (PESQ), and Short Time Objective Intelligibility (STOI). Secondly, a combination of statistical based approach and Non-negative Matrix Factorization (NMF) based speech enhancement technique, in which bases are update on-line is discussed . The proposed estimator gain is used with NMF and analyzes its performance using PESQ measure.

AC

CE

PT

ED

Index Terms / Keywords—Speech Enhancement, Von misses distribution, Generalized Gamma distribution, Phase Uncertainty, Non-negative Matrix Factorization.

I. INTRODUCTION

In mobile communication, the background noises (Train, Car, Cockpit...) shows adverse effect on transmitted clean speech signal. The goal of speech enhancement is to improve the quality and intelligibility of degraded speech signal by background noise. To challenge background noises, various speech enhancement techniques based on filtering approach, sub-space approach, statistical based approach have been proposed. Statistical-based speech enhancement algorithms which uses Bayesian estimators plays important role. In statistical-based estimators, the clean speech signal is estimated by

ACCEPTED MANUSCRIPT assuming priors for Discrete Fourier Transform (DFT) coefficients of speech and noise components. Bayesian estimators estimate either complex speech spectral coefficients, let it be S or estimate the realvalued clean speech amplitude, let it be A. Different estimators are proposed by assuming Gaussian and non-Gaussian PDFs as speech priors [1-4], also estimators which incorporates compressed amplitude [5,6] (for better perceptual enhanced speech) are proposed.

CR IP T

Most of the traditional speech enhancement methods processes spectral amplitudes [7], while the noisy phase is kept unprocessed. However, recent research, in [8-11], shows that the performance of enhancement algorithms may improve if the phase of clean speech is known. Estimation of clean speech phase is possible using clean speech amplitudes as in [11]. It is noted that better performance is obtained by incorporating the clean speech phase in the estimator proposed by Gerkman [12]. It is noted in [12] that incorporation of clean speech phase leads to improvement in Perceptual Evaluation of Speech Quality of 0.35 than to Wiener filter. In [13] it was proposed that the complex speech coefficients can be modelled as circular symmetric probability density function, as this assumption leads to uniform distribution of phase. By using the estimated clean speech phase as the prior knowledge of uncertain phase, CUP estimators with given STFT phase are derived in [14].

CE

PT

ED

M

AN US

The DFT coefficients of clean speech signal are assumed as Gaussian as in [15] and this assumption holds good for long duration analysis of frames. But the speech DFT coefficients are analyzed using short windows of around 20 ms – 30 ms. For short duration analysis, the assumption of superGaussian distributions provides better fit for speech priors. The Gamma prior assumption for Speech DFT coefficients, provide smaller Kullback - Leibler (KL) divergence. It is noted that improved results are obtained using Laplacian and Gamma as speech priors [16, 17]. With the motivation of this, in this paper, CUP-GG estimator (GG- (G)amma and (G)eneralized Gamma) which uses Gamma for Speech prior and Generalized Gamma as noise priori is proposed [18-20]. Also it is noted that the Nagakami distribution as speech prior is able to preserve speech spectral components and hence, here one more estimator CUP-NG ( (N)agakami and (G)eneralized Gamma) uses Nagakami as speech prior and Generalized Gamma as noise priori is proposed . In [12], CUP estimators are extended for different non-Gaussian distributions as noise priors. Bayesian estimators utilize Maximum Likelihood (ML), (Maximum APosteriori) MAP, Minimum Mean Square Estimation (MMSE). Among all the Bayesian estimators, MMSE estimators are able to preserve first harmonics well and hence, the proposed work uses estimation under MMSE criteria. Bayesian estimation of clean speech magnitude coefficients which utilizes MMSE criteria and compression function (for better perceptual results) is proposed in [5]. It is further extended in [12] by considering the estimated clean speech phase.

AC

The proposed CUP estimators uses non-Gaussian distribution as noise prior because it is noted in [19] that environmental noise is non-Gaussian and also provides less annoying of residual noise. At low frequenies the KL divergence error keeps smooth and suggests the Gamma model is the best assumption for noise prior. It is noted that the KL divergence error of the Rayleigh model is smaller and keeps smooth at the high frequency region. To have advantage in all frequency regions the gamma distribution is modified as Generalized Gamma Distribution (GGD), and analyzed different estimators using GGD as noise prior. In this work, estimators of complex-valued clean speech components S and estimators of clean speech spectral amplitudes A, (A is the Magnitude of S) are discussed. Optimal estimators like Wiener

ACCEPTED MANUSCRIPT filter in Gaussian sense and Short-Time Spectral Amplitude (STSA) as the optimal estimator of A, modifies only the spectral amplitudes and the phase is kept unchanged. Several advanced estimators which use a compression parameter  for perceptually beneficial is incorporated in super-Gaussian

CR IP T

estimators. With a motivation of improved performance, obtained by considering phase information in spectral subtraction method [21] (reduced residual noise) and Modulation frequency domain processing methods [22], this paper discusses different estimators and the affect of initial phase consideration on quality of the enhanced speech. By assuming noise speech prior as Generalized Gamma PDF, a novel estimator AUP is derived as in [23] and two CUP estimators are developed using Nagakami PDF, Gamma PDF as Speech priors and Generalized Gamma Prior as noise prior. These are named as CUP-GG, where GG stands for Gamma and Generalized Gamma, CUP-NG, where NG stands for Nagakami and Generalized Gamma. And a novel phase Blind Estimator of Complex Coefficients (BECOCO) where the initial phase is completely uncertain or not available is also derived.

AN US

Recently, speech enhancement algorithms are developed for highly non stationary noises [24]. Also several speech enhancement algorithms are developed using Non-negative Matrix Factorization (NMF) [25, 26]. To have advantages of both statistical-based approaches and NMF approach, online methods has been proposed for bases update [25]. In this work, the performance of combination of derived estimator and NMF with online bases update is analyzed.

ED

M

This paper is organized as follows: Section-II gives the basic concept of phase-aware clean speech estimation and derived two different CUP estimators by assuming speech priori as Nagakami, noise priori as Generalized Gamma Distribution (GGD) and speech prior as Gamma PDF and noise prior as Generalized Gamma Distribution. Section – III derives the improved Novel Amplitude estimator, AUP. Section- IV derives the BECOCO. Section-V analyses the proposed estimators. In section VI the derived estimator is combined with NMF, section VII discusses results and Section VIII concludes. In Table.1 all the Proposed Estimators are listed.

PT

II. PRINCIPLES OF PHASE-AWARE CLEAN SPEECH ESTIMATION

CE

The noisy speech in STFT domain is assumed to be superposition of noise and clean speech, Y  k, i   S k, i   N k, i 

(1)

AC

Where, Y  k , i  indicates the noisy speech, S  k , i  indicates clean speech and N  k , i  indicates noise, and

k , i are indices of frequency index and time index respectively. The complex spectral coefficients can be represented in terms of phase and amplitude as follows, Y  Re jY S  Ae jS

(2)

W  Ne jN Here Y, S, W are the spectral components of noisy, clean speech and noise respectively represented in j S j N jY terms of their amplitudes R, A, N and phases e , e , e respectively.

ACCEPTED MANUSCRIPT Assume that Prior knowledge of estimated clean speech phase is available. This is accomplished using the phase reconstruction algorithm obtained from [10]. From [23], the phase-aware estimator can be write as  2





  f  S  p  y a,  p  a  p  S

E f ( S ) y , S 

0 0

 2





S



S dS da



p  y a,S  p  a  p S S dS da

(3)

0 0

CR IP T

To solve above equation the priors of speech and noise has to be assumed. In this paper the speech spectral amplitudes A is assumed to be Nagakami PDF as k

 k 2 2  k  2 k 1 pA  a    2  a exp   2 a   k    S   S 

(4)

AN US

Where  . is a gamma function and k is shape parameter. If k <1, k =1 allows speech priors as super Gaussian and Rayleigh respectively. Further noise is assumed to be Generalized Gamma PDF as

2 v pY S  y s     v   N2

 a   2 N 

2 v 1

  y  a 2  exp     2     N    

(5)

To solve (3) the only parameter required is the PDF of clean speech phase given estimated phase



M



p S S .As proposed in paper [14], it is modelled using von Mises distribution [28] as

(S S )  exp(k cos(S  S )) / 2I 0 (k )

ED P

S

S

(6)

PT

Where  is the concentration parameter and I n . is the modified Bessel function of the first kind (order

CE

n). As large values of  , the true clean speech phase is likely to be close to initial phase estimate and for small values of  , the initial phase estimate yields only little information about the true clean speech phase. Now all the required terms are available to solve (3) and hence we can derive the proposed estimators.

AC

2.1. Derivations for Proposed CUP Estimators CUP-NG The CUP estimator by assuming speech as Nagakami prior and noise as GGD is derived as in [14] Assume the speech prior as Nagakami PDF as k

 k 2 2  k  2 k 1 pA  a    2  a exp   2 a   k    S   S  Assume the noise prior as GGD as

(7)

ACCEPTED MANUSCRIPT 2 v pY S  y s     v   N2

 a   2 N 

2 v 1

  y  a 2  exp     2     N    

(8)

The CUP estimator is derived as in [23] and results in





 Sˆ    E A e j S y,S  2   ya  exp     2   p  S S dadS    N   S S  k 2 v 1 2 v 1 2 2    k 2  2 v  a   ya  2  k  2 k 1 2 v  a    p  S S dadS a exp  a exp             2 2 2 2 2 2 0 0   k    S2     S    v  N   N    v  N   N    N   S S 

a  e jS

 k 2  2 v 2  k  2 k 1 0 0   k    S2  a e xp    S2 a    v  N2

 a   2  N 

2 v 1

2 v   v  N2

 a   2  N 

2 v 1







(9)



CR IP T



k

2 

After Simplification [see Appendix A], the final equation is obtained as





 Sˆ    E A e jS y, S  2

 r2  2  jS e exp exp     D 2v  2 k   1   pS S S S dS 2 0  2 N  2 2 r  2  0 exp   N2  exp  2  D2v2k 1   pS S S S dS



AN US

  2k  2v    1   k      2  2  2     2k  2v  1    N  S  

    2







(10)

2.2. Derivations for Proposed CUP Estimators CUP-GG

Assume the speech prior as Gamma PDF as

M

Similarly, the CUP estimator by assuming speech as Gamma prior and noise as GGD is derived as in [14]

1  a a k 1 exp     k   

ED

pA  a  

k

(11)

CE

PT

Assume the noise prior as GGD as

2 v pY S  y s     v   N2

 a   2 N 

2 v 1

  y  a 2  exp     2     N    

(12)

AC

The final equation for estimated clean speech signal is obtained as [see Appendix B]





 Sˆ    E A e j S y, S 

  2v  k    1  2       2v  k  1   N2 

2     2

2  jS e exp   D (2 v  k   1)   p S S S S dS 0  2  2 2  exp 0  2  D(2vk 1)   pS S S S dS







(13)



Solving the integral over the phase s is difficult as it involves integration of parabolic cylindrical function D.   . The term phase is having limits ranging from 0 to 2  and the equation (13) can be

ACCEPTED MANUSCRIPT solved numerically. The gains are pre-computed and tabulated with given shape parameters  , k and compression parameter  . The dimensions of the table are taken with respect to a priori, a posterior SNR, the concentration parameter  and the phase difference.

III. PHASE – AWARE AMPLITUDE ESTIMATION

AUP estimates the compressed speech amplitudes. To derive novel estimator AUP, define f  S   S  A 

CR IP T

(14)

The parameter  is perceptually beneficial for 0 <  < 1 as given in [5]. Using this compression parameter  , in phase blind amplitude estimators may yield better results. From [23], (13), the estimation the AUP estimator is obtained as 2

2  0 exp  2  D(2vk   1)   pS S S S dS 2 2  exp 0  2  D(2vk 1)   pS S S S dS



AN US

  2v  k    1  2  Aˆ       2v  k  1   N2 

    2

Where D . is the parabolic cylinder function and the argument  





(15)



 S  2 y  cos(S  Y ) . Here, 2 N 

M

S  Y is the difference between the observed phase and true phase. Von Mises phase prior is substitute in above equation to obtain AUP. The integral is solved numerically by looking look up table which is

 

ED

computed off-line. And the enhanced speech signal amplitudes are obtained by using Aˆ 

1



. The final

PT

clean speech estimate is obtained by combining the obtained amplitudes with the noisy phase as Sˆ AUP  Aˆ exp  jY 

(16)

CE

From equation (16) it is noted that AUP uses noisy phase while reconstruction of estimated signal, thus AUP only modifies / enhances the spectral amplitudes. The motivation here is the unnecessary artifacts arising during phase processing may be excluded.

AC

Let us take a look on how AUP performs for    (known phase) and   0 (Phase blind). When

   the initial phase is treated as estimated phase and hence uncertainty is neglected. As    , the









von Mises distribution approaches to delta function i.e., p S S   S  S . It indicates that the von Mises distribution is non-zero only when S  S , and yields Amplitude estimator given Deterministic Phase information (ADP) is obtained as [23]

  2v  k    1  2  Aˆ       2v  k  1   N2 

    2



D (2v  k   1)   D (2v  k 1)  

(17)

ACCEPTED MANUSCRIPT By choosing    , the uncertainty in estimation of phase is neglected. While reconstruction, the noisy phase is consider along with the Amplitude estimator AUP (as they estimate only spectral amplitudes). Now coming to   0 case, von Mises distribution leads to phase-blind estimator as initial phase does not provide useful information and the von Mises distribution reduces to a uniform distribution between limits  and  as 1 2 . Solving (3) for f ( S )  S



 A , a parametric amplitude estimator is

obtained as in [6]

Aˆ   E  A y 

(18)

The above phase blind estimator is named as MMSE estimation with Optimizable Speech model and





e Ak , Aˆk  Ak  Aˆk with

CR IP T

Inhomogeneous Error criterion (MOSIE). Here the error function

an amplitude

compression parameter β is used (Which is non-linear or inhomogeneous distortion).

PHASE – AWARE

ESTIMATION OF COMPLEX COEFFICIENTS PHASE-AWARE AMPLITUDE ESTIMATION

AND

RELATIONS

AN US

IV.

TO

The phase-aware CUP estimator which estimates the compressed complex speech spectral coefficients is obtained as

M

f  S   S   A e jS

(19)

And we obtain the estimated signal using (10) as



PT



ED

  2v  k    1  2   Sˆ    E A e jS y, S      2v  k  1   N2 

2     2

2  jS e exp   D (2 v  k   1)   p S S S S dS 0  2  2 2  0 exp  2  D(2vk 1)   pS S S S dS









(20)

It is noted that the above equation is similar to AUP but contains e jS . The final estimated speech is obtained using

AC

CE

SˆCUP  Sˆ

1



Sˆ Sˆ

(21)



Note that the in CUP estimator the phase is not the initial phase estimate. Similar to ADP, the complex estimator CUP for known phase, i.e.,    and for phase blind i.e.,   0 has to discuss. When    CUP reduces to





SˆD  E A e jS y, S  E  A y, S  e jS  Aˆ D e jS

(22)

Which is named as Complex spectral speech coefficients given Deterministic Phase information (CDP) . Comparing the AUP and CUP for case of full certainty in the initial phase (    ), it is noted that the estimated clean speech amplitudes are same. It is clear that CUP estimates the complex coefficients of clean speech, whereas AUP estimates the amplitudes. When the phase is perfectly known, the CUP phase estimates is clean speech phase and AUP uses noisy phase (for reconstruction).

ACCEPTED MANUSCRIPT The closed form solution for estimator when   0 yields as in [23] by assuming speech prior as Nagakami and Noise prior as GGD yields an estimator phase-Blind Estimator of Complex Coefficients BECOCO. From ([23], Eq.17 and Eq. 18) we write as

e j

Y





2ra    cos    j sin    exp   cos    d 2 Y



2 N

Y

(23)

And also from ([23], Eq. 19)

In  p  

1 2



2

o

cos  nz  exp  p cos  z   dz (24)

CR IP T

Thus from ([23] Eq. 17) we obtain the estimation as

 k N2   S2 2   2r  a  2 v  2 k 2  a exp a  I1    da 2 2 2 0    S N N     j Y  e   k N2   S2 2   2r  a  2v  2 k 2 exp   a  I0   da 2 2 2 0 a    S N N     

AN US

 SˆB 

(25)

After simplification the result is obtained as [see Appendix C]

(26)

ED

M

3    2v  2k      (2 v  2 k    2)  1   2  2 2  (2 v  2 k    2)       N    2  2k    ; 2 ;  2  k      2 2 k       3  3   2v  2k    (2 v  2 k  ) 3 2  3   2  2 2  (2 v  2 k  2 )       N    2  2k  ;1;  2   (1) 2 k     k    

Where  . ;. ; . is the confluent hypergeometric function,  

Y

2

 N2

is the a posterior SNR and  

PT

priori SNR. Here,  N2 is the noise variance,  S2 is the clean speech variance and Y

2

 S2 is the a  N2

is power related to

CE

noisy signal. The final estimator is obtained by reversing the compression. SˆB  SˆB

1







exp jSˆB  SˆB

1



exp  jY 

(27)

AC

This estimator is named as the phase-Blind Estimator of Complex Coefficients (BECOCO). Note that for   0 , i.e., for phase-Blind, the complex magnitudes for CUP and spectral amplitudes of AUP is not same. All the derived estimators are listed in Table. 1., along with final equation.

ACCEPTED MANUSCRIPT

Table.1. Proposed Estimators and their Mathematical representations

Estimators MOSIE (18)

Mathematical Equation Gamma as Speech prior and Generalized Gamma as Noise Prior

Aˆ B  E  A y  Nagakami as Speech prior and Generalized Gamma as Noise Prior

CR IP T

BECOCO(26)

ADP (17)

AN US

3  7   2v  2k     (   ) 7 1   2  2 (   2 )    2   N    2  2k    ; 2 ;  2      2 2 k     k     SˆB  Y 3    2v  2k   3   2     2  2k  ;1;  2 (1) 2 k     Gamma as Speech prior and Generalized Gamma as Noise Prior

  2v  k    1  2  Aˆ       2v  k  1   N2 



D (2v  k   1)   D (2v  k 1)  

Gamma as Speech prior and Generalized Gamma as Noise Prior

M

CDP (22)

    2





AUP (15)

ED

SˆD  E A e jS y,S  E  A y,S  e jS  AˆD e jS Gamma as Speech prior and Generalized Gamma as Noise Prior

2  exp 0  2  D(2vk   1)   pS S S S dS 2 2  exp 0  2  D(2vk 1)   pS S S S dS









Nagakami as Speech prior and Generalized Gamma as Noise Prior

AC

CUP-NG (10)

CE

PT

  2v  k    1  2  Aˆ       2v  k  1   N2 

2     2

  2k  2v    1   k    Sˆ    2  2  2     2k  2v  1    N  S  

    2

2

 r2  2  jS e exp exp  2   D 2 v  2 k   1   p S S S S dS 0   2   N  2 2 r  2  exp exp 0   N2   2  D 2v2k 1   pS S S S dS









ACCEPTED MANUSCRIPT CUP-GG (13)

Gamma as Speech prior and Generalized Gamma as Noise Prior 2

  2v  k    1  2  Sˆ       2v  k  1   N2 

    2

2  jS e exp   D (2v  k  1)   pS S S S dS 0  2  2 2  0 exp  2  D(2vk 1)   pS S S S dS









V. ANALYSIS OF ESTIMATORS

CR IP T

Analysis of estimators for various values of concentration parameter  are discussed here. Firstly consider phase-Blind case. When k =0, a special case of CUP estimator and AUP estimators are resulted, and those are named as BECOCO and MOSIE. It is noted that the     0.5 the BECOCO and MOSIE

ED

M

AN US

provides reduced attenuation and protects speech components. Secondly, consider the phase-aware case i.e.,   0 . Under   0 , there exists some certainty in phase. The behaviour of estimators for different values of concentration parameter k , are plotted in Fig.1. Here to understand the behaviour of estimators the values of concentration parameter k is varied from 0 to  . k value is increasing from zero to infinity indicates that we are moving to increasing certainty in phase. The characteristic curve shows the that the CUP-GG, AUP-GG provides better performance than the CUP estimator because of reduced attenuation as proposed in [ 23]. Observe that as k   the CUP and AUP responses are coincide, it indicates that the result is independent of concentration parameter at such large value. It is noted from Fig.1 that for smaller values of k there exists clear difference in amplitudes of CUP, AUP, and as k is changing in between values from 0 to  i.e at k =10, k =20, the difference in CUP and AUP input and output characteristics are decreasing. This indicates the importance of initial phase consideration on the input and output characteristics of the estimators.

PT

From Fig.1 it is noted that the concentration parameter decides the uncertainty in phase. So here by considering the concentration parameter k =4 and the input and output curves are plotted for different phase differences. In Fig.2. we present the input output characteristics of CUP and AUP at different phases of   S  Y and fixed uncertainty k =4. It is noted that if the phase difference is more then the more suppression is resulted in input and output characteristics curve. Hence, observe the cases where,

CE

phase difference   0 ,   1/ 2 in Fig.2, the AUP and CUP cures are at some distance where as for the case   2 / 3 the AUP and CUP curves are closer than the cases   0 ,   1/ 2 .

AC

This indicates that the suppression or attenuation is more under large phase difference values. Also it is noted that under phase difference considerations the CUP is more aggressive than AUP which is observed both theoretically and practically. The plots of (on X-axis) normalized noise Input (normalized with noise variance) versus normalized estimated amplitude (on Y-axis) are obtained as follows.

ACCEPTED MANUSCRIPT 3

3 2.5 2 1.5 1 0.5 0

CUP AUP CUP-GG AUP-GG

CUP AUP CUP-GG AUP-GG

2.5 2 1.5 1 0.5 0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0

0.5

Normalized Noise Input (Normalized with Noise Variance)

2

CUP AUP CUP-GG AUP-GG

1.5 1

1 0.5

0

0 1

1.5

2

2.5

3

3.5

4

2.5

3

3.5

4

4.5

5

4.5

5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

AN US

0.5

2

CUP AUP CUP-GG AUP-GG

1.5

0.5 0

1.5

Normalized Noise Input (Normalized with Noise Variance)

2.5 2

1

CR IP T

0

Normalized Noise Input (Normalized with Noise Variance) Figure 1. Input and Output Characteristics of AUP and CUP for

Normalized Noise Input (Normalized with Noise Variance)

    1,  1 , and the phase difference of   0.45

and for Concentration parameter k=0, k=10, k=20, k= Infinity (Left to Right) 3

3

2 1.5 1 0 0

0.5

1

1.5

PT

0.5 2

2.5

3

3.5

4

CUP AUP CUP-GG AUP-GG

2.5 2 1.5

ED

2.5

M

CUP AUP CUP-GG AUP-GG

1 0.5 0 4.5

5

0

AC

CE

Normalized Noise Input (Normalized with Noise Variance) 3 2.5 2 1.5 1 0.5 0

0

0.5

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Normalized Noise Input (Normalized with Noise Variance)

CUP AUP CUP-GG AUP-GG

1

1.5

2

2.5

3

3.5

4

4.5

5

Normalized Noise Input (Normalized with Noise Variance)

Figure. 2. Input and output characteristics of AUP and CUP for k=4 and at different phase differences

  0 ,   1/ 2 , 2 / 3 (From Left to Right) (Note: In Figure 1 and Figure 2) Y-axis label it is Normalized Estimated Amplitude)

ACCEPTED MANUSCRIPT VI. COMBINING DERIVED ESTIMATOR WITH NMF APPROACH

CR IP T

From the above discussion, statistical speech enhancement methods assume different PDFs for speech and noise. And it is observed that no training is needed a priori. But the statistical approaches assume stationary noise and hence it may reduce the performance under highly non-stationary noises. Other approaches like NMF, uses a prior information from speech and noise data bases and thus performs well even under non-stationary noises. Thus combining the statistical approaches and NMF may provide better results as it takes the advantage of both statistical approach and NMF approach. The combination of statistical and NMF approach is proposed in [25 ]. In this paper, the derived CCUP-GG estimator gain obtained by statistical approach is used in place of estimator gain function.

Y ' (t )

Y (t ) Statistical Model-based Enhancement

AN US

The proposed speech enhancement method consists of two stages as in [25 ] . In the first stage statistical model-based speech enhancement method proposed in [24] is used. In the second stage NMF is used and estimator gain function is replaced by proposed CUP-GG estimator. Also online up-date of noise and speech basis is implemented as proposed in [25]. The process of proposed method is given in Fig

ˆ (t ) , S

M

NMF

ˆ (t ) N

Sˆ Final  t 

 (t ),

SNR Estimation

CUP-GG Gain

 (t )

ED

Wn  t  1 , Ws  t  1

SPP estimation

 s t  ,

Maximum update rate determinatio n

PT

p (t )

AC

CE

On-line bases update

n t  Figure.3. Block diagram for proposed method

Different NMF techniques are compared with proposed NMF based speech enhancement that uses online-bases update. For comparison, objective performance measure Perceptual Evaluation of Speech Quality (PESQ) is used. The experimental results from Tables. [2 to 5] shows that derived estimator CUP-GG provides better performance when it is combined with NMF approach.

ACCEPTED MANUSCRIPT VII. RESULTS To evaluate and compare the performance of the proposed estimators, 30 sentences (15 by male speakers and 15 by female speakers) are taken from NOIZEUS speech corpus database (phonotenically Balanced) and are corrupted by additive noise. Hanning window with 75 % overlap is used for analysis and synthesis of speech signal. In this work, Modulated pink noise, Pink noise, Babble noise and NonStationary factory noise are added to clean speech signal and corrupted at different input SNRs of -10 dB, -5 dB, 0 dB, 5 dB, 10 dB. The prior phase is estimated as in [10], which uses the fundamental frequency. The speech fundamental frequency is obtained from the pitch estimation algorithm PEFAC [29] on the noisy observation. The uncertainty parameter  , is set for each time-frequency bin using [29] as

 4 PV  l  , for kf s / N  4 kHz  2 PV  l  , for kf s / N  4 kHz

Where

CR IP T

 k,l   

(28)

PV  l  , is the probability that signal segment  l  contains voiced speech. From the above equation

M

AN US

it is clear that higher probability of having voiced speech is available then it is taken as the phase estimate and the value of concentration parameter is increased. Noise variance and speech variances are obtained using Speech Presence Probability (SPP) approach and decision directed approach [30] with a smoothing factor of 0.65 (to reduce speech distortions) respectively. Amplitude effects are evaluated using Signal to Noise Ratio (SNR) and Segmental Signal to Noise Ratio (SSNR), whereas phase effects are evaluated using Phase Signal to Noise Ratio (PSNR). The PSNR is evaluated using ([23], Eq.22). To evaluate perceptual quality of enhanced speech signal Perceptual Evaluation of Speech Quality (PESQ) performance measure is evaluated [30]. To understand about intelligibility performance, Short-Time Objective Intelligibility (STOI) [31] are ploted for different estimators.

CE

PT

ED

In Fig.4, the comparison of ΔPESQ values averaged over 30 sentences for proposed estimators under Modulated-pink noise, pink noise, Babble noise, Non-stationary factory noise is shown graphically. Where Δ indicates the difference between the PESQ value obtained for processed and unprocessed signals. The respective numerical results are given in Appendix. D (Table. 6 to Table. 9). Experimental results shows that the proposed CUP estimators (10), (13) provides better improvement in PESQ values than CUP proposed in [14]. Due to consideration of uncertainty about the prior phase, rather than directly employing estimated phase, there is significant improvement in PESQ values even at low SNRs. It is observed that estimators (10), (13) reduce the quality degradation at high SNRs. It is observed the high value of PESQ improvement is observed for proposed estimators.

AC

CUP-NG estimator (10) improves performance than CUP [14], with PESQ improvement of 0.23, 0.47, 0.87, 0.79, 0.73, 0.64 at respective SNRs of -10 dB, -5 dB, 0 dB, 5 dB, 10 dB, 15 dB for Modulatedpink noise. PESQ improvement of 0.14, 0.41, 0.74, 0.73, 0.7, 0.64 at respective SNRs of -10 dB, -5 dB, 0 dB, 5 dB, 10 dB, 15 dB for Pink noise is obtained. PESQ improvement of 0.13, 0.26, 0.47, 0.54, 0.56, 0.5 at respective SNRs of -10 dB, -5 dB, 0 dB, 5 dB, 10 dB, 15 dB for Babble noise is obtained. PESQ improvements of 0.11, 0.35, 0.68, 0.68, 0.64, 0.62 at respective SNRs of -10 dB, -5 dB, 0 dB, 5 dB, 10 dB, 15 dB for Non-stationary factory noise is obtained. It is noted that high PESQ value is obtained for Modulated pink noise at 0 dB. . It is noted that low PESQ improvement is obtained for babble noise at 0 dB. The proposed estimator CUP-GG (13), provides better results than to estimator (10) and is observed in Fig. 3. It is noted that high PESQ value of 0.92 is obtained for Modulated pink noise at 0 dB. Babble noise. The experimental results shows that the proposed CUP-GG (13) estimator, with speech as Gamma

ACCEPTED MANUSCRIPT and noise as Generalized gamma provides better PESQ values even at low SNRs for different noise conditions. CUP-GG (13) improves the performance than CUP [14], with PESQ improvement of 0.31, 0.55, 0.92, 0.82, 0.77, 0.66 at respective SNRs of -10 dB, -5 dB, 0 dB, 5 dB, 10 dB, 15 dB for Modulated-pink noise. PESQ improvement of 0.17, 0.43, 0.81, 0.79, 0.76, 0.65 at respective SNRs of -10 dB, -5 dB, 0 dB, 5 dB, 10 dB, 15 dB for Pink noise is obtained. PESQ improvement of 0.17, 0.28, 0.57, 0.58, 0.59, 0.53 at respective SNRs of -10 dB, -5 dB, 0 dB, 5 dB, 10 dB, 15 dB for Babble noise is obtained. PESQ improvements of 0.14, 0.39, 0.72, 0.67, 0.65, 0.63 at respective SNRs of -10 dB, -5 dB, 0 dB, 5 dB, 10 dB, 15 dB for Non-stationary factory noise is obtained.

AN US

CR IP T

In Fig. 5. the average values of 30 sentences of Δ STOI in %, are plotted for four different noises. Here Δ indicates the difference in processed and unprocessed signals which are mentioned in percentage. The proposed estimator CUP-GG provides better STOI improvement. It is noted that the STOI improvement for CUP estimators are decreasing with increased input SNR. CUPP-GG (13) provides better STOI improvement up to 5 percent. It is noted that for Babble noise (13) results in negative STOI values. This may overcome by using cepstrum smoothing as in [32]. It is noted that improved Novel estimator MOSIE is providing positive STOI and also at low input SNRs, AUP estimator provides better STOI values up to difference of 0.15 to 0.4 for different noises considered in this work. This STOI values higher for AUP than CUP indicates the intelligibility is improve with AUP estimator at low input SNRs and it shows the importance and artifacts under phase consideration. It is noted that the Proposed BECOCO is showing negative values. The respective numerical values of STOI improvement is given in Appendix. E (Table. 10 to Table. 13)

AC

CE

PT

ED

M

In Fig. 6, the results are plotted by averaging the SSNR values obtained for four noises Modulated Pink noise, Pink noise, Babble Noise, Non-stationary factory noise and 30 sentences. It is noted that the quality of enhanced signal is improved with MOSIE and AUP estimators than CUP. Thus better phase estimation and reconstruction methods may move the enhancement process further. In Fig. 5, the results are plotted by averaging the Phase Signal to Noise Ratio (PSNR) values and it is noted that proposed CUP-GG provides better values and the improved novel estimators also shows improved values for MOSIE, AUP, BECOCO. The respective numerical values of Avg SSNR and Avg PSNR are is given in Appendix. F (Table. 14 to Table. 15)

1

1

0.8

0.8 Δ PESQ

Δ PESQ

ACCEPTED MANUSCRIPT

0.6 0.4

0.6 0.4 0.2

0.2

0

0 -10

-5

0

5

10

-10

15

-5

0.6 0.4

Wiener BECOCO (26) CDP (22) CUP [14] CUP-GG (13)

0.2 0

0

5

10

10

15

-10

15

Mosie (18) ADP (17) AUP (15) CUP-NG (10)

CR IP T

Δ PESQ

Δ PESQ

0.8

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -5

5

Input Signal SNR in dB

Input Signal SNR in dB

-10

0

-5

0

5

10

15

Input Signal SNR in dB

AN US

Input Signal SNR in dB

Figure. 4. Comparison of PESQ values for different input SNRs of -10 dB, -5 dB, 0 dB, 5 dB, 10 dB, 15 dB, and noise types (from left to right ) Modulated Pink noise, Pink Noise Babble, Noise, Non-Stationary Factory Noise averaged over 30 sentences (15 spoken by 3 female and 15 by male speakers) from NOIZEUS speech corpus data base.

M

6 4 2 0 -10

-2

-5

0

5

10

15

Δ STOI in [%]

8

ED

Δ STOI in [%]

8

6 4 2 0

-2

-10

5

Wiener BECOCO (26) CDP (22) CUP [14] CUP-GG (13)

Input Signal SNR in dB

5

10

15

10

15

Input Signal SNR in dB 6

10

15

Mosie (18) ADP (17) AUP (15) CUP-NG (10)

Δ STOI in [%]

PT 0

CE

-5

AC

Δ STOI in [%]

-10

0

-4

Input Signal SNR in dB 1 0 -1 -2 -3 -4 -5 -6

-5

4 2 0 -10

-5

0

5

-2 Input Signal SNR in dB

Figure. 5. Comparison of STOI values for different input SNRs of -10 dB, -5 dB, 0 dB, 5 dB, 10 dB, 15 dB, and noise types (from left to right ) Modulated Pink noise, Pink Noise Babble, Noise, Non-Stationary Factory Noise averaged over 30 sentences (15 spoken by 3 female and 15 by male speakers) from NOIZEUS speech corpus data base.

16

14

14

12

Avg PSNR

16

10 8 6

Wiener

Mosie

BECOCO

ADP

12

CDP

AUP

10

CUP

CUP-NG

8 4

2

2

0

0 -5

0

5

10

-10

15

-5

0

5

10

15

Input Signal SNR in dB

Input Signal SNR in dB

CR IP T

-10

CUP-GG

6

4

Figure. 6. Comparison of SSNR, PSNR values Averaged for four different noises Modulated Pink noise, Pink Noise, Babble Noise, Non-Stationary Factory Noises and Average of 30 sentences from NOIZEUS speech corpus data base at different input SNRs of -10 dB, -5 dB, 0 dB, 5 dB, 10 dB, 15 dB (from left to right).

7.1. Subjective Listening tests and its analysis

PT

ED

M

AN US

Subjective listening tests are held in quiet room according to ITU-T recommendation P.385 [33]. In this method the listener is instructed to rate the processed speech signal by attending on three aspects of speech signal alone, background noise alone and the overall effect using Mean Opinion Score (MOS). A group of 14 listeners and of normal-hearing (age 18-20 years) are participated in the test. Sentences from Noizeus data base (phonetically balanced) were given to listeners over high end headphones. As mentioned in paper [33], scale of signal distortion (SIG) is considered as 5- Very natural, no degradation; 4- Fairly natural, little degradation; 3- Somewhat natural, somewhat degraded; 2- Fairly unnatural, fairly degraded; 1- Very unnatural, very degraded. Scale of background intrusiveness (BAK) is considered as 5- Not noticeable; 4- Somewhat noticeable, 3- Noticeable but not intrusive, 2- Fairly conspicuous, somewhat intrusive, 1- Very conspicuous, very intrusive. The overall effect is taken using the Mean Opinion Score (OVRL) as 1-bad; 2-poor; 3-fair; 4-good; 5-excellent. Results are analyzed for the signals corrupted in Babble noise, Modulated Pink noise, Non-Stationary factory noises.

CE

Subjective test results shows that, the proposed estimators CUP-NG (10), CUP-GG (13) obtained by considering phase information there is significant improvement under low SNRs. The listeners noted that there is reduction in quality degradation at high SNRs. It is noted that when listeners are attended to processed signal corrupted by babble noise the estimator CUP-GG provides better values of 3.6 for SIG, 2.5 for BAK and 3.1 for OVRL at 5 dB SNR, which indicates fair and some what natural sounding. It is also noted that, better is the case with other two noises listed . It is noted that objective performance measure PESQ is better for proposed CUP-GG, the subjective listening test also indicating some what natural sounding. When the SNR is moving to 0 dB, -5 dB proposed CUP-GG (13) gives fair and some natural sound is seen and for CDP, MOSIE some intrusiveness is observed as shown in Fig.7. When listener is attended to processed signal under Modulated pink noise, it is observed a better performance than in case of Babble noise of 3.9 for SIG, 2.7 for BAK, 3.4 for OVRL as shown in Fig.8. This result is obvious as the PESQ and STOI values are better than to babble noise. Under Non-Stationary Factory case the results are better than the case of Babble as shown in Fig. 8. The phase based amplitude modification in proposed estimator CUP-GG (13) and the results of estimators under phase uncertainty MOSIE, AUP, BECOCO shows improved results than to traditional approaches.

AC

Avg SSNR

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT 3.5

SIG BAK OVRL

3

SIG BAK OVRL

3 2.5

2

2

1.5

CR IP T

CUP-GG (13)

CUP-NG (10)

ADP (17)

CUP [7]

0

AUP (15)

0 CDP (22)

0.5 BECOCO (26)

0.5 MOSIE (18)

1

Wiener

1

CUP-GG (13)

CUP-NG (10)

CUP [7]

AUP (15)

CDP (22)

ADP (17)

BECOCO (26)

MOSIE (18)

1.5

Wiener

SIG BAK OVRL

2.5

Wiener MOSIE (18) BECOCO (26) ADP (17) CDP (22) AUP (15) CUP [7] CUP-NG (10) CUP-GG (13)

4 3.5 3 2.5 2 1.5 1 0.5 0

Figure. 7. Comparison of Subjective listening test values, averaged over 30 sentences from NOIZEUS speech corpus data base corrupted by Babble noise at different input SNRs of 5 dB, 0 dB, -5 dB (from left to right).

2

SIG BAK OVRL

1.5 1 0.5 0 Wiener MOSIE (18) BECOCO (26) ADP (17) CDP (22) AUP (15) CUP [7] CUP-NG (10) CUP-GG (13)

CUP-GG (13)

CUP [7]

AUP (15)

CDP (22)

ADP (17)

AN US

2.5

ED

M

BECOCO (26)

Wiener

CUP-GG (13)

CUP-NG (10)

CUP [7]

AUP (15)

CDP (22)

ADP (17)

BECOCO (26)

MOSIE (18)

Wiener

3

SIG BAK OVRL

CUP-NG (10)

4 3.5 3 2.5 2 1.5 1 0.5 0

SIG BAK OVRL

MOSIE (18)

4.5 4 3.5 3 2.5 2 1.5 1 0.5 0

3.5

SIG BAK OVRL

3 2.5

3 2.5 2

2

1.5

1.5

1

1

0.5

0.5

0 CUP-GG (13)

CUP-NG (10)

CUP [7]

AUP (15)

CDP (22)

ADP (17)

BECOCO (26)

MOSIE (18)

Wiener

CUP-GG (13)

CUP-NG (10)

CUP [7]

0 AUP (15)

SIG BAK OVRL

Wiener MOSIE (18) BECOCO (26) ADP (17) CDP (22) AUP (15) CUP [7] CUP-NG (10) CUP-GG (13)

CE CDP (22)

ADP (17)

BECOCO (26)

MOSIE (18)

AC

SIG BAK OVRL

Wiener

4 3.5 3 2.5 2 1.5 1 0.5 0

PT

Figure. 8. Comparison of Subjective listening test values, averaged over 30 sentences from NOIZEUS speech corpus data base corrupted by Modulated-Pink noise at different input SNRs of 5 dB, 0 dB, -5 dB (from left to right).

Figure. 9. Comparison of Subjective listening test values, averaged over 30 sentences from NOIZEUS speech corpus data base corrupted by Non-Stationary Factory noise at different input SNRs of 5 dB, 0 dB, -5 dB (from left to right).

ACCEPTED MANUSCRIPT 7.2. Combined approach of Proposed estimator with NMF The proposed estimator is combined with NMF as proposed in [25] . The experiments are conducted by taking samples from database. The clean speech signals are corrupted by different noises like, street noise, babble noise, factory noise, babble noise, F16 noise, M109 noise, White noise. The experiments are conducted under matched noise. The STFT is computed using Hanning window with frame size of 1024 and 75% of overlap. The parameter values used were

 smax  0.4,  nmax  0.5,  s  0.4,  n  0.9,  e  0.97 and are taken from [25]. The proposed method is

CR IP T

compared with the benchmark algorithms. The experimental results show that the proposed method outperforms the benchmark algorithms in terms of PESQ measure. The numerical values are listed in [Table.2 to Table.5]. SE (Statistical-model based Enhancement) is speech enhancement method taken from [24]. NMF based method is taken from [26], SE+NMF+OU (OU-online-bases update) method is proposed in [25]. Now the proposed combined approach of NMF and statistical-model based approach as in [25] with proposed estimator gain SE+NMF+OU+CUP-GG is analyzed.

AN US

Tables. [2-5]. Performance comparison of PESQ values for different NMF algorithms with the proposed method by combining statistical based method and NMF for matched noise bases.

Algorithm

NMF [26] SE+NMF SE+NMF+OU[25]

Algorithm

PT

SE+NMF+OU+CUP-GG

ED

M

SE[ 24]

Modulated Pink Noise SNR in dB -10 -5 0 5 10 15 0.29 0.56 1.32 2.31 2.52 2.91 0.1 0.24 1.11 2.01 2.31 2.7 0.31 0.61 1.38 2.39 2.61 3.01 0.33 0.67 1.42 2.48 2.69 3.12 0.39 0.78 1.67 2.68 2.95 3.31

-10

NMF [26] SE+NMF

0.15 0.08 0.17 0.18 0.21

CE

SE [24]

AC

SE+NMF+OU [25]

SE+NMF+OU+CUP-GG

Algorithm

-10 SE [24] NMF [26] SE+NMF SE+NMF+OU [25] SE+NMF+OU+CUP-GG

0.29 0.13 0.31 0.33 0.35

Street Noise SNR in dB -5 0 5 10 15

0.52 0.25 0.57 0.59 0.67

F-16 Noise SNR in dB -5 0 5 10 15

-10

2.2 1.97 2.28 2.34 2.58

2.27 1.98 2.32 2.23 2.63

-10

Factory Noise SNR in dB -5 0 5 10 15

0.17 0.1 0.19 0.19 0.23

1.19 0.98 1.23 1.29 1.46

1.91 1.75 2.02 2.12 2.31

Babble Noise SNR in dB -5 0 5 10 15 0.71 1.57 2.19 2.45 2.87 0.48 1.21 1.98 2.12 2.38 0.78 1.62 2.26 2.51 2.95 0.81 1.69 2.34 2.41 2.72 0.89 1.94 2.56 2.67 2.92

2.58 2.19 2.64 2.45 2.91

0.46 0.24 0.49 0.52 0.58

1.49 1.17 1.53 1.59 1.93

-10 0.23 0.07 0.26 0.28 0.31

2.23 1.99 2.29 2.36 2.41

2.27 2.03 2.39 2.48 2.67

0.21 0.11 0.22 0.24 0.28

0.26 0.12 0.31 0.36 0.62

0.81 0.65 0.88 0.91 1.47

1.71 1.59 1.79 1.84 2.16

1.96 1.71 2.11 2.19 2.41

M109 Noise SNR in dB -5 0 5 10 0.42 0.21 0.42 0.45 0.56

1.11 0.94 1.18 1.21 1.44

2.02 1.93 2.11 2.18 2.39

2.18 1.95 2.23 2.31 2.56

2.1 1.87 2.32 2.41 2.69

15 2.21 1.98 2.29 2.38 2.57

ACCEPTED MANUSCRIPT

-10 0.33 0.34 0.35 0.4

0.98 0.99 1.12 1.18

2.01 2.17 2.25 2.49

2.81 2.89 2.96 3.11

2.86 2.92 3.02 3.19

3.02 3.12 3.22 3.39

AN US

SE [24] SE+NMF SE+NMF+OU [25] SE+NMF+OU+CUPGG

White Noise SNR in dB -5 0 5 10 15

CR IP T

Algorithm

VIII. CONCLUSION

CE

APPENDIX A

PT

ED

M

In this paper, two CUP estimators (10), (13) are derived by assuming Nagakami and Gamma as speech priors and Generalized Gamma as noise priori. An amplitude estimator given uncertain prior phase information and estimation of complex speech coefficients are presented for completely uncertain phase information. Different estimators are analyzed in terms of objective performance measures PESQ, STOI and subjective listening test according to ITU-T recommendation P.385. Also sensitivity of prior phase information on different estimators are discussed. Secondly, the derived CUP estimator is combined with template based NMF approach with online-bases update. The sophisticated methods for estimation of phase moves the speech enhancement process further. By combining the advantages of statistical-based approach and NMF, more robust speech enhancement methods for non stationary noises may develop.

Generalized

AC

(Derivation of CUP Estimator by assuming Nagakami PDF for speech prior and Gamma Prior for noise ) The CUP estimator by assuming speech as Nagakami prior and noise as GGD is derived as in [14] Assume the speech prior as Nagakami PDF as k

 k 2 2  k  2 k 1 pA  a    2  a exp   2 a   k    S   S  Assume the noise prior as GGD as

(A.1)

ACCEPTED MANUSCRIPT 2 v pY S  y s     v   N2

 a   2 N 

2 v 1

  y  a 2  exp     2     N    

(A.2)

The CUP estimator is derived as in [20] using





 Sˆ    E A e jS y,S 

  y  a 2  exp     2   p  S S dadS   N   S S   k 2 v 1 2 v 1 2 2  v v    ya 2  k  2 k 1  k 2  2  a  2  a  0 0   k    S2  a exp    S2 a    v  N2   N2    v  N2   N2  exp     N2   pS S S S dadS   k

2 

2  k  2 k 1  k 2  2 v  a  a e    a e xp   2 a  2  2   k 2   S    v  N   N  0 0   S   jS



2 v 1

2 v  a      v  N2   N2 

2 v 1









Sˆ     E A e jS y, S 

 0



2

 0

  k  r 2   2k 2v  2  exp exp    2  2  2 a 2 k 2 2 2 v 1   S N   k    v    S   N  N  N  0   4 v k k

 2  2 r cos    a   a  pS S S S dadS  N2    





  k  r 2   2k  2v 2  2 r cos      exp exp    2  2  a 2    2 a  a  pS S S S dadS 2 k 2 2 2 v 1  N2   k    v    S   N  N  N  0     S N  4 v k k



AN US

e jS

2



CR IP T



(A.3)



(A.4)

After rearranging and simplification using Gradshteyn and Ryzhik (2007 Eq. (3.462.1)) 

p 1   x x e

2

 x

By Comparing, the obtained terms are

M

0

p 2       2  2   p  exp   D p   2    8 

k N2   S2

ED

p  2k  2v  1;  

  2 S

2 N



(A.5)

2r  cos 

 N2

PT

By Substituting all parameters after simplification





 Sˆ    E A e j S y, S 





0

4 v k k   2k  2v    1   k     2 2  2  2 k 2 2v   k    v   S   N     N  S  

CE

2

2

4 v k k   2k  2v  1   k     2 2  2  2 k 2 2v   k    v   S   N     N  S  

AC

 0

 1    k  v   2  



 r2  2  e jS exp  2  exp   D 2 v  2 k   1   p  S S dS S S  2  N 

1   k  v   2 



 r2 exp  2 N



 2  exp    D 2 v  2 k 1   p S S S S dS  2  





(A.6)



 Sˆ    E A e jS y, S  2



  2k  2v    1   k     2  2  2     2k  2v  1    N  S  

    2

 r2  2  jS e exp exp  2   D 2v  2 k   1   pS S S S dS 0  2 N  2 2 r  2  exp exp 0   N2   2  D2v2k 1   pS S S S dS









(A.7)

ACCEPTED MANUSCRIPT

APPENDIX B (Derivation of CUP Estimator by assuming Gamma PDF for speech prior and Gamma Prior for noise priori)

Generalized

CR IP T

The CUP estimator by assuming speech as Gamma prior and noise as GGD is derived as in [10] Assume the speech prior as Gamma PDF as

pA  a  

1  a a k 1 exp     k    k

AN US

Assume the noise prior as GGD as

2 v pY S  y s     v   N2

 a   2 N 

2 v 1

  y  a 2  exp     2     N    

(B.1)

(B.2)



0

2

 x

  2 

ED

p 1   x x e

M

The CUP estimator is derived as in [20] and from Gradshteyn and Ryzhik (2007 Eq. (3.462.1))

p 2

2       p  exp   D p   2    8 

(B.3)

PT

By comparing the parameters we obtain parameters

CE

p  2v  k    1;  

 1 2 y cos S  Y  ;   2 N S  N2

2

 0

AC

The Numerator term is obtained finally as

2 v   k  k   v   N2

Let

 1   2 N 

2 v 1

 2   2 N 

 2 v  k   1    2  

  1 2 y cos     2   1 2 y cos S  Y   S Y        2 N  N2    D  S    2v  k    1 exp   S  2 v  k   1 2  2   8  N 2   N          

 2 y S cos S  Y    N2    N 2 2    N S  

  

Divide Numerator for above  with  S and  N2 then  

 S  2 y cos(S  Y ) 2 N 

(B.4)

ACCEPTED MANUSCRIPT Finally the estimated signal using (3) and using simplified equation as

 2   2 2 v   k  k (v)  N2    N  2 v



 k   1   v   2  

2    2v  k    1 exp   D (2v  k   1)    2

(B.5)



Sˆ     E A e j S y, S  2

 0



  1 2 y cos     2   1 2  y cos S  Y   S Y         N2  1   2    N2 2 v  S   p   2v  k    1 exp  D 2 v  k   1  S   d S  2 k 2  2  2     S S S S   k    v   N   N    N  8  N 2   N2             1 2 y cos     2   1 2  y cos S  Y   S Y  2 v  k 1     2 v 1       2  N2  1   2   2    N2 2 v  S   p   2v  k  1 exp  D 2 v  k 1  S   d S 2 0   k  k   v   N2   N2    N2      S S S S 8  N 2   N2           2 v 1

 2 v  k   1    2  





CR IP T





 Sˆ    E A e j S y, S   k   1   v  

2











2

CE

APPENDIX . C

2  jS e exp   D (2 v  k  1)   p S S S S dS 0  2  2 2  exp 0  2  D(2vk 1)   pS S S S dS





(B.7)





(B.8)

PT

ED

  2v  k    1  2       2v  k  1   N2 

    2





M



 Sˆ    E A e j S y, S 

AN US

2   2   2   2 v  k    1 exp     D (2 v  k  1)   p S S S S dS 0   N2   2    k 1   v   2  2   2  2   2 v  k  1 exp     D (2 v  k 1)   p S S S S dS 0   N2   2 

(B.6)

Thus from ([23] Eq. 17) we obtain the estimation as

 k N2   S2 2   2r  a  exp   a  I1   da 2 2 2 0 a    S N N     j Y  e   k N2   S2 2   2r  a  2v  2 k 2 a exp a  I0    da 2 2 2 0    S N N    

AC



 SˆB 

2 v  2 k 2 

(C.1)

Using ([20] Eq. 6.643.2, Eq.9.220.2) and simplification sequentially results in





0

x

1 p 2

e  x I 2 v



1   p  v   2  2  2  1 2  p  2 x dx   e  M  p ,v     2v  1  



(C.2)

ACCEPTED MANUSCRIPT And

M  ,  z   z



1 2

z 2

1   e       ; 2  1; z  2  

(C.3)

Let



2

Y



 N2

 S2  N2

From (21)

 k N2   S2   2r  x  2 v  2 k  3  a exp x  I1   dx  2 0  S2 N2 Y    N   e j 2 2   k N   S   2r  x  2 v  2 k 3 exp   x  I0   da 2 0 a  S2 N2    N 

 SˆB 

CR IP T



The Numerator term is simplified as 0

a

5 1 2v2k     2 2

 k 2   2 exp   N 2 2 S  S N 

 2r  x   x  I 1   dx 2 2  N  2 

AN US





Compare (25) with (22) we get

(C.4)

(C.5)

M

k N2   S2 5 1 r p  2v  2k    ;   ;v ;  2 2 2 2  S N 2 N Then numerator becomes



3

ED

  r  2   k 2   2  2v  2 k    2    r  2  2 2   S2 N2 3    N2   N S S N   2    2v  2k      M    exp   2  3 1  2  2 v  2 k    ,   2   r   N  2(k N2   S2 )    S2 N2  k    S2   N N    2 2      

(C.6)

2

(C.7)

AC

CE

PT

 r   S2 N2 Multiply and divide with  2  and using (23) 2 2  k    N S  N 3    2v  2k      (2 v  2 k    2)  1   2  2 2  (2v  2 k   2)       N    2  2k    ; 2 ;  2      2 2 k     k     Where



  2  2k    

Where  

Y

2 1   r  2    r  2  2 2   S2 N2 1    r     S2 N2  S N     ; 2 ;  2  exp M         N2  2(k N2   S2 )   2v  2 k    32 , 12    N2  k N2   S2  2 k      N2   k N2   S2     

2

 N2



 S2  N2

Similarly denominator is simplified as

(C.8)

ACCEPTED MANUSCRIPT 3  3   2v  2k    (2 v  2 k  ) 3 2  3   2  2 2  (2v  2 k  2 )       N    2  2k  ;1;  2    (1) 2 k     k    

(C.9)

Substitute (27) and (26) in (24) it results in

CR IP T

3    2v  2k      (2 v  2 k    2)  1   2  2 2  (2v  2 k   2)       N    2  2k    ; 2 ;  2     2 2 k     k     3  3   2v  2k    (2 v  2 k  ) 3 2  3   2  2 2  (2v  2 k  2 )       N    2  2k  ;1;  2   (1) 2 k     k    

APPENDIX D ΔPESQ values [Table 6 to Table 9]

-10 -5 0 5 10 15

0.09 0.18 0.48 0.43 0.39 0.32

0.09 0.26 0.52 0.52 0.45 0.38

BECOCO

0.13 0.15 0.48 0.46 0.39 0.29

ADP

0.15 0.38 0.57 0.52 0.46 0.37

CDP

0.18 0.41 0.68 0.62 0.54 0.49

M

SNR in dB Wiener MOSIE

AN US

Table 6. Comparison of PESQ values for different estimators in case of Modulated pink noise AUP

0.14 0.35 0.53 0.42 0.41 0.32

CUP

0.2 0.45 0.76 0.69 0.56 0.57

CUPNG

CUPGG

0.23 0.47 0.87 0.79 0.73 0.64

0.31 0.55 0.92 0.82 0.77 0.66

CUPNG

CUPGG

0.14 0.41 0.74 0.73 0.7 0.64

0.17 0.43 0.81 0.79 0.76 0.65

CUPNG

CUPGG

0.13 0.26 0.47 0.54 0.56 0.5

0.17 0.28 0.57 0.58 0.59 0.53

SNR in dB Wiener MOSIE

0.06 0.19 0.41 0.39 0.38 0.31

BECOCO

0.04 0.18 0.36 0.36 0.33 0.28

PT

0.04 0.16 0.39 0.37 0.36 0.27

CE

-10 -5 0 5 10 15

ED

Table 7. Comparison of PESQ values for different estimators in case of pink noise ADP

0.06 0.24 0.46 0.45 0.42 0.38

CDP

0.07 0.3 0.57 0.59 0.53 0.48

AUP

0.05 0.18 0.43 0.44 0.43 0.35

CUP

0.11 0.35 0.67 0.68 0.66 0.61

AC

Table 8. Comparison of PESQ values for different estimators in case of Babble noise SNR in dB Wiener MOSIE -10 -5 0 5 10 15

0.03 0.16 0.29 0.27 0.27 0.2

0.08 0.18 0.35 0.29 0.28 0.26

BECOCO

0.03 0.11 0.25 0.26 0.27 0.22

ADP

0.06 0.13 0.27 0.34 0.32 0.28

CDP

0.07 0.16 0.38 0.43 0.41 0.38

AUP

0.04 0.1 0.36 0.35 0.35 0.33

CUP

0.07 0.15 0.41 0.44 0.47 0.42

Table 9. Comparison of PESQ values for different estimators in case of Non-stationary factory noise SNR in dB Wiener MOSIE

BECOCO

ADP

CDP

AUP

CUP

CUPNG

CUPGG

(C.10)

ACCEPTED MANUSCRIPT -10 -5 0 5 10 15

0.04 0.15 0.35 0.28 0.25 0.21

0.09 0.21 0.39 0.36 0.32 0.27

0.03 0.17 0.24 0.3 0.25 0.23

0.04 0.17 0.35 0.35 0.33 0.29

0.08 0.16 0.46 0.47 0.46 0.43

0.06 0.14 0.37 0.36 0.34 0.31

0.04 0.3 0.62 0.58 0.56 0.54

0.11 0.35 0.68 0.68 0.64 0.62

0.14 0.39 0.72 0.67 0.65 0.63

APPENDIX E

CR IP T

Δ STOI in [%] values [Table 10 to Table 13] Table 10. Comparison of STOI values for different estimators in case of Modulated pink noise

-10 -5 0 5 10 15

-1.29 -1.28 -1.11 -0.47 0.12 0.25

4.11 3.72 2.97 2.16 1.76 0.99

BECOCO

-0.79 -0.62 -0.49 -0.32 -0.19 -0.14

ADP

6.27 3.47 3.16 2.75 2.03 1.27

CDP

-1.38 -1.1 0.41 0.21 0.1 0.05

AUP

5.29 2.89 2.37 0.82 0.16 0.09

CUP

3.51 2.76 2.49 0.95 0.26 0.12

AN US

SNR in dB Wiener MOSIE

CUPNG

CUPGG

4.73 3.64 2.98 1.61 0.49 0.32

4.91 3.95 3.64 1.96 0.44 0.27

CUPNG

CUPGG

4.59 3.81 3.29 1.23 1.15 1.43

4.98 4.69 2.61 1.95 1.46 0.58

Table 11. Comparison of STOI values for different estimators in case of pink noise

-1.29 -1.41 -1.09 -0.42 0.14 0.21

3.72 3.48 2.58 1.95 1.49 0.81

-0.94 -0.72 -0.52 -0.39 -0.26 -0.15

ADP

5.81 3.74 2.95 2.61 1.98 1.12

CDP

-1.29 -1.72 0.28 0.15 0.04 0.05

M

-10 -5 0 5 10 15

BECOCO

ED

SNR in dB Wiener MOSIE

AUP

4.97 3.21 1.89 0.49 0.07 0.05

CUP

3.61 2.98 2.79 0.9 0.44 0.34

PT

Table 12. Comparison of STOI values for different estimators in case of Babble noise SNR in dB Wiener MOSIE

-1.89 -1.75 -1.49 -1.32 -1.11 -0.94

CE

-2.74 -2.23 -2.03 -1.89 -1.39 -1.31

AC

-10 -5 0 5 10 15

BECOCO

-1.54 -1.22 -1.08 -0.96 -0.69 -0.52

ADP

-2.13 -1.86 -1.49 -1.21 -1.02 -0.83

CDP

-4.87 -5.12 -0.38 -0.32 -0.29 -0.27

AUP

CUP

CUPNG

CUPGG

-2.51 -1.19 -0.09 0.29 0.23 0.07

-1.09 -0.62 -0.17 0.25 0.15 0.02

-0.35 -0.19 -0.01 0.37 0.24 0.01

-0.02 -0.01 0.09 0.06 0.05 0.03

Table 13. Comparison of STOI values for different estimators in case of Non-stationary factory noise SNR in dB Wiener MOSIE -10 -5 0 5 10 15

-1.28 -1.39 -1.12 -0.42 -0.11 -0.01

3.91 3.44 2.83 2.04 1.81 0.91

BECOCO

-0.84 -0.77 -0.59 -0.45 -0.28 -0.18

ADP

5.11 3.14 2.68 2.48 1.84 0.97

CDP

-1.18 -1.42 0.25 0.14 0.02 0.02

AUP

4.89 3.01 1.92 0.58 0.17 0.05

CUP

3.02 2.56 1.98 0.69 0.22 0.07

CUPNG

CUPGG

3.61 2.98 2.91 1.18 0.69 0.19

3.98 3.18 2.59 1.79 0.85 0.31

ACCEPTED MANUSCRIPT

APPENDIX . F Table 14. Comparision of SSNR values Averaged for four different noises Modulated Pink noise,Pink

-10 -5 0 5 10 15

1.31 1.8 5.62 7.12 10.76 14.03

4.87 5.32 7.18 8.71 11.84 14.79

BECOCO 0.56 0.91 5.87 7.21 10.82 14.11

ADP

CDP

3.88 5.05 7.11 8.59 11.76 14.68

3.87 5.04 7.1 8.57 11.76 14.67

AUP 4.11 5.32 7.89 9.21 12.45 15.22

CUP

1.69 2.31 5.95 8.23 11.21 13.78

AN US

SNR in dB Wiener MOSIE

CR IP T

Noise, Babble Noise, Non-Stationary Factory Noises for different estimators at different input SNRs CUPNG 2.12 4.27 6.54 8.42 11.77 14.23

CUPGG 3.01 5.21 6.94 8.97 12.15 14.68

Table 15. Comparision of PSNR values Averaged for four different noises Modulated Pink noise,Pink

Noise, Babble Noise, Non-Stationary Factory Noises for different estimators at different input SNRs 4.87 6.41 7.97 9.76 14.28 14.88

4.83 6.36 7.94 9.74 14.27 14.85

4.85 6.38 7.95 9.75 14.29 14.89

ADP

CDP

4.85 6.38 7.94 9.73 14.29 14.86

3.74 5.71 6.84 8.87 13.1 13.97

AUP

4.82 6.28 8.02 9.81 14.28 14.85

CUP

CUPNG

4.01 5.81 7.14 9.01 13.38 14.23

5.07 6.31 7.62 9.61 13.76 14.52

CUPGG 6.08 7.11 7.92 9.75 13.97 14.79

CE

APPENDIX . G

PT

ED

-10 -5 0 5 10 15

BECOCO

M

SNR in dB Wiener MOSIE

Table 16. Comparison of Subjective listening test values, averaged over 30 sentences from NOIZEUS

AC

speech corpus data base corrupted by Babble noise at different input SNRs of 5 dB, 0 dB, -5 dB Estimator

Wiener MOSIE (18) BECOCO (26) ADP (17) CDP (22) AUP (15) CUP [7] CUP-NG (10) CUP-GG (13)

SIG 2.5 2.3 2.7 3.2 3.3 2.9 3.4 3.4 3.6

5 dB SNR BAK OVRL 1.1 1.7 1 1.5 1.4 1.9 1.6 2.3 1.7 2.4 1.5 2.2 2.1 2.4 2.2 2.8 2.5 3.1

SIG 1.9 1.9 2.2 2.8 2.9 2.5 2.9 3 3.2

0 dB SNR BAK OVRL 1 1.3 1.4 1.2 1 1.6 1.2 1.9 1.4 1.8 1.1 1.8 1.7 2.1 1.9 2.5 2.3 2.9

SIG 1.2 1.3 1.3 1.5 1.3 1.4 1.7 2.2 2.5

-5 dB SNR BAK OVRL 1 1 1.1 1 1 1.2 1.1 1.1 1 1.2 1.1 1.1 1.3 1.5 1.4 1.7 1.7 1.9

ACCEPTED MANUSCRIPT

Table 17. Comparison of Subjective listening test values, averaged over 30 sentences from NOIZEUS

speech corpus data base corrupted by Modulated-Pink noise at different input SNRs of 5 dB, 0 dB, -5 dB

SIG 2.2 2.3 2.5 2.8 2.9 2.7 3.1 3.4 3.5

0 dB SNR BAK OVRL 1.2 1.6 1.1 1.3 1.4 1.9 1.5 2.2 1.6 2.4 1.5 2.4 2.1 2.6 1.9 2.7 2.5 3.1

-5 dB SNR SIG BAK OVRL 1.4 1 1.3 1.5 1.3 1.3 1.5 1.4 1.4 1.7 1.3 1.2 1.6 1.2 1.3 1.8 1.3 1.4 1.9 1.5 1.6 2.4 1.7 1.9 2.7 1.9 2.2

CR IP T

Wiener MOSIE (18) BECOCO (26) ADP (17) CDP (22) AUP (15) CUP [7] CUP-NG (10) CUP-GG (13)

SIG 2.7 2.5 2.9 3.1 3.3 3.1 3.4 3.7 3.9

5 dB SNR BAK OVRL 1.3 1.9 1.1 1.5 1.6 2.1 1.7 2.4 1.7 2.6 1.6 2.4 2.3 2.8 2.2 2.8 2.7 3.4

AN US

Estimator

Table 18. Comparison of Subjective listening test values, averaged over 30 sentences from NOIZEUS

ED

0 dB SNR SIG BAK OVRL 2 1.1 1.5 2.1 1 1.2 2.3 1.2 1.6 2.6 1.3 2 2.7 1.4 2.1 2.5 1.3 2.2 2.9 2 2.4 3.2 2.1 2.5 3.3 2.3 3

SIG 1.3 1.3 1.3 1.5 1.5 1.5 1.7 2.2 2.5

-5 dB SNR BAK OVRL 1 1.1 1 1.2 1.2 1.2 1.1 1.4 1.1 1.1 1.2 1.3 1.3 1.6 1.6 1.8 1.9 2.3

AC

CE

Wiener MOSIE (18) BECOCO (26) ADP (17) CDP (22) AUP (15) CUP [7] CUP-NG (10) CUP-GG (13)

5 dB SNR SIG BAK OVRL 2.6 1.2 1.7 2.4 1.1 1.3 2.7 1.4 1.8 2.8 1.5 2.3 3.1 1.6 2.5 3 1.4 2.2 3.2 2.1 2.7 3.5 2 2.6 3.7 2.3 2.9

PT

Estimator

M

speech corpus data base corrupted by Non-Stationary Factory noise at different input SNRs of 5 dB, 0 dB, -5 dB

REFERENCES

[1] R. Martin, ―Speech enhancement based on minimum mean-square error estimation and super-Gaussian priors,‖ IEEE Trans. Speech Audio Process., vol. 13, no. 5, pp. 845–856, Sep. 2005. [2] Breithaupt, C., Martin, R., 2003. MMSE estimation of magnitude squared DFT coefficients with supergaussian priors. In: Proceedings of the IEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Hong Kong, pp. 896–899 [3] Ephraim, Y., Malah, D., December 1984. Speech enhancement using a minimum mean square error short-time spectral amplitude estimator. IEEE Trans Acoust, Speech, Signal Process. Vol.32,Issue 6, pp.1109–1121

ACCEPTED MANUSCRIPT

CR IP T

[4] Fodor, B., Fingscheidt, T., August 29-Sept 2, 2011. MMSE speech spectral amplitude estimation assuming non-Gaussian noise. In: Proceedings of the 19th European Signal Processing Conference (EUSIPCO 2011). Barcelona, Spain, pp. 2314– 2318 [5] C.H.You, S. N.Koh, and S. Rahardja, ―β-orderMMSEspectral amplitude estimation for speech enhancement,‖ IEEE Trans. Speech Audio Process., vol. 13, no. 4, pp. 475–486, Jul. 2005 [6] C.Breithaupt, M.Krawczyk, and R.Martin, ―Parameterized MMSE spectral magnitude estimation for the enhancement of noisy speech,‖ in IEEE International Conference on Acoustics Speech signal Process., Apr.2008, pp.4037-4040. [7] Hendriks, R.C.Gerkman, T.Gensen, ― DFT Domain Based Single Microphone Noise Reduction for Speech Enhancement: A survey of the State-of-the-art.‖ Morgan & Claypool, Colorado, USA. [8] K. Paliwal, K. W´ojcicki, and B. Shannon, ―The importance of phase in speech enhancement,‖ Speech Commun., vol. 53, no. 4, pp. 465-494, Apr.2011 [9] Kuldip Paliwal, Kamil Wojcicki, Belinda Schwerin ―Single-Channel speech enhancement using spectral subtraction in the short-time modulation domain‖, Speech communications, Vol.52, Issue 5, May 2010 [10] M. Krawczyk and T. Gerkman, ―STFT phase reconstruction in voiced speech for an improved single- channel speech enhancement,‖ IEEE/ACM Trans. Audio, Speech, Lang, Process., vol. 22, no. 12, pp.1931-1940, Dec.2014 [11] P. Mowlaee and J. Kulmer, ―Harmonic phase estimation in single-channel speech enhancement using phase decomposition and SNR information,‖ IEEE/ACMTrans. Audio, Speech, Lang. Process., vol. 23, no. 9, pp. 1521–1532, Sep. 2015

AC

CE

PT

ED

M

AN US

[12] Gerkmann, T., Krawczyk, M., Feb. 2013. MMSE-optimal spectral amplitude estimation given the STFT-phase. IEEE Signal Process. Lett. 20 (2), 129– 132. [13] Erkelens, J.S., Hendriks, R.C., Heusdens, R., Jensen, J., Aug. 2007. Minimum mean-square error estimation of discrete Fourier coefficients with generalized Gamma priors. IEEE Trans. Audio, Speech, Lang. Process. 15 (6), 1741–175 [14] T. Gerkman, ―Bayesian estimation of clean speech spectral coefficients given a priori knowledge of the phase,‖ IEEE Trans, Signal Process., vol. 62, no. 16, pp. 4199-4208, Aug. 2014 [15] Ephraim, Y., Malah, D., December 1984. Speech enhancement using a minimum mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust, Speech, Signal Process. 32 (6), 1109–1121 [16] Chen, B., Loizou, P.C., 2005. Speech enhancement using a MMSE short time spectral amplitude estimator with Laplacian speech modelling. In: Proceedings of the IEEE International Conference Acoustics, Speech and Signal Processing (ICASSP). Philadelphia, Pennsylvania, USA, pp. 1097–1100. [17] Hendriks, R.C., Heusdens, R., 2010. On linear versus non-linear magnitude DFT estimators and the influence of superGaussian speech priors. In: Proceedings of. IEEE International Conference Acoustics. [18] Lotter, T., Vary, P., Jan. 2005. Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model. EURASIP J. Appl. Signal Process. 2005 (7), 1110–1126. [19] Martin, R., 2002. Speech enhancement using MMSE short time spectral estimation with gamma distributed priors. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Orlando, Florida, pp. 504–512 [20] Martin, R., 2005. Speech enhancement based on minimum mean square error estimation and supergaussian priors. IEEE Trans. Speech Audio Process 13 844 (5), 845–856 [21] Y. Lu and P. C. Loizou, ―A geometric approach to spectral subtraction,‖ Speech Commun., vol. 50, no. 6, pp. 453–466, 2008. [22] Y. Zhang and Y. Zhao, ―Real and imaginary modulation spectral subtraction for speech enhancement,‖ Speech Communication., vol. 55, pp.509–522, 2013. [23] On MMSE-Based Estimation of Amplitude and Complex Speech Spectral Coeficients Under Phase Uncertainty,‖ IEEE/ACM Transactions on Audio, Speech and Language Processing, VOL.24, No.12, Dec 2016 [24] Sundarrajan Rangachari , Loizou PhiliposC . A noise-estimation algorithm for highly non-stationary environments. Speech Commun 2006, Vol 48, Issue 2, pp:220–231 . [25] Kwon Kisoo , Jong Won Shin , Nam Soo Kim . NMF-based speech enhancement using bases update. Sig Process Lett IEEE 2015;22(4):450–4 . [26] Févotte Cédric , Nancy Bertin , Jean-Louis Durrieu . Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music anal- ysis. Neural Comput 2009;21(3):793–830 . [27] Gradshteyn, I.S., Ryzhik, I.M., 2007. Table of Integrals Series and Products, 7th ed. Academic, San Diego, CA, USA. [28] Evans, M., Hastings, N., Peacock, B., 2000. von Mises distribution. Statistical Distributions, 4th ed.. John Wiley & Sons, New York, pp. 191–192. [29] S. Gonzalez and M. Brookes, ―PEFAC – a pitch estimation algorithm robust to high levels of noise,‖ IEEE Trans. Audio, Speech, Language Process., vol. 22, no. 2,pp. 518–530, Feb. 2014. [30] Loizou, P.C., 2007. Speech Enhancement – Theory and Practice. CRC Press/Taylor & Francis Group, Boca Raton, FL, USA.

ACCEPTED MANUSCRIPT [31] Taal, C.H., Hendriks, R.C., Heusdens, R., Jensen, J., Sep. 2011. An algorithm for intelligibility prediction of timefrequency weighted noisy speech. IEEE Trans. Audio, Speech, Lang. Process. 19 (7), 2125–2136. [32] Gerkmann, T.,Martin, R., Nov. 2009. On the statistics of spectral amplitudes after variance reduction by temporal cepstrum smoothing and cepstral nulling.IEEE Trans. Signal Process. Vol.52, Issue 11, 4165–4174. [33] ITU-T P.835, 2003. Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm. ITU-T Recommendation P.835.

Author’s Profiles

CR IP T

Ravi Kumar Kandagatla was born in Markapur, India in 1988. He received the Bachelor of Technology degree from Jawaharlal Nehru Technological University, Kakinada in 2009 and received Master of Technology in Digital Electronics and Communication Systems from Jawaharlal Nehru Technological University, Kakinada in 2011. He is presently working as Assistant professor in Lakireddy Balireddy College of Engineering, Mylavaram, India. He has 6 years of teaching experience. He has 3 International publications. His interest area of research is speech processing

AC

CE

PT

ED

M

AN US

Dr. P. V Subbaiah was graduated in ECE from Bangalore University and received his Master’s Degree from Andhra University, Visakhapatnam in 1982. JNTU, Hyderabad has conferred Ph.D degree on P.V. Subbaiah for his work on Microwave Antenna Test Facilities in the year 1996. He has vast teaching experience of 33 years in different reputed Institutions as Assistant Professor, Associate Professor, Professor and Head of the Department and Principal. Presently he is the Professor of ECE at V.R. Siddhartha Engineering College, Vijayawada and discharging his duty as the Coordinator of World Bank funded TEQIP Project since 2014. His areas of interest include Microwave Antennas, Smart Antennas and Communications. He has published more than 100 research papers in National and International Journals and Conferences of repute. Ten research scholars have received their Ph.D degree under his supervision and presently guiding three more scholars for their Ph D. He is the Member and Fellow of various professional societies namely ISTE, BMESI, IETE and IE (I). He was recipient of Sir Thomas ward Gold Prize from The Institution of Engineers (India).

Speech enhancement using MMSE estimation of amplitude and complex speech spectral coefficients under phase-uncertainty

Speech enhancement using MMSE estimation of amplitude and complex speech spectral coefficients under phase-uncertainty

Recommend Documents