Accepted Manuscript
Speech Enhancement using MMSE estimation of Amplitude and Complex speech spectral coefficients under phase-uncertainty Ravi Kumar Kandagatla , P.V. Subbaiah PII: DOI: Reference:
S0167-6393(16)30357-0 10.1016/j.specom.2017.11.001 SPECOM 2497
To appear in:
Speech Communication
Received date: Revised date: Accepted date:
9 December 2016 2 November 2017 2 November 2017
Please cite this article as: Ravi Kumar Kandagatla , P.V. Subbaiah , Speech Enhancement using MMSE estimation of Amplitude and Complex speech spectral coefficients under phase-uncertainty, Speech Communication (2017), doi: 10.1016/j.specom.2017.11.001
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Speech Enhancement using MMSE estimation of Amplitude and Complex speech spectral coefficients under phase-uncertainty
CR IP T
Title/ Author/ ABSTRACT page Name of the Authors 1) RAVI KUMAR KANDAGATLA 2) P. V. SUBBAIAH
Title: Speech Enhancement using MMSE Estimation of Amplitude and Complex Speech Spectral Coefficients under Phase-Uncertainty
Corresponding / First Author:
AN US
Affiliation(s) and address(es) of the author(s)
RAVI KUMAR KANDAGATLA
M
Assistant Professor
Laki reddy Bali reddy Engineering College
ED
Mylavaram
Pin code: 521230
PT
Krishna District Andhra pradesh
CE
India
AC
Email Id:
[email protected]
Contact No: 9052544504
Co-Author: Dr. P.V. SUBBAIAH Professor of ECE Velagapudi Siddhartha Engineering College Kanuru Vijayawada
ACCEPTED MANUSCRIPT Krishna District
Andhra Pradesh E-mail address of the corresponding author:
[email protected]
M
AN US
CR IP T
Abstract: Traditional speech enhancement algorithms are based on amplitude only processing, in which the amplitudes of speech are processed and phase is left unprocessed. Recently, Short Time Fourier Transform (STFT) based single channel speech enhancement algorithms are developed by considering prior knowledge of phase and its uncertainty. The uncertain knowledge of the phase is obtained from the phase reconstruction algorithms. The goal of this paper is twofold. One is deriving Joint Minimum Mean Square Error (MMSE) estimate of Complex speech coefficients given Uncertainty Phase (CUP) by assuming the speech coefficients as Nagakami, Gamma and noise distribution as Generalized Gamma distribution (GGD). Also estimators of type, Amplitudes given Uncertainty Phase (AUP), which uses uncertain phase only for amplitude estimation and not for phase improvement are derived. Also Novel Phase- blind estimators are developed using Nagakami PDF / Gamma as speech priors and Generalized Gamma as Noise Prior. Finally comparison of all estimators using uncertain prior phase information is discussed and how initial phase information affects the enhancement process is analyzed with novel estimators. The proposed CUP estimator outperforms the existing algorithms in terms of objective performance measures Segmental Signal to Noise Ratio (SSNR), Phase Signal to Noise Ratio (PSNR), Perceptual Evaluation of Speech Quality (PESQ), and Short Time Objective Intelligibility (STOI). Secondly, a combination of statistical based approach and Non-negative Matrix Factorization (NMF) based speech enhancement technique, in which bases are update on-line is discussed . The proposed estimator gain is used with NMF and analyzes its performance using PESQ measure.
AC
CE
PT
ED
Index Terms / Keywords—Speech Enhancement, Von misses distribution, Generalized Gamma distribution, Phase Uncertainty, Non-negative Matrix Factorization.
I. INTRODUCTION
In mobile communication, the background noises (Train, Car, Cockpit...) shows adverse effect on transmitted clean speech signal. The goal of speech enhancement is to improve the quality and intelligibility of degraded speech signal by background noise. To challenge background noises, various speech enhancement techniques based on filtering approach, sub-space approach, statistical based approach have been proposed. Statistical-based speech enhancement algorithms which uses Bayesian estimators plays important role. In statistical-based estimators, the clean speech signal is estimated by
ACCEPTED MANUSCRIPT assuming priors for Discrete Fourier Transform (DFT) coefficients of speech and noise components. Bayesian estimators estimate either complex speech spectral coefficients, let it be S or estimate the realvalued clean speech amplitude, let it be A. Different estimators are proposed by assuming Gaussian and non-Gaussian PDFs as speech priors [1-4], also estimators which incorporates compressed amplitude [5,6] (for better perceptual enhanced speech) are proposed.
CR IP T
Most of the traditional speech enhancement methods processes spectral amplitudes [7], while the noisy phase is kept unprocessed. However, recent research, in [8-11], shows that the performance of enhancement algorithms may improve if the phase of clean speech is known. Estimation of clean speech phase is possible using clean speech amplitudes as in [11]. It is noted that better performance is obtained by incorporating the clean speech phase in the estimator proposed by Gerkman [12]. It is noted in [12] that incorporation of clean speech phase leads to improvement in Perceptual Evaluation of Speech Quality of 0.35 than to Wiener filter. In [13] it was proposed that the complex speech coefficients can be modelled as circular symmetric probability density function, as this assumption leads to uniform distribution of phase. By using the estimated clean speech phase as the prior knowledge of uncertain phase, CUP estimators with given STFT phase are derived in [14].
CE
PT
ED
M
AN US
The DFT coefficients of clean speech signal are assumed as Gaussian as in [15] and this assumption holds good for long duration analysis of frames. But the speech DFT coefficients are analyzed using short windows of around 20 ms – 30 ms. For short duration analysis, the assumption of superGaussian distributions provides better fit for speech priors. The Gamma prior assumption for Speech DFT coefficients, provide smaller Kullback - Leibler (KL) divergence. It is noted that improved results are obtained using Laplacian and Gamma as speech priors [16, 17]. With the motivation of this, in this paper, CUP-GG estimator (GG- (G)amma and (G)eneralized Gamma) which uses Gamma for Speech prior and Generalized Gamma as noise priori is proposed [18-20]. Also it is noted that the Nagakami distribution as speech prior is able to preserve speech spectral components and hence, here one more estimator CUP-NG ( (N)agakami and (G)eneralized Gamma) uses Nagakami as speech prior and Generalized Gamma as noise priori is proposed . In [12], CUP estimators are extended for different non-Gaussian distributions as noise priors. Bayesian estimators utilize Maximum Likelihood (ML), (Maximum APosteriori) MAP, Minimum Mean Square Estimation (MMSE). Among all the Bayesian estimators, MMSE estimators are able to preserve first harmonics well and hence, the proposed work uses estimation under MMSE criteria. Bayesian estimation of clean speech magnitude coefficients which utilizes MMSE criteria and compression function (for better perceptual results) is proposed in [5]. It is further extended in [12] by considering the estimated clean speech phase.
AC
The proposed CUP estimators uses non-Gaussian distribution as noise prior because it is noted in [19] that environmental noise is non-Gaussian and also provides less annoying of residual noise. At low frequenies the KL divergence error keeps smooth and suggests the Gamma model is the best assumption for noise prior. It is noted that the KL divergence error of the Rayleigh model is smaller and keeps smooth at the high frequency region. To have advantage in all frequency regions the gamma distribution is modified as Generalized Gamma Distribution (GGD), and analyzed different estimators using GGD as noise prior. In this work, estimators of complex-valued clean speech components S and estimators of clean speech spectral amplitudes A, (A is the Magnitude of S) are discussed. Optimal estimators like Wiener
ACCEPTED MANUSCRIPT filter in Gaussian sense and Short-Time Spectral Amplitude (STSA) as the optimal estimator of A, modifies only the spectral amplitudes and the phase is kept unchanged. Several advanced estimators which use a compression parameter for perceptually beneficial is incorporated in super-Gaussian
CR IP T
estimators. With a motivation of improved performance, obtained by considering phase information in spectral subtraction method [21] (reduced residual noise) and Modulation frequency domain processing methods [22], this paper discusses different estimators and the affect of initial phase consideration on quality of the enhanced speech. By assuming noise speech prior as Generalized Gamma PDF, a novel estimator AUP is derived as in [23] and two CUP estimators are developed using Nagakami PDF, Gamma PDF as Speech priors and Generalized Gamma Prior as noise prior. These are named as CUP-GG, where GG stands for Gamma and Generalized Gamma, CUP-NG, where NG stands for Nagakami and Generalized Gamma. And a novel phase Blind Estimator of Complex Coefficients (BECOCO) where the initial phase is completely uncertain or not available is also derived.
AN US
Recently, speech enhancement algorithms are developed for highly non stationary noises [24]. Also several speech enhancement algorithms are developed using Non-negative Matrix Factorization (NMF) [25, 26]. To have advantages of both statistical-based approaches and NMF approach, online methods has been proposed for bases update [25]. In this work, the performance of combination of derived estimator and NMF with online bases update is analyzed.
ED
M
This paper is organized as follows: Section-II gives the basic concept of phase-aware clean speech estimation and derived two different CUP estimators by assuming speech priori as Nagakami, noise priori as Generalized Gamma Distribution (GGD) and speech prior as Gamma PDF and noise prior as Generalized Gamma Distribution. Section – III derives the improved Novel Amplitude estimator, AUP. Section- IV derives the BECOCO. Section-V analyses the proposed estimators. In section VI the derived estimator is combined with NMF, section VII discusses results and Section VIII concludes. In Table.1 all the Proposed Estimators are listed.
PT
II. PRINCIPLES OF PHASE-AWARE CLEAN SPEECH ESTIMATION
CE
The noisy speech in STFT domain is assumed to be superposition of noise and clean speech, Y k, i S k, i N k, i
(1)
AC
Where, Y k , i indicates the noisy speech, S k , i indicates clean speech and N k , i indicates noise, and
k , i are indices of frequency index and time index respectively. The complex spectral coefficients can be represented in terms of phase and amplitude as follows, Y Re jY S Ae jS
(2)
W Ne jN Here Y, S, W are the spectral components of noisy, clean speech and noise respectively represented in j S j N jY terms of their amplitudes R, A, N and phases e , e , e respectively.
ACCEPTED MANUSCRIPT Assume that Prior knowledge of estimated clean speech phase is available. This is accomplished using the phase reconstruction algorithm obtained from [10]. From [23], the phase-aware estimator can be write as 2
f S p y a, p a p S
E f ( S ) y , S
0 0
2
S
S dS da
p y a,S p a p S S dS da
(3)
0 0
CR IP T
To solve above equation the priors of speech and noise has to be assumed. In this paper the speech spectral amplitudes A is assumed to be Nagakami PDF as k
k 2 2 k 2 k 1 pA a 2 a exp 2 a k S S
(4)
AN US
Where . is a gamma function and k is shape parameter. If k <1, k =1 allows speech priors as super Gaussian and Rayleigh respectively. Further noise is assumed to be Generalized Gamma PDF as
2 v pY S y s v N2
a 2 N
2 v 1
y a 2 exp 2 N
(5)
To solve (3) the only parameter required is the PDF of clean speech phase given estimated phase
M
p S S .As proposed in paper [14], it is modelled using von Mises distribution [28] as
(S S ) exp(k cos(S S )) / 2I 0 (k )
ED P
S
S
(6)
PT
Where is the concentration parameter and I n . is the modified Bessel function of the first kind (order
CE
n). As large values of , the true clean speech phase is likely to be close to initial phase estimate and for small values of , the initial phase estimate yields only little information about the true clean speech phase. Now all the required terms are available to solve (3) and hence we can derive the proposed estimators.
AC
2.1. Derivations for Proposed CUP Estimators CUP-NG The CUP estimator by assuming speech as Nagakami prior and noise as GGD is derived as in [14] Assume the speech prior as Nagakami PDF as k
k 2 2 k 2 k 1 pA a 2 a exp 2 a k S S Assume the noise prior as GGD as
(7)
ACCEPTED MANUSCRIPT 2 v pY S y s v N2
a 2 N
2 v 1
y a 2 exp 2 N
(8)
The CUP estimator is derived as in [23] and results in
Sˆ E A e j S y,S 2 ya exp 2 p S S dadS N S S k 2 v 1 2 v 1 2 2 k 2 2 v a ya 2 k 2 k 1 2 v a p S S dadS a exp a exp 2 2 2 2 2 2 0 0 k S2 S v N N v N N N S S
a e jS
k 2 2 v 2 k 2 k 1 0 0 k S2 a e xp S2 a v N2
a 2 N
2 v 1
2 v v N2
a 2 N
2 v 1
(9)
CR IP T
k
2
After Simplification [see Appendix A], the final equation is obtained as
Sˆ E A e jS y, S 2
r2 2 jS e exp exp D 2v 2 k 1 pS S S S dS 2 0 2 N 2 2 r 2 0 exp N2 exp 2 D2v2k 1 pS S S S dS
AN US
2k 2v 1 k 2 2 2 2k 2v 1 N S
2
(10)
2.2. Derivations for Proposed CUP Estimators CUP-GG
Assume the speech prior as Gamma PDF as
M
Similarly, the CUP estimator by assuming speech as Gamma prior and noise as GGD is derived as in [14]
1 a a k 1 exp k
ED
pA a
k
(11)
CE
PT
Assume the noise prior as GGD as
2 v pY S y s v N2
a 2 N
2 v 1
y a 2 exp 2 N
(12)
AC
The final equation for estimated clean speech signal is obtained as [see Appendix B]
Sˆ E A e j S y, S
2v k 1 2 2v k 1 N2
2 2
2 jS e exp D (2 v k 1) p S S S S dS 0 2 2 2 exp 0 2 D(2vk 1) pS S S S dS
(13)
Solving the integral over the phase s is difficult as it involves integration of parabolic cylindrical function D. . The term phase is having limits ranging from 0 to 2 and the equation (13) can be
ACCEPTED MANUSCRIPT solved numerically. The gains are pre-computed and tabulated with given shape parameters , k and compression parameter . The dimensions of the table are taken with respect to a priori, a posterior SNR, the concentration parameter and the phase difference.
III. PHASE – AWARE AMPLITUDE ESTIMATION
AUP estimates the compressed speech amplitudes. To derive novel estimator AUP, define f S S A
CR IP T
(14)
The parameter is perceptually beneficial for 0 < < 1 as given in [5]. Using this compression parameter , in phase blind amplitude estimators may yield better results. From [23], (13), the estimation the AUP estimator is obtained as 2
2 0 exp 2 D(2vk 1) pS S S S dS 2 2 exp 0 2 D(2vk 1) pS S S S dS
AN US
2v k 1 2 Aˆ 2v k 1 N2
2
Where D . is the parabolic cylinder function and the argument
(15)
S 2 y cos(S Y ) . Here, 2 N
M
S Y is the difference between the observed phase and true phase. Von Mises phase prior is substitute in above equation to obtain AUP. The integral is solved numerically by looking look up table which is
ED
computed off-line. And the enhanced speech signal amplitudes are obtained by using Aˆ
1
. The final
PT
clean speech estimate is obtained by combining the obtained amplitudes with the noisy phase as Sˆ AUP Aˆ exp jY
(16)
CE
From equation (16) it is noted that AUP uses noisy phase while reconstruction of estimated signal, thus AUP only modifies / enhances the spectral amplitudes. The motivation here is the unnecessary artifacts arising during phase processing may be excluded.
AC
Let us take a look on how AUP performs for (known phase) and 0 (Phase blind). When
the initial phase is treated as estimated phase and hence uncertainty is neglected. As , the
von Mises distribution approaches to delta function i.e., p S S S S . It indicates that the von Mises distribution is non-zero only when S S , and yields Amplitude estimator given Deterministic Phase information (ADP) is obtained as [23]
2v k 1 2 Aˆ 2v k 1 N2
2
D (2v k 1) D (2v k 1)
(17)
ACCEPTED MANUSCRIPT By choosing , the uncertainty in estimation of phase is neglected. While reconstruction, the noisy phase is consider along with the Amplitude estimator AUP (as they estimate only spectral amplitudes). Now coming to 0 case, von Mises distribution leads to phase-blind estimator as initial phase does not provide useful information and the von Mises distribution reduces to a uniform distribution between limits and as 1 2 . Solving (3) for f ( S ) S
A , a parametric amplitude estimator is
obtained as in [6]
Aˆ E A y
(18)
The above phase blind estimator is named as MMSE estimation with Optimizable Speech model and
e Ak , Aˆk Ak Aˆk with
CR IP T
Inhomogeneous Error criterion (MOSIE). Here the error function
an amplitude
compression parameter β is used (Which is non-linear or inhomogeneous distortion).
PHASE – AWARE
ESTIMATION OF COMPLEX COEFFICIENTS PHASE-AWARE AMPLITUDE ESTIMATION
AND
RELATIONS
AN US
IV.
TO
The phase-aware CUP estimator which estimates the compressed complex speech spectral coefficients is obtained as
M
f S S A e jS
(19)
And we obtain the estimated signal using (10) as
PT
ED
2v k 1 2 Sˆ E A e jS y, S 2v k 1 N2
2 2
2 jS e exp D (2 v k 1) p S S S S dS 0 2 2 2 0 exp 2 D(2vk 1) pS S S S dS
(20)
It is noted that the above equation is similar to AUP but contains e jS . The final estimated speech is obtained using
AC
CE
SˆCUP Sˆ
1
Sˆ Sˆ
(21)
Note that the in CUP estimator the phase is not the initial phase estimate. Similar to ADP, the complex estimator CUP for known phase, i.e., and for phase blind i.e., 0 has to discuss. When CUP reduces to
SˆD E A e jS y, S E A y, S e jS Aˆ D e jS
(22)
Which is named as Complex spectral speech coefficients given Deterministic Phase information (CDP) . Comparing the AUP and CUP for case of full certainty in the initial phase ( ), it is noted that the estimated clean speech amplitudes are same. It is clear that CUP estimates the complex coefficients of clean speech, whereas AUP estimates the amplitudes. When the phase is perfectly known, the CUP phase estimates is clean speech phase and AUP uses noisy phase (for reconstruction).
ACCEPTED MANUSCRIPT The closed form solution for estimator when 0 yields as in [23] by assuming speech prior as Nagakami and Noise prior as GGD yields an estimator phase-Blind Estimator of Complex Coefficients BECOCO. From ([23], Eq.17 and Eq. 18) we write as
e j
Y
2ra cos j sin exp cos d 2 Y
2 N
Y
(23)
And also from ([23], Eq. 19)
In p
1 2
2
o
cos nz exp p cos z dz (24)
CR IP T
Thus from ([23] Eq. 17) we obtain the estimation as
k N2 S2 2 2r a 2 v 2 k 2 a exp a I1 da 2 2 2 0 S N N j Y e k N2 S2 2 2r a 2v 2 k 2 exp a I0 da 2 2 2 0 a S N N
AN US
SˆB
(25)
After simplification the result is obtained as [see Appendix C]
(26)
ED
M
3 2v 2k (2 v 2 k 2) 1 2 2 2 (2 v 2 k 2) N 2 2k ; 2 ; 2 k 2 2 k 3 3 2v 2k (2 v 2 k ) 3 2 3 2 2 2 (2 v 2 k 2 ) N 2 2k ;1; 2 (1) 2 k k
Where . ;. ; . is the confluent hypergeometric function,
Y
2
N2
is the a posterior SNR and
PT
priori SNR. Here, N2 is the noise variance, S2 is the clean speech variance and Y
2
S2 is the a N2
is power related to
CE
noisy signal. The final estimator is obtained by reversing the compression. SˆB SˆB
1
exp jSˆB SˆB
1
exp jY
(27)
AC
This estimator is named as the phase-Blind Estimator of Complex Coefficients (BECOCO). Note that for 0 , i.e., for phase-Blind, the complex magnitudes for CUP and spectral amplitudes of AUP is not same. All the derived estimators are listed in Table. 1., along with final equation.
ACCEPTED MANUSCRIPT
Table.1. Proposed Estimators and their Mathematical representations
Estimators MOSIE (18)
Mathematical Equation Gamma as Speech prior and Generalized Gamma as Noise Prior
Aˆ B E A y Nagakami as Speech prior and Generalized Gamma as Noise Prior
CR IP T
BECOCO(26)
ADP (17)
AN US
3 7 2v 2k ( ) 7 1 2 2 ( 2 ) 2 N 2 2k ; 2 ; 2 2 2 k k SˆB Y 3 2v 2k 3 2 2 2k ;1; 2 (1) 2 k Gamma as Speech prior and Generalized Gamma as Noise Prior
2v k 1 2 Aˆ 2v k 1 N2
D (2v k 1) D (2v k 1)
Gamma as Speech prior and Generalized Gamma as Noise Prior
M
CDP (22)
2
AUP (15)
ED
SˆD E A e jS y,S E A y,S e jS AˆD e jS Gamma as Speech prior and Generalized Gamma as Noise Prior
2 exp 0 2 D(2vk 1) pS S S S dS 2 2 exp 0 2 D(2vk 1) pS S S S dS
Nagakami as Speech prior and Generalized Gamma as Noise Prior
AC
CUP-NG (10)
CE
PT
2v k 1 2 Aˆ 2v k 1 N2
2 2
2k 2v 1 k Sˆ 2 2 2 2k 2v 1 N S
2
2
r2 2 jS e exp exp 2 D 2 v 2 k 1 p S S S S dS 0 2 N 2 2 r 2 exp exp 0 N2 2 D 2v2k 1 pS S S S dS
ACCEPTED MANUSCRIPT CUP-GG (13)
Gamma as Speech prior and Generalized Gamma as Noise Prior 2
2v k 1 2 Sˆ 2v k 1 N2
2
2 jS e exp D (2v k 1) pS S S S dS 0 2 2 2 0 exp 2 D(2vk 1) pS S S S dS
V. ANALYSIS OF ESTIMATORS
CR IP T
Analysis of estimators for various values of concentration parameter are discussed here. Firstly consider phase-Blind case. When k =0, a special case of CUP estimator and AUP estimators are resulted, and those are named as BECOCO and MOSIE. It is noted that the 0.5 the BECOCO and MOSIE
ED
M
AN US
provides reduced attenuation and protects speech components. Secondly, consider the phase-aware case i.e., 0 . Under 0 , there exists some certainty in phase. The behaviour of estimators for different values of concentration parameter k , are plotted in Fig.1. Here to understand the behaviour of estimators the values of concentration parameter k is varied from 0 to . k value is increasing from zero to infinity indicates that we are moving to increasing certainty in phase. The characteristic curve shows the that the CUP-GG, AUP-GG provides better performance than the CUP estimator because of reduced attenuation as proposed in [ 23]. Observe that as k the CUP and AUP responses are coincide, it indicates that the result is independent of concentration parameter at such large value. It is noted from Fig.1 that for smaller values of k there exists clear difference in amplitudes of CUP, AUP, and as k is changing in between values from 0 to i.e at k =10, k =20, the difference in CUP and AUP input and output characteristics are decreasing. This indicates the importance of initial phase consideration on the input and output characteristics of the estimators.
PT
From Fig.1 it is noted that the concentration parameter decides the uncertainty in phase. So here by considering the concentration parameter k =4 and the input and output curves are plotted for different phase differences. In Fig.2. we present the input output characteristics of CUP and AUP at different phases of S Y and fixed uncertainty k =4. It is noted that if the phase difference is more then the more suppression is resulted in input and output characteristics curve. Hence, observe the cases where,
CE
phase difference 0 , 1/ 2 in Fig.2, the AUP and CUP cures are at some distance where as for the case 2 / 3 the AUP and CUP curves are closer than the cases 0 , 1/ 2 .
AC
This indicates that the suppression or attenuation is more under large phase difference values. Also it is noted that under phase difference considerations the CUP is more aggressive than AUP which is observed both theoretically and practically. The plots of (on X-axis) normalized noise Input (normalized with noise variance) versus normalized estimated amplitude (on Y-axis) are obtained as follows.
ACCEPTED MANUSCRIPT 3
3 2.5 2 1.5 1 0.5 0
CUP AUP CUP-GG AUP-GG
CUP AUP CUP-GG AUP-GG
2.5 2 1.5 1 0.5 0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0
0.5
Normalized Noise Input (Normalized with Noise Variance)
2
CUP AUP CUP-GG AUP-GG
1.5 1
1 0.5
0
0 1
1.5
2
2.5
3
3.5
4
2.5
3
3.5
4
4.5
5
4.5
5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
AN US
0.5
2
CUP AUP CUP-GG AUP-GG
1.5
0.5 0
1.5
Normalized Noise Input (Normalized with Noise Variance)
2.5 2
1
CR IP T
0
Normalized Noise Input (Normalized with Noise Variance) Figure 1. Input and Output Characteristics of AUP and CUP for
Normalized Noise Input (Normalized with Noise Variance)
1, 1 , and the phase difference of 0.45
and for Concentration parameter k=0, k=10, k=20, k= Infinity (Left to Right) 3
3
2 1.5 1 0 0
0.5
1
1.5
PT
0.5 2
2.5
3
3.5
4
CUP AUP CUP-GG AUP-GG
2.5 2 1.5
ED
2.5
M
CUP AUP CUP-GG AUP-GG
1 0.5 0 4.5
5
0
AC
CE
Normalized Noise Input (Normalized with Noise Variance) 3 2.5 2 1.5 1 0.5 0
0
0.5
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Normalized Noise Input (Normalized with Noise Variance)
CUP AUP CUP-GG AUP-GG
1
1.5
2
2.5
3
3.5
4
4.5
5
Normalized Noise Input (Normalized with Noise Variance)
Figure. 2. Input and output characteristics of AUP and CUP for k=4 and at different phase differences
0 , 1/ 2 , 2 / 3 (From Left to Right) (Note: In Figure 1 and Figure 2) Y-axis label it is Normalized Estimated Amplitude)
ACCEPTED MANUSCRIPT VI. COMBINING DERIVED ESTIMATOR WITH NMF APPROACH
CR IP T
From the above discussion, statistical speech enhancement methods assume different PDFs for speech and noise. And it is observed that no training is needed a priori. But the statistical approaches assume stationary noise and hence it may reduce the performance under highly non-stationary noises. Other approaches like NMF, uses a prior information from speech and noise data bases and thus performs well even under non-stationary noises. Thus combining the statistical approaches and NMF may provide better results as it takes the advantage of both statistical approach and NMF approach. The combination of statistical and NMF approach is proposed in [25 ]. In this paper, the derived CCUP-GG estimator gain obtained by statistical approach is used in place of estimator gain function.
Y ' (t )
Y (t ) Statistical Model-based Enhancement
AN US
The proposed speech enhancement method consists of two stages as in [25 ] . In the first stage statistical model-based speech enhancement method proposed in [24] is used. In the second stage NMF is used and estimator gain function is replaced by proposed CUP-GG estimator. Also online up-date of noise and speech basis is implemented as proposed in [25]. The process of proposed method is given in Fig
ˆ (t ) , S
M
NMF
ˆ (t ) N
Sˆ Final t
(t ),
SNR Estimation
CUP-GG Gain
(t )
ED
Wn t 1 , Ws t 1
SPP estimation
s t ,
Maximum update rate determinatio n
PT
p (t )
AC
CE
On-line bases update
n t Figure.3. Block diagram for proposed method
Different NMF techniques are compared with proposed NMF based speech enhancement that uses online-bases update. For comparison, objective performance measure Perceptual Evaluation of Speech Quality (PESQ) is used. The experimental results from Tables. [2 to 5] shows that derived estimator CUP-GG provides better performance when it is combined with NMF approach.
ACCEPTED MANUSCRIPT VII. RESULTS To evaluate and compare the performance of the proposed estimators, 30 sentences (15 by male speakers and 15 by female speakers) are taken from NOIZEUS speech corpus database (phonotenically Balanced) and are corrupted by additive noise. Hanning window with 75 % overlap is used for analysis and synthesis of speech signal. In this work, Modulated pink noise, Pink noise, Babble noise and NonStationary factory noise are added to clean speech signal and corrupted at different input SNRs of -10 dB, -5 dB, 0 dB, 5 dB, 10 dB. The prior phase is estimated as in [10], which uses the fundamental frequency. The speech fundamental frequency is obtained from the pitch estimation algorithm PEFAC [29] on the noisy observation. The uncertainty parameter , is set for each time-frequency bin using [29] as
4 PV l , for kf s / N 4 kHz 2 PV l , for kf s / N 4 kHz
Where
CR IP T
k,l
(28)
PV l , is the probability that signal segment l contains voiced speech. From the above equation
M
AN US
it is clear that higher probability of having voiced speech is available then it is taken as the phase estimate and the value of concentration parameter is increased. Noise variance and speech variances are obtained using Speech Presence Probability (SPP) approach and decision directed approach [30] with a smoothing factor of 0.65 (to reduce speech distortions) respectively. Amplitude effects are evaluated using Signal to Noise Ratio (SNR) and Segmental Signal to Noise Ratio (SSNR), whereas phase effects are evaluated using Phase Signal to Noise Ratio (PSNR). The PSNR is evaluated using ([23], Eq.22). To evaluate perceptual quality of enhanced speech signal Perceptual Evaluation of Speech Quality (PESQ) performance measure is evaluated [30]. To understand about intelligibility performance, Short-Time Objective Intelligibility (STOI) [31] are ploted for different estimators.
CE
PT
ED
In Fig.4, the comparison of ΔPESQ values averaged over 30 sentences for proposed estimators under Modulated-pink noise, pink noise, Babble noise, Non-stationary factory noise is shown graphically. Where Δ indicates the difference between the PESQ value obtained for processed and unprocessed signals. The respective numerical results are given in Appendix. D (Table. 6 to Table. 9). Experimental results shows that the proposed CUP estimators (10), (13) provides better improvement in PESQ values than CUP proposed in [14]. Due to consideration of uncertainty about the prior phase, rather than directly employing estimated phase, there is significant improvement in PESQ values even at low SNRs. It is observed that estimators (10), (13) reduce the quality degradation at high SNRs. It is observed the high value of PESQ improvement is observed for proposed estimators.
AC
CUP-NG estimator (10) improves performance than CUP [14], with PESQ improvement of 0.23, 0.47, 0.87, 0.79, 0.73, 0.64 at respective SNRs of -10 dB, -5 dB, 0 dB, 5 dB, 10 dB, 15 dB for Modulatedpink noise. PESQ improvement of 0.14, 0.41, 0.74, 0.73, 0.7, 0.64 at respective SNRs of -10 dB, -5 dB, 0 dB, 5 dB, 10 dB, 15 dB for Pink noise is obtained. PESQ improvement of 0.13, 0.26, 0.47, 0.54, 0.56, 0.5 at respective SNRs of -10 dB, -5 dB, 0 dB, 5 dB, 10 dB, 15 dB for Babble noise is obtained. PESQ improvements of 0.11, 0.35, 0.68, 0.68, 0.64, 0.62 at respective SNRs of -10 dB, -5 dB, 0 dB, 5 dB, 10 dB, 15 dB for Non-stationary factory noise is obtained. It is noted that high PESQ value is obtained for Modulated pink noise at 0 dB. . It is noted that low PESQ improvement is obtained for babble noise at 0 dB. The proposed estimator CUP-GG (13), provides better results than to estimator (10) and is observed in Fig. 3. It is noted that high PESQ value of 0.92 is obtained for Modulated pink noise at 0 dB. Babble noise. The experimental results shows that the proposed CUP-GG (13) estimator, with speech as Gamma
ACCEPTED MANUSCRIPT and noise as Generalized gamma provides better PESQ values even at low SNRs for different noise conditions. CUP-GG (13) improves the performance than CUP [14], with PESQ improvement of 0.31, 0.55, 0.92, 0.82, 0.77, 0.66 at respective SNRs of -10 dB, -5 dB, 0 dB, 5 dB, 10 dB, 15 dB for Modulated-pink noise. PESQ improvement of 0.17, 0.43, 0.81, 0.79, 0.76, 0.65 at respective SNRs of -10 dB, -5 dB, 0 dB, 5 dB, 10 dB, 15 dB for Pink noise is obtained. PESQ improvement of 0.17, 0.28, 0.57, 0.58, 0.59, 0.53 at respective SNRs of -10 dB, -5 dB, 0 dB, 5 dB, 10 dB, 15 dB for Babble noise is obtained. PESQ improvements of 0.14, 0.39, 0.72, 0.67, 0.65, 0.63 at respective SNRs of -10 dB, -5 dB, 0 dB, 5 dB, 10 dB, 15 dB for Non-stationary factory noise is obtained.
AN US
CR IP T
In Fig. 5. the average values of 30 sentences of Δ STOI in %, are plotted for four different noises. Here Δ indicates the difference in processed and unprocessed signals which are mentioned in percentage. The proposed estimator CUP-GG provides better STOI improvement. It is noted that the STOI improvement for CUP estimators are decreasing with increased input SNR. CUPP-GG (13) provides better STOI improvement up to 5 percent. It is noted that for Babble noise (13) results in negative STOI values. This may overcome by using cepstrum smoothing as in [32]. It is noted that improved Novel estimator MOSIE is providing positive STOI and also at low input SNRs, AUP estimator provides better STOI values up to difference of 0.15 to 0.4 for different noises considered in this work. This STOI values higher for AUP than CUP indicates the intelligibility is improve with AUP estimator at low input SNRs and it shows the importance and artifacts under phase consideration. It is noted that the Proposed BECOCO is showing negative values. The respective numerical values of STOI improvement is given in Appendix. E (Table. 10 to Table. 13)
AC
CE
PT
ED
M
In Fig. 6, the results are plotted by averaging the SSNR values obtained for four noises Modulated Pink noise, Pink noise, Babble Noise, Non-stationary factory noise and 30 sentences. It is noted that the quality of enhanced signal is improved with MOSIE and AUP estimators than CUP. Thus better phase estimation and reconstruction methods may move the enhancement process further. In Fig. 5, the results are plotted by averaging the Phase Signal to Noise Ratio (PSNR) values and it is noted that proposed CUP-GG provides better values and the improved novel estimators also shows improved values for MOSIE, AUP, BECOCO. The respective numerical values of Avg SSNR and Avg PSNR are is given in Appendix. F (Table. 14 to Table. 15)
1
1
0.8
0.8 Δ PESQ
Δ PESQ
ACCEPTED MANUSCRIPT
0.6 0.4
0.6 0.4 0.2
0.2
0
0 -10
-5
0
5
10
-10
15
-5
0.6 0.4
Wiener BECOCO (26) CDP (22) CUP [14] CUP-GG (13)
0.2 0
0
5
10
10
15
-10
15
Mosie (18) ADP (17) AUP (15) CUP-NG (10)
CR IP T
Δ PESQ
Δ PESQ
0.8
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -5
5
Input Signal SNR in dB
Input Signal SNR in dB
-10
0
-5
0
5
10
15
Input Signal SNR in dB
AN US
Input Signal SNR in dB
Figure. 4. Comparison of PESQ values for different input SNRs of -10 dB, -5 dB, 0 dB, 5 dB, 10 dB, 15 dB, and noise types (from left to right ) Modulated Pink noise, Pink Noise Babble, Noise, Non-Stationary Factory Noise averaged over 30 sentences (15 spoken by 3 female and 15 by male speakers) from NOIZEUS speech corpus data base.
M
6 4 2 0 -10
-2
-5
0
5
10
15
Δ STOI in [%]
8
ED
Δ STOI in [%]
8
6 4 2 0
-2
-10
5
Wiener BECOCO (26) CDP (22) CUP [14] CUP-GG (13)
Input Signal SNR in dB
5
10
15
10
15
Input Signal SNR in dB 6
10
15
Mosie (18) ADP (17) AUP (15) CUP-NG (10)
Δ STOI in [%]
PT 0
CE
-5
AC
Δ STOI in [%]
-10
0
-4
Input Signal SNR in dB 1 0 -1 -2 -3 -4 -5 -6
-5
4 2 0 -10
-5
0
5
-2 Input Signal SNR in dB
Figure. 5. Comparison of STOI values for different input SNRs of -10 dB, -5 dB, 0 dB, 5 dB, 10 dB, 15 dB, and noise types (from left to right ) Modulated Pink noise, Pink Noise Babble, Noise, Non-Stationary Factory Noise averaged over 30 sentences (15 spoken by 3 female and 15 by male speakers) from NOIZEUS speech corpus data base.
16
14
14
12
Avg PSNR
16
10 8 6
Wiener
Mosie
BECOCO
ADP
12
CDP
AUP
10
CUP
CUP-NG
8 4
2
2
0
0 -5
0
5
10
-10
15
-5
0
5
10
15
Input Signal SNR in dB
Input Signal SNR in dB
CR IP T
-10
CUP-GG
6
4
Figure. 6. Comparison of SSNR, PSNR values Averaged for four different noises Modulated Pink noise, Pink Noise, Babble Noise, Non-Stationary Factory Noises and Average of 30 sentences from NOIZEUS speech corpus data base at different input SNRs of -10 dB, -5 dB, 0 dB, 5 dB, 10 dB, 15 dB (from left to right).
7.1. Subjective Listening tests and its analysis
PT
ED
M
AN US
Subjective listening tests are held in quiet room according to ITU-T recommendation P.385 [33]. In this method the listener is instructed to rate the processed speech signal by attending on three aspects of speech signal alone, background noise alone and the overall effect using Mean Opinion Score (MOS). A group of 14 listeners and of normal-hearing (age 18-20 years) are participated in the test. Sentences from Noizeus data base (phonetically balanced) were given to listeners over high end headphones. As mentioned in paper [33], scale of signal distortion (SIG) is considered as 5- Very natural, no degradation; 4- Fairly natural, little degradation; 3- Somewhat natural, somewhat degraded; 2- Fairly unnatural, fairly degraded; 1- Very unnatural, very degraded. Scale of background intrusiveness (BAK) is considered as 5- Not noticeable; 4- Somewhat noticeable, 3- Noticeable but not intrusive, 2- Fairly conspicuous, somewhat intrusive, 1- Very conspicuous, very intrusive. The overall effect is taken using the Mean Opinion Score (OVRL) as 1-bad; 2-poor; 3-fair; 4-good; 5-excellent. Results are analyzed for the signals corrupted in Babble noise, Modulated Pink noise, Non-Stationary factory noises.
CE
Subjective test results shows that, the proposed estimators CUP-NG (10), CUP-GG (13) obtained by considering phase information there is significant improvement under low SNRs. The listeners noted that there is reduction in quality degradation at high SNRs. It is noted that when listeners are attended to processed signal corrupted by babble noise the estimator CUP-GG provides better values of 3.6 for SIG, 2.5 for BAK and 3.1 for OVRL at 5 dB SNR, which indicates fair and some what natural sounding. It is also noted that, better is the case with other two noises listed . It is noted that objective performance measure PESQ is better for proposed CUP-GG, the subjective listening test also indicating some what natural sounding. When the SNR is moving to 0 dB, -5 dB proposed CUP-GG (13) gives fair and some natural sound is seen and for CDP, MOSIE some intrusiveness is observed as shown in Fig.7. When listener is attended to processed signal under Modulated pink noise, it is observed a better performance than in case of Babble noise of 3.9 for SIG, 2.7 for BAK, 3.4 for OVRL as shown in Fig.8. This result is obvious as the PESQ and STOI values are better than to babble noise. Under Non-Stationary Factory case the results are better than the case of Babble as shown in Fig. 8. The phase based amplitude modification in proposed estimator CUP-GG (13) and the results of estimators under phase uncertainty MOSIE, AUP, BECOCO shows improved results than to traditional approaches.
AC
Avg SSNR
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT 3.5
SIG BAK OVRL
3
SIG BAK OVRL
3 2.5
2
2
1.5
CR IP T
CUP-GG (13)
CUP-NG (10)
ADP (17)
CUP [7]
0
AUP (15)
0 CDP (22)
0.5 BECOCO (26)
0.5 MOSIE (18)
1
Wiener
1
CUP-GG (13)
CUP-NG (10)
CUP [7]
AUP (15)
CDP (22)
ADP (17)
BECOCO (26)
MOSIE (18)
1.5
Wiener
SIG BAK OVRL
2.5
Wiener MOSIE (18) BECOCO (26) ADP (17) CDP (22) AUP (15) CUP [7] CUP-NG (10) CUP-GG (13)
4 3.5 3 2.5 2 1.5 1 0.5 0
Figure. 7. Comparison of Subjective listening test values, averaged over 30 sentences from NOIZEUS speech corpus data base corrupted by Babble noise at different input SNRs of 5 dB, 0 dB, -5 dB (from left to right).
2
SIG BAK OVRL
1.5 1 0.5 0 Wiener MOSIE (18) BECOCO (26) ADP (17) CDP (22) AUP (15) CUP [7] CUP-NG (10) CUP-GG (13)
CUP-GG (13)
CUP [7]
AUP (15)
CDP (22)
ADP (17)
AN US
2.5
ED
M
BECOCO (26)
Wiener
CUP-GG (13)
CUP-NG (10)
CUP [7]
AUP (15)
CDP (22)
ADP (17)
BECOCO (26)
MOSIE (18)
Wiener
3
SIG BAK OVRL
CUP-NG (10)
4 3.5 3 2.5 2 1.5 1 0.5 0
SIG BAK OVRL
MOSIE (18)
4.5 4 3.5 3 2.5 2 1.5 1 0.5 0
3.5
SIG BAK OVRL
3 2.5
3 2.5 2
2
1.5
1.5
1
1
0.5
0.5
0 CUP-GG (13)
CUP-NG (10)
CUP [7]
AUP (15)
CDP (22)
ADP (17)
BECOCO (26)
MOSIE (18)
Wiener
CUP-GG (13)
CUP-NG (10)
CUP [7]
0 AUP (15)
SIG BAK OVRL
Wiener MOSIE (18) BECOCO (26) ADP (17) CDP (22) AUP (15) CUP [7] CUP-NG (10) CUP-GG (13)
CE CDP (22)
ADP (17)
BECOCO (26)
MOSIE (18)
AC
SIG BAK OVRL
Wiener
4 3.5 3 2.5 2 1.5 1 0.5 0
PT
Figure. 8. Comparison of Subjective listening test values, averaged over 30 sentences from NOIZEUS speech corpus data base corrupted by Modulated-Pink noise at different input SNRs of 5 dB, 0 dB, -5 dB (from left to right).
Figure. 9. Comparison of Subjective listening test values, averaged over 30 sentences from NOIZEUS speech corpus data base corrupted by Non-Stationary Factory noise at different input SNRs of 5 dB, 0 dB, -5 dB (from left to right).
ACCEPTED MANUSCRIPT 7.2. Combined approach of Proposed estimator with NMF The proposed estimator is combined with NMF as proposed in [25] . The experiments are conducted by taking samples from database. The clean speech signals are corrupted by different noises like, street noise, babble noise, factory noise, babble noise, F16 noise, M109 noise, White noise. The experiments are conducted under matched noise. The STFT is computed using Hanning window with frame size of 1024 and 75% of overlap. The parameter values used were
smax 0.4, nmax 0.5, s 0.4, n 0.9, e 0.97 and are taken from [25]. The proposed method is
CR IP T
compared with the benchmark algorithms. The experimental results show that the proposed method outperforms the benchmark algorithms in terms of PESQ measure. The numerical values are listed in [Table.2 to Table.5]. SE (Statistical-model based Enhancement) is speech enhancement method taken from [24]. NMF based method is taken from [26], SE+NMF+OU (OU-online-bases update) method is proposed in [25]. Now the proposed combined approach of NMF and statistical-model based approach as in [25] with proposed estimator gain SE+NMF+OU+CUP-GG is analyzed.
AN US
Tables. [2-5]. Performance comparison of PESQ values for different NMF algorithms with the proposed method by combining statistical based method and NMF for matched noise bases.
Algorithm
NMF [26] SE+NMF SE+NMF+OU[25]
Algorithm
PT
SE+NMF+OU+CUP-GG
ED
M
SE[ 24]
Modulated Pink Noise SNR in dB -10 -5 0 5 10 15 0.29 0.56 1.32 2.31 2.52 2.91 0.1 0.24 1.11 2.01 2.31 2.7 0.31 0.61 1.38 2.39 2.61 3.01 0.33 0.67 1.42 2.48 2.69 3.12 0.39 0.78 1.67 2.68 2.95 3.31
-10
NMF [26] SE+NMF
0.15 0.08 0.17 0.18 0.21
CE
SE [24]
AC
SE+NMF+OU [25]
SE+NMF+OU+CUP-GG
Algorithm
-10 SE [24] NMF [26] SE+NMF SE+NMF+OU [25] SE+NMF+OU+CUP-GG
0.29 0.13 0.31 0.33 0.35
Street Noise SNR in dB -5 0 5 10 15
0.52 0.25 0.57 0.59 0.67
F-16 Noise SNR in dB -5 0 5 10 15
-10
2.2 1.97 2.28 2.34 2.58
2.27 1.98 2.32 2.23 2.63
-10
Factory Noise SNR in dB -5 0 5 10 15
0.17 0.1 0.19 0.19 0.23
1.19 0.98 1.23 1.29 1.46
1.91 1.75 2.02 2.12 2.31
Babble Noise SNR in dB -5 0 5 10 15 0.71 1.57 2.19 2.45 2.87 0.48 1.21 1.98 2.12 2.38 0.78 1.62 2.26 2.51 2.95 0.81 1.69 2.34 2.41 2.72 0.89 1.94 2.56 2.67 2.92
2.58 2.19 2.64 2.45 2.91
0.46 0.24 0.49 0.52 0.58
1.49 1.17 1.53 1.59 1.93
-10 0.23 0.07 0.26 0.28 0.31
2.23 1.99 2.29 2.36 2.41
2.27 2.03 2.39 2.48 2.67
0.21 0.11 0.22 0.24 0.28
0.26 0.12 0.31 0.36 0.62
0.81 0.65 0.88 0.91 1.47
1.71 1.59 1.79 1.84 2.16
1.96 1.71 2.11 2.19 2.41
M109 Noise SNR in dB -5 0 5 10 0.42 0.21 0.42 0.45 0.56
1.11 0.94 1.18 1.21 1.44
2.02 1.93 2.11 2.18 2.39
2.18 1.95 2.23 2.31 2.56
2.1 1.87 2.32 2.41 2.69
15 2.21 1.98 2.29 2.38 2.57
ACCEPTED MANUSCRIPT
-10 0.33 0.34 0.35 0.4
0.98 0.99 1.12 1.18
2.01 2.17 2.25 2.49
2.81 2.89 2.96 3.11
2.86 2.92 3.02 3.19
3.02 3.12 3.22 3.39
AN US
SE [24] SE+NMF SE+NMF+OU [25] SE+NMF+OU+CUPGG
White Noise SNR in dB -5 0 5 10 15
CR IP T
Algorithm
VIII. CONCLUSION
CE
APPENDIX A
PT
ED
M
In this paper, two CUP estimators (10), (13) are derived by assuming Nagakami and Gamma as speech priors and Generalized Gamma as noise priori. An amplitude estimator given uncertain prior phase information and estimation of complex speech coefficients are presented for completely uncertain phase information. Different estimators are analyzed in terms of objective performance measures PESQ, STOI and subjective listening test according to ITU-T recommendation P.385. Also sensitivity of prior phase information on different estimators are discussed. Secondly, the derived CUP estimator is combined with template based NMF approach with online-bases update. The sophisticated methods for estimation of phase moves the speech enhancement process further. By combining the advantages of statistical-based approach and NMF, more robust speech enhancement methods for non stationary noises may develop.
Generalized
AC
(Derivation of CUP Estimator by assuming Nagakami PDF for speech prior and Gamma Prior for noise ) The CUP estimator by assuming speech as Nagakami prior and noise as GGD is derived as in [14] Assume the speech prior as Nagakami PDF as k
k 2 2 k 2 k 1 pA a 2 a exp 2 a k S S Assume the noise prior as GGD as
(A.1)
ACCEPTED MANUSCRIPT 2 v pY S y s v N2
a 2 N
2 v 1
y a 2 exp 2 N
(A.2)
The CUP estimator is derived as in [20] using
Sˆ E A e jS y,S
y a 2 exp 2 p S S dadS N S S k 2 v 1 2 v 1 2 2 v v ya 2 k 2 k 1 k 2 2 a 2 a 0 0 k S2 a exp S2 a v N2 N2 v N2 N2 exp N2 pS S S S dadS k
2
2 k 2 k 1 k 2 2 v a a e a e xp 2 a 2 2 k 2 S v N N 0 0 S jS
2 v 1
2 v a v N2 N2
2 v 1
Sˆ E A e jS y, S
0
2
0
k r 2 2k 2v 2 exp exp 2 2 2 a 2 k 2 2 2 v 1 S N k v S N N N 0 4 v k k
2 2 r cos a a pS S S S dadS N2
k r 2 2k 2v 2 2 r cos exp exp 2 2 a 2 2 a a pS S S S dadS 2 k 2 2 2 v 1 N2 k v S N N N 0 S N 4 v k k
AN US
e jS
2
CR IP T
(A.3)
(A.4)
After rearranging and simplification using Gradshteyn and Ryzhik (2007 Eq. (3.462.1))
p 1 x x e
2
x
By Comparing, the obtained terms are
M
0
p 2 2 2 p exp D p 2 8
k N2 S2
ED
p 2k 2v 1;
2 S
2 N
(A.5)
2r cos
N2
PT
By Substituting all parameters after simplification
Sˆ E A e j S y, S
0
4 v k k 2k 2v 1 k 2 2 2 2 k 2 2v k v S N N S
CE
2
2
4 v k k 2k 2v 1 k 2 2 2 2 k 2 2v k v S N N S
AC
0
1 k v 2
r2 2 e jS exp 2 exp D 2 v 2 k 1 p S S dS S S 2 N
1 k v 2
r2 exp 2 N
2 exp D 2 v 2 k 1 p S S S S dS 2
(A.6)
Sˆ E A e jS y, S 2
2k 2v 1 k 2 2 2 2k 2v 1 N S
2
r2 2 jS e exp exp 2 D 2v 2 k 1 pS S S S dS 0 2 N 2 2 r 2 exp exp 0 N2 2 D2v2k 1 pS S S S dS
(A.7)
ACCEPTED MANUSCRIPT
APPENDIX B (Derivation of CUP Estimator by assuming Gamma PDF for speech prior and Gamma Prior for noise priori)
Generalized
CR IP T
The CUP estimator by assuming speech as Gamma prior and noise as GGD is derived as in [10] Assume the speech prior as Gamma PDF as
pA a
1 a a k 1 exp k k
AN US
Assume the noise prior as GGD as
2 v pY S y s v N2
a 2 N
2 v 1
y a 2 exp 2 N
(B.1)
(B.2)
0
2
x
2
ED
p 1 x x e
M
The CUP estimator is derived as in [20] and from Gradshteyn and Ryzhik (2007 Eq. (3.462.1))
p 2
2 p exp D p 2 8
(B.3)
PT
By comparing the parameters we obtain parameters
CE
p 2v k 1;
1 2 y cos S Y ; 2 N S N2
2
0
AC
The Numerator term is obtained finally as
2 v k k v N2
Let
1 2 N
2 v 1
2 2 N
2 v k 1 2
1 2 y cos 2 1 2 y cos S Y S Y 2 N N2 D S 2v k 1 exp S 2 v k 1 2 2 8 N 2 N
2 y S cos S Y N2 N 2 2 N S
Divide Numerator for above with S and N2 then
S 2 y cos(S Y ) 2 N
(B.4)
ACCEPTED MANUSCRIPT Finally the estimated signal using (3) and using simplified equation as
2 2 2 v k k (v) N2 N 2 v
k 1 v 2
2 2v k 1 exp D (2v k 1) 2
(B.5)
Sˆ E A e j S y, S 2
0
1 2 y cos 2 1 2 y cos S Y S Y N2 1 2 N2 2 v S p 2v k 1 exp D 2 v k 1 S d S 2 k 2 2 2 S S S S k v N N N 8 N 2 N2 1 2 y cos 2 1 2 y cos S Y S Y 2 v k 1 2 v 1 2 N2 1 2 2 N2 2 v S p 2v k 1 exp D 2 v k 1 S d S 2 0 k k v N2 N2 N2 S S S S 8 N 2 N2 2 v 1
2 v k 1 2
CR IP T
Sˆ E A e j S y, S k 1 v
2
2
CE
APPENDIX . C
2 jS e exp D (2 v k 1) p S S S S dS 0 2 2 2 exp 0 2 D(2vk 1) pS S S S dS
(B.7)
(B.8)
PT
ED
2v k 1 2 2v k 1 N2
2
M
Sˆ E A e j S y, S
AN US
2 2 2 2 v k 1 exp D (2 v k 1) p S S S S dS 0 N2 2 k 1 v 2 2 2 2 2 v k 1 exp D (2 v k 1) p S S S S dS 0 N2 2
(B.6)
Thus from ([23] Eq. 17) we obtain the estimation as
k N2 S2 2 2r a exp a I1 da 2 2 2 0 a S N N j Y e k N2 S2 2 2r a 2v 2 k 2 a exp a I0 da 2 2 2 0 S N N
AC
SˆB
2 v 2 k 2
(C.1)
Using ([20] Eq. 6.643.2, Eq.9.220.2) and simplification sequentially results in
0
x
1 p 2
e x I 2 v
1 p v 2 2 2 1 2 p 2 x dx e M p ,v 2v 1
(C.2)
ACCEPTED MANUSCRIPT And
M , z z
1 2
z 2
1 e ; 2 1; z 2
(C.3)
Let
2
Y
N2
S2 N2
From (21)
k N2 S2 2r x 2 v 2 k 3 a exp x I1 dx 2 0 S2 N2 Y N e j 2 2 k N S 2r x 2 v 2 k 3 exp x I0 da 2 0 a S2 N2 N
SˆB
CR IP T
The Numerator term is simplified as 0
a
5 1 2v2k 2 2
k 2 2 exp N 2 2 S S N
2r x x I 1 dx 2 2 N 2
AN US
Compare (25) with (22) we get
(C.4)
(C.5)
M
k N2 S2 5 1 r p 2v 2k ; ;v ; 2 2 2 2 S N 2 N Then numerator becomes
3
ED
r 2 k 2 2 2v 2 k 2 r 2 2 2 S2 N2 3 N2 N S S N 2 2v 2k M exp 2 3 1 2 2 v 2 k , 2 r N 2(k N2 S2 ) S2 N2 k S2 N N 2 2
(C.6)
2
(C.7)
AC
CE
PT
r S2 N2 Multiply and divide with 2 and using (23) 2 2 k N S N 3 2v 2k (2 v 2 k 2) 1 2 2 2 (2v 2 k 2) N 2 2k ; 2 ; 2 2 2 k k Where
2 2k
Where
Y
2 1 r 2 r 2 2 2 S2 N2 1 r S2 N2 S N ; 2 ; 2 exp M N2 2(k N2 S2 ) 2v 2 k 32 , 12 N2 k N2 S2 2 k N2 k N2 S2
2
N2
S2 N2
Similarly denominator is simplified as
(C.8)
ACCEPTED MANUSCRIPT 3 3 2v 2k (2 v 2 k ) 3 2 3 2 2 2 (2v 2 k 2 ) N 2 2k ;1; 2 (1) 2 k k
(C.9)
Substitute (27) and (26) in (24) it results in
CR IP T
3 2v 2k (2 v 2 k 2) 1 2 2 2 (2v 2 k 2) N 2 2k ; 2 ; 2 2 2 k k 3 3 2v 2k (2 v 2 k ) 3 2 3 2 2 2 (2v 2 k 2 ) N 2 2k ;1; 2 (1) 2 k k
APPENDIX D ΔPESQ values [Table 6 to Table 9]
-10 -5 0 5 10 15
0.09 0.18 0.48 0.43 0.39 0.32
0.09 0.26 0.52 0.52 0.45 0.38
BECOCO
0.13 0.15 0.48 0.46 0.39 0.29
ADP
0.15 0.38 0.57 0.52 0.46 0.37
CDP
0.18 0.41 0.68 0.62 0.54 0.49
M
SNR in dB Wiener MOSIE
AN US
Table 6. Comparison of PESQ values for different estimators in case of Modulated pink noise AUP
0.14 0.35 0.53 0.42 0.41 0.32
CUP
0.2 0.45 0.76 0.69 0.56 0.57
CUPNG
CUPGG
0.23 0.47 0.87 0.79 0.73 0.64
0.31 0.55 0.92 0.82 0.77 0.66
CUPNG
CUPGG
0.14 0.41 0.74 0.73 0.7 0.64
0.17 0.43 0.81 0.79 0.76 0.65
CUPNG
CUPGG
0.13 0.26 0.47 0.54 0.56 0.5
0.17 0.28 0.57 0.58 0.59 0.53
SNR in dB Wiener MOSIE
0.06 0.19 0.41 0.39 0.38 0.31
BECOCO
0.04 0.18 0.36 0.36 0.33 0.28
PT
0.04 0.16 0.39 0.37 0.36 0.27
CE
-10 -5 0 5 10 15
ED
Table 7. Comparison of PESQ values for different estimators in case of pink noise ADP
0.06 0.24 0.46 0.45 0.42 0.38
CDP
0.07 0.3 0.57 0.59 0.53 0.48
AUP
0.05 0.18 0.43 0.44 0.43 0.35
CUP
0.11 0.35 0.67 0.68 0.66 0.61
AC
Table 8. Comparison of PESQ values for different estimators in case of Babble noise SNR in dB Wiener MOSIE -10 -5 0 5 10 15
0.03 0.16 0.29 0.27 0.27 0.2
0.08 0.18 0.35 0.29 0.28 0.26
BECOCO
0.03 0.11 0.25 0.26 0.27 0.22
ADP
0.06 0.13 0.27 0.34 0.32 0.28
CDP
0.07 0.16 0.38 0.43 0.41 0.38
AUP
0.04 0.1 0.36 0.35 0.35 0.33
CUP
0.07 0.15 0.41 0.44 0.47 0.42
Table 9. Comparison of PESQ values for different estimators in case of Non-stationary factory noise SNR in dB Wiener MOSIE
BECOCO
ADP
CDP
AUP
CUP
CUPNG
CUPGG
(C.10)
ACCEPTED MANUSCRIPT -10 -5 0 5 10 15
0.04 0.15 0.35 0.28 0.25 0.21
0.09 0.21 0.39 0.36 0.32 0.27
0.03 0.17 0.24 0.3 0.25 0.23
0.04 0.17 0.35 0.35 0.33 0.29
0.08 0.16 0.46 0.47 0.46 0.43
0.06 0.14 0.37 0.36 0.34 0.31
0.04 0.3 0.62 0.58 0.56 0.54
0.11 0.35 0.68 0.68 0.64 0.62
0.14 0.39 0.72 0.67 0.65 0.63
APPENDIX E
CR IP T
Δ STOI in [%] values [Table 10 to Table 13] Table 10. Comparison of STOI values for different estimators in case of Modulated pink noise
-10 -5 0 5 10 15
-1.29 -1.28 -1.11 -0.47 0.12 0.25
4.11 3.72 2.97 2.16 1.76 0.99
BECOCO
-0.79 -0.62 -0.49 -0.32 -0.19 -0.14
ADP
6.27 3.47 3.16 2.75 2.03 1.27
CDP
-1.38 -1.1 0.41 0.21 0.1 0.05
AUP
5.29 2.89 2.37 0.82 0.16 0.09
CUP
3.51 2.76 2.49 0.95 0.26 0.12
AN US
SNR in dB Wiener MOSIE
CUPNG
CUPGG
4.73 3.64 2.98 1.61 0.49 0.32
4.91 3.95 3.64 1.96 0.44 0.27
CUPNG
CUPGG
4.59 3.81 3.29 1.23 1.15 1.43
4.98 4.69 2.61 1.95 1.46 0.58
Table 11. Comparison of STOI values for different estimators in case of pink noise
-1.29 -1.41 -1.09 -0.42 0.14 0.21
3.72 3.48 2.58 1.95 1.49 0.81
-0.94 -0.72 -0.52 -0.39 -0.26 -0.15
ADP
5.81 3.74 2.95 2.61 1.98 1.12
CDP
-1.29 -1.72 0.28 0.15 0.04 0.05
M
-10 -5 0 5 10 15
BECOCO
ED
SNR in dB Wiener MOSIE
AUP
4.97 3.21 1.89 0.49 0.07 0.05
CUP
3.61 2.98 2.79 0.9 0.44 0.34
PT
Table 12. Comparison of STOI values for different estimators in case of Babble noise SNR in dB Wiener MOSIE
-1.89 -1.75 -1.49 -1.32 -1.11 -0.94
CE
-2.74 -2.23 -2.03 -1.89 -1.39 -1.31
AC
-10 -5 0 5 10 15
BECOCO
-1.54 -1.22 -1.08 -0.96 -0.69 -0.52
ADP
-2.13 -1.86 -1.49 -1.21 -1.02 -0.83
CDP
-4.87 -5.12 -0.38 -0.32 -0.29 -0.27
AUP
CUP
CUPNG
CUPGG
-2.51 -1.19 -0.09 0.29 0.23 0.07
-1.09 -0.62 -0.17 0.25 0.15 0.02
-0.35 -0.19 -0.01 0.37 0.24 0.01
-0.02 -0.01 0.09 0.06 0.05 0.03
Table 13. Comparison of STOI values for different estimators in case of Non-stationary factory noise SNR in dB Wiener MOSIE -10 -5 0 5 10 15
-1.28 -1.39 -1.12 -0.42 -0.11 -0.01
3.91 3.44 2.83 2.04 1.81 0.91
BECOCO
-0.84 -0.77 -0.59 -0.45 -0.28 -0.18
ADP
5.11 3.14 2.68 2.48 1.84 0.97
CDP
-1.18 -1.42 0.25 0.14 0.02 0.02
AUP
4.89 3.01 1.92 0.58 0.17 0.05
CUP
3.02 2.56 1.98 0.69 0.22 0.07
CUPNG
CUPGG
3.61 2.98 2.91 1.18 0.69 0.19
3.98 3.18 2.59 1.79 0.85 0.31
ACCEPTED MANUSCRIPT
APPENDIX . F Table 14. Comparision of SSNR values Averaged for four different noises Modulated Pink noise,Pink
-10 -5 0 5 10 15
1.31 1.8 5.62 7.12 10.76 14.03
4.87 5.32 7.18 8.71 11.84 14.79
BECOCO 0.56 0.91 5.87 7.21 10.82 14.11
ADP
CDP
3.88 5.05 7.11 8.59 11.76 14.68
3.87 5.04 7.1 8.57 11.76 14.67
AUP 4.11 5.32 7.89 9.21 12.45 15.22
CUP
1.69 2.31 5.95 8.23 11.21 13.78
AN US
SNR in dB Wiener MOSIE
CR IP T
Noise, Babble Noise, Non-Stationary Factory Noises for different estimators at different input SNRs CUPNG 2.12 4.27 6.54 8.42 11.77 14.23
CUPGG 3.01 5.21 6.94 8.97 12.15 14.68
Table 15. Comparision of PSNR values Averaged for four different noises Modulated Pink noise,Pink
Noise, Babble Noise, Non-Stationary Factory Noises for different estimators at different input SNRs 4.87 6.41 7.97 9.76 14.28 14.88
4.83 6.36 7.94 9.74 14.27 14.85
4.85 6.38 7.95 9.75 14.29 14.89
ADP
CDP
4.85 6.38 7.94 9.73 14.29 14.86
3.74 5.71 6.84 8.87 13.1 13.97
AUP
4.82 6.28 8.02 9.81 14.28 14.85
CUP
CUPNG
4.01 5.81 7.14 9.01 13.38 14.23
5.07 6.31 7.62 9.61 13.76 14.52
CUPGG 6.08 7.11 7.92 9.75 13.97 14.79
CE
APPENDIX . G
PT
ED
-10 -5 0 5 10 15
BECOCO
M
SNR in dB Wiener MOSIE
Table 16. Comparison of Subjective listening test values, averaged over 30 sentences from NOIZEUS
AC
speech corpus data base corrupted by Babble noise at different input SNRs of 5 dB, 0 dB, -5 dB Estimator
Wiener MOSIE (18) BECOCO (26) ADP (17) CDP (22) AUP (15) CUP [7] CUP-NG (10) CUP-GG (13)
SIG 2.5 2.3 2.7 3.2 3.3 2.9 3.4 3.4 3.6
5 dB SNR BAK OVRL 1.1 1.7 1 1.5 1.4 1.9 1.6 2.3 1.7 2.4 1.5 2.2 2.1 2.4 2.2 2.8 2.5 3.1
SIG 1.9 1.9 2.2 2.8 2.9 2.5 2.9 3 3.2
0 dB SNR BAK OVRL 1 1.3 1.4 1.2 1 1.6 1.2 1.9 1.4 1.8 1.1 1.8 1.7 2.1 1.9 2.5 2.3 2.9
SIG 1.2 1.3 1.3 1.5 1.3 1.4 1.7 2.2 2.5
-5 dB SNR BAK OVRL 1 1 1.1 1 1 1.2 1.1 1.1 1 1.2 1.1 1.1 1.3 1.5 1.4 1.7 1.7 1.9
ACCEPTED MANUSCRIPT
Table 17. Comparison of Subjective listening test values, averaged over 30 sentences from NOIZEUS
speech corpus data base corrupted by Modulated-Pink noise at different input SNRs of 5 dB, 0 dB, -5 dB
SIG 2.2 2.3 2.5 2.8 2.9 2.7 3.1 3.4 3.5
0 dB SNR BAK OVRL 1.2 1.6 1.1 1.3 1.4 1.9 1.5 2.2 1.6 2.4 1.5 2.4 2.1 2.6 1.9 2.7 2.5 3.1
-5 dB SNR SIG BAK OVRL 1.4 1 1.3 1.5 1.3 1.3 1.5 1.4 1.4 1.7 1.3 1.2 1.6 1.2 1.3 1.8 1.3 1.4 1.9 1.5 1.6 2.4 1.7 1.9 2.7 1.9 2.2
CR IP T
Wiener MOSIE (18) BECOCO (26) ADP (17) CDP (22) AUP (15) CUP [7] CUP-NG (10) CUP-GG (13)
SIG 2.7 2.5 2.9 3.1 3.3 3.1 3.4 3.7 3.9
5 dB SNR BAK OVRL 1.3 1.9 1.1 1.5 1.6 2.1 1.7 2.4 1.7 2.6 1.6 2.4 2.3 2.8 2.2 2.8 2.7 3.4
AN US
Estimator
Table 18. Comparison of Subjective listening test values, averaged over 30 sentences from NOIZEUS
ED
0 dB SNR SIG BAK OVRL 2 1.1 1.5 2.1 1 1.2 2.3 1.2 1.6 2.6 1.3 2 2.7 1.4 2.1 2.5 1.3 2.2 2.9 2 2.4 3.2 2.1 2.5 3.3 2.3 3
SIG 1.3 1.3 1.3 1.5 1.5 1.5 1.7 2.2 2.5
-5 dB SNR BAK OVRL 1 1.1 1 1.2 1.2 1.2 1.1 1.4 1.1 1.1 1.2 1.3 1.3 1.6 1.6 1.8 1.9 2.3
AC
CE
Wiener MOSIE (18) BECOCO (26) ADP (17) CDP (22) AUP (15) CUP [7] CUP-NG (10) CUP-GG (13)
5 dB SNR SIG BAK OVRL 2.6 1.2 1.7 2.4 1.1 1.3 2.7 1.4 1.8 2.8 1.5 2.3 3.1 1.6 2.5 3 1.4 2.2 3.2 2.1 2.7 3.5 2 2.6 3.7 2.3 2.9
PT
Estimator
M
speech corpus data base corrupted by Non-Stationary Factory noise at different input SNRs of 5 dB, 0 dB, -5 dB
REFERENCES
[1] R. Martin, ―Speech enhancement based on minimum mean-square error estimation and super-Gaussian priors,‖ IEEE Trans. Speech Audio Process., vol. 13, no. 5, pp. 845–856, Sep. 2005. [2] Breithaupt, C., Martin, R., 2003. MMSE estimation of magnitude squared DFT coefficients with supergaussian priors. In: Proceedings of the IEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Hong Kong, pp. 896–899 [3] Ephraim, Y., Malah, D., December 1984. Speech enhancement using a minimum mean square error short-time spectral amplitude estimator. IEEE Trans Acoust, Speech, Signal Process. Vol.32,Issue 6, pp.1109–1121
ACCEPTED MANUSCRIPT
CR IP T
[4] Fodor, B., Fingscheidt, T., August 29-Sept 2, 2011. MMSE speech spectral amplitude estimation assuming non-Gaussian noise. In: Proceedings of the 19th European Signal Processing Conference (EUSIPCO 2011). Barcelona, Spain, pp. 2314– 2318 [5] C.H.You, S. N.Koh, and S. Rahardja, ―β-orderMMSEspectral amplitude estimation for speech enhancement,‖ IEEE Trans. Speech Audio Process., vol. 13, no. 4, pp. 475–486, Jul. 2005 [6] C.Breithaupt, M.Krawczyk, and R.Martin, ―Parameterized MMSE spectral magnitude estimation for the enhancement of noisy speech,‖ in IEEE International Conference on Acoustics Speech signal Process., Apr.2008, pp.4037-4040. [7] Hendriks, R.C.Gerkman, T.Gensen, ― DFT Domain Based Single Microphone Noise Reduction for Speech Enhancement: A survey of the State-of-the-art.‖ Morgan & Claypool, Colorado, USA. [8] K. Paliwal, K. W´ojcicki, and B. Shannon, ―The importance of phase in speech enhancement,‖ Speech Commun., vol. 53, no. 4, pp. 465-494, Apr.2011 [9] Kuldip Paliwal, Kamil Wojcicki, Belinda Schwerin ―Single-Channel speech enhancement using spectral subtraction in the short-time modulation domain‖, Speech communications, Vol.52, Issue 5, May 2010 [10] M. Krawczyk and T. Gerkman, ―STFT phase reconstruction in voiced speech for an improved single- channel speech enhancement,‖ IEEE/ACM Trans. Audio, Speech, Lang, Process., vol. 22, no. 12, pp.1931-1940, Dec.2014 [11] P. Mowlaee and J. Kulmer, ―Harmonic phase estimation in single-channel speech enhancement using phase decomposition and SNR information,‖ IEEE/ACMTrans. Audio, Speech, Lang. Process., vol. 23, no. 9, pp. 1521–1532, Sep. 2015
AC
CE
PT
ED
M
AN US
[12] Gerkmann, T., Krawczyk, M., Feb. 2013. MMSE-optimal spectral amplitude estimation given the STFT-phase. IEEE Signal Process. Lett. 20 (2), 129– 132. [13] Erkelens, J.S., Hendriks, R.C., Heusdens, R., Jensen, J., Aug. 2007. Minimum mean-square error estimation of discrete Fourier coefficients with generalized Gamma priors. IEEE Trans. Audio, Speech, Lang. Process. 15 (6), 1741–175 [14] T. Gerkman, ―Bayesian estimation of clean speech spectral coefficients given a priori knowledge of the phase,‖ IEEE Trans, Signal Process., vol. 62, no. 16, pp. 4199-4208, Aug. 2014 [15] Ephraim, Y., Malah, D., December 1984. Speech enhancement using a minimum mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust, Speech, Signal Process. 32 (6), 1109–1121 [16] Chen, B., Loizou, P.C., 2005. Speech enhancement using a MMSE short time spectral amplitude estimator with Laplacian speech modelling. In: Proceedings of the IEEE International Conference Acoustics, Speech and Signal Processing (ICASSP). Philadelphia, Pennsylvania, USA, pp. 1097–1100. [17] Hendriks, R.C., Heusdens, R., 2010. On linear versus non-linear magnitude DFT estimators and the influence of superGaussian speech priors. In: Proceedings of. IEEE International Conference Acoustics. [18] Lotter, T., Vary, P., Jan. 2005. Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model. EURASIP J. Appl. Signal Process. 2005 (7), 1110–1126. [19] Martin, R., 2002. Speech enhancement using MMSE short time spectral estimation with gamma distributed priors. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Orlando, Florida, pp. 504–512 [20] Martin, R., 2005. Speech enhancement based on minimum mean square error estimation and supergaussian priors. IEEE Trans. Speech Audio Process 13 844 (5), 845–856 [21] Y. Lu and P. C. Loizou, ―A geometric approach to spectral subtraction,‖ Speech Commun., vol. 50, no. 6, pp. 453–466, 2008. [22] Y. Zhang and Y. Zhao, ―Real and imaginary modulation spectral subtraction for speech enhancement,‖ Speech Communication., vol. 55, pp.509–522, 2013. [23] On MMSE-Based Estimation of Amplitude and Complex Speech Spectral Coeficients Under Phase Uncertainty,‖ IEEE/ACM Transactions on Audio, Speech and Language Processing, VOL.24, No.12, Dec 2016 [24] Sundarrajan Rangachari , Loizou PhiliposC . A noise-estimation algorithm for highly non-stationary environments. Speech Commun 2006, Vol 48, Issue 2, pp:220–231 . [25] Kwon Kisoo , Jong Won Shin , Nam Soo Kim . NMF-based speech enhancement using bases update. Sig Process Lett IEEE 2015;22(4):450–4 . [26] Févotte Cédric , Nancy Bertin , Jean-Louis Durrieu . Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music anal- ysis. Neural Comput 2009;21(3):793–830 . [27] Gradshteyn, I.S., Ryzhik, I.M., 2007. Table of Integrals Series and Products, 7th ed. Academic, San Diego, CA, USA. [28] Evans, M., Hastings, N., Peacock, B., 2000. von Mises distribution. Statistical Distributions, 4th ed.. John Wiley & Sons, New York, pp. 191–192. [29] S. Gonzalez and M. Brookes, ―PEFAC – a pitch estimation algorithm robust to high levels of noise,‖ IEEE Trans. Audio, Speech, Language Process., vol. 22, no. 2,pp. 518–530, Feb. 2014. [30] Loizou, P.C., 2007. Speech Enhancement – Theory and Practice. CRC Press/Taylor & Francis Group, Boca Raton, FL, USA.
ACCEPTED MANUSCRIPT [31] Taal, C.H., Hendriks, R.C., Heusdens, R., Jensen, J., Sep. 2011. An algorithm for intelligibility prediction of timefrequency weighted noisy speech. IEEE Trans. Audio, Speech, Lang. Process. 19 (7), 2125–2136. [32] Gerkmann, T.,Martin, R., Nov. 2009. On the statistics of spectral amplitudes after variance reduction by temporal cepstrum smoothing and cepstral nulling.IEEE Trans. Signal Process. Vol.52, Issue 11, 4165–4174. [33] ITU-T P.835, 2003. Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm. ITU-T Recommendation P.835.
Author’s Profiles
CR IP T
Ravi Kumar Kandagatla was born in Markapur, India in 1988. He received the Bachelor of Technology degree from Jawaharlal Nehru Technological University, Kakinada in 2009 and received Master of Technology in Digital Electronics and Communication Systems from Jawaharlal Nehru Technological University, Kakinada in 2011. He is presently working as Assistant professor in Lakireddy Balireddy College of Engineering, Mylavaram, India. He has 6 years of teaching experience. He has 3 International publications. His interest area of research is speech processing
AC
CE
PT
ED
M
AN US
Dr. P. V Subbaiah was graduated in ECE from Bangalore University and received his Master’s Degree from Andhra University, Visakhapatnam in 1982. JNTU, Hyderabad has conferred Ph.D degree on P.V. Subbaiah for his work on Microwave Antenna Test Facilities in the year 1996. He has vast teaching experience of 33 years in different reputed Institutions as Assistant Professor, Associate Professor, Professor and Head of the Department and Principal. Presently he is the Professor of ECE at V.R. Siddhartha Engineering College, Vijayawada and discharging his duty as the Coordinator of World Bank funded TEQIP Project since 2014. His areas of interest include Microwave Antennas, Smart Antennas and Communications. He has published more than 100 research papers in National and International Journals and Conferences of repute. Ten research scholars have received their Ph.D degree under his supervision and presently guiding three more scholars for their Ph D. He is the Member and Fellow of various professional societies namely ISTE, BMESI, IETE and IE (I). He was recipient of Sir Thomas ward Gold Prize from The Institution of Engineers (India).