Non-stationary sources separation based on maximum likelihood criterion using source temporal–spatial model

Non-stationary sources separation based on maximum likelihood criterion using source temporal–spatial model

ARTICLE IN PRESS JID: NEUCOM [m5G;September 15, 2017;9:44] Neurocomputing 0 0 0 (2017) 1–9 Contents lists available at ScienceDirect Neurocomputi...

1MB Sizes 0 Downloads 13 Views

ARTICLE IN PRESS

JID: NEUCOM

[m5G;September 15, 2017;9:44]

Neurocomputing 0 0 0 (2017) 1–9

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Non-stationary sources separation based on maximum likelihood criterion using source temporal–spatial model Jiong Li a,∗, Hang Zhang a, Menglan Fan a, Jiang Zhang b a b

College of Communications and Engineering, PLA University of Science and Technology, 1 Haifuxiang, Nanjing, China Nanjing Institute of Telecommunication Engineering, 18 Houbiaoying, Nanjing, China

a r t i c l e

i n f o

Article history: Received 27 October 2016 Revised 15 May 2017 Accepted 30 August 2017 Available online xxx Communicated by Zhi Yong Liu. MSC: 00-01 99-00 Keywords: Blind source separation Maximum likelihood Expectation maximization method Non-stationary signal processing Autoregressive model Hidden Markov model

a b s t r a c t In this paper, we propose a new non-stationary sources separation algorithm, which is referred to as autoregressive hidden Markov Gaussian process (AR-HMGP), in which the sources are non-stationary and temporally correlated. For the proposed algorithm, a generative model is employed to track the nonstationarity of the source where the temporal dependencies of sources are represented by autoregressive model (AR) and the distribution of the associated innovation process is described using non-stationary Gaussian process with hidden Markov model (HMM). We further explore the maximum likelihood (ML) method to estimate the parameters of the source model by using the expectation maximum (EM) algorithm. Our important findings reveal that (a) AR-HMGP algorithm outperforms the other three BSS algorithms for non-stationary sources separation, the instantaneous mixture system is also well corroborated with the effectiveness of our algorithm; (b) both independent and dependent non-stationary sources have been successfully separated; (c) the proposed algorithm is robust with respect to noise, while the other three algorithms are not.

1. Introduction Source signals mixing has attracted a wave of attentions with various applications, such as communication system, speech system, biomedical signal processing. In particular, target signal can be easily extracted by filter if there is no spectral overlap. However, if there is spectral overlap, the filter method is invalid. In this case, blind source separation (BSS) is a good choice to solve the problem. BSS is a technique which attempts to extract a set of unknown sources mixed by an unknown mixing matrix, given only a set of their mixture signals [1]. In past several decades, BSS has been widely researched. It has been used to remove interference from communication signal [2], separate speech signals from different speakers [3,4], extract encephalography (EEG) signals [5,6], and so on. Meanwhile, many BBS algorithms are developed. There are two common types of BSS algorithms. One is based on independent component analysis (ICA) [1,3,6] and the other is based on decorrelation [7,8]. BSS algorithms based on decorrelation use covariance matrices of



Corresponding author. E-mail address: [email protected] (J. Li).

© 2017 Published by Elsevier B.V.

estimation sources with two or more lags and combine the matrices characteristics to achieve mixture system identification, and then obtain sources. Those methods are able to separate dependent sources mixtures and multi-Gaussian sources mixtures when the sources have sufficient spectral diversity. The algorithms based on ICA are invalid in this case. While the BSS algorithms based on ICA minimize the cost function constructed on the dependence of estimation sources. Those algorithms assume that sources are mutually independent and no more than one Gaussian signal exists. Under those conditions, ICA-based algorithms can separate mixtures no matter what the sources are, even all of the sources have identical spectra, which is significant for communication system – without consuming additional power and frequency resources to eliminate interference or jamming to achieve normal communication. However, decorrelation-based algorithms are invalid when the sources have identical spectra. Therefore, we need to select different algorithms according to different application environment, which brings much inconvenience for the application of BSS. The reason is mainly because of the two methods above analyse temporal structure and spatial structure of sources separately, consequently the temporal-spatial structure information has not been fully utilized. The algorithm proposed in this paper is to solve the two aforementioned cases, in which both temporal and

http://dx.doi.org/10.1016/j.neucom.2017.08.034 0925-2312/© 2017 Published by Elsevier B.V.

Please cite this article as: J. Li et al., Non-stationary sources separation based on maximum likelihood criterion using source temporal– spatial model, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2017.08.034

JID: NEUCOM 2

ARTICLE IN PRESS

[m5G;September 15, 2017;9:44]

J. Li et al. / Neurocomputing 000 (2017) 1–9

spatial structure information are in consideration. Recently, there are also some others methods to address BSS, such as bounded component analysis (BCA) [9–11] and non-negative matrix factorization (NMF) [12–14]. However, since these methods are based on the geometric properties of the received signals, they are very sensitive to noise. From the practical application of the blind separation algorithm, we will no longer discuss these algorithms in detail. Structures that have been used in BSS to explain the temporal structure of each source are often described by using autoregressive model (AR) and hidden Markov model (HMM). Both ICA-based methods [15,16] and decorrelation-based methods [15,17] can be used to separate AR sources. It is worth noting that only stationary signals can be represented by AR model. However, most of signals are non-stationary in practice. Non-stationary signals are often described by using HMM. The BSS algorithms for HMM sources can be used to separate non-stationary sources [18–20]. However, those methods are based on ICA, in which dependent sources cannot be separated. Some researchers have also studied blind separation for moving average model (MA) sources [21,22]. However, MA model is not prevalent in BSS, because it needs to estimate a back forward finite impulse response (FIR) filter, which has high computational complexity compared with the AR model. The statistical distribution models prevalent in BSS are often described by using Gaussian mixture model [2,23], generalized Gaussian distribution (GGD) [24] and generalized hyperbolic model [25]. However, those models are used to represent stationary signals. Meanwhile, authors of [26] combined HMM with GMM to present the non-stationary characteristic and statistical characteristic of non-stationary signals respectively, which obtains a good separation result. However, the complexity of the algorithm grows exponentially with a linear increase in model order. Considering the model of sources, we always hope that the model can represent different signals perfectly, as well as the number of parameters as small as possible. Since AR model represents the temporal structure of signals with linear complexity, we select AR model as our source model frame. The innovation of AR model is described by using non-stationary Gaussian process, where HMM is employed to represent the non-stationarity. The effectiveness of the proposed approach for tracking non-stationarity is shown in section IV that small AR model order and small number of innovation state already has a great performance. The estimation of model parameters is an incomplete data issue with latent variables. The typical solution is based on maximum likelihood criterion (ML). There are two common algorithms of ML. The first is Markov chain Monte Carlo algorithm (MCMC) [27,28] and the second is expectation maximum (EM) algorithm [23,29,30]. Considering EM algorithm is simple and converges faster than MCMC [31], EM algorithm is employed in this paper to realize BSS. The main contributions of our work are summarized as follows. • Considering both temporal and spatial structure information of sources. • Solving the problem of non-stationary sources separation for dependent sources and sources with similar spectrum. • Exploiting the structure of non-stationary source signals (temporal–spatial) in the ML criterion leads to better performance. The remainder of this paper is organized as follows. Section 2 introduces three models used in this paper, including system model, sources distribution model and observations distribution model. Next, the proposed algorithm is described in Section 3. Section 4 demonstrates our experimental results, and conclusions are drawn in Section 5.

Fig. 1. The structure of sources (ellipse stands for model parameters whereas circle stands for variable).

2. Models 2.1. System model Instantaneous mixture model is given by

X(t ) = AS(t )

(1)

where S(t ) = [s1 (t ), s2 (t ), . . . , sN (t )]T denotes N sources vector, A ∈ RM×N denotes mixture matrix, X(t ) = [x1 (t ), x2 (t ), . . . , xM (t )]T denotes M observations vector, [ · ]T denotes transpose operator. BSS aims at constructing a demixing matrix W to obtain Sˆ (t ) = WX (t ), which is an estimation of S(t), possibly with different order and scale. To simplify the analysis, we only discuss the case N = M, reducing dimension method [32] can be used to transform over-determined mixture (M > N) to determined mixture (M = N). 2.2. Sources distribution model In this paper, a temporal–spatial structure model is employed to represent sources, where the temporal dependencies of each source is described by using AR model and the distribution of associated innovation process is described by using non-stationary Gaussian process (HMGP), where HMM is employed to represent the non-stationarity. Assuming there are sources, their vector AR model is given by Boudjellal et al. [17]

S(t ) =

L 

Dl S(t − l ) + V(t )

(2)

l=1

where Dl = diag[d1l , d2l , . . . , dNl ] denotes the coefficient matrix of vector AR model, L denotes the order of vector AR model, V (t ) = T [v1 (t ), v2 (t ), . . . , vN (t )] denotes the vector of innovation process. The model is envisioned as a generative model with three layers, as depicted in Fig. 1. First, a state sequence (z(1 ), z(2 ), . . . , z(T )) is generated by Markov process, where z(t ) ∈ {1, 2, . . . , K }. Afterwards, the innovation process is generated by Gaussian distribution process associated to the state sequence. At last, the sources are generated according to the vector AR model. There are two limitations for the generative model that need to clarify: (1) It is first order Markov process, i.e. the state at time t, z(t), only depends on the state at time t − 1, z (t − 1 )

Pr (z (t )|z (t − 1 ), z (t − 2 ), . . . , z (1 ) ) = Pr (z (t )|z (t − 1 ) ).

(3)

(2) The value of innovation process at time t, V(t), has no dependency with other variables but the state at time t

fV|z (V(t )|z (t ),  ) = fV|z (V(t )|z (t ) ),

(4)

where  represents the set of variables except z(t). The initial state probability of Markov process is πk = Pr(z(1 ) = k ), the transition probability from state i to state k is pik = Pr (z (t ) = k|z (t − 1 ) = i ), i, k = 1, 2, . . . , K. According to Eq. (4), the

Please cite this article as: J. Li et al., Non-stationary sources separation based on maximum likelihood criterion using source temporal– spatial model, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2017.08.034

ARTICLE IN PRESS

JID: NEUCOM

[m5G;September 15, 2017;9:44]

J. Li et al. / Neurocomputing 000 (2017) 1–9

3

conditional probability density function (CPDF) of innovation process vector V(1: T) with respect to (w.r.t.) state sequence z(1: T) is given by

fV|z (V(1 : T )|z (1 : T ) ) =

T t=1

fV|z (V(t )|z (t ) )

(5)

The CPDF of V(t) w.r.t. z (t ) = k is

fV|z (V(t )|z (t ) = k ) = N (V(t ); μk , k )

(6)

where N (·; , ·, · ) denotes Gaussian density function,

μk = [μk1 , μk2 , . . . , ⎡ 2 σk11 σk212 2 ⎢σ 2 ⎢ k21 σk22 k = ⎢ . .. ⎣ .. . 2 2 σkN1 σkN2

μkN ]T , ⎤ · · · σk21N · · · σk22N ⎥ ⎥ 2 2 .. ⎥(σki j = σk ji , i, j = 1, 2, . . . , N ) .. ⎦ . . 2 · · · σkNN

are the covariance matrix and mean vector of V(t) at kth state respectively. In this paper we assume μk = 0. For joint Gaussian probability distribution, if covariance σki2 j = 0(i = j ), the sources are mutually independent, otherwise they are related. Therefore, the source model we proposed is able to represent dependent sources. At initial time S(1 ) = V(1 ), the probability density function (PDF) of S(1) is K 

f S (S (1 ) ) =

Fig. 2. Graphical models of observations. (a) At the initial time step t = 1. (b) Transition from time step t = 1 to T. The solid square represents one time slice.

fX|z (X(t )|z (t − 1 ) = i ) =

=

=

pik N X(t );

L 

Ml X(t − l ), Rk

(11)

l=1

The JLF of observations is formulated as



fX X(1 : T ); θ

Pr (z (1 ) = k ) fS|z (S(1 )|z (1 ) = k )

K 

=

Pr (z (1 : T ) = (k1 , k2 , . . . , kT ) )

k1 ,k2 ,...,kT =1





· fX|z X(1 : T )z (1 : T ) = (k1 , k2 , . . . , kT ); θ

πk N (S(1 ); 0, k )

(7)

=

K 





Pr (z (1 ) = k ) fX|z X(1 )z (1 ) = k; θ



k=1

The joint probability density function (JPDF) of sources at time t is given by

fS (S(t ) ) =



K  k=1

k=1

K 

Pr (z (t ) = k|z (t − 1 ) = i ) fX|z (X(t )|z (t ) = k )

k=1

k=1 K 

K 



Pr (z (t ) = k )N S(t );

k=1

L 

·



Dl S(t − l ), k



T K  

Pr (z (t ) = k1 |z (t − 1 ) = k2 ) fX|z X(t )|z (t ) = k1 ; θ

t=2 k1 ,k2 =1

(8)



(12)



where θ = ∈ , θ denotes the unknown parameter set,  denotes the unknown parameters space. Fig. 2 shows the probability generative model of observations.

l=1

2.3. Observed signals distribution model

{k }Kk=1 , {Dl }Ll=1 , A, π, P

3. Proposed algorithm Substituting Eq. (2) into Eq. (1), the vector AR model of observations is obtained

X(t ) =

L 

The estimation of unknown parameter set θ by ML estimator is given by

ADl S(t − l ) + AV (t )

θ = arg max log fX X(1 : T ); θ

l=1

=

L 

ADl A

−1

=

Ml X(t − l ) + AV (t )

According to Eqs. (11) and(12), logarithm JML is given by

(9)



log fX X(1 : T ); θ = log

l=1

K 

+

K 

Pr (z (1 ) = k ) fX|z (X(1 )|z (1 ) = k )

πk N (X(1 ); 0, Rk )

T 

log

t=2

πk N (X(1 ); 0, Rk )

K  k1 ,k2 =1



pk2 k1 N X(t );

L 

Ml X(t − l ), Rk1

(14)

l=1

The complete data in EM algorithm [30] is chosen to be F =

k=1

=

K  k=1

where Ml = ADl A−1 . At initial time, X(1 ) = AV (1 ),

f X (X (1 ) ) =

(13)

θ ∈

X(t − l ) + AV (t )

l=1 L 

3.1. Construction auxiliary function based on ML criterion

(10)

(X(1 : T ), y(1 : T )), where y(1: T) is a discrete hidden state index sequence, y(t ) = [y1 (t ), y2 (t ), . . . , yK (t )]T . The PDF of y(t) is given by

k=1

where Rk = Ak AT . The joint likelihood function (JLF) of observations X(t) under condition z (t − 1 ) = i is given by

fy (y(t ) ) =

K 

Pr (z (t ) = k )δ (yk (t ) − 1 )

(15)

k=1

Please cite this article as: J. Li et al., Non-stationary sources separation based on maximum likelihood criterion using source temporal– spatial model, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2017.08.034

ARTICLE IN PRESS

JID: NEUCOM 4

[m5G;September 15, 2017;9:44]

J. Li et al. / Neurocomputing 000 (2017) 1–9

where δ (·) denotes Dirac δ function, and



yk (t ) =

if z (t ) = k otherwise.

1 0

(16)

The joint logarithm likelihood of the observation sequence X(1: T) and discrete hidden state index sequence y(1: T) is given by



log fX,y X(1 : T ), y(1 : T ); θ =

K 

yk (1 ) log (πk N (X(1 ); 0, Rk ) ) +

k=1



·yk2 (t ) log

pk2 k1 N X(t );

T K   t=2 k1 ,k2 =1

L 

Ml X(t − l ), Rk1

yk1 (t )

(17)

The auxiliary function based on ML criterion is obtained by taking expectation of Eq. (17) w.r.t. the conditional distribution of hidden state index f(y(1: T)|X(1: T).). The auxiliary function is given by



= F1 + F2 + F3

(18)

F1 =

F2 =

(27)

3.2.2. M-step The M-step maximizes auxiliary function w.r.t. θ , which is equivalent to maximize F1 , F2 , F3 w.r.t. π , P, θ¯ respectively. First, optimization of π can be written as

K 

s.t. K 

ξk1 k2 (t ) log pk1 k2

T  K 

γk (t ) log ϕk (t )

K 

γk (1 ) log πk

k=1

πk = 1

(28)

k=1

(20)

t=1 k1 ,k2 =1

F3 =

α (t ) pi j β j (t + 1 )ϕ j (t + 1 ) ξi j (t ) = K Ki i =1 j =1 αi (t ) pi j β j (t + 1 )ϕ j (t + 1 )

(19)

k=1 T −1 

(26)

π

γk (1 ) log πk

(25)

α (t )βk (t ) γk (t ) = K k j=1 α j (t )β j (t )

π = arg max

where K 

⎧ β (T ) = 1 ⎪ ⎨ k  βk (t ) = Kj=1 pk j ϕ j (t + 1 )β j (t + 1 ) ⎪ ⎩  fX (X(1 : T )|θ ) = Kk=1 αk (1 )πk ϕk (t )

Then, γ k (t) and ξ ij (t) is easily derived as

l=1

L = Ey|X log fX,y X(1 : T ), y(1 : T ); θ

which is the probability of ending the partial observation sequence X(t + 1 ), X(t + 2 ), . . . , X(T ) and started at state k at time instance t. β k (t) satisfies

(21)

t=1 k=1

and γk (t ) = Pr(z(t ) = k|X, θ ), ξk1 k2 (t ) = Pr(z(t ) = k1 , z(t + 1 ) = k2 |X, θ ), ϕk (t ) = fX (X(t )|z(t ) = k, θ ). Observing from Eqs. (19) to (21), it can be concluded that the initial state probability π of HMM can be estimated by optimizing F1 , the state transition probability P of HMM can be estimated by optimizing F2 , the parameter set θ¯ = {{k }Kk=1 , {Dl }Ll=1 , A} can be estimated by optimizing F3 .

One can obtain the updating rule for π k by using a Lagrange multiplier λ, and setting the derivative equal to zero

  K K  ∂  γ (1 ) log πk + λ πk − 1 =0 ∂ πk k=1 k k=1 γ (1 )

One obtain πk = kλ . Combine with the constraint that πk = 1, the update equation for π k can be obtained as

k=1

(30)

Optimization of P can be written as

P

The EM method maximizes auxiliary function in an iterative manner that alternates between the E-step and M-step.

K

πk = γk (1 )

P = arg max F2 = arg max

3.2. EM algorithm

(29)

P

K 

s.t.

T −1  K 

ξi j (t ) log pi j

t=1 i, j=1

pi j = 1

(31)

j=1

3.2.1. E-step The E-step calculates the posterior probability γ mk . However, it is difficult to calculate γ mk directly, because it needs to traverse all the possible state sequence. Fortunately, a famous relatively efficient algorithm named Baum–Welch procedure [33] can be extended to the case. Recall the forward procedure. Define

 αk (t ) = f X(1 : t ), z(t ) = kθ

(23)

t=1

(32)

As F3 can not be directly solved, gradient-based optimization ¯ algorithm is employed   to obtain the suboptimal solution of θ = {k }Kk=1 , {Dl }Ll=1 , A . Combine with Eq. (11), the expansion of Eq. (21) is given by

F3 =

= (24)



T −1 ξi j (t ) ξi j (t ) = t=1 T −1 T −1 ξ t ( ) j=1 t=1 i j t=1 γi (t )



T  X(t ) − Ll=1 Ml X(t − l ) R−1 1 k − γk (t )

L 2 · X(t ) − l=1 Ml X(t − l ) + log |Rk | k=1

T  K  t=1

The backward procedure is similar,

βk (t ) = f (X(t + 1 : T )|z(t ) = k, θ )

T −1

pi j = K

(22)

which is the probability of seeing the partial observation sequence X(1), X(2), . . . , X(t ) and ending up in state k at time instance t. It implies that α k (t) satisfies

⎧ α ( 1 ) = πk ϕ k ( 1 ) ⎪ ⎨ k   αk (t + 1 ) = Kj=1 α j (t ) p jk ϕk (t ) ⎪ ⎩  fX (X(1 : T )|θ ) = Kk=1 αk (T )

In a similar way, one can also obtain the updating rule for pij K j=1 pi j = 1,

by using a Lagrange multiplier with the constraints as follows

T  K  t=1 k=1



T

 1 − γk (t )⎣ X(t ) − ADl A−1 X(t − l ) 2 L





Ak AT

+u

−1

l=1

Please cite this article as: J. Li et al., Non-stationary sources separation based on maximum likelihood criterion using source temporal– spatial model, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2017.08.034

ARTICLE IN PRESS

JID: NEUCOM

[m5G;September 15, 2017;9:44]

J. Li et al. / Neurocomputing 000 (2017) 1–9

· X(t ) −

L 

5

⎤   ADl A−1 X(t − l ) + log Ak AT ⎦ + u

l=1

=

T  K 



t=1 k=1

1 γ (t )Bk (t ) + u 2 k

(33)

where u is a constant independent from θ¯ .

Bk (t ) = XT (t )WT Ck WX (t ) −

L 

XT (t )WT Ck Dl WX (t − l )

l=1



L 

XT (t − l )WT Dl Ck WX (t ) − 2 log |W| − log |Ck |

l=1

+

L  L 

XT (t − l1 )WT Dl1 Ck Dl2 WX (t − l2 )

(34)

l1 =1 l2 =1

W = A−1 denotes demixing matrix, Ck = −1 . The updating rules k of θ¯ based on gradient optimization are formulated as follows: Update W

∂ F3 ∂W

W←W+η

(35)

where η denotes the convergence factor, T  K 1 ∂ F3  ∂ B (t ) = − γ (t ) k ∂ W t=1 k=1 2 k ∂W

(36) Fig. 3. Flow chart of the proposed AR-HMGP algorithm.

∂ Bk (t ) = 2Ck WX (t )XT (t ) ∂W L 

−2 Ck Dl W X(t )XT (t − l ) + X(t − l )XT (t ) l=1

+

L  L 



∂ Bk (t ) = −Ck WX (t )XT (t − l )WT ∂ Dl L 

 T

Ck Dl  WX t − l X (t − l )WT

+

Dl1 Ck Dl2 W X(t − l1 )XT (t − l2 ) + X(t − l2 )XT (t − l1 )

l  =1

l1 =1 l2 =1

+

(37)

L 



WX (t − l )XT t − l  WT Dl  Ck

l  =1

Update Ck

− Ck WX (t − l )XT (t )WT

(43)

∂ F3 Ck ← Ck + η ∂ Ck

(38)

In summary, Fig. 3 shows the flow chart of the proposed ARHMGP algorithm.

T 1 ∂ F3  ∂ B (t ) = − γ (t ) k ∂ Ck t=1 2 k ∂ Ck

(39)

3.3. Parameters initialization Although EM algorithm converges fast, it may converge towards a local optimum instead of the global optimum of auxiliary function. Hence, the initialization of parameters is essential for finding a good solution and minimizing the number of EM algorithm iterations that need to attain a local optimum. In this subsection, we introduce an initialization method. The demixing matrix should be initialized by using principle component analysis (PCA) so that the initial source estimations are mutual uncorrelated. The well-known Yule–Walker equations can be used to estimate the coefficient matrices, which is given by

∂ Bk (t ) = WX (t )XT (t )WT ∂ Ck L  L 

+

Dl1 WX (t − l1 )XT (t − l2 )WT Dl2

l1 =1 l2 =2



− C−1 k

T



L 

WX (t )XT (t − l )WT Dl

l=1 L 



WXT (t − l )X(t )WT Dl

DY W =

l=1

∂ F3 ∂ Dl

T  K 1 ∂ F3  ∂ B (t ) = − γ (t ) k ∂ Dl t=1 k=1 2 k ∂ Dl

T 



X(t )XTL (t )

t=1

Update Dl

Dl ← Dl + η



(40)

(41)

(42)

T 

−1

XL (t )XTL (t )

(44)

t=1

where XL (t ) = [x1 (t − 1 ), . . . , x1 (t − L ), . . . , xN (t − 1 ), . . . , xN (t − L )]T , diag(Dl ) = [DY W (n, (n − 1 )L + l )]N . The covariance matrix of inn=1   novation process is initialized to k = E X(1 : T )XT (1 : T ) . The initial state probability and state transition probability are initialized to πk = 1/K, P = 1/K, respectively, where 1 is a K × K matrix with full 1.

Please cite this article as: J. Li et al., Non-stationary sources separation based on maximum likelihood criterion using source temporal– spatial model, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2017.08.034

ARTICLE IN PRESS

JID: NEUCOM 6

[m5G;September 15, 2017;9:44]

J. Li et al. / Neurocomputing 000 (2017) 1–9

The recommended setting for the parameters L and K depends on several items. If K is chosen to be 1 (hence, the innovation process is stationary Gaussian), the source model can only represent stationary sources, then AR-HMGP reduces to second order statistics BSS [4]. In this case, only spectral diversity is used, and the true distribution of innovation is ignored. If L is chosen to be 0 (hence, the source model only has spatial structure), then only spatial structure information is used to attempt to separate the sources [26]. Hence, K = 1 and L = 0 should not be used simultaneously. Both temporal and spatial structure information are used when K ≥ 2, L ≥ 1. Although more parameters describes a more exact source, ML-based BSS algorithms often work well even if there are only approximately correct descriptions of sources [34,35]. Hence, too large of L and K are not necessary. After several experiments, the suggested values of L and K are 1 ≤ L ≤ 4 and 2 ≤ K ≤ 4. 4. Simulations In this paper, two performance indexes are employed to evaluate the proposed algorithm: signal to interference ratio (SIR) and similarity coefficient ρ . The SIR is given by Eq. (45), which is defined for the case of no permutations,

SIR =

N 1 10log10 SIRn N

Fig. 4. Separation performance as a function of K.

(45)

n=1

where SIRn =

T N

(Wn,1:N A1:N,n sn (t ) )

2 . T t=1 Wn,1:N A1:N,n sn (t ) 2

t=1

n =1 ( n =n )

Be readily appreciated,

the larger SIR, the better performance of separation algorithm. The similarity coefficient ρ is given by

          sˆi (t ), si (t )   ρi =     

    2  T T 2    t=1 sˆi (t ) t=1 (si (t ) )

(46)

where · denotes inner product operation, sˆi (t ) denotes the estimation of si (t). A total of four algorithms are compared. They include the proposed AR-HMGP, joint approximate diagonalization of eigenmatrices with temporal decorrelation (JADETD ) [36], EASI [37] and FastICA [38]. JADETD is decorrelation-based algorithm, EASI and FastICA are ICA-based algorithms. 4.1. Noiseless The first experiment tests the separation for two audio signals, where the two sources are selected from eight speech signals randomly (the sources are sampled at 8KHz) of length 10 0 0. Fig. 4 shows the separation performance as a function of the number of Markov model state K, where the order of vector AR model is 3. When K = 2, there is a surplus around 10–34 dB compared with the other three algorithms. As expected, the performance improves with the increasing of K. However, when K > 4, there is little improvement. The sources in the second experiment are generated in the same way with the first experiment. Fig. 5 shows the separation performance as a function of the order of vector AR model L, where the number of Markov state is 4. Simulation results demonstrate that the proposed algorithm is robust w.r.t L. When L = 1, AR-HMGP outperforms the other three algorithms. As expected, the performance improves with increasing of L. However, when L > 4, there is little performance improvement.

Fig. 5. Separation performance as a function of L.

4.2. Noisy data Since the sources inevitably suffer noise damage in actual applications, the sensitivity of separation performance to additive white Gaussian noise (AWGN) is studied in this subsection. The sensitivity is measured by using signal to noise ratio (SNR). SNR is given by

SNR =

N 1 var (xn (t ) ) 10log10 N var (ωn (t ) )

(47)

n=1

where ωn (t) denotes the AWGN which is added to the observations xn (t), var (· ) denotes calculating the variance of · . The third experiment uses two synthetic sources generated by the generative model, where the parameters are set as follows: L = 3, D1 = diag([0.7, 0.2] ), D2 = diag([−0.4, −0.5] ), D3 = diag([−0.3, 0.3] ), K = 4, π = [0.25, 0.25, 0.25, 0.25], 1 = diag([0.3162, 0.5477] ), 2 = diag([0.3162, 0.8367] ), 3 = diag([0.7071, 0.5477] ), 5/8 1/8 5/8 1/8 1/8 1/8

4 = diag([0.7071, 0.8367]), P = [11//88

1/8 1/8 5/8 1/8

1/8 1/8 ]. 1/8 5/8

10 0 0 samples

are generated for each source.

Please cite this article as: J. Li et al., Non-stationary sources separation based on maximum likelihood criterion using source temporal– spatial model, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2017.08.034

JID: NEUCOM

ARTICLE IN PRESS

[m5G;September 15, 2017;9:44]

J. Li et al. / Neurocomputing 000 (2017) 1–9

Fig. 6. Similarity coefficients between sources and recovered signals versus SNR.

7

Fig. 8. Separation results of SIR versus the length of sources. Table 1 Dependent sources separation results of SIR (dB).

SNR = 10 dB SNR = 19 dB Noiselessness

EASI

FastICA

JADETD

AR-HMGP

13.5713 14.1882 14.1713

13.9719 14.2091 14.5713

13.0614 14.4861 14.4124

36.4302 47.6530 58.8592

D2 = diag([−0.4, −0.3]), D3 = diag([−0.3, 0.9]). Fig. 8 shows the separation performance as a function of the length of sources with SNR = 15 dB. Since the coefficients of the sources are much different, i.e. the sources have a big diversity of spectra, and also the two sources have strong temporal structure, the three classical BSS algorithms work well in this case.The performances of the tested four algorithms improve with increasing of the length of sources. The proposed algorithm outperforms the other three algorithms, especially for short sources. 4.3. Dependent sources

Fig. 7. Separation results of SIR versus SNR.

The sixth experiment tests the separation for dependent sources. Two synthetic sources are tested in this subsection. The covariance matrices of the four innovation process states are 1 =

1 Fig. 6 shows the similarity coefficient between sources and recovered signals for SNR = 0 to 20 dB. From Fig. 6, one can observe that the performance of the tested algorithms improve with increasing of SNR. It is obvious that AR-HMGP outperforms the other three algorithms. The fourth experiment involves two synthetic sources whose parameters are identical with those in the third experiment. Fig. 7 shows the separation performance as a function of SNR. From Fig. 7, one can observe that the proposed algorithm performs well even at low SNR, and the separation result improves with increasing SNR. Compared with the other three algorithms, AR-HMGP achieves a gain around 10–40 dB. It is mainly because the sources are approximate Gaussian and have similar spectra diversity (i.e. the AR model coefficients of the two sources are approximate), in which case decorrelation-based and ICA-based algorithms almost do not work. The fifth experiment uses two synthetic sources with identical parameters in the third experiment except for changing the coefficient matrices of vector AR model into D1 = diag([0.7, 0.1]),



0.5 0.5 1

, 2 =

3



1.5 1.5 1

, 3 =

1



1.5 1.5 4

, 4 =

3 3 3 4

. The other param-

eters of sources are identical with those in the fifth experiment. Table 1 shows separation results of the four BSS algorithms for dependent sources of SIR with SNR = 10 dB, 19 dB and noiselessness respectively. From Table 1, one can observe that our algorithm is better than the three classical BSS algorithms for dependent sources separation. When SNR = 10 dB, our algorithm surpass the other three algorithms around 21 dB, and increases with increasing of SNR. 4.4. Real data In the last experiment, we build a wire communication system to simulate instantaneous mixture environment. The principle block diagram and physical photo are presented in Fig. 9 and Fig. 10, respectively. The vector signal generator generates two sources, then the sources are divided into two signals for each source by two splitters. After this, two signals are chosen to make amplitude attenuation. Then two mixed signals are obtained

Please cite this article as: J. Li et al., Non-stationary sources separation based on maximum likelihood criterion using source temporal– spatial model, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2017.08.034

JID: NEUCOM 8

ARTICLE IN PRESS

[m5G;September 15, 2017;9:44]

J. Li et al. / Neurocomputing 000 (2017) 1–9

center frequency 2 MHz, transmission power −10 dBm. The attenuation values of the two attenuators are both −6 dB. The sample rate of the data acquisition card is 40 MHz. From Fig. 11, one can observe that the crosstalk of our algorithm is much better than those of the other three algorithms. 5. Conclusion

Fig. 9. The principle block diagram of simulation instantaneous mixture system.

In this paper, we proposed an effective temporal–spatial BSS algorithm is proposed. Specifically, the proposed algorithm is based on ML criterion using EM optimization approach. AR-HMGP is able to separate non-stationary sources which may be dependent and have overlap spectrum, there is great separation gain compared with three classical BSS algorithms. In this paper, a generative model is used to effectively represent the temporal–spatial structure of source where the temporal structure is described by using AR model and spatial structure is described by using HMGP model. Both temporal and spatial structures are considered in the algorithm and significant performances are achieved. The proposed algorithm is also able to separate dependent sources and has high robustness against noise. References

Fig. 10. Physical photo of the simulation instantaneous mixture system.

Fig. 11. Separation results for different BSS algorithms. (a), (b) EASI; (c), (d) FastICA; (e), (f) JADETD ; (g), (h) AR-HMGP.

from two combiners and sampled by the data acquisition card. At last, the samples are processed by the PC machine to recover sources. The PC specifications are Intel(R) Core(TM) i5-4590, PCI4012 40 Msps/ch, 12bits A/D; the software package used to collect the measurement data is TopView20 0 0. Two typical non-stationary signals, tone signal and amplitude modulation signal, are generated by the vector signal generator. The system setting parameters are set as follows. The frequency and transmission power of the tone signal are set as 2 MHz and 0 dBm, respectively. The parameters of the amplitude modulation signal are set as alphabets set { ± 1, ± 3, ± 5}, symbol rate 40 Ksps,

[1] C. Jutten, J. Herault, Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture, Signal Process. 24 (1) (1991) 1–10. [2] Y. Duan, H. Zhang, Noisy blind signal-jamming separation algorithm based on VBICA, Wireless Personal Commun. 74 (2) (2014) 307–324. [3] G. Bao, Z. Ye, Y. Zhou, A compressed sensing approach to blind separation of speech mixture based on a two-layer sparsity model, IEEE Trans. Audio Speech Lang. Process. 21 (5) (2013) 899–906. [4] M. Souden, S. Araki, K. Kinoshita, T. Nakatani, A multichannel MMSE-based framework for speech source separation and noise reduction, IEEE Trans. Audio Speech Lang. Process. 21 (9) (2013) 1913–1928. [5] P. Ahmadian, S. Sanei, L. Ascari, L. Gonzlez-Villanueva, U.M. Alessandra, Constrained blind source extraction of readiness potentials from eeg, IEEE Trans. Neural Syst. Rehab. Eng. A Publ. IEEE Eng. Med. Biol. Soc. 21 (4) (2013) 567–575. [6] M. Mohammadi, S.H. Sardouie, M.B. Shamsollahi, Denoising of interictal EEG signals using ICA and time varying ar modeling, in: Proceedings of the Biomedical Engineering, 2014. [7] S. Van Gerven, D. Van Compernolle, Signal separation by symmetric adaptive decorrelation: stability, convergence, and uniqueness, IEEE Trans. Signal Process. 43 (7) (1995) 1602–1612. [8] H.C. Wu, J.C. Principe, A unifying criterion for blind source separation and decorrelation: simultaneous diagonalization of correlation matrices, Neural Networks for Signal Processing (1997) 496–505. [9] H.A. Inan, A.T. Erdogan, Convolutive bounded component analysis algorithms for independent and dependent source separation, IEEE Trans. Neural Netw. Learn. Syst. 26 (4) (2015) 697–708. [10] S. Cruces, Bounded component analysis of noisy underdetermined and overdetermined mixtures, IEEE Trans. Signal Process. 63 (9) (2015). 1–1 [11] Q. Su, Y. Shen, C. Deng, W. Zhao, L. Zhang, A fast algorithm for the separation of dependent sources based on bounded component analysis, in: Proceedings of the IEEE/CIC International Conference on Communications in China, 2016, pp. 1–6. [12] X. Li, X. Wang, Q. Fu, Y. Yan, Dynamic group sparsity for non-negative matrix factorization with application to unsupervised source separation, in: Proceedings of the IEEE International Workshop on Acoustic Signal Enhancement, 2016, pp. 1–5. [13] S. Mirzaei, H.V. Hamme, Y. Norouzi, Under-determined reverberant audio source separation using Bayesian non-negative matrix factorization, Speech Commun. 81 (2016) 129–137. [14] Y. Mitsufuji, S. Koyama, H. Saruwatari, Multichannel blind source separation based on non-negative tensor factorization in wavenumber domain, in: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2016, pp. 56–60. [15] C.A. Palagan, K.P. Geetha, K. Maniysundar, R.K. Thampi, A prediction method for blind source separation of speech signals by instantaneous mixing plus auto regressive model, Int. J. Appl. Eng. Res. 10 (2) (2015) 2353–2369. [16] H. Kameoka, T. Yoshioka, M. Hamamura, J.L. Roux, K. Kashino, Statistical model of speech signals based on composite autoregressive system with application to blind source separation, Lect. Notes Comput. Sci. 6365 (2010) 245–253. [17] A. Boudjellal, A. Mesloub, K. Abed-Meraim, A. Belouchrani, Separation of dependent autoregressive sources using joint matrix diagonalization, IEEE Signal Process. Lett. 22 (8) (2015) 1180–1183. [18] A. Mukherjee, S. Maiti, A. Datta, Spectrum sensing for cognitive radio using blind source separation and hidden Markov model, in: Proceedings of the International Conference on Advanced Computing & Communication Technologies, 2014, pp. 409–414.

Please cite this article as: J. Li et al., Non-stationary sources separation based on maximum likelihood criterion using source temporal– spatial model, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2017.08.034

JID: NEUCOM

ARTICLE IN PRESS

[m5G;September 15, 2017;9:44]

J. Li et al. / Neurocomputing 000 (2017) 1–9 [19] D.H.T. Vu, R. Haeb-Umbach, Blind speech separation exploiting temporal and spectral correlations using 2d-HMMS, in: Proceedings of the Signal Processing Conference, 2013, pp. 1–5. [20] R. Choudrey, S. Roberts, Bayesian ICA with hidden Markov model sources, in: Proceedings of the Independent Component Analysis, 2003. [21] J. Liu, K. Li, Z. He, L. Mei, An ma model based blind source separation algorithm, in: Proceedings of the IEEE Region 10 Conference TENCON 99, vol. 2, 20 0 0, pp. 1363–1366. [22] E. Doron, A. Yeredor, P. Tichavsky, Cramcrao-induced bound for blind separation of stationary parametric gaussian sources, IEEE Signal Process. Lett. 14 (6) (2007) 417–420. [23] B. Liu, V.G. Reju, A.W.H. Khong, V.V. Reddy, A GMM post-filter for residual crosstalk suppression in blind source separation, IEEE Signal Process. Lett. 21 (8) (2014) 942–946. [24] X.L. Li, T. Adali, Noncircular complex ICA by generalized householder reflections, IEEE Trans. Signal Process. 61 (24) (2013) 6423–6430. [25] H. Snoussi, J. Idier, Bayesian blind separation of generalized hyperbolic processes in noisy and underdeterminate mixtures, IEEE Trans. Signal Process. 54 (9) (2006) 3257–3269. [26] F. Gu, H. Zhang, D. Zhu, Blind separation of non-stationary sources using continuous density hidden Markov models, Dig. Signal Process. 23 (5) (2013) 1549–1564. [27] E. Jacquier, M. Johannes, N. Polson, Mcmc maximum likelihood for latent state models, J. Econ. 137 (2) (2007) 615–640. [28] W. Chan, S.W. Raudenbush, Maximum likelihood estimation in generalized linear mixed models using monte carlo methods: application to small-area estimation of breast cancer mortality, Chin. J. Appl. Probab. Stat. 22 (22) (2006) 69–80. [29] M. Sahmoudi, K. Abed-Meraim, M. Lavielle, E. Kuhn, Blind source separation of noisy mixtures using a semi-parametric approach with application to heavy– tailed signals, in: Proceedings of the 2005 European on Signal Processing Conference, 2005, pp. 397–415. [30] T. Routtenberg, J. Tabrikian, MIMO-AR system identification and blind source separation for GMM-distributed sources, IEEE Trans. Signal Process. 57 (5) (2009) 1717–1730. [31] T. Ryden, Em versus markov chain monte carlo for estimation of hidden Markov models: a computational perspective, Bayesian Anal. 3 (4) (2008) 659–716. [32] H. Abdi, L.J. Williams, Principal component analysis, Wiley Interdiscipl. Rev. Comput. Stat. 2 (4) (2010) 433–459. [33] J.B. Bilmes, A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models, Proceedings of the International Computer Science Institute, vol. 4, 1997. [34] S.A. Cruces-Alvarez, A. Cichocki, S.I. Amari, On a new blind signal extraction algorithm: different criteria and stability analysis, IEEE Signal Process. Lett. 9 (8) (2002) 233–236. [35] J.F. Cardoso, ois, Infomax and maximum likelihood for blind source separation, IEEE Signal Process. Lett. 4 (4) (1997) 112–114. [36] K.R. Muller, P. Philips, A. Ziehe, Jade TD: combining higher-order statistics and temporal information for blind source separation (with noise), in: Proceedings of the Independent Component Analysis, 2010, pp. 87–92. [37] J.F. Cardoso, B.H. Laheld, Equivariant adaptive source separation, IEEE Trans. Signal Process. 44 (12) (1997) 3017–3030. [38] A. Hyvarinen, Fast and robust fixed-point algorithms for independent component analysis., IEEE Trans. Neural Netw. 10 (3) (1999) 626–634.

9 Jiong Li received the B.S. degree in communications engineering from Ocean University of China, Qingdao, China, in 2011 and the M.S. degree in communications engineering from PLA University of Science and Technology, Nanjing, China, in 2014. He is currently working toward the PH.D. degree in information and communications engineering with the college of Communications Engineering, PLA University of Science and Technology. His current research interests include 5G self-interference cancellation, signal processing in communications and blind source separation.

Hang Zhang received the M.S. degree from Southeast University, Nanjing, China in 1989 and the B.S. degree from PLA University of Science and Technology, Nanjing, China in 1984. She is now a professor and also the Ph. D. supervisor at PLA University of Science and Technology. Her research interests include wireless communication, satellite communication and signal processing in communications.

Menglan Fan received the B.S. degree in communications engineering from PLA University of Science and Technology, Nanjing, China, in 2014. She is currently working toward the M.S. degree in information and communications engineering with the college of Communications Engineering, PLA University of Science and Technology. Her current research interests include cognitive radio technology, green communications and blind source separation.

Jiang Zhang received the B.S. degree in communications engineering from Taiyuan University of Technology, Taiyuan, China, in 2006, the M.S. degree and the Ph.D degree in information and communications engineering from PLA University of Science and Technology, Nanjing, China, in 2009 and 2013. He is currently a researcher at the Nanjing Institute of Telecommunication Engineering. His current research interests include satellite communication, signal processing in communications and blind source separation.

Please cite this article as: J. Li et al., Non-stationary sources separation based on maximum likelihood criterion using source temporal– spatial model, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2017.08.034