On the convergence of ICA algorithms with weighted orthogonal constraint

On the convergence of ICA algorithms with weighted orthogonal constraint

Digital Signal Processing 28 (2014) 39–47 Contents lists available at ScienceDirect Digital Signal Processing www.elsevier.com/locate/dsp On the co...

556KB Sizes 1 Downloads 95 Views

Digital Signal Processing 28 (2014) 39–47

Contents lists available at ScienceDirect

Digital Signal Processing www.elsevier.com/locate/dsp

On the convergence of ICA algorithms with weighted orthogonal constraint ✩ Xingjia Tang, Jimin Ye ∗ , Xiufang Zhang School of Mathematics and Statistics, Xidian University, Xi’an 710071, China

a r t i c l e

i n f o

Article history: Available online 22 February 2014 Keywords: Independent component analysis Weighted orthogonal constraint Symmetric MDWUM Convergence analysis

a b s t r a c t A kind of weighted orthogonal constrained independent component analysis (ICA) algorithms with weighted orthogonalization for achieving this constraint is proposed recently. It has been proved in the literature that weighted orthogonal constrained ICA algorithms keep the equivariance property and have much better convergence speed, separation ability and steady state misadjustment, but the convergence is not yet analyzed in the published literature. The goal of this paper is to fill this gap. Firstly, a characterization of the stationary points corresponding to these algorithms using symmetric Minimum Distance Weighted Unitary Mapping (MDWUM) for achieving the weighted orthogonalization is obtained. Secondly, the monotonic convergence of the weighted orthogonal constrained fixed point ICA algorithms using symmetric MDWUM for convex contrast function is proved, which is further extended to nonconvex contrast functions case by adding a weighted orthogonal constraint term onto the contrast function. Together with the boundedness of contrast function, the convergence of fixed point ICA algorithms with weighted orthogonal constraint using symmetric MDWUM is implied. Simulation experiments results show that the adaptive ICA algorithms using symmetric MDWUM are better in terms of accuracy than the ones with pre-whitening, and the fixed-point ICA algorithms using symmetric MDWUM converge monotonically. © 2014 Elsevier Inc. All rights reserved.

1. Introduction ICA consists of recovering statistically independent but otherwise unobserved source signals from their linear mixtures without prior knowledge of the source signals and the mixing coefficients. As a hot topic in the field of signal processing, ICA has been widely applied in seismic exploration, mobile communication, speech processing, array signal processing and biomedical engineering, etc. [1]. The algorithms in ICA are derived from searching methods aiming to maximize a given contrast function that reflects a measure based on independence or non-Gaussianity of the source signals. Whitening is the prerequisite of ICA, however, there is no equivariance property in the ICA algorithms based on pre-whitened data. To solve this problem, by transforming the pre-whitening into weighted orthogonal constraint, a new definition of contrast function is proposed in [2], various weighted orthogonal constrained

✩ This work is supported by National Natural Science Foundation of China under grants 61075117, 11101322. Corresponding author. E-mail addresses: [email protected] (T. Xingjia), [email protected] (J. Ye), [email protected] (X. Zhang).

*

http://dx.doi.org/10.1016/j.dsp.2014.02.005 1051-2004/© 2014 Elsevier Inc. All rights reserved.

adaptive ICA algorithms and a weighted orthogonal constrained Fast ICA algorithm are proposed in [2–5]. In the past, a major result about the convergence of the oneunit ICA algorithms was provided in [6], where it is shown that the one-unit Fast ICA algorithms have cubic convergence for the special choice of kurtosis contrast function. In [7], Regalia and Kofidis have shown the monotonic convergence of the one-unit fixed point ICA algorithms for a larger set of contrast functions. In [8], Hao and Huper used a dynamical systems framework for the local convergence analysis of Fast ICA. For the convergence analysis of the symmetric ICA algorithms, Oja proposed a convergence analysis for the symmetric Fast ICA algorithm based on a kurtosis contrast function [9], which also appeared in [10]. Furthermore in [11], both one-unit and symmetric ICA algorithms are shown to approach the Crammer–Rao lower bound for certain scenarios. Recently, a comprehensive result on the convergence of ICA algorithms with symmetric orthogonalization was reported in [12]. The goal of this article is to contribute to the convergence analysis of the recently proposed ICA algorithms with weighted orthogonal constraint. Firstly, we introduce the weighted orthogonal constraint and the symmetric MDWUM for achieving this constraint. Secondly, a convenient characterization of stationary points of both gradient ascent and fixed point ICA algorithms with this constraint

40

T. Xingjia et al. / Digital Signal Processing 28 (2014) 39–47

using symmetric MDWUM are provided. Then, we analyze the monotonic convergence of weighted orthogonal constrained fixed point ICA algorithm using symmetric MDWUM based on convex contrast functions, and extend it to nonconvex contrast functions case. Finally, we prove the relevant analysis results by simulation experiments. 2. ICA with weighted orthogonal constraint In the ICA model, the observed signals are the output of a group of sensors, what each sensor received is a mixture of multiple source signals. n source signals issued by st = [s1 (t ), . . . , sn (t )] T are received by m sensors, then observed signals xt = [x1 (t ), . . . , xm (t )] T are generated according to

xt = Ast ,

t = 1, 2, . . .

(1)

where A is mixing matrix. Without loss of generality, we assume that the source signals have zero mean and unit variance, and at most, one of them is Gaussian distributed. Solving ICA problem is to find a matrix B called separating matrix, which makes the matrix transformation

yt = Bxt ,

t = 1, 2, . . .

(2)

be an estimation of the source signals, where yt = [ y 1 (t ), . . . , yn (t )] T are called separated signals [1]. Except for the observed signals and the statistically independent assumption of the source signals, nothing can be used in solving ICA problem. Therefore, a contrast function based on independence or non-Gaussianity of the source signals is first designed, then the maximum point of it yields the optimal separating matrix. Generally, two kinds of optimizing updates are used in ICA algorithms. Gradient ascent update:

Bk+1 = Bk + ηk ∇ B J (Bk )

(3)

Fixed-point update:

Bk+1 = ∇ B J (Bk )

(4)

where J (B) is contrast function, η is learning step-size and ∇ B J (B) is the gradient of contrast function with respect to B [1]. As the pre-processing of ICA, pre-whitening makes ICA somewhat easier. The pre-whitened observed signals are given by

vt = Vxt ,

t = 1, 2, . . .

(5)

where vt = [ v 1 (t ), . . . , v n (t )] have unit variance, V is pre-whitening matrix. In fact, the separating matrix B can be factorized into the product of a new separating matrix W and pre-whitening matrix, at this time, the separating outputs (2) can be written as T

yt = Bxt = WVxt = Wvt ,

t = 1, 2, . . .

(6)

In addition, we know that the source signals have unit variance from the basic assumption of ICA model, so the separated signals should also have unit variance, hence













E yt ytT = E Wvt vtT W T = WE vt vtT W T

= WWT = I

(7)

which illustrates that, the new separating matrix W is orthogonal when observed signals are pre-whitened [1,13]. Equivariance property is a significant property of adaptive ICA algorithms, which means the evolution law of the global transfer matrix Ck = Bk A is independent of the mixing matrix [3,14]. However, because of the accumulated error in pre-whitening stage, the

ICA algorithms based on pre-whitened data often have no equivariance property. As a result, when the condition number of mixing matrix is bad or some source signals are weak, the separated results may be very poor. To overcome the above weakness, supposed the pre-whitening matrix is obtained, and instead of multiplying from left to observed signals xt to get the pre-whitened observed signals vt , we multiply pre-whitening matrix from right to W, then the separating matrix B = WV should satisfy









BRx B T = WVE xt xtT V T W T = WE Vxt xtT V T W T

  = WE vt vtT WT = WWT = I

(8)

where Rx is the covariance matrix of observed signals. In other words, the separating matrix B is weighted orthogonal when observed signals are not pre-whitened. Based on this, a new definition of contrast is proposed by transforming the pre-whitening into the condition (8) which is called weighted orthogonal constraint. Definition 1. (See [3].) An alternative contrast is defined as a multivariate mapping J (·) from the set of probability distribution of yt = Bxt to the real number set R, so that J (xt , B) obtains its maximum (minimum) if and only if B = PDA−1 for any t, where B is an n × m full rank matrix satisfying BRx B T = I, P is a permutation matrix and D is a nonsingular diagonal matrix. The new contrast can be formulated as the following optimization model



∀t s.t. BRx B = I, B ∈ Rnwort ×m max J (xt , B) T

(9)

where Rnwort ×m denotes the set composed by n × m weighted orthogonal matrices. In this case, the shortcoming that the accumulated error in the estimation of pre-whitening matrix degenerates the equivariance property of the ICA algorithms based on pre-whitened data will be evaded. For this constrained optimization problem, we can use projection method to solve it [13,15–17]. That is to say, we solve the above optimization problem with an unconstrained learning first, which might be gradient ascent update or fixed-point update, then after each iteration step, the updated B is projected weighted orthogonally onto the constraint set so that it satisfies the constraint. To sum up, the ICA algorithms with weighted orthogonal constraint based on projection method are composed of the following two steps [1,3]: Step 1: an unconstrained ICA learning based on classic contrast (3) or (4), ignoring the weighted orthogonal constraint. Step 2: the enforcement of the weighted orthogonal constraint (8), namely, to make the updated B be weighted orthogonal. 3. Symmetric MDWUM and related lemmas In this section, the symmetric MDWUM is introduced first, then two important Lemmas about symmetric MDWUM are proposed and one Lemma about generalized eigne-decomposition of matrix pencil is introduced. For simplicity, from now on, we assume that m = n. According to the above ICA algorithms with weighted orthogonal constraint based on projection method, for achieving this constraint, we need to do weighted orthogonalization to B after the traditional update. Generally, two methods are used [12,18,19], one is Gram–Schmidt procedure based on QR factorization, which is applied to the updated B to convert its columns to weighted orthogonal vectors, the other is symmetric MDWUM, in which case, the argument is mapped to the closest weighted orthogonal matrix in both Frobenius norm and induced-2-norm sense. In other

T. Xingjia et al. / Digital Signal Processing 28 (2014) 39–47

words, given Q a nonsingular square matrix, then symmetric MD(Q): the closest weighted orthogonal WUM maps the Q to MMDWUM WU matrix to Q, namely, MMDWUM (Q) satisfies WU

 MDWUM 2 M (Q) − Q F = min B − Q2F WU B∈Rnwort ×n

  = min Tr (B − Q)(B − Q)T B∈Rnwort ×n

 1  T T T = min Tr R− x − BQ − QB + QQ B∈Rnwort ×n

  = −2 min Tr BQT + c

(10)

B∈Rnwort ×n

where subscript “WU” means the mapped matrix is a “weighted unitary” matrix. Suppose the matrix Q has the singular value deT composition (SVD) Q = UQ  Q VQ , where  Q is a diagonal matrix consisted of the square root of eigenvalues of QQ T , UQ and VQ are corresponding unitary matrix, later, in Theorem 3, we will obtain −1/2

MMDWUM (Q) = UQ VQT Rx WU

−1/2

 





1/2

Rx

1/2

= BRx

  1/2 ∇ B J B∗ = B∗ Rx S and S = ST

(13)

(12)

B∗ = MMDWUM B∗ + ηk ∇ B J B∗ WU

Lemma 1. For a full rank square matrix Q and weighted orthogonal ma1/ 2 1/ 2 (BRx Q) = BRx MMDWUM (Q). trix B, MMDWUM WU WU 1/ 2

When B is weighted orthogonal, BRx is orthogonal, so this Lemma is clearly established according to (11). −1/2

Theorem 1. For the ICA algorithms with weighted orthogonal constraint using symmetric MDWUM and gradient ascent update, where the stepsize is sufficiently small such that B in (3) is a full rank matrix, a weighted orthogonal matrix B∗ is the stationary point of the above algorithm if and only if there exists a matrix S satisfying

Proof. A weighted orthogonal matrix B∗ is the stationary point of this ICA algorithm if and only if B∗ is mapped back to itself after gradient ascent update and symmetric MDWUM, which can be mathematically written as

where  is positive definite, B is weighed orthogonal, namely, 1/ 2 1/ 2 1/ 2 BRx is orthogonal: BRx (BRx ) T = BRx B T = I. Before the convergence analysis of this new ICA algorithm, we first propose the following two Lemmas about symmetric MDWUM and introduce one lemma about generalized eigen-decomposition of matrix pencil that will be used latter.

(Q) = Rx Lemma 2. For a full rank square matrix Q, MMDWUM WU only if Q is a positive-definite matrix.

A separating matrix is referred as the stationary point of ICA algorithm if the algorithm iterations starting from this point will remains at the same point, and once the algorithm reaches to one of these stationary points, it becomes “trapped” at that point. Based on this, we propose the following two Theorems, which provides the stationary point characterization of ICA algorithms with weighted orthogonal constraint using symmetric MDWUM based on gradient ascent update and fixed-point update respectively. Here, we assume that the gradient of the contrast function with respect to B exists and is of full rank over the constraint.

(11)

In fact, we can rewrite T Q = U Q  Q VQ = UQ  Q UQT UQ VQT Rx

41

if and

Together with (11), according to eigenvalue decomposition and SVD, this Lemma is easy to be proved. Lemma 3. (See [20].) Suppose matrix pencil (A, B) is regular, then 1. all the generalized eigenvalue of matrix pencil (A, B) is real, i.e., λi ∈ R, i = 1, . . . , n. 2. there exit an n × n matrix C, such that CBC T = In , CAC T = diag(λ1 , . . . , λn ). This Lemma shows that the eigenvalue problem of regular matrix pencil is similar to one of Hermitian matrix of this matrix pencil. 4. Stationary points of ICA algorithms with weighted orthogonal constraint On the convergence of ICA algorithm, we should analyze “which points” the algorithm converges to and “how” it converges. In this section, we will first obtain a general characterization of the stationary points of ICA algorithms with weighted orthogonal constraint using symmetric MDWUM.







 −1/2   1/2 1/2 = B∗ Rx MMDWUM Rx + ηk Rx B∗T ∇ B J B∗ WU

(14)

where last equalities follow from B∗ Rx B∗ T = I and Lemma 1. As a result, B∗ to be stationary point can be rewritten as

 −1/2   1/2 −1/2 MMDWUM Rx + ηk Rx B∗T ∇ B J B∗ = Rx WU

(15)

Due to step-size is enough small, the argument of above MMDWUM WU is a full rank matrix. Therefore, according to Lemma 2, B∗ is a sta−1/2 1/ 2 tionary point if and only if Rx + ηk Rx B∗ T ∇ B J (B∗ ) is a positivedefinite matrix, which is further equivalent to the condition that 1/ 2 Rx B∗ T ∇ B J (B∗ ) is a symmetric matrix. In other words, there exists a matrix S satisfying





Rx B∗ T ∇ B J B∗ = S 1/2



⇒

 ∗

∇B J B

and

S = ST

= B∗ Rx S 1/2

2

(16)

Here, if Rx B∗ T ∇ B J (B∗ ) is positive-definite matrix, then B∗ is 1/ 2 a stationary point for any nonnegative step-size. If Rx B∗ T ∇ B J (B∗ ) is a symmetric matrix with negative eigenvalues, then B∗ is a stationary point for the non-negative step-size that should satisfy [12] 1/ 2









ηk < 1 λmin Rx1/2 B∗T ∇ B J B∗ ∀k  0

(17)

Theorem 2. For the ICA algorithms with weighted orthogonal constraint using symmetric MDWUM and fixed-point update (4), a weighted orthogonal matrix B∗ is the stationary point if and only if there exists a matrix S satisfying (13). Proof. A weighted orthogonal matrix B∗ is the stationary point of this ICA algorithm if and only if B∗ is mapped back to itself after fixed-point update and symmetric MDWUM, which can be mathematically written as





∇ B J B∗ B∗ = MMDWUM WU



 1/2 ∗T   1/2 = B∗ Rx MMDWUM Rx B ∇ B J B∗ WU

(18)

where last equalities follow from B∗ Rx B∗ T = I and Lemma 1. As a result, B∗ to be stationary point can be rewritten as

 1/2 ∗T   −1/2 MMDWUM Rx B ∇ B J B∗ = Rx WU

(19)

42

T. Xingjia et al. / Digital Signal Processing 28 (2014) 39–47

Then, according to Lemma 2 and the full rank assumption on the gradient of contrast function with respect to B, B∗ is a sta1/ 2 tionary point if and only if Rx B∗ T ∇ B J (B∗ ) is a positive-definite matrix, which is also symmetric. In other words, there exists a matrix S satisfying (16). 2 5. Monotonic convergence of fixed-point ICA algorithms with weighted orthogonal constraint For the ICA algorithms with weighted orthogonal constraint using symmetric MDWUM, the convergence analysis is obstructed by the complication caused by symmetric MDWUM. In this section, we will analyze “how” this fixed-point ICA algorithm converges. First, we will concentrate on the monotonic convergence analysis based on the convex contrast functions. Then, extend it to the nonconvex contrast functions case. Based on Theorems 1 and 2, we first propose the following Theorem which will give the solution of (10) and be useful in the proof of monotonic convergence.

Theorem 4. An ICA algorithm with weighted orthogonal constraint using symmetric MDWUM corresponding to the optimization setting in fixed-point update (4) where J is a smooth convex contrast function (over a convex set containing the set of weighted orthogonal matrices), which is bounded on the set of weighted orthogonal matrices, is monotonically convergent to one of the stationary points defined by Theorem 2. Proof. For given J which is a smooth convex function of B (over a convex set containing the set of weighted orthogonal matrices), there is





J (Bk+1 )  J (Bk ) + Tr ∇ B J (Bk ) T (Bk+1 − Bk )

(25)

With fixed-point update and symmetric MDWUM, for any given Bk which is not a stationary point of the problem (9)



Bk = MMDWUM (∇ B J (Bk )) WU

(26)

Bk+1 = MMDWUM (∇ B J (Bk )) WU According to Theorem 3, these guarantee that

Theorem 3. Given a full rank matrix Q, the unique global maximum of the problem



max Tr(BQ T ) s.t. BRx B T = I

(20)

Proof. Based on Theorems 1 and 2, a local optimum B∗ of this problem should satisfy

Q = B Rx S and

S=S

T

(21)

Assuming Q has a SVD Q = of the above problem, we can get

T UQ  Q VQ ,



1/2  T

Q T Q = S T B∗ Rx

for the local optimum points

B∗ Rx S 1/2

= ST S = VQ  Q  Q VQT T  = VQ VQT VQ VQT

(22)

Here, (B∗ Rx ) T B∗ Rx = B∗ Rx (B∗ Rx ) T = B∗ Rx B∗ T = I, this is 1/ 2 because B∗ Rx is an orthogonal square matrix, and  = F Q , in which F is a diagonal matrix with 1’s, and/or −1’s on the diagonal. So we can conclude that S is a matrix with 1/ 2

T S = VQ VQ

1/ 2

and

1/ 2

1/ 2

S = ST

(23)

Then, according to (21), we can get that a local optimum point of the above problem should satisfy −1/2

B∗ = QS−1 Rx







−1/2

1 = U Q F −1 V− Q Rx

−1/2

1 Tr B∗ Q T = Tr UQ F−1 V− Q Rx







T VQ  Q U Q

(27)

So combining (25), for any Bk , Bk+1 pair in (4), we have

J (Bk+1 ) > J (Bk )

is given by B∗ = MMDWUM (Q). WU

∗ 1/2



Tr ∇ B J (Bk ) T Bk+1 > Tr ∇ B J (Bk ) T Bk

(28)

Therefore, for convex contrast functions in ICA algorithm, this algorithm is monotonically convergent. 2 5.2. Extension of monotonic convergence for nonconvex contrast functions For nonconvex contrast functions in ICA algorithms with weighted orthogonal constraint using symmetric MDWUM and fixed-point update, we need to modify the update expression from the convex case onto the nonconvex case, such that the resulting algorithm is guaranteed to be monotonically convergent. ∂J When the Hessian matrix ∂∂b ( ∂ b ) T (or H J (B)) of J (B) to vectorized coordinates b = [B:,T 1 B:,T 2 · · · B:,T m ] T is not positivesemidefinite, we can introduce an auxiliary contrast function [12,21]





P (B) = J (B) + γ Tr BRx B T − I

(29)

over the domain D = {B | BRx B I}, which means I − BRx B T is a positive-semidefinite matrix. Then, the gradient of P (B) with respect to B can be written as T

∇ B P (B) = ∇ B J (B) + γ BRx

(30)

The Hessian matrix ∂∂b ( ∂∂ bP ) T of P (B) to vectorized coordinates b is given by



(24)

Among all local optimum points specified by (24), the value of Tr(B∗ Q T ) at the global optimum point achieved for F = I is strictly greater than its values at other local optimum obtained for F = I. Then the global optimum point B∗ for the above problem is obtained by projecting Q to the set of weighted orthogonal matrices using symmetric MDWUM (11). 2

H P (B) = H J (B) + γ Rx

(31)

where γ Rx denotes m × m in (29). 2



Rx ⎢0

0 Rx

0

0

Rx = ⎣

·

·

0 0

· ···

··· ··· · 0

2

Hessian matrix of the second term



0 0 ⎥

·



Rx

5.1. Monotonic convergence for convex contrast functions

It is easy to see that Rx is positive matrix. The first term H J (B) is a symmetric matrix and the second term γ Rx is a positive ma-

Based on Theorem 3, for convex contrast functions in the ICA algorithms, we can propose the following Theorem about monotonic convergence of the ICA algorithms with weighted orthogonal constraint using symmetric MDWUM and fixed-point update.

trix, so the matrix pencil ( H J (B), γ Rx ) is a regular matrix pencil. According to Lemma 3, there exists a nonsingular matrix C satisfying CH J (B)C T = diag(λ1 , λ2 , . . . , λm2 ), where λi , i = 1, 2, . . . , m2 denotes the eigenvalues of H J (B), and Cγ Rx C T = I. These yield

T. Xingjia et al. / Digital Signal Processing 28 (2014) 39–47

43

Fig. 1. Convergence stability performance curves of various adaptive NG-ICA algorithms.

CH P (B)C T = diag(λ1 , λ2 , . . . , λm2 ) + γ I

(32)

from which, it is easy to see that the Hessian matrix H P (B) is positive if γ satisfies [12]





γ  −inf B∈D λmin H J (B)

(33)

where H J (B) has negative eigenvalues over D due to the fact that J is nonconvex. At this time, H P (B) is positive-semidefinite over D, then P (B) would be convex over D. Therefore, due to Theorem 4, the corresponding new ICA algorithm is monotonically convergent. Regardless of symmetric MDWUM, the fixed-point update expression corresponding to this new contrast function can be written as

Bk+1 = ∇ B P (Bk )

= γ Bk Rx + ∇ B J (Bk )

(34)

To sum up, together with the boundedness of contrast functions, the monotonic convergence of the fixed-point ICA algorithms with weighted orthogonal constraint using symmetric MDWUM is implied. The weighted orthogonality in this paper is the extension of the orthogonality (the weighted matrix is extended from unit matrix I to covariance matrix of the mixture Rx ), so, the conclusions in this paper are the extension of the conclusion in Ref. [12]. What is more, the ICA algorithms without pre-whitening using symmetric MDWUM have the equivariance property, which makes that corresponding ICA algorithms work more reliable, so the relevant convergence analysis is necessary. In addition, like in extension of monotonic convergence to nonconvex contrast functions case, our proof is much more complicated than the corresponding proof in [12], meanwhile, we will use simulation experiments to support our analysis. 6. Simulation experiments In this section, we will provide 3 groups of experiments to illustrate the convergence, stability and accuracy of the ICA algorithms with weighted orthogonal constraint using symmetric MDWUM, and prove the monotonic convergence of this new ICA algorithm based on fixed-point update. In experiments, we use 4 observation sensors, the following signals and some speech signals [2–5, 22–24]:

s1 (t ): s2 (t ): s3 (t ): s4 (t ):

Square wave signal sign(cos(2π × 155t )) High frequency sinusoidal signal sin(2π × 800t ) Phase modulation signal sin(2π × 300t + 6 cos(2π × 60t )) Uniformly distributed noise signal in [−1, 1].

The convergence stability performance is measured by crosstalk error [25]

Ect(k) =

 n n   i =1

j =1

  n  n   |c i j | |c i j | −1 + −1 maxk |c ik | maxk |ckj | j =1

i =1

(35) where Ck = Bk A = {c i j }, and the monotonic convergence is directly described by kurtosis contrast function. The kurtosis is the name given to the fourth-order cumulant of a random variable, which is often used to measure the non-Gaussianity of a random variable. Generally, we define the kurtosis of variable y in the real case as follows [1]





 

kurt( y ) = E y 4 − 3 E y 2

2

(36)

here, if y is whitening, there is E { y 2 } = 1. The following 3 groups of experiments are used to simulate the adaptive and the batch gradient descent ICA algorithms and fixedpoint ICA algorithm respectively using adaptive natural gradient ICA (NG-ICA) algorithm, batch NG-ICA algorithm and fast fixedpoint ICA (Fast ICA) algorithm. In experiment 1, we select the above 4 signals: s1 (t ), s2 (t ), s3 (t ), s4 (t ) as the source signals. The weighted orthogonal constrained adaptive NG-ICA algorithm using symmetric MDWUM, the adaptive NG-ICA algorithm based on pre-whitened data and the traditional adaptive NG-ICA algorithm are executed simultaneously. In three algorithms, the step-sizes all take value 0.01. The convergence stability performance curves of different algorithms are given in Fig. 1. In experiment 2, we select a speech signal and the above 3 signals: s2 (t ), s3 (t ), s4 (t ) as the source signals, whose waveforms are plotted in Fig. 2. The weighted orthogonal constrained batch NGICA algorithm using symmetric MDWUM and the batch NG-ICA algorithm based on pre-whitened data are executed simultaneously. In two algorithms, the step-sizes all take value 0.02. Similarly, the separated signals waveform achieved by batch NG-ICA algorithms

44

T. Xingjia et al. / Digital Signal Processing 28 (2014) 39–47

Fig. 2. Source signals waveform using a speech signal and 3 analog signals.

Fig. 3. Separated signals waveform achieved by batch NG-ICA algorithm with symmetric MDWUM.

using symmetric MDWUM and the convergence stability performance curves of different algorithms are given in Figs. 3 and 4. In experiment 3, we select 3 speech signals and the above noise signal s4 (t ) as the source signals, whose waveform are plotted in Fig. 5. The convergence stability performance of the weighted orthogonal constrained Fast ICA algorithm using symmetric MDWUM and the Fast ICA algorithm based on pre-whitened data are compared in Fig. 7. Meanwhile, the separated signals waveform achieved by Fast ICA algorithm using symmetric MDWUM are given in Fig. 6. Here, the update rule of the weighted orthogonal constrained Fast ICA algorithm is:

 





 



bi = E xt g (bi xt ) − E g (bi xt ) E xt (bi xt ) BkT Bk

(37)

where bi , i = 1, . . . , n is the i-th row of Bk , g (y) = [ g 1 ( y 1 ), g 2 ( y 2 ), . . . , gn ( yn )] T are some increasing odd functions, called activation functions, for supergaussian source signals, we usually set g i ( y i ) = 2 tanh( y i ), i = 1, . . . , n, for subgaussian source signals, we

usually set g i ( y i ) = y 3i , i = 1, . . . , n. In addition, for proving the monotonic convergence of weighted orthogonal constrained Fast ICA algorithm using symmetric MDWUM, the kurtosis contrast functions values of the weighted orthogonal constrained Fast ICA algorithm and the Fast ICA algorithm based on pre-whitened data are directly compared in Fig. 8. It is needed to be emphasized that, in all experiments, we must do symmetric MDWUM followed by the basic update for Bk in the every iteration of weighted orthogonal constrained ICA algorithms. In simulation, the mixing matrix is generated randomly in each run and its element is subject to the uniform distribution in [−1, 1]; the sampling period of source signals is T s = 0.0001 s. The convergence stability performance curves and the kurtosis contrast functions curves are all plotted according to the mean values of 200 run results, and the kurtosis contrast functions values are the mean of absolute values of kurtosis of all separated signals.

T. Xingjia et al. / Digital Signal Processing 28 (2014) 39–47

45

Fig. 4. Convergence stability performance curves of various batch NG-ICA algorithms.

Fig. 5. Source signals waveform using 3 speech signals and 1 analog signal.

By Figs. 1, 4 and 7, we can see that crosstalk error of ICA algorithms with pre-whitening are convergent gradually, so are the ICA algorithms with symmetric MDWUM, but the values of latter in the adaptive NG-ICA algorithm are smaller when the algorithm is convergent, which shows that the new adaptive NG-ICA algorithm with weighted orthogonal constraint using symmetric MDWUM is more accuracy. Meanwhile, although we do not do pre-whitening on observed signals in weighted orthogonal constrained ICA algorithms, but the convergence speed and stability of the batch algorithms are basically same to the ones based on pre-whitened data. By Figs. 2, 3, 5 and 6, we can see that using batch NG-ICA and Fast ICA algorithms with symmetric MDWUM to separate speech signals is effective. In addition, by Fig. 8, we can also see that the Fast ICA algorithm with symmetric MDWUM is monotonic convergent, the mean values of kurtosis contrast functions of separated signals increases monotonic to be invariant, whose track is the same to the Fast ICA algorithm with pre-whitening.

7. Conclusion In this paper, various analytical results related to the convergence behaviors of the ICA algorithms with weighted orthogonal constraint are presented. In particular, 1) A convenient characterization of the stationary points of the ICA algorithms with weighted orthogonal constraint using symmetric MDWUM is obtained where it is shown that a weighted orthogonal matrix B∗ is the stationary point if and only if the corresponding gradient of the contrast function with respect to B is 1/ 2 equal to B∗ Rx , multiplied from right by a symmetric matrix. 2) For convex contrast functions in the fixed-point ICA algorithm with weighted orthogonal constraint using symmetric MDWUM, the monotonic convergence of the corresponding algorithm is proved, which can be extended to nonconvex contrast function case by adding a weighted orthogonal constraint term onto the contrast function. 3) Simulation experiments results show that these new adaptive ICA algorithms using symmetric MDWUM are better in terms

46

T. Xingjia et al. / Digital Signal Processing 28 (2014) 39–47

Fig. 6. Separated signals waveform achieved by Fast ICA algorithm with symmetric MDWUM.

Fig. 7. Convergence stability performance curves of various Fast ICA algorithms.

Fig. 8. Kurtosis contrast functions of various Fast ICA algorithms.

of accuracy than the ones with pre-whitening, and the new fixedpoint ICA algorithms using symmetric MDWUM converge monotonically. Furthermore, the global stationary point of these new ICA algorithms with weighted orthogonal constraint can be obtained by the conventional ICA optimization followed by symmetric MDWUM.

[6] A. Hyvarinen, E. Oja, A fast fixed-point algorithm for independent component analysis, Neural Comput. 9 (7) (1997) 1483–1492. [7] P.A. Regalia, E. Kofidis, Monotonic convergence of fixed-point algorithms for ICA, IEEE Trans. Neural Netw. 14 (4) (2003) 943–949. [8] H. Shen, K. Huper, Local Convergence Analysis of FastICA, Lecture Notes Comput. Sci., vol. 3889, 2006, pp. 893–900. [9] E. Oja, Convergence of the symmetrical FastICA algorithm, in: Proc. 9th Int. Conf. Neural Inf. Process., vol. 3, Nov. 2002, pp. 1368–1372. [10] E. Oja, Y. Zhijian, The FastICA algorithm revisited: convergence analysis, IEEE Trans. Neural Netw. 17 (6) (2006) 1370–1381. [11] P. Tichavsky, Z. Koldovsky, E. Oja, Performance analysis of the FastICA algorithm and Cramer–Rao bounds for linear independent component analysis, IEEE Trans. Signal Process. 54 (4) (2006) 1189–1203. [12] A.T. Erdogan, On the convergence of ICA algorithms with symmetric orthogonalization, IEEE Trans. Signal Process. 57 (6) (2009) 2209–2221. [13] J.H. Manton, Optimization algorithms exploiting unitary constraints, IEEE Trans. Signal Process. 50 (3) (2002) 635–650. [14] J.F. Cardoso, B. Laheld, Equivariant adaptive source separation, IEEE Trans. Signal Process. 44 (12) (1996) 3017–3030. [15] C.D. Scott, S.I. Amari, S.Y. Kung, On gradient adaptation with Unit-Norm constraints, IEEE Trans. Signal Process. 48 (6) (2000) 1843–1847. [16] P.A. Absil, R. Mahony, R. Sepulchre, Optimization Algorithms on Matrix Manifolds, Princeton University Press, Princeton, NJ, 2008. [17] A. Edelman, T. Arias, S.T. Smith, The geometry of algorithms with orthogonality constraints, SIAM J. Matrix Anal. Appl. 20 (2) (1999) 303–353.

References [1] A. Hyvarinen, J. Karhunen, E. Oja, Independent Component Analysis, Wiley, New York, 2001. [2] Y. Jimin, H. Ting, New fast-ICA algorithms for blind source separation without pre-whitening, Commun. Comput. Inf. Sci. 225 (2011) 579–585. [3] Y. Jimin, J. Haihong, Z. Qingrui, Adaptive weighted orthogonal constrained algorithm for blind source separation, Digit. Signal Process. 23 (2013) 514–521. [4] Z. Xiaolong, Z. Xianda, D. Zizhe, et al., Adaptive nonlinear PCA algorithms for blind source separation without pre-whitening, IEEE Trans. Circuits Syst. 53 (3) (2006) 745–752. [5] T. Xingjia, Z. Xiufang Jimin Y, Adaptive step-size natural gradient ICA algorithm with weighted orthogonalization, Circuits, Syst. and Signal Process. 2013 (6), 1–11.

T. Xingjia et al. / Digital Signal Processing 28 (2014) 39–47

[18] J.F. Cardoso, On the performance of orthogonal source separation algorithms, in: Proc. EUSIPCO, vol. 7, 1994, pp. 776–779. [19] T.C. Moody, T.T. Nickolay, On a differential equation approach to the weighted orthogonal Procrustes problem, Stat. Comput. 8 (1998) 125–133. [20] M.c. Pease, Methods of Matrix Algebra, Academic Press, New York, 1965. [21] D.P. Bertsekas, A. Nedic, A. Ozdaglar, Convex Analysis and Optimization, Athena Scientific, New York, 2003. [22] Z. Xiaolong, Z. Xianda Jimin Y, Natural gradient-based recursive least-squares algorithm for blind source separation, Sci. China, Ser. F 47 (1) (2004) 55–65. [23] P. Pajunen, J. Karhunen, Least-squares method for blind source separation based on non-linear PCA, Int. J. Neural Syst. 8 (5–6) (1998) 601–612. [24] H.H. Yang, S. Amari, Adaptive on-line learning algorithms for blind separation: maximum entropy and minimum mutual information, Neural Comput. 9 (7) (1997) 1457–1482. [25] Z. Xiaolong, Z. Xianda, Blind source separation based on optimally selected estimating functions, J. Xidian Univ. 30 (3) (2003) 335–339.

Xingjia Tang was born in Gansu, China, in 1987. He is a student working for his M.S. degree major in Operational Research and Cybernetics

47

in Xidian University, Xi’an, China, and will be a researcher in Spectral Imaging Technology Laboratory of Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences. His research interests include blind signal processing and spectral image processing. Jimin Ye was born in Shaanxi, China, in 1967. He received his M.S. degree in mathematics from Shaanxi Normal University, Xi’an, China, in 1990, and his Ph.D. degree in information and communication engineering from Xidian University, Xi’an, China, in 2005, respectively. Currently, he is a Professor in School of Mathematics and Statistics of Xidian University. His research interests include statistical signal processing and neural network in signal processing. Xiufang Zhang was born in Shandong, China, in 1988. She is a student working for her M.S. degree major in Applied Mathematics in Xidian University, Xi’an, China. Her research interests include data processing and Bayesian Network.