A stable MCA learning algorithm

A stable MCA learning algorithm

Computers and Mathematics with Applications 56 (2008) 847–860 www.elsevier.com/locate/camwa A stable MCA learning algorithm Dezhong Peng a,∗ , Zhang ...

560KB Sizes 3 Downloads 114 Views

Computers and Mathematics with Applications 56 (2008) 847–860 www.elsevier.com/locate/camwa

A stable MCA learning algorithm Dezhong Peng a,∗ , Zhang Yi a , Jian Cheng Lv a , Yong Xiang b a Computational Intelligence Laboratory, School of Computer Science and Engineering, University of Electronic Science and Technology

of China, Chengdu 610054, PR China b School of Engineering and Information Technology, Deakin University, Waurn Ponds Campus, Geelong, VIC 3217, Australia

Received 31 December 2006; received in revised form 17 December 2007; accepted 4 January 2008

Abstract Minor component analysis (MCA) is an important statistical tool for signal processing and data analysis. Neural networks can be used to extract online minor component from input data. Compared with traditional algebraic approaches, a neural network method has a lower computational complexity. Stability of neural networks learning algorithms is crucial to practical applications. In this paper, we propose a stable MCA neural networks learning algorithm, which has a more satisfactory numerical stability than some existing MCA algorithms. Dynamical behaviors of the proposed algorithm are analyzed via deterministic discrete time (DDT) method and the conditions are obtained to guarantee convergence. Simulations are carried out to illustrate the theoretical results achieved. c 2008 Elsevier Ltd. All rights reserved.

Keywords: Neural networks; Minor component analysis (MCA); Deterministic discrete time (DDT) system; Eigenvector; Eigenvalue

1. Introduction The eigenvector associated with the smallest eigenvalue of the correlation matrix of input data is called minor component (MC). In many information processing areas, it is important to extract online minor component from high-dimensional input data streams. As an important feature extraction technique, minor component analysis (MCA) has been widely applied in many fields of signal processing and data analysis, such as total least squares (TLS) [1], moving target indication [2], clutter cancellation [3], curve and surface fitting [4], digital beamforming [5], frequency estimation [6] etc. In the past several years, many neural networks learning algorithms were proposed to solve the MCA problem, such as [4,7–12]. These learning algorithms can be used to extract minor component from input data without calculating the correlation matrix in advance, which makes neural networks method more suitable in realtime applications and have a lower computational complexity than traditional algebraic approaches, such as eigenvalue decomposition (EVD) and singular value decomposition (SVD). However, there is a divergence problem in some existing MCA algorithms [7,8]. For example, OJAn algorithm [4], MCA EXIN algorithm [7] and LUO algorithm [11] may suffer ∗ Corresponding author.

E-mail addresses: [email protected], [email protected] (D. Peng). c 2008 Elsevier Ltd. All rights reserved. 0898-1221/$ - see front matter doi:10.1016/j.camwa.2008.01.016

848

D. Peng et al. / Computers and Mathematics with Applications 56 (2008) 847–860

from the divergence of the weight vector norm, as discussed in [7]. In order to guarantee convergence, some selfstabilizing MCA learning algorithms are proposed, for instance, MOLLER algorithm [8], CHEN algorithm [9] and DOUGLAS algorithm [10]. In these self-stabilizing algorithms, the weight vector of the neuron can be guaranteed to converge to a normalized minor component. In this paper, following the studies in [8–10], we propose a stable neural networks learning algorithm for minor component analysis, which has a more satisfactory numerical stability compared to some existing MCA algorithms. Convergence of learning algorithms is crucial to practical applications [13–19]. Usually, MCA learning algorithms are described by stochastic discrete time (SDT) systems. It is very difficult to study the convergence of the SDT system directly. In [8–10], dynamics of MCA algorithms are indirectly proved via a deterministic continuous time (DCT) method. To use this DCT method, the learning rates of algorithms must approach zero. Unfortunately, due to the roundoff limitation and tracking requirements, this restrictive condition cannot be satisfied in many practical applications, where a constant learning rate is usually selected [16]. Recently, a deterministic discrete time (DDT) method has been used to study the dynamics of SDT systems [14–19]. The DDT method transforms the SDT system into a corresponding DDT system. Then, the convergence results about the original SDT system can be indirectly obtained via analyzing dynamics of its corresponding DDT system. The DDT method allows a constant learning rate and preserves the discrete time nature of original SDT systems. Compared with traditional DCT method, DDT method is a more reasonable analysis approach for stochastic MCA learning algorithms [16]. In this paper, we analyze dynamical behaviors of the proposed MCA algorithm via DDT method and obtain the conditions to guarantee convergence. This paper is organized as follows. In Section 2, a stable MCA algorithm is proposed. In Section 3, dynamics of the proposed algorithm is analyzed via DDT method. Simulations are carried out to further illustrate the theoretical results in Section 4. Conclusions are given in Section 5. 2. A stable MCA algorithm Suppose that the input sequence {x(k)|x(k) ∈ R n (k = 0, 1, 2, . . .)} is a zero mean stationary stochastic process and consider a linear neuron with the following input output relation: y(k) = w T (k)x(k),

(k = 0, 1, 2, . . .),

where y(k) is the neuron output and w(k) ∈ R n is the weight vector of the neuron. In [20], in order to solve the problem of principal component analysis (PCA), Xu proposed the following PCA learning algorithm w(k + 1) = w(k) + η[(2 − w T (k)w(k))y(k)x(k) − y 2 (k)w(k)],

(1)

where η > 0 is the learning rate. PCA is a statistical method for extracting principal component from input data. Principal component is the direction in which input data has the largest variance, contrary to minor component which is the direction in which input data has the smallest variance. Thus, a corresponding MCA learning algorithm can be obtained by simply reversing the sign in (1) and given as: w(k + 1) = w(k) − η[(2 − w T (k)w(k))y(k)x(k) − y 2 (k)w(k)].

(2)

Unfortunately, theoretical analysis and simulation results show that the above algorithm (2) is divergent, that is, the norm of weight vector w(k) will approach infinity. In order to avoid the problem of divergence, we add a penalty term 0.5[1 − w T (k)w(k)]w(k) to (2) and get a stable MCA learning algorithm as follows w(k + 1) = w(k) + 0.5[1 − w T (k)w(k)]w(k) + η(w T (k)w(k) − 2)y(k)x(k) + ηy 2 (k)w(k).

(3)

The reasons of adding this penalty term include: (1) reducing the deviation of the weight vector norm from unity, i.e., guaranteeing that the proposed algorithm is bounded; (2) improving the stability of learning algorithm, which will be illustrated in Theorem 3 in Section 3. By applying the conditional expectation operator E{w(k + 1)/w(0), x(i), i < k} to (3) and identifying the conditional expected value as the next iterate, a DDT system can be obtained and given as: w(k + 1) = w(k) + 0.5[1 − w T (k)w(k)]w(k) + η(w T (k)w(k) − 2)Rw(k) + ηw T (k)Rw(k)w(k),

(4)

D. Peng et al. / Computers and Mathematics with Applications 56 (2008) 847–860

849

where R = E[x(k)x T (k)] is the correlation matrix of the input sequence {x(k)|x(k) ∈ R n (k = 0, 1, 2, . . .)}. In this paper, we analyze the dynamics of (4) to interpret the convergence of the algorithm (3), indirectly. The DDT system (4) is an “average” of the original stochastic system (3). The convergence analysis results about (4) could reflect some “average” dynamical behaviors of (3). For convenience of analysis, we next give some preliminaries. Since the correlation matrix R is a symmetric positive definite matrix, there exists an orthonormal basis of R n composed of the eigenvectors of R. Let λ1 , . . . , λn be all the eigenvalues of R ordered by λ1 > · · · > λn > 0. Suppose that {vi |i = 1, 2, . . . , n} is an orthonormal basis of R n such that each vi is a unit eigenvector of R associated with the eigenvalue λi . Thus, for each k ≥ 0, the weight vector w(k) can be represented as: w(k) =

n X

z i (k)vi ,

(5)

i=1

where z i (k) (i = 1, 2, . . . , n) are some constants. From (4) and (5), it holds that for all k ≥ 0 z i (k + 1) = [1.5 + (ηλi − 0.5)w T (k)w(k) + ηw T (k)Rw(k) − 2ηλi ]z i (k),

(6)

(i = 1, 2, . . . , n). According to the relevant properties of the Rayleigh Quotient, it holds that λn w T (k)w(k) ≤ w T (k)Rw(k) ≤ λ1 w T (k)w(k),

(7)

for all k ≥ 0. 3. Dynamical analysis In the present section, we study the dynamical behaviors of DDT system (4) via the following three steps: Step 1: in Theorem 1, we prove that if some mild conditions are satisfied, weight vector in (4) is bounded. Step 2: in Theorem 2, it is proven that weight vector in (4) will converge to minor component under some conditions. Step 3: in Theorem 3, we analyze the stability of DDT system (4). In order to analyze convergence of DDT system (4), we need to prove a preliminary conclusion in the following Lemma 1, where it is proven that under some conditions, 1.5 + (ηλi − 0.5)w T (k)w(k) + ηw T (k)Rw(k) − 2ηλi ≥ 0, for any i (i = 1, 2, . . . , n) in (6). This preliminary conclusion means that the projection of weight vector w(k) on eigenvector vi (i = 1, 2, . . . , n), which is denoted as z i (k) = w T (k)vi (i = 1, 2, . . . , n), does not change its sign in (6). Lemma 1. Suppose that ηλ1 < 0.25 and kw(k)k2 ≤ 1 +

2 1−4ηλn ,

then

1.5 + (ηλi − 0.5)w T (k)w(k) + ηw T (k)Rw(k) − 2ηλi ≥ 0, (i = 1, . . . , n), for all k ≥ 0. Proof. Two cases will be considered to complete the proof. 2 Case 1: 2 ≤ kw(k)k2 ≤ 1 + 1−4ηλ . n By ηλ1 ≤ 0.25, it follows from (7) that 1.5 + (ηλi − 0.5)w T (k)w(k) + ηw T (k)Rw(k) − 2ηλi h i = 1 + (0.25 − ηλi ) · 2 − w T (k)w(k) − 0.25w T (k)w(k) + ηw T (k)Rw(k) h i ≥ 1 + (0.25 − ηλn ) · 2 − w T (k)w(k) − 0.25w T (k)w(k) + ηλn w T (k)w(k) = 1.5 − 2ηλn − (0.5 − 2ηλn ) · kw(k)k2   2 ≥ 1.5 − 2ηλn − (0.5 − 2ηλn ) · 1 + 1 − 4ηλn = 0.

850

D. Peng et al. / Computers and Mathematics with Applications 56 (2008) 847–860

Case 2: kw(k)k2 < 2. By ηλ1 ≤ 0.25, it follows from (7) that 1.5 + (ηλi − 0.5) w T (k)w(k) + ηw T (k)Rw(k) − 2ηλi h i = 1 + (0.25 − ηλi ) · 2 − w T (k)w(k) − 0.25w T (k)w(k) + ηw T (k)Rw(k) ≥ 1 − 0.25w T (k)w(k) + ηw T (k)Rw(k) ≥ 1 − (0.25 − ηλn ) w T (k)w(k) > 0.5 + 2ηλn > 0. Using the analysis in the above two cases, we completes the proof.



Next, let us prove the boundedness of DDT system (4). To guarantee the boundedness of all trajectories of (4), we propose a definition of an invariant set. Definition 1. A compact set S ⊂ R n is called an invariant set of (4), if for any w(0) ∈ S, the trajectory of (4) starting from w(0) will remain in S for all k ≥ 0. Clearly, an invariant set is interesting since it provides a method to guarantee non-divergence of (4). Next, we prove an interesting lemma which provides an invariant set for DDT system (4). Lemma 2. Denote  S = w(k)|w(k) ∈ R n and kw(k)k2 ≤ 1 +

 2 . 1 − 4ηλn

If ηλ1 < 0.25, then S is an invariant set of (4). Proof. To complete the proof, two cases will be considered. Case 1: 2 ≤ kw(k)k2 ≤ 1 +

2 1−4ηλn .

By ηλ1 < 0.25, it holds from (7) that 1.5 + (ηλi − 0.5) w T (k)w(k) + ηw T (k)Rw(k) − 2ηλi h i = 1 + (0.25 − ηλi ) · 2 − w T (k)w(k) − 0.25w T (k)w(k) + ηw T (k)Rw(k) ≤ 1 − 0.25w T (k)w(k) + ηλ1 w T (k)w(k) < 1, (i = 1, . . . , n).

(8)

Using Lemma 1, it follows from (6) and (8) that kw(k + 1)k2 = =

n X

z i2 (k i=1 n h X

+ 1)

1.5 + (ηλi − 0.5) w T (k)w(k) + ηw T (k)Rw(k) − 2ηλi

i2

· z i2 (k)

i=1

≤ kw(k)k2 ≤ 1+ Case 2: kw(k)k2 < 2.

2 . 1 − 4ηλn

(9)

851

D. Peng et al. / Computers and Mathematics with Applications 56 (2008) 847–860

By ηλ1 < 0.25 and Lemma 1, it follows from (6) and (7) that kw(k + 1)k2 =

n X

z i2 (k + 1)

i=1

=

n h X

1.5 + (ηλi − 0.5) w T (k)w(k) + ηw T (k)Rw(k) − 2ηλi

i2

· z i2 (k)

i=1

=

n X

h i {1 + (0.25 − ηλi ) · 2 − w T (k)w(k) − 0.25w T (k)w(k) + ηw T (k)Rw(k)}2 · z i2 (k)

i=1

h   i2 ≤ 1 + (0.25 − ηλn ) 2 − kw(k)k2 − (0.25 − ηλ1 )kw(k)k2 · kw(k)k2 h  i2 ≤ 1 + (0.25 − ηλn ) 2 − kw(k)k2 · kw(k)k2 max



2 0≤s≤1+ 1−4ηλ n

[1 + (0.25 − ηλn ) · (2 − s)]2 · s.

h By ηλ1 < 0.25, it can be calculated out that for all s ∈ 0, 1 + [1 + (0.25 − ηλn ) · (2 − s)]2 · s ≤

2 1−4ηλn

(10) i

,

16 (1.5 − 2ηλn )3 . 27(1 − 4ηλn )

(11)

For ηλ1 < 0.25, it holds from (10) and (11) that kw(k + 1)k2 ≤

16 (1.5 − 2ηλn )3 2 2 < <1+ . 27(1 − 4ηλn ) 1 − 4ηλn 1 − 4ηλn

From (9) and (12), we can draw the conclusion that if ηλ1 < 0.25 and w(k) ∈ S, then kw(k + 1)k2 ≤ 1 + i.e., w(k + 1) ∈ S. The proof is completed. 

(12) 2 1−4ηλn ,

Next, using Lemmas 1 and 2, we prove that if some mild conditions are satisfied, the squared norm kw(k)k2 of weight vector will be less than a constant ζ (0 < ζ < 2) after some iterations in (4). Theorem 1. If ηλ1 < 0.25 and w(0) ∈ S, then there exist a constant ζ < 2 and a positive integer N , such that kw(k)k2 ≤ ζ, for all k ≥ N . Proof. Using Lemma 2, S is an invariant set, then w(k) ∈ S for all k ≥ 0. For ηλ1 < 0.25, clearly, 0<

1 − 4ηλn < 2. 1 − 2ηλ1 − 2ηλn

Next, three cases will be considered to complete the proof. Case 1: 0 ≤ kw(k)k2 ≤

1−4ηλn 1−2ηλ1 −2ηλn .

By ηλ1 < 0.25 and Lemma 1, it follows from (6) and (7) that kw(k + 1)k2 =

n X

[1.5 + (ηλi − 0.5) w T (k)w(k) + ηw T (k)Rw(k) − 2ηλi ]2 · z i2 (k)

i=1

=

n X

{1.5 − 0.5w T (k)w(k) + ηw T (k)Rw(k) − [2 − w T (k)w(k)]ηλi }2 · z i2 (k)

i=1



n h X i=1

 i2  1.5 − 0.5kw(k)k2 + ηλ1 kw(k)k2 − 2 − kw(k)k2 ηλn · z i2 (k)

852

D. Peng et al. / Computers and Mathematics with Applications 56 (2008) 847–860

h i2 = 1.5 − 2ηλn − (0.5 − ηλn − ηλ1 ) kw(k)k2 · kw(k)k2  2  1 − 4ηλn 1 − 2ηλ1 − 2ηλn · 1− = 1+ · kw(k)k2 · kw(k)k2 2 1 − 4ηλn 2   1 − 2ηλ1 − 2ηλn 1 2 · kw(k)k2 · kw(k)k ≤ 1+ · 1− 2 1 − 4ηλn  2 1 − 2ηλn − 2ηλ1 ≤ 1.5 − · kw(k)k2 · kw(k)k2 2 − 8ηλn  2 1 − 2ηλn − 2ηλ1 ≤ max 1.5 − · s · s. 2 − 8ηλn 0≤s≤ 1−4ηλn 1−2ηλ1 −2ηλn

By ηλ1 < 0.25, it can be calculated out that  2 1 − 2ηλn − 2ηλ1 1 − 4ηλn 1.5 − , ·s ·s ≤ 2 − 8ηλn 1 − 2ηλ1 − 2ηλn h i 1−4ηλn 1−4ηλn for all s ∈ 0, 1−2ηλ . Thus, kw(k + 1)k2 ≤ 1−2ηλ . −2ηλ n 1 1 −2ηλn Thus, it holds that if kw(0)k2 ≤

1 − 4ηλn , 1 − 2ηλ1 − 2ηλn

kw(k)k2 ≤

1 − 4ηλn < 2, 1 − 2ηλ1 − 2ηλn

then (k ≥ 0).

1−4ηλn Case 2: 1−2ηλ < kw(k)k2 < 2 1 −2ηλn By ηλ1 < 0.25, it follows from (7) that

1.5 + (ηλi − 0.5) w T (k)w(k) + ηw T (k)Rw(k) − 2ηλi h i = 1.5 − 0.5w T (k)w(k) + ηw T (k)Rw(k) − 2 − w T (k)w(k) ηλi   ≤ 1.5 − 0.5kw(k)k2 + ηλ1 kw(k)k2 − 2 − kw(k)k2 ηλn = 1.5 − 2ηλn − (0.5 − ηλ1 − ηλn ) kw(k)k2 ≤ 1. Then, using Lemma 1, it follows from (6) that kw(k + 1)k2 =

n X

[1.5 + (ηλi − 0.5) w T (k)w(k) + ηw T (k)Rw(k) − 2ηλi ]2 · z i2 (k)

i=1

≤ kw(k)k2 . 1−4ηλn This means that if 1−2ηλ < kw(k)k2 < 2, then kw(k)k2 is monotone decreasing. 1 −2ηλn Using the analysis of Case 1, we can obtain that if

1 − 4ηλn < kw(0)k2 < 2, 1 − 2ηλ1 − 2ηλn then, kw(k)k2 ≤ kw(0)k2 < 2, Case 3: 2 ≤ kw(k)k2 ≤ 1 +

(k ≥ 0).

2 1−4ηλn .

D. Peng et al. / Computers and Mathematics with Applications 56 (2008) 847–860

853

By ηλ1 < 0.25 and 2 ≤ kw(k)k2 , it follows from (7) that 1.5 + (ηλi − 0.5) w T (k)w(k) + ηw T (k)Rw(k) − 2ηλi   = 1 + (0.25 − ηλi ) · 2 − kw(k)k2 − 0.25kw(k)k2 + ηw T (k)Rw(k) ≤ 1 − 0.25kw(k)k2 + ηλ1 kw(k)k2 = 1 − (0.25 − ηλ1 ) kw(k)k2 ≤ 0.5 + 2ηλ1 .

(13)

Using Lemma 1, it follows from (6) and (13) that kw(k + 1)k2 =

n X

z i2 (k + 1)

i=1

=

n h X

1.5 + (ηλi − 0.5) w T (k)w(k) + ηw T (k)Rw(k) − 2ηλi

i2

· z i2 (k)

i=1

≤ (0.5 + 2ηλ1 )2 · kw(k)k2 = β 2 · kw(k)k2 ≤ β 2(k+1) · kw(0)k2 , where β = 0.5 + 2ηλ1 . Since ηλ1 < 0.25, clearly, 0 < β < 1. Thus, there must exist a positive integer N , such that kw(N )k2 < 2. Then, using the analysis of cases 1 and 2, we can obtain that for all k ≥ N ,   1 − 4ηλn < 2. kw(k)k2 ≤ max kw(N )k2 , 1 − 2ηλ1 − 2ηλn Using the analysis of cases 1–3, we completes the proof.



At this point, the boundedness of DDT system (4) has been proved. Next, we prove that under some mild conditions, lim w(k) = ±vn ,

(14)

k→∞

in (4), where vn is minor component, i.e., the unit eigenvector associated with the smallest eigenvalue of the correlation matrix. From (5), for each k ≥ 0, w(k) can be represented as w(k) =

n−1 X

z i (k)vi + z n (k)vn ,

(15)

i=1

where z i (k) (i = 1, . . . , n) are some constants. Clearly, the convergence of w(k) can be determined by the convergence of z i (k) (i = 1, . . . , n). From (15), the convergence result (14) can be obtained via proving the following two conclusions: Conclusion 1. limk→∞ z i (k) = 0, (i = 1, 2, . . . , n − 1). Conclusion 2. limk→∞ z n (k) = ±1. The following Lemmas 3 and 4 will provide the proof about Conclusions 1 and 2, respectively. Lemma 3. Suppose that ηλ1 < 0.25, if w(0) ∈ S and w T (0)vn 6= 0, then lim z i (k) = 0,

k→∞

(i = 1, 2, . . . , n − 1).

Proof. By ηλ1 < 0.25, it holds from (7) that ηw T (k)Rw(k) − 0.25w T (k)w(k) ≤ ηλ1 w T (k)w(k) − 0.25w T (k)w(k) ≤ 0.

(16)

854

D. Peng et al. / Computers and Mathematics with Applications 56 (2008) 847–860

Using Lemma 1, we have that for any i (i = 1, . . . , n) 1.5 + (ηλi − 0.5) w T (k)w(k) + ηw T (k)Rw(k) − 2ηλi ≥ 0,

(17)

for all k ≥ 0. Using Theorem 1, there must exist a constant ζ < 2 and a positive integer N , such that kw(k)k2 ≤ ζ,

(18)

for all k ≥ N . By ηλ1 < 0.25, it follows from (7) and (16)–(18) that for all k ≥ N ,   1 + (0.25 − ηλi ) · 2 − w T (k)w(k) 1.5 + (ηλi − 0.5) w T (k)w(k) + ηw T (k)Rw(k) − 2ηλi   ≤ 1.5 + (ηλn − 0.5) w T (k)w(k) + ηw T (k)Rw(k) − 2ηλn 1 + (0.25 − ηλn ) · 2 − w T (k)w(k) 1 + (0.25 − ηλi ) · (2 − ζ ) ≤ 1 + (0.25 − ηλn ) · (2 − ζ ) 1 + (0.25 − ηλn−1 ) · (2 − ζ ) ≤ 1 + (0.25 − ηλn ) · (2 − ζ ) = γ , (i = 1, 2, . . . , n − 1),

(19)

where γ =

1 + (0.25 − ηλn−1 ) · (2 − ζ ) . 1 + (0.25 − ηλn ) · (2 − ζ )

Since ζ < 2 and ηλ1 < 0.25, it holds that 0 < γ < 1.

(20)

By w T (0)vn 6= 0, clearly, z n (0) 6= 0, and then z n (k) 6= 0(k > 0). Using Lemma 1, it follows from (6) and (19) that  2     z i (k + 1) 2 z i (k) 2 1.5 + (ηλi − 0.5) w T (k)w(k) + ηw T (k)Rw(k) − 2ηλi · = z n (k + 1) 1.5 + (ηλn − 0.5) w T (k)w(k) + ηw T (k)Rw(k) − 2ηλn z n (k) 2  z i (k) ≤ γ2 · z n (k)   z i (N ) 2 2(k−N +1) ≤γ · , (i = 1, 2, . . . , n − 1), z n (N ) for all k ≥ N . From (20), clearly, lim

z i (k)

k→∞ z n (k)

= 0,

(i = 1, 2, . . . , n − 1).

Given the invariance of S, z n (k) must be bounded, then, lim z i (k) = 0,

k→∞

(i = 1, 2, . . . , n − 1).

The proof is completed.



Lemma 4. Suppose that ηλ1 < 0.25, if w(0) ∈ S and w T (0)vn 6= 0, then it holds that lim z n (k) = ±1.

k→∞

Proof. Using Lemma 3, clearly, w(k) will converge to the direction of the minor component vn , as k → ∞. Suppose at time k0 , w(k) has converged to the direction of vn , i.e., w(k0 ) = z n (k0 ) · vn .

(21)

D. Peng et al. / Computers and Mathematics with Applications 56 (2008) 847–860

855

From (6), it holds that h i z n (k + 1) = z n (k) · (1.5 − 2ηλn ) + (2ηλn − 0.5)z n2 (k) i h  = z n (k) · 1 + (0.5 − 2ηλn ) 1 − z n2 (k) ,

(22)

for all k ≥ k0 . Since w(0) ∈ S, using Lemma 2, it holds that |z n (k)|2 ≤ kw(k)k2 ≤ 1 +

2 , 1 − 4ηλn

for all k ≥ 0. For ηλ1 < 0.25, clearly,   1 + (0.5 − 2ηλn ) 1 − z n2 (k) ≥ 0,

(23)

for all k ≥ 0. Thus, from (22) and (23), it holds that i h  |z n (k + 1)| = |z n (k)| · 1 + (0.5 − 2ηλn ) 1 − z n2 (k) ,

(24)

for all k ≥ k0 . For ηλ1 < 0.25, then  > 1, |z n (k + 1)| = 1 + (0.5 − 2ηλn ) 1 − z n2 (k) = = 1,  |z n (k)| < 1, h

i

if |z n (k)| < 1 if |z n (k)| = 1 if |z n (k)| > 1,

(25)

for all k ≥ k0 . From (25), clearly, |z n (k)| = 1 is a potential stable equilibrium point of (24). Next, three cases will be considered to complete the proof. Case 1: |z n (k0 )| ≤ 1. By ηλn ≤ ηλ1 ≤ 0.25 and |z n (k0 )| ≤ 1, it can be calculated out from (24) that |z n (k0 + 1)| = (2ηλn − 0.5) |z n (k0 )|3 + (1.5 − 2ηλn ) |z n (k0 )| ≤ 1. Clearly, it holds that |z n (k)| ≤ 1 for all k ≥ k0 . On the other hand, it holds from (25) that |z n (k)| is monotone increasing for all k ≥ k0 . Thus, |z n (k)| must converge to the equilibrium point 1, as k → ∞. Case 2: |z n (k)| > 1 for all k ≥ k0 . From (25), it holds that |z n (k)| is monotone decreasing for all k ≥ k0 . On the other hand, |z n (k)| has a low bound 1 for all k ≥ k0 . Clearly, |z n (k)| will converge to the equilibrium point 1, as k → ∞. Case 3: |z n (k0 )| > 1 and there exists a positive integer M(M > k0 ), such that |z n (M)| ≤ 1. Since |z n (M)| ≤ 1, in the same way as Case 1, it can be proven that |z n (k)| must converge to the equilibrium point 1, as k → ∞. From analysis of the above three cases, we can obtain that lim |z n (k)| = 1.

(26)

k→∞

At the same time, from (22) and (23), we have that if z n (k0 ) ≥ 0, then z n (k) ≥ 0 for all k > k0 , or if z n (k0 ) ≤ 0, then z n (k) ≤ 0 for all k > k0 . Thus, it holds from (26) that lim z n (k) = ±1.

k→∞

This completes the proof.



In Lemma 3, we have proven that if ηλ1 < 0.25, w(0) ∈ S and w T (0)vn 6= 0, then lim z i (k) = 0,

k→∞

(i = 1, 2, . . . , n − 1).

(27)

In Lemma 4, it is proven that if ηλ1 < 0.25, w(0) ∈ S and w T (0)vn 6= 0, then lim z n (k) = ±1.

k→∞

(28)

856

D. Peng et al. / Computers and Mathematics with Applications 56 (2008) 847–860

Since w(k) =

n−1 X

z i (k)vi + z n (k)vn ,

i=1

from (27) and (28), we can draw easily the following conclusion: Theorem 2. if ηλ1 < 0.25, w(0) ∈ S and w T (0)vn 6= 0, then in (4), lim w(k) = ±vn .

k→∞

At this point, we have finished the proof of convergence of DDT system (4). Next, we further study the stability of (4). Clearly, the set of all equilibrium points of (4) is {v1 , . . . , vn } ∪ {−v1 , . . . , −vn } ∪ {0}. The following Theorem 3 will show the stability of the equilibrium points vn and −vn in (4). Theorem 3. The equilibrium points vn and −vn are locally asymptotical stable if ηλ1 < 0.25. Other equilibrium points of (4) are unstable. Proof. Denote G(w) = w(k + 1) = w(k) + η

h

 i h i w T (k)w(k) − 2 Rw(k) + w T (k)Rw(k)w(k) + 0.5 1 − w T (k)w(k) w(k).

Then, it follows that ∂G = I + 2ηC − ηw T (k)w(k)C − ηw T (k)Cw(k)I − 4ηCw(k)w T (k), ∂w where C=

1 I − R, 4η

and I is the unity matrix. For the equilibrium point 0, it holds from (29) that ∂G = I + 2ηC = 1.5I − 2η R = J0 . ∂w 0 (i)

Denote the ith eigenvalue of J0 by α0 . For ηλ1 < 0.25, it holds that (i)

α0 = 1.5 − 2ηλi ≥ 1.5 − 2ηλ1 > 1,

(i = 1, 2, . . . , n).

According the Lyapunov theory, the equilibrium point 0 is unstable. For the equilibrium points ±v j ( j = 1, . . . , n), it follows from (29) that   ∂G = 1 + ηλ j I − η R − 1 − 4ηλ j v j v Tj = J j , ∂w v j (i)

denote the ith eigenvalue of J j by α j . Clearly,  (i) α j = 1 + η λ j − λi if i = 6 j. (i)

α j = 4ηλ j

if i = j.

(29)

D. Peng et al. / Computers and Mathematics with Applications 56 (2008) 847–860

857

For any j 6= n, it holds that (n)

αj

 = 1 + η λ j − λn > 1.

Clearly, the equilibrium points ±v j ( j 6= n) are unstable. For the equilibrium points ±vn , since ηλ1 < 0.25, it follows that αn(i) = 1 + η (λn − λi ) < 1 αn(i)

= 4ηλn < 4ηλ1 < 1

if i 6= n. if i = n.

Thus, ±vn are asymptotical stable. The proof is completed.



In this section, we have analyzed dynamics of DDT system (4) associated with the proposed stable MCA learning algorithm (3) and proved that weight vector of (4) will converge to stable equilibrium points ±vn under some conditions. 4. Simulation results In this section, we provide two interesting simulation results to illustrate the convergence and stability of the proposed MCA algorithm (3) in a deterministic case and a stochastic case, respectively. In these simulations, we compare performance of the proposed MCA algorithm (3) with ones of the following MCA algorithms: (1) MOLLER MCA algorithm [8] h i w(k + 1) = w(k) − η 2w T (k)w(k) − 1 y(k)x(k) + ηy 2 (k)w(k), (30) (2) DOUGLAS MCA algorithm [10] w(k + 1) = w(k) − ηw T (k)w(k)w T (k)w(k)y(k)x(k) + ηy 2 (k)w(k),

(31)

(3) OJAm MCA algorithm [21] w(k + 1) = w(k) − ηy(k)x(k) +

ηy 2 (k)w(k) w T (k)w(k)w T (k)w(k)

.

(32)

In order to measure the convergence speed and accuracy of these algorithms, we compute the norm of w(k) and the direction cosine at kth update by [22]: T w (k) · vn Direction Cosine(k) = , kw(k)k · kvn k where vn is the eigenvector associated with the smallest eigenvalue of the correlation matrix R. Clearly, if Direction Cosine(k) converges to 1, then weight vector w(k) must approach the direction of minor component vn . Firstly, let us consider performance comparisons in the deterministic case. By taking conditional expectation, the corresponding DDT systems associated with the above three MCA algorithms (30)–(32) can be obtained as follows: (1) DDT system associated with MOLLER MCA algorithm (30) h i w(k + 1) = w(k) − η 2w T (k)w(k) − 1 Rw(k) + ηw T (k)Rw(k)w(k), (33) (2) DDT system associated with DOUGLAS MCA algorithm (31) w(k + 1) = w(k) − ηw T (k)w(k)w T (k)w(k)Rw(k) + ηw T (k)Rw(k)w(k),

(34)

(3) DDT system associated with OJAm MCA algorithm (32) w(k + 1) = w(k) − η Rw(k) +

ηw T (k)Rw(k)w(k) . w T (k)w(k)w T (k)w(k)

We randomly generate a 4 × 4 symmetric positive definite matrix as:

(35)

858

D. Peng et al. / Computers and Mathematics with Applications 56 (2008) 847–860

Fig. 1. Convergence of kw(k)k in the deterministic case.

Fig. 2. Convergence of Direction Cosine of w(k) in the deterministic case.



1.6064 0.2345 ∗  R = 0.6882 1.2175

0.2345 1.6153 0.3753 0.0938

0.6882 0.3753 1.8981 0.6181

 1.2175 0.0938 . 0.6181 1.1799

We use DDT systems (4) and (33)–(35) to extract the unit eigenvector associated with the smallest eigenvalue of R ∗ with η = 0.02 and randomly generated w(0). The simulation results, evaluated over 100 Monte Carlo runs, are presented in Figs. 1 and 2, which show the convergence of kw(k)k and Direction Cosine(k), respectively. From the simulation results, we can find that kw(k)k in DDT system (4) can converge to 1 faster than other three systems. Next, let us use stochastic input data to compare stability of the proposed algorithm (3) with ones of MOLLER algorithm (30), DOUGLAS algorithm (31) and OJAm algorithm (32). In the simulation, the input data sequence which is generated by [21] x(k) = C · h(k), where C = randn(5, 5)/5 and h(k) ∈ R 5×1 is Gaussian and randomly produced. The above-mentioned four MCA learning algorithms (3) and (30)–(32) are used to extract minor component from the input data sequence {x(k)}. Figs. 3 and 4 show the convergence of kw(k)k and Direction Cosine(k), respectively. From the simulation results, we can find easily that in these MCA algorithms, Direction Cosine(k) can converge to 1 at the approximately same speed and weight vector norm kw(k)k in the proposed MCA algorithm (3) has a better numerical stability than other three algorithms.

D. Peng et al. / Computers and Mathematics with Applications 56 (2008) 847–860

859

Fig. 3. Convergence of kw(k)k in the stochastic case.

Fig. 4. Convergence of Direction Cosine of w(k) in the stochastic case.

5. Conclusions In this paper, we propose a stable MCA learning algorithm which has more satisfactory convergence and stability than some existing MCA approaches. Dynamics of the proposed algorithm are analyzed by the deterministic discrete time (DDT) method. It is proven that if some mild conditions about the learning rate and the initial weight vector are satisfied, our proposed algorithm will converge to the minor component with unit norm. At the same time, stability analysis shows that the minor component is the asymptotical stable equilibrium point in the proposed algorithm. Performance comparisons among our proposed algorithm and some existing MCA approaches are carried out in the deterministic case and the stochastic case, which illustrates the performance improvement of the former. Acknowledgments This work was supported by National Science Foundation of China under Grant 60471055 and 60702072, Australian Research Council under grant DP0773446 and Youth Science and Technology Foundation of UESTC under Grant L08011001JX0774. References [1] K. Gao, M.O. Ahmad, M.N. Swamy, Learning algorithm for total least squares adaptive signal processing, Electron. Lett. 28 (4) (1992) 430–432.

860

D. Peng et al. / Computers and Mathematics with Applications 56 (2008) 847–860

[2] R. Klemm, Adaptive airborne mti: An auxiliary channel approach, Proc. Inst. Elect. Eng. pt. F 134 (1987) 269–276. [3] S. Barbarossa, E. Daddio, G. Galati, Comparison of optimum and linear prediction technique for clutter cancellation, Proc. Inst. Elect. Eng. pt. F 134 (1987) 277–282. [4] L. Xu, E. Oja, C. Suen, Modified Hebbian learning for curve and surface fitting, Neural Netw. 5 (1992) 441–457. [5] J.W. Griffiths, Adaptive array processing, a tutorial, Proc. Inst. Elect. Eng. pt. F 130 (1983) 3–10. [6] G. Mathew, V. Reddy, Development and analysis of a neural network approach to Pisarenko’s harmonic retrieval method, IEEE Trans. Signal Process. 42 (1994) 663–667. [7] G. Cirrincione, M. Cirrincione, J. Herault, S.V. Huffel, The MCA EXIN Neuron for the minor component analysis, IEEE Trans. Neural Netw. 13 (1) (2002) 160–187. [8] R. M¨oller, A self-stabilizing learning rule for minor component analysis, Int. J. Neural Syst. 14 (2004) 1–8. [9] T. Chen, S. Amari, Unified stabilization approach to principal and minor components extraction, Neural Netw. 14 (2001) 1377–1387. [10] S.C. Douglas, S.Y. Kung, S. Amari, A self-stabilized minor subspace rule, IEEE Signal Process. Lett. 5 (12) (1998) 330–332. [11] F.L. Luo, R. Unbehauen, A minor subspace analysis algorithm, IEEE Trans. Neural Netw. 8 (5) (2005) 1149–1155. [12] E. Oja, Principal components, minor components, and linear neural networks, Neural Netw. 5 (1992) 927–935. [13] Z. Yi, Y. Fu, H.J. Tang, Neural networks based approach computing eigenvectors and eigenvalues of symmetric matrix, Comput. Math. Appl. 47 (2004) 1155–1164. [14] P.J. Zufiria, On the discrete-time dynamics of the basic Hebbian neural-network nodes, IEEE Trans. Neural Netw. 13 (6) (2002) 1342–1352. [15] Q. Zhang, On the discrete-time dynamics of a PCA learning algorithm, Neurocomputing 55 (2003) 761–769. [16] Z. Yi, M. Ye, J.C. Lv, K.K. Tan, Convergence analysis of a deterministic discrete time system of Oja’s PCA learning algorithm, IEEE Trans. Neural Netw. 16 (6) (2005) 1318–1328. [17] D. Peng, Zhang Yi, Dynamics of generalized PCA and MCA learning algorithms, IEEE Trans. Neural Netw. 18 (6) (2007) 1777–1784. [18] D. Peng, Zhang Yi, Convergence analysis of the OJAn MCA learning algorithm by the deterministic discrete time method, Theoret. Comput. Sci. 378 (1) (2007) 87–100. [19] D. Peng, Zhang Yi, A modified Oja-Xu MCA learning algorithm and its convergence analysis, IEEE Trans. Circuits Syst. II 54 (4) (2007) 348–352. [20] L. Xu, Least mean square error reconstruction principle for self-organizing neural-nets, Neural Netwo. 6 (1993) 627–648. [21] D.Z. Feng, W.X. Zheng, Y. Jia, Neural network learning algorithms for tracking minor subspace in hing-dimensional data stream, IEEE Trans. Neural Netw. 16 (3) (2005) 513–521. [22] C. Chatterjee, Z. Kang, V.P. Roychowdhury, Algorithm for accelerated convergence of adaptive PCA, IEEE Trans. Neural Netw. 11 (2) (2000) 338–355.