A self-stabilizing MSA algorithm in high-dimension data stream

A self-stabilizing MSA algorithm in high-dimension data stream

Neural Networks 23 (2010) 865–871 Contents lists available at ScienceDirect Neural Networks journal homepage: www.elsevier.com/locate/neunet A self...

454KB Sizes 0 Downloads 25 Views

Neural Networks 23 (2010) 865–871

Contents lists available at ScienceDirect

Neural Networks journal homepage: www.elsevier.com/locate/neunet

A self-stabilizing MSA algorithm in high-dimension data stream Xiangyu Kong a,∗ , Changhua Hu a , Chongzhao Han b a

The Xi’an Research Institute of High Technology, Xi’an, Shaanxi 710025, PR China

b

School of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, PR China

article

info

Article history: Received 15 October 2008 Received in revised form 1 April 2010 Accepted 6 April 2010 Keywords: Minor subspace analysis (MSA) Neural network Eigenvalue Eigenvector Adaptive algorithm

abstract Minor subspace analysis (MSA) is a statistical method for extracting the subspace spanned by all the eigenvectors associated with the minor eigenvalues of the autocorrelation matrix of a high-dimension vector sequence. In this paper, we propose a self-stabilizing neural network learning algorithm for tracking minor subspace in high-dimension data stream. Dynamics of the proposed algorithm are analyzed via a corresponding deterministic continuous time (DCT) system and stochastic discrete time (SDT) system methods. The proposed algorithm provides an efficient online learning for tracking the MS and can track an orthonormal basis of the MS. Computer simulations are carried out to confirm the theoretical results. © 2010 Elsevier Ltd. All rights reserved.

1. Introduction The minor subspace (MS) is a subspace spanned by all the eigenvectors associated with the minor eigenvalues of the autocorrelation matrix of a high-dimension data stream. And, the minor component (MC) is the eigenvector associated with the smallest eigenvalue of the correlation matrix of input data. In many information processing areas, it is important to track online MS (or to extract MC) from high-dimensional input data stream. As an important tool for signal processing and data analysis, MSA has been applied to adaptive direction-of-arrival (DOA) estimation (Durrani & Shatman, 1983), total least squares (TLS) (Gao, Ahmad, & Swamy, 1994; Oja & Karhunen, 1994), the data compression in data communications, computer vision (Cirrincione, 1998), curve and surface fitting (Xu, Oja, & Suen, 1992), etc. Hence, it is interesting to find some MSA learning algorithms with lower computational complexity for adaptive signal processing application. The adaptive algorithms for tracking one minor component have been proposed by several authors (Cirrincione, Herault, & Van Huffel, 2002; Feng, Bao, & Jiao, 1998; Luo & Unbehauen, 1997a; Mao, Fan, & Li, 2006; Oja, 1992; Peng, Zhang, Lv, & Xiang, 2007, 2008; Peng, Zhang, & Xiang, 2008; Xu et al., 1992; Zhang & Leung, 2000, etc.). These learning algorithms can be used to extract minor component from input data without calculating the correlation matrix in advance, which make neural networks method more suitable in real-time applications and have a lower



Corresponding author. E-mail address: [email protected] (X. Kong).

0893-6080/$ – see front matter © 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.neunet.2010.04.001

computational complexity than traditional algebraic approaches, such as eigenvalue decomposition (EVD) and singular value decomposition (SVD). The dynamics of many minor component analysis (MCA) algorithms have been studied (Cirrincione et al., 2002; Taleb & Cirrincione, 1999), and a divergence problem of the weight vector norm has been found in some existing MCA algorithms such as OJAn algorithm (Xu et al., 1992), LUO algorithm (Luo & Unbehauen, 1997a, 1997b) etc., and sudden divergence (the divergence happens in a finite time) has also been found in some existing algorithms such as LUO algorithm, OJA algorithm (Xu et al., 1992) and OJA+ algorithms (Oja, 1992) on some condition. In order to guarantee convergence, several self-stabilizing MCA learning algorithms have been proposed by a few authors (Chen & Amari, 2001; Douglas, Kung, & Amari, 1998; Möller, 2004). In these algorithms, the weight vector of the neuron can be guaranteed to converge to a normalized minor component. Recently, an efficient MCA algorithm was proposed in Feng, Zheng, and Jia (2005), called OJAm algorithm, which has simple expression and works more satisfactorily than other MCA algorithms for tracking one MC. In the recent decade, many algorithms (Chiang & Chen, 1999; Douglas et al., 1998; Luo & Unbehauen, 1997a, 1997b; Mathew & Reddy, 1994; Mathew, Reddy, & Dasgupta, 1995; Ouyang, Bao, Liao, & Ching, 2001) for tracking the MS or MC have been proposed on the basis of the feedforward neural network models. Mathew and Reddy (1994) proposed the MS algorithm based on a feedback neural network structure with sigmoid activation function. Using the inflation method, Luo and Unbehauen (1997a, 1997b) proposed an MSA algorithm that does not need any normalized operation. Douglas et al. (1998) presented a self-stabilizing Minor subspace rule that does not need periodically normalization and matrix inversions. Chiang and Chen (1999) showed that a learning

866

X. Kong et al. / Neural Networks 23 (2010) 865–871

algorithm can extract multiple MCs in parallel with the appropriate initialization instead of inflation method. On the basis of an information criterion, Ouyang et al. (2001) developed an adaptive MC tracker that automatically finds the MS without using the inflation method. Recently, Feng et al. (2005) proposed the OJAm algorithm and extended it for tracking multiple MCs or the MS, which makes the corresponding state matrix tend to a columnorthonormal basis of the MS. The objective of the paper is to find more satisfactory learning algorithm for adaptively tracking the MS. For neural networks learning algorithms, convergence of learning algorithms is crucial to practical applications. Usually, MSA (or MCA) learning algorithms are described by stochastic discrete time systems. Traditionally, convergence of MSA algorithms is analyzed via a corresponding DCT system, but some restrictive conditions must be satisfied in this method. It is recognized that using only DCT method does not reveal some of the most important features of these algorithms. The SDT method use directly the stochastic discrete learning laws to analyze the temporal behavior of MCA algorithms, and has been given more and more attention (Cirrincione et al., 2002; Feng et al., 2005; Taleb & Cirrincione, 1999). In this paper, we will propose a self-stabilizing MCA algorithm and extended it for tracking the MS, which has a more satisfactory numerical stability compared to some existing MSA algorithms, and the dynamics of the proposed algorithm will be analyzed via corresponding DCT and SDT methods. The paper is organized as follows: In Section 2, a self-stabilizing MCA algorithm is presented and its performance is analyzed. The Section 3 is devoted for a self-stabilizing MSA algorithm and its performance analysis. In Section 4, the simulation results are shown. Finally, Section 5 is the conclusion. 2. A self-stabilizing algorithm for tracking one MC 2.1. A self-stabilizing MCA algorithm

y(t ) = W (t )X (t ),

(t = 0, 1, 2, . . .),

where y(t ) is the neuron output, the input sequence {X (t )|X (t ) ∈ R n (t = 0, 1, 2, . . .)} is a zero mean stationary stochastic process and W (t ) ∈ R n (t = 0, 1, 2, . . .) is the weight vector of the neuron. Let R = E [X (t )X T (t )] denote the autocorrelation matrix of the input sequence X (t ), and Let λi and vi (i = 1, 2, . . . , N ) denote the eigenvalues and the corresponding orthonormal eigenvectors of R, respectively. We can arrange the orthonormal eigenvectors v1 , v2 , . . . , vN such that the corresponding eigenvalues are in a nondecreasing order: 0 < λ1 ≤ λ2 ≤ · · · ≤ λN . Although linear neurons are the simplest units to build neural networks, they have many important applications in signal processing. Interestingly, a linear neuron trained by an unsupervised constrained Hebbian rule (Oja, 1992) can track the principal component from an input vector sequence. Similarly, a linear neuron trained by an unsupervised constrained anti-Hebbian rule (Oja, 1992; Xu et al., 1992) can adaptively extract the minor component from a multiple dimensional data stream. The dynamics of some major MCA algorithms such as OJA, OJAn, OJA+, LUO, FENG algorithms have been studied, and a MCA EXIN algorithm based on the gradient flow of the Rayleigh quotient of the autocorrelation matrix R (=E [X (t )X T (t )]) on R n − {0} was presented as follows (Cirrincione et al., 2002): W (t + 1) = W (t ) − α(t )(W T (t )W (t ))−1

× [y(t )X (t ) − y2 (t )W (t )(W T (t )W (t ))−1 ],

W (t + 1) = W (t ) − α(t )(W T (t )W (t ))−1 [y(t )X (t )

− (y2 (t ) + 1 − kW (t )k4 )(W T (t )W (t ))−1 W (t )]. (2) The difference between MCA EXIN algorithm and the modified one is that the latter refers to OJA+ algorithm, and adds a term (1 − kW (t )k4 )W (t ). Then this leads to our algorithm to have the satisfactory one-tending property (which is explained later), and its performance outperforms some existing MCA algorithms such as OJAm algorithm, etc. 2.2. The convergence analysis via DCT method Usually, MCA learning algorithms are described by SDT systems. It is very difficult to study the convergence of the SDT system directly. So far, dynamics of most of MCA algorithms are indirectly proved via a corresponding DCT system. This method is very simple, and is an efficient analysis tool in the limits of validity of this asymptotic theory. According to the stochastic approximation theory (Kushner & Clark, 1976; Ljung, 1977), it can be shown that if some conditions are satisfied, then the asymptotic limit of the discrete learning algorithm (2) can be solved by applying the corresponding continuous time differential equations: dW (t )

Let us consider a single linear neuron with the following input–output relation: T

where α(t ) is the learning rate, which controls the stability and rate of convergence of the algorithm. MCA EXIN algorithm is analyzed in detail and it is concluded that the algorithm is the best MCA neuron in terms of stability (no finite time divergence), speed, and accuracy. However, by using the same analytical approach, it is easy to show that it is possible that MCA EXIN converges to infinity. In order to avoid the possible divergence and preserve the good performance of MCA EXIN as possible, we propose a modified algorithm as below:

(1)

dt

= −(W T (t )W (t ))−1 [y(t )X (t ) − (y2 (t ) + 1 − kW (t )k4 )(W T (t )W (t ))−1 W (t )].

(3)

Assume X (t ) is stationary, and X (t ) is not correlated with W (t ), and taking the expected value with respect to two sides of Eq. (3), then Eq. (3) can be approximated by the following ordinary differential equation (ODE): dW (t ) dt

= −(W T (t )W (t ))−1 [RW (t ) − (W T (t )RW (t ) + 1 − kW (t )k4 )(W T (t )W (t ))−1 W (t )].

(4)

The asymptotic property of (4) approximates that of (3), and the asymptotic property of (4) can be insured by the following theorem. Theorem 1. Let R be a semipositive definite matrix, λ1 and v1 be respectively its smallest eigenvalue and corresponding normalized eigenvector with the nonzero first component. If the initial weight vector W (0) satisfies W T (0)v1 6= 0 and λ1 is single, then limt →∞ W (t ) = ±v1 , i.e. W (t ) tends to ±v1 asymptotically as t → ∞. Proof. Denote N eigenvalues of R by λ1 , λ2 , . . . , λN , where λ1 is the smallest eigenvalue, and denote a set of corresponding normalized eigenvectors by v1 , v2 , . . . , vN . So R and W (t ) can be written as R=

N X i=1

λi vi viT ,

W (t ) =

N X i=1

fi (t )vi .

(5)

X. Kong et al. / Neural Networks 23 (2010) 865–871

And then it can be obtained dW (t )

N X dfi (t )

=

dt

dt

i=1



N P

2.3. The divergence analysis

vi

kW (t )k2 (λi fi (t )vi ) + (W T (t )RW (t ) + 1 − kW (t )k4 )

i=1

=

N P

(fi (t )vi )

i=1

kW (t )k4 N X

=

((−λi kW (t )k2 + (W T (t )RW (t )

i=1

+ 1 − kW (t )k4 ))fi (t )vi ) kW (t )k−4 .

(6)

Then, we have dfi (t )

= (−λi kW (t )k2 + (W T (t )RW (t )

dt

+ 1 − kW (t )k4 ))fi (t ) kW (t )k−4

∀i = 1, 2, . . . , N . (7)

Because of fi (t ) = W (t )vi and W (0)v1 6= 0, so f1 (t ) 6= 0 (∀t ≥ 0) holds. Define T

γi (t ) =

fi ( t ) f1 ( t )

T

(i = 2, . . . , N ).

(8)

And then the following differential equation can be obtained dγi (t ) dt

=

(λ1 − λi ) kW (t )k2 fi (t )f1 (t ) (λ1 − λi )γi (t ) = , 4 2 kW (t )k2 f1 (t ) kW (t )k

(9)

whose solution on [0, ∞] is

γi (t ) = exp



(λ1 − λi ) kW (t )k2

t

Z





∀i = 2, . . . , N .

(10)

0

If λi > λ1 (that is, the smallest eigenvalue is single but not multiple), then γi (t ) tends to zero as t → ∞ (∀i = 2, . . . , N ), consequently limt →∞ fi (t ) = 0 (∀i = 2, . . . , N ). So our conclusion can be obtained lim W (t ) = lim

t →∞

N X

t →∞

! fi (t )vi

= f1 (t )v1 .

Cirrincione et al. (2002) found that in an exact solution of the differential equation associated with some MCA algorithms such as LUO, OJAn and MCA EXIN algorithms, the weight vector length would not deviate from its initial value. However, when a numerical procedure (like Euler’s method) is applied, all these rules are plagued by ‘‘divergence or sudden divergence’’ of the weight vector length. Obviously, only from the analysis of the ordinary differential equation, it is not sufficient to determine its convergence of the weight vector length for MCA algorithms. Thus, it is necessary and important to analyze the temporal behavior of MCA algorithms via the stochastic discrete law. The purpose of this section is the analysis of the temporal behavior of the proposed algorithm, by using not only the ODE approximation, but, above all, the stochastic discrete laws. Cirrincione et al. (2002) found a divergence phenomenon in a finite time (also called sudden divergence) for LUO algorithm (OJA and OJA+ also have this phenomenon on some condition). Sudden divergence is very adverse for practical application. Has our proposed algorithm a sudden divergence? In this section, we will study the proposed algorithm in detail. Averaging the behavior of the weight vector after the first critical point has been reached (t ≥ t0 ), the following formula can be written: W (t ) = kW (t )k v1

t →∞

t →∞

Neglecting the second-order term in α(t ) and the above equation can be regarded as a discretization of the following ODE: d kW (t )k2 dt

(11)

dt

= E {2W T (t )1W (t )} = E {2W T (t ) · {−(W T (t )W (t ))−1 [y(t )X (t ) − (y2 (t ) + 1 − kW (t )k4 )(W T (t )W (t ))−1 W (t )]}} = −2(W T (t )W (t ))−2 [kW (t )k2 W T (t )RW (t ) − (W T (t )RW (t ) + 1 − kW (t )k4 ) kW (t )k2 ]

(12)

However, by differentiating W T W along the solution of (4), it holds that dW T (t )W (t )

(14)

kW (t + 1)k2 = kW (t )k2 + k1W (t )k2 + 2W T (t )1W (t ).

i=1

lim kW (t )k = lim kf1 (t )v1 k = lim kf1 (t )k.

∀t ≥ t 0 ,

where v1 is the unit eigenvector associated to the smallest eigenvalue of R. From (2), it holds that

From (11), it follows that t →∞

867

= 2(W T (t )W (t ))−1 [1 − kW (t )k4 ].

(15)

The above equation can be approximated as

= −2(W T (t )W (t ))−2 [kW (t )k2 W T (t )RW (t ) − (W T (t )RW (t ) + 1 − kW (t )k4 ) kW (t )k2 ] = −2(W T (t )W (t ))−1 [W T (t )RW (t ) − (W T (t )RW (t ) + 1 − kW (t )k4 )] = 2(W T (t )W (t ))−1 [1 − kW (t )k4 ] ( > 0 for kW (t )k < 1 = < 0 for kW (t )k > 1 (13) = 0 for kW (t )k = 1.

This shows that limt →∞ kW (t )k = 1, so we can obtain that limt →∞ f1 (t ) = ±1, which gives limt →∞ W (t ) = ±v1 . This completes the proof of the theorem.  The asymptotic behavior of MCA ODE can only be considered in the limits of validity of this asymptotic theory, so above theorem’s result is only approximately valid in the first part of the time evolution of the MCA learning law, i.e., in approaching the minor component.

dp dt

=

2 p

(1 − p2 ),

(16)

where p = kW (t )k2 . Define the instant of time in which the MC direction is approached as t0 and the corresponding value of the squared weight modulus as p0 . The solution of (16) is given by



|1 − p2 | = 1 − p20 e−4(t −t0 ) p = p0

if p0 6= 1 if p0 = 1.

(17)

Fig. 1 shows these results for different values of p0 . From above results, it can be seen that the norm of the weight increases or decreases to one according to the initial weight modulus and the sudden divergence does not happen in a finite time. From (17), it is obvious that the rate of increase or decrease of the weight modulus depends only on the initial weight modulus, and is no relative to the eigenvalue of autocorrelation matrix of input vector.

868

X. Kong et al. / Neural Networks 23 (2010) 865–871

the multiple MC or MS. We know that the eigenvectors associated with the r smallest eigenvalues of the autocorrelation matrix R of the data vector are defined as the minor components (MC), and r is referred to as the number of the minor components. The eigenvector associated with the smallest eigenvalue of the autocorrelation matrix R of the data vector is called the smallest component, and the subspace spanned by the minor components is called the MS. Let U = [u1 , u2 , . . . , ur ] ∈ R N ×r denote the weight matrix, where ui ∈ R N ×1 represents the ith column vector of U and also denotes the weight vector of the ith neuron of a multiple-input multiple-output (MIMO) linear neural network. The input–output relation of the MIMO linear neural network is described by y (t ) = U T (t )x(t ).

(20)

The extended learning algorithm for training the weight matrix is represented as Fig. 1. The asymptotic behavior of the ODE for different values of the initial conditions.

U (t + 1) = U (t ) − µ(t )[x(t )y T (t ) − U (t ){U T (t )U (t )}−1

× (y (t )y T (t ) + I − {U T (t )U (t )}2 )]{U T (t )U (t )}−1 .

(21)

2.4. The convergence analysis via SDT method Above analysis is based on a fundamental theorem of stochastic approximation theory, the obtained result is then an approximation on some conditions. To use the stochastic discrete laws is a direct analytical method. The purpose of this section is the analysis of the temporal behavior of our MCA neurons and the relation between the dynamic stability and learning rate, by using mainly the SDT system following the approach (Feng et al., 2005). From (2), it holds that

kW (t + 1)k2 = W T (t + 1)W (t + 1) = kW (t )k2 − 2α(t )(W T (t )W (t ))−2 (kW (t )k2 y2 (t ) − (y2 (t ) + 1 − kW (t )k4 ) kW (t )k2 ) + α 2 (t )(W T (t )W (t ))−4 (kW (t )k4 y2 (t ) kX (t )k2 − 2 kW (t )k2 y2 (t )(y2 (t ) + 1 − kW (t )k4 ) + (y2 (t ) + 1 − kW (t )k4 )2 kW (t )k2 ) = kW (t )k2 + 2α(t )(W T (t )W (t ))−1 × (1 − kW (t )k4 ) + O(α 2 (t )) ≈ kW (t )k2 + 2α(t )(W T (t )W (t ))−1 (1 − kW (t )k4 ).

3.1. The convergence analysis Under the similar conditions to those defined (Zufiria, 2002), using the techniques of the stochastic approximation theory (Kushner & Clark, 1976; Ljung, 1977), we can deduce the corresponding averaging differential equation dU (t ) dt

= −[RU (t ) − U (t ){U T (t )U (t )}−1 (U T (t )RU (t ) + I − {U T (t )U (t )}2 )]{U T (t )U (t )}−1 .

(22)

We can give the energy function associated with (22) as follows (18)

Hence, if the learning factor is small enough and the input vector is bounded, we can make such analysis as follows by neglecting the second-order terms of the α(t ).

kW (t + 1)k2 ≈ 1 + 2α(t )(W T (t )W (t ))−2 (1 − kW (t )k4 ) kW (t )k2 ( > 1 for kW (0)k < 1 = < 1 for kW (0)k > 1 = 1 for kW (0)k = 1.

It should be noted that (21) is no trivial extension of the (2). Although (2) has many extended forms, they may be difficult to find the corresponding Lyapunov functions in order to analyze their stability.

E (U ) =

1 2

tr {(U T RU )(U T U )−1 } +

1 2

tr {U T U + (U T U )−1 }.

The gradient of E (U ) with respect to U is given by

∇ E (U ) = RU (U T U )−1 − U T RUU (U T U )−2 + U [I − (U T U )−2 ] = {RUU T U − U (U T RU + I − (U T U )2 )}(U T U )−2 = {RU − U (U T U )−1 (U T RU + I − (U T U )2 )}(U T U )−1 .

(19)

This shows that kW (t + 1)k tends to one whether kW (t )k is equal to one or not, which is called the one-tending property (OTP), i.e. the weight modulus remains constant (kW (t )k2 → 1 at convergence). The OTP indicates that W (0) with modulus one should be selected as the initial value of the proposed algorithm, thus some practical limitations (Zufiria, 2002) which may be resulted in by an inappropriate initial value and a larger learning factor can be avoided. 2

3. Algorithm for tracking MS The MCA learning algorithm given in Section 2.1 extracts only one component. We can easily extend the algorithm for tracking

(23)

(24)

Clearly, (22) is equivalent to the following equation: dU dt

= −∇ E (U ).

(25)

Differentiating E (U ) along the solution of (22) yields dE (U ) dt

=

dU T dt

=−

∇ E (U)

dU T dU dt

dt

.

(26)

Since the extended form of the algorithm (21) has the Lyapunov function E (U ) only with a lower bound (LaSalle, 1976), the corresponding averaging equation converges to the common invariance set P = {U |∇ E (U ) = 0} from any initial value U (0).

X. Kong et al. / Neural Networks 23 (2010) 865–871 1.014

3.2. The divergence analysis

1.012

Theorem 2. If the learning factor µ(t ) is small enough and the input vector is bounded, then the state flows in the proposed learning algorithm (21) for tracking the MS are bounded.

T

U (t + 1)U (t + 1) 2 = tr [U T (t + 1)U (t + 1)] F

1.01 Norm of U(k)

Proof. Since the learning factor µ(t ) is small enough and the input vector is bounded, we have

= tr {{U (t ) − µ(t )[x(t )y T (t ) − U (t ){U T (t )U (t )}−1 (y (t )y T (t ) + I − {U T (t )U (t )}2 )]{U T (t )U (t )}−1 }T {U (t ) − µ(t ) × [x(t )y T (t ) − U (t ){U T (t )U (t )}−1 (y (t )y T (t ) + I − {U T (t )U (t )}2 )]{U T (t )U (t )}−1 }} ≈ tr [U T (t )U (t )] − 2µ(t )tr [({U T (t )U (t )}2 − I ) × {U T (t )U (t )}−1 ] = tr [U T (t )U (t )] − 2µ(t )[tr {U T (t )U (t )} − tr {U T (t )U (t )}−1 ].

1.008

1.006

1.004

1.002

1

0.95

F

≈ 1 − 2µ(t )[tr {U T (t )U (t )}

2 − tr {UT (t )U (t )}−1 ]/ U T (t )U (t ) F   tr {U T (t )U (t )}−1 = 1 − 2µ(t ) 1 − . tr {U T (t )U (t )}

(28)

It is obviously seen that there exists a tr {U T (t )U (t )} large enough such that (1−tr {U T (t )U (t )}−1 /tr {U T (t )U (t )}) > 0, which

2

2

results in U T (t + 1)U (t + 1) F / U T (t )U (t ) F < 1. Thus, the state flow in the proposed algorithm is bounded. This completes the proof of the theorem.  3.3. Landscape of nonquadratic criteria and global asymptotical convergence Given U ∈ RN ×r in the domain Ω = {U |0 < U T RU < ∞, U U 6= 0}, we analyze the following nonquadratic criterion (NQC) for tracking the MS: T

min E (U ) = U

1 2

tr {(U T RU)(U T U )−1 } +

1 2

tr {U T U + (U T U )−1 }. (29)

Feng et al. (2005) analyzed the landscape of nonquadratic criteria for the OJAm algorithm in detail. We can refer to the analysis method of OJAm algorithm to analyzes our algorithm, here we only give the resulting theorems. The landscape of E (U ) is depicted by the following Theorems 3 and 4. Theorem 3. U is a stationary point of E (U ) in the domain Ω if and only if U = Lr Q , where Lr ∈ R N ×r consist of the r eigenvectors of R, and Q is the r × r orthogonal matrix. Theorem 4. In the domain Ω , E (U ) has a global minimum that is attained when and only when U = L(n) Q , where L(n) = [v1 , v2 , . . . , Pr vr ]. At a global minimum, E (U) = 21 i=1 λi + r. All the other stationary points of U = Lr Q (Lr 6= L(n) ) are saddle (unstable) points of E (U).

Direction Cosine of U(k)

Notice that in the previous formula the second-order terms associated with the learning factor have been neglected. It holds that F

200 400 600 800 1000 1200 1400 1600 1800 2000

Fig. 2. Convergence of kW (t )k. 1

T

U (t + 1)U (t + 1) 2 /kU T (t )U (t )k2

0

Number of Iterations

(27)

869

0.9

0.85

0.8

0.75

0.7

0

200

400

600

800 1000 1200 1400 1600 1800 2000 Number of Iterations

Fig. 3. Convergence of Direction Cosine(t).

The proof of Theorems 3 and 4 can refer to the section IV in Feng et al. (2005), they are analogous in most parts. From the previous theorems, it is obvious that the minimum of E (U ) automatically orthonormalizes the columns of U , and at the minimum of E (U), U only produces an arbitrary orthonormal basis of the MS but not the multiple MC. The global asymptotical convergence of the proposed algorithm (21) by considering its gradient rule (22) can be given by the next theorem. Theorem 5. Given the ordinary differential equation (22) and the initial value U (0) ∈ Ω , then U (t ) globally asymptotically converges to a point in the set U = L(n) Q as t → ∞, where L(n) = [v1 , v2 , . . . , vr ] and Q denotes an r × r unitary orthogonal matrix. Remark. OJAn, LUO, MCA EXIN, FENGm and OJAm algorithm have been extended for tracking MS like the Eq. (21), and simulations have been performed (Feng et al., 2005). It is concluded that the state matrices in the OJAn, LUO, MCA EXIN and FENGm do not converge to an orthonormal basis of the MS, and OJAm can. From previous analysis, we can conclude that the proposed algorithm (2) can be extended for tracking MS and can converge to an orthonormal basis of the MS.

870

X. Kong et al. / Neural Networks 23 (2010) 865–871 1.012

0.01

OJAm Dougles the Proposed algorithm

0.009 0.008 Direction Cosine of U(k)

1.01

Norm of U(k)

1.008

1.006

1.004

0.007

OJAm Dougles the Proposed algorithm

0.006 0.005 0.004 0.003 0.002

1.002 0.001 1

0 0

1000

2000

3000

4000

5000

6000

0

1000

Number of Iterations

2000

3000

4000

5000

6000

Number of Iterations

Fig. 4. The evolution curves of the norm of state matrix.

Fig. 5. The deviation from the orthogonality.

4. Computer simulations

1.025

4.1. Simulation experiment on MCA algorithm

Norm of U(k)

1.02

In this section, we provide simulation results to illustrate the convergence and stability of the proposed MCA algorithm. Since the OJAm algorithm and Douglas algorithm are self-stabilizing, and have better performance than other MCA algorithms, so we compare performance of the proposed MCA algorithm with these algorithms in below simulations. In the simulation, we use above three algorithms to extract the minor component from the input data sequence which is generated by: X (t ) = C · y (t ),

T W (t ) · v1 kW (t )k · kv1 k

1

4.2. Simulation experiment on MSA algorithm In this section, we provide simulation results to illustrate the convergence and stability of the proposed MSA algorithm. The selfstabilizing Douglas algorithm is extended for tracking MS like Eq. (21). Since the OJAm algorithm has better performance than other

1000

2000

3000

4000

5000

6000

Fig. 6. The evolution curves of the norm of state matrix.

,

where v1 is the unit eigenvector associated with the smallest eigenvalue of R. If Direction Cosine(t) converges to 1, W (t ) must converge to the direction of minor component v1 . Figs. 2 and 3 show the simulation curves about the convergence of kW (t )k and Direction Cosine(t) (DC) respectively. The learning constant in the OJAm and the proposed algorithm is fixed at 0.3, while the learning constant in the Douglas is taken as 0.1. All the algorithms start from the same initial value that is randomly produced and normalized to modulus one. From the simulation results, we can find easily that when the weight norm and the Direction Cosine in the OJAm, Douglas and the proposed algorithms all converge, the convergence precision in the proposed algorithm is the best. Since the OJAn, MCA EXIN and FENGm are divergent and the OJAm outperforms these algorithms (Feng et al., 2005), thus it seems that the proposed algorithm for tracking one MC works more satisfactorily than most existing MCA algorithms.

0

Number of Iterations

0.025

0.02 Direction Cosine of U(k)

Cosine(t ) =

1.01

1.005

where C = randn(5, 5)/5 and y (t ) ∈ R 5×1 is Gaussian and randomly produced. In order to measure the convergence speed of learning algorithm, we compute the norm of W (t ) and the direction cosine at tth update: Direction

OJAm Dougles the Proposed algorithm

1.015

OJAm Dougles the Proposed algorithm

0.015

0.01

0.005

0

0

1000

2000

3000

4000

5000

6000

Number of Iterations

Fig. 7. The deviation from the orthogonality.

MSA algorithms, so we compare performance of the proposed MSA algorithm with the OJAm and Douglas algorithms in below simulations. Here, an MS with dimension 5 is tracked. The vector

X. Kong et al. / Neural Networks 23 (2010) 865–871

data sequence is generated by X (t ) = B · y (t ), where B is randomly produced. In order to measure the convergence speed and precision of learning algorithm, we compute the norm of a state matrix at tth update:

ρ(U (t )) = U T (t )U (t ) 2 , and the deviation of a state matrix from the orthogonality at tth update, which is defined: dist (U (t )) = U T (t )U (t )[diag (U T (t )U (t ))]−1 − Ir F .



This simulation can be divided into two parts. In the first simulation, let B = (1/11) randn(11, 11), and y (t ) ∈ R 11×1 be Gaussian, spatially-temporally white, and randomly produced. We simulate the algorithms starting from the same initial value U (0) which is randomly produced and normalized to modulus one. The learning constants in the OJAm and Douglas are fixed at 0.02 and 0.01, respectively, while the learning constant in the proposed algorithm is taken as 0.01. Figs. 4 and 5 show the norm of state matrix and deviation of a state matrix from the orthogonality versus the iteration number, respectively. In the second simulation, let B = (1/31) randn(31, 31), and y (t ) ∈ R 31×1 be Gaussian, spatially-temporally white, and randomly produced. The learning constants in the OJAm and Douglas are fixed at 0.04 and 0.02, respectively, while the learning constant in the proposed algorithm is taken as 0.02. Other conditions are the similar to the first simulation. We can get the simulation results as shown in Figs. 6 and 7. From the simulation results, we can find easily that the state matrices in the OJAm, Douglas and the proposed algorithm all can converge to an orthonormal basis of the MS, while the convergence precision of state matrix in the proposed algorithm is the best, and there are residual deviations of state matrices from the orthogonality in the OJAm and Douglas. Since the OJAm is superior to OJAn, MCA EXIN and FENGm, thus the proposed algorithm seems to work more satisfactorily than most existing MSA algorithms. 5. Conclusion A self-stabilizing MCA learning algorithm is presented, and the algorithm has been extended for tracking MS and a self-stabilizing MSA algorithm is developed. The theoretical analysis of the proposed MCA algorithm is given via corresponding deterministic continuous time (DCT) system and stochastic discrete time (SDT) system. The globally asymptotic stability of the averaging equation of the proposed MSA algorithm has been studied. The simulation experiments have shown that the proposed MCA algorithm can efficiently extract one MC and works satisfactorily, and the proposed MSA algorithm makes the corresponding state matrix tend to column-orthonormal basis of the MS and the performance is superior to that of other MSA algorithms in high-dimension data stream. Acknowledgements This work was supported by National Natural Science Foundation of China under Grant 60736026, China Postdoctoral Foundation (20080441273), and China Postdoctoral Especial Foundation (20080148).

871

Cirrincione, G. (1998). A neural approach to the structure from motion problem. Ph.D. dissertation. LIS INPG Grenoble, Grenoble, France. Cirrincione, G., Cirrincione, M., Herault, J., & Van Huffel, S. (2002). The MCA EXIN neuron for the minor component analysis. IEEE Transactions on Neural Networks, 13, 160–187. Douglas, S. C., Kung, S. Y., & Amari, S. (1998). A self-stabilized minor subspace rule. IEEE Signal Processing Letter, 5, 328–330. Durrani, T. S., & Shatman, K. C. (1983). Eigenfilter approaches to adaptive array processing. Proceedings F , 130. Feng, D. Z., Bao, Z., & Jiao, L. C. (1998). Total least mean squares algorithm. IEEE Transactions on Signal Processing, 46, 2122–2130. Feng, D. Z., Zheng, W. X., & Jia, Y. (2005). Neural network learning algorithms for tracking minor subspace in high -dimensional data stream. IEEE Transactions on Neural Network, 16, 513–521. Gao, K., Ahmad, M. O., & Swamy, M. N. (1994). A constrained anti-Hebbian learning algorithm for total least squares estimation with applications to adaptive FIR and IIR filtering. IEEE Transactions on Circuits System II, Analog Digital Signal Processing, 11, 718–729. Kushner, H. J., & Clark, D. S. (1976). Stochastic approximation methods for constrained and unconstrained system. New York: Springer-Verlag. LaSalle, J. P. (1976). The stability of dynamical systems. Philadelphia, PA: SIAM. Ljung, L. (1977). Analysis of recursive stochastic algorithms. IEEE Transactions on Automation Control, AC-22, 551–575. Luo, F. L., & Unbehauen, R. (1997a). A generalized learning algorithm of minor component. In Proceeding of international conference on acoust, speech and signal processing: Vol. 4 (pp. 3229–3232). Luo, F. L., & Unbehauen, R. (1997b). A minor subspace analysis algorithm. IEEE Transactions on Neural Networks, 8, 1149–1155. Mao, Y., Fan, X., & Li, X. (2006). A class of self-stabilizing MCA learning algorithms. IEEE Transactions on Neural Networks, 17, 1634–1638. Mathew, G., & Reddy, V. U. (1994). Orthogonal eigensubspace estimation using neural networks. IEEE Transactions on Signal Processing, 42, 1803–1811. Mathew, G., Reddy, V. U., & Dasgupta, S. (1995). Adaptive estimation of eigensubspace. IEEE Transactions on Signal Processing, 43, 401–411. Möller, R. (2004). A self-stabilizing learning rule for minor component analysis. International Journal of Neural System, 14, 1–8. Oja, E. (1992). Principal component, minor component and linear neural networks. Neural Networks, 5, 927–935. Oja, E., & Karhunen, J. (1994). A constrained anti-Hebbian learning algorithm for total least squares estimation with applications to adaptive FIR and IIR filtering. IEEE Transactions on Circuits System II, Analog Digit Signal Processing, 41, 718–729. Ouyang, S., Bao, Z., Liao, G. S., & Ching, P. C. (2001). Adaptive minor component extraction with modular structure. IEEE Transactions on Signal Processing, 49, 2127–2137. Peng, D. Z., Zhang, Y., Lv, J. C., & Xiang, Y. (2007). A neural networks learning algorithm for minor component analysis and its convergence analysis. Neurocomputing, 71, 1748–1752. Peng, D. Z., Zhang, Y., Lv, J. C., & Xiang, Y. (2008). A stable MCA learning algorithm. Computers and Mathematics with Application, 56, 847–860. Peng, D. Z., Zhang, Y., & Xiang, Y. (2008). On the discrete time dynamics of a selfstabilizing MCA learning algorithm. Mathematical and Computer Modeling, 47, 903–916. Taleb, A., & Cirrincione, G. (1999). Against the convergence of the minor component analysis neurons. IEEE Transactions on Neural Networks, 10, 207–210. Xu, L., Oja, E., & Suen, C. (1992). Modified Hebbian learning for curve and surface fitting. Neural Networks, 5, 441–457. Zhang, Q., & Leung, Y.-W. (2000). A class of learning algorithms for principal component analysis and minor component analysis. IEEE Transactions on Neural Networks, 11, 529–533. Zufiria, P. J. (2002). On the discrete-time dynamics of the basic Hebbian neural network node. IEEE Transactions on Neural Networks, 6, 1342–1352.

Xiangyu Kong is a Postdoctor at Xi’an Research Institute of High Technology. He received the Bachelor degree from Beijing Institute of Technology in 1990, the Master degree from the Xi’an Research Institute of High Technology in 2000 and the Ph.D. Degree in Control Science & Engineering in Xi’an Jiaotong University, Xi’an, China, in 2005. His current interests cover adaptive signal processing, nonlinear Volterra system modeling and its application. Changhua Hu is a Professor from Xi’an Research Institute of High Technology. He received the Ph.D. degree in Control Science & Engineering in Northwest Polytechnical University Xi’an, China, in 1996. His current interests cover measure and control, adaptive control, and fault diagnosis.

References Chen, T. P., & Amari, S. (2001). Unified stabilization approach to principal and minor components extraction. Neural Networks, 14, 1377–1387. Chiang, C. T., & Chen, Y. H. (1999). On the inflation method in adaptive noisesubspace estimator. IEEE Transactions on Signal Processing, 47, 1125–1129.

Chongzhao Han is a Professor from Xi’an Jiaotong University. He received the Bachelor degree from Xi’an Jiaotong University, Xi’an, China, in 1968, the Master degree from Chinese Academy of Science, in 1982. His current interests cover nonlinear system identification and control, information fuse theory and application, adaptive signal processing, and intelligent decision-making system.