ARTICLE IN PRESS
Signal Processing 86 (2006) 3490–3495 www.elsevier.com/locate/sigpro
Fast communication
Convergence analysis of the NOJA algorithm using the ODE approach Samir Attallaha,, Jonathan H. Mantonb,1, Karim Abed-Meraimc a
Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576, Singapore b Department of Information Engineering, Research School of Information Sciences and Engineering, The Australian National University Canberra, ACT 0200 Australia c Telecom Paris, TSI Department, 46, rue Barrault, Paris Cedex 13, France Received 8 July 2005; received in revised form 9 May 2006; accepted 21 June 2006 Available online 18 July 2006
Abstract Some time ago, we proposed two normalized versions of Oja’s algorithm (NOja and NOOja) and gave some results concerning their convergence performance for the estimation of the signal and noise subspaces of a sequence of random vectors. In this paper, we comment on NOja results using the ordinary differential equation (ODE) concept. r 2006 Elsevier B.V. All rights reserved. Keywords: NOja’s algorithm; NOOja’s algorithm; Subspace estimation; ODE analysis
1. Summary
Initialization of the algorithm: Wð0Þ ¼ any arbitrary matrix such that
Let frðkÞg be a sequence of N 1 wide sense stationary random vectors, with covariance matrix C ¼ E½rðkÞrT ðkÞ, and consider the problem of extracting the principal subspace of C. To tackle this problem, we have proposed the following algorithm (NOja) [1]2
WT ð0ÞWð0Þ ¼ I.
Algorithm at iteration i: yðiÞ ¼ WT ðiÞrðiÞ, zðiÞ ¼ WðiÞyðiÞ, yw ðiÞ ¼ WT ðiÞzðiÞ, ^ J ðiÞ ¼ rðiÞ½2yT ðiÞ yT ðiÞ þ zðiÞyT ðiÞ, r
Corresponding author. Tel.: +65 6516 2114;
fax: +65 6779 1103. E-mail addresses:
[email protected] (S. Attallah),
[email protected] (J.H. Manton),
[email protected] (K. Abed-Meraim). 1 The second author would like to acknowledge the support of the Australian Research Council Special Research Centre for Ultra-Broadband Information Networks (CUBIN). 2 We would like to point out to a typo in the instantaneous gradient in NOja algorithm [1], which is corrected here.
w
^ J ðiÞ, Wði þ 1Þ ¼ WðiÞ br
ð1Þ
where W denotes an N P real matrix (PoN) which represents the principal subspace estimate. Note that for noise subspace estimation we need to invert the sign of the fixed step size b in (1). For the
0165-1684/$ - see front matter r 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.sigpro.2006.06.014
ARTICLE IN PRESS S. Attallah et al. / Signal Processing 86 (2006) 3490–3495
normalized case, while keeping the equations in (1) unchanged, noise subspace estimation can be achieved if b is directly replaced by b^ opt ðiÞ ¼
2
krðiÞk ðyTw ðiÞyðiÞ 2kyðiÞk2 Þ þ kyðiÞk4 m kkrðiÞk2 ðyTw ðiÞ 2yT ðiÞÞ þ kyðiÞk2 yT ðiÞk2 þ
Therefore, with respect to the Riemannian metric hD; Ri ¼ 2trðRT DÞ, the gradient G of f is3 G ¼ 2CW þ CWWT W þ WWT CW T
g
,
(2) where m and g are two positive constants (0omo1) which help improve the numerical stability of the algorithm [2]. It is worthwhile to note that (2) represents the instantaneous estimate of the algorithm step size that is optimal only for the noise subspace estimation. Changing the sign of m does not lead to an optimal step size for the signal subspace estimation. Therefore, we will restrict its use for noise subspace estimation only. For signal subspace estimation, it is better to use a fixed step size. For the sake of analysis, we rewrite the weight updating equation as follows: Wði þ 1Þ ¼ WðiÞ þ dðiÞ½2rðiÞr ðiÞWðiÞ rðiÞ
Proof. Let R be any matrix such that C ¼ RT R. Then f can be written in the alternative form f ðWÞ ¼ kRðI WWT Þk2 ,
(10)
where k k is the Frobenius norm. Thus, f ðWÞ is lower bounded by zero. For any scalar c, the sublevel set fW : f ðWÞpcg is closed. Thus, to prove it is compact it suffices to show it is bounded. Using the bound ð1 xÞ2 4x 2 for any scalar x, observe that N X
ð1 ðWWT Þii Þ2
i¼1
ð3Þ
¼
where dðiÞ 2 R is a sequence of positive step sizes (signal subspace estimation) such that
N X
4
N X i¼1
dðiÞ ¼ 1.
1 "
P X j¼1
P X
ð11Þ
!2 W2ij #
ð12Þ !
W2ij 2
ð13Þ
j¼1
¼ kWk2 2N.
(4)
i¼1
ð9Þ
Lemma 1. The function f in (6) is a lower bounded function with compact sub-level sets.
i¼1
1 X
ð8Þ
In the following, C is assumed to be positive definite and symmetric with distinct eigenvalues.
rT ðiÞWðiÞWT ðiÞWðiÞ WðiÞWT ðiÞrðiÞ r ðiÞWðiÞ,
T
¼ ½CWðI W WÞ þ ðI WW ÞCW.
kI WWT k2 X
T
T
3491
ð14Þ
This, together with the bound
The effect of bopt on the convergence performance of the algorithm will be assessed through simulations. 2. Analysis of NOja 2.1. Signal subspace estimation The ordinary differential equation (ODE) associated with (1) is dW ¼ CWðI WT WÞ þ ðI WWT ÞCW. (5) dt In fact, (5) is a negative gradient flow. Specifically, define the function f ðWÞ ¼ trðC 2WT CW þ WT CWWT WÞ.
(6)
Its directional derivative in the direction R is DR f ðWÞ ¼ 4 trðRT CWÞ þ 2 trðRT CWWT WÞ þ 2 trðRT WWT CWÞ.
f ðWÞXlmin fCg kI WWT k,
(15)
proves that f ðWÞpc implies kWk is bounded.
&
The above facts are enough to conclude that the ODE (5) does not diverge. The following lemma goes further and rules out the possibility that the ODE has a limit cycle. It is remarked that this is a non-trivial result because there exist gradient flows with compact sub-level sets which do not converge to a point. Lemma 2 (Pointwise convergence). For any initial condition, the ODE (5) converges to a point W satisfying CWðI WT WÞ þ ðI WWT ÞCW ¼ 0. Proof. Observe that the cost function f in (6) is real analytic and satisfies the conditions in Lemma 1. A not widely known result by Lojasiewicz [3] on real analytic gradient flows implies that the associated 3
ð7Þ
Recall that G is defined by the requirement DR f ¼ hG; Ri for all R.
ARTICLE IN PRESS S. Attallah et al. / Signal Processing 86 (2006) 3490–3495
3492
negative gradient flow, which is (5) in this case, must therefore converge pointwise (see Appendix). & By appealing to a result in [4], the main convergence result can now be stated. Theorem 1 (Self-Calibrated PCA). The flow (5) is a self-calibrated principal component flow meaning that for almost any initial condition Wð0Þ satisfying WT ð0ÞWð0Þ40, (5) converges to a point Wð1Þ having the following properties: (1) WT ð1ÞWð1Þ ¼ I. (2) The columns of Wð1Þ span the principal subspace of C. Proof. Pointwise convergence is proved in Lemma 2. That WT ðtÞWðtÞ ! I is proved in [4] using a Lyapunov argument. LaSalle’s invariance principle then implies it suffices to consider (5) restricted to the stable manifold WT W ¼ I. However, if WT W ¼ I then (5) reduces to Oja’s flow and it is well-known [5] that the Oja flow converges to the principal subspace of C. 2.2. Noise subspace estimation The noise subspace is found by inverting the sign of dðiÞ. However, the manifold WT W ¼ I, which was proved to be stable for principal subspace
estimation, now becomes an unstable manifold [4]. Therefore, unless we force WT W ¼ I, as done in the NOOja algorithm [1], the NOja algorithm might not, in general, produce a sequence of normalized W’s. This observation is tested through simulations. 3. Simulation results and discussion In the following, we choose rðiÞ to be a sequence of independent Gaussian random vectors with covariance matrix C ¼ RRT , where R is generated randomly for every realization so as to avoid having simulation results that depend on a particular choice of C. In addition, P ¼ 2 and Wð0Þ ¼ D, where Di;j ¼ dðj iÞ. In order to measure the errors in subspace estimation we compute the ensemble averages of the performance factors [6] r0 1 X trðWTr ðiÞE1 ET1 Wr ðiÞÞ , r0 r¼1 trðWTr ðiÞE2 ET2 Wr ðiÞÞ r0 1 X ZðiÞ ¼ kWTr ðiÞWr ðiÞ IkF , r0 r¼1
rðiÞ ¼
where the number of algorithm runs is r0 ¼ 40, r indicates that the associated variable depends on the particular run, and E1 (respectively, E2 ) is the minor ðN PÞ-dimensional subspace (respectively, principal P-dimensional subspace). For noise subspace estimation, E1 and E2 are interchanged. It should be
102
ρ
100
10-2
10-4 0 10
101
102 103 Number of iterations
104
105
101
102 103 Number of iterations
104
105
η
100
10-5
100
ð16Þ
Fig. 1. Performance measures r and Z for signal subspace estimation using a fixed step size (b ¼ 0:001).
ARTICLE IN PRESS S. Attallah et al. / Signal Processing 86 (2006) 3490–3495
clear that r measures the subspace estimation error, whereas Z measures the orthogonality of the subspace vectors. Case 1: b is fixed. Figs. 1 and 2 show the average behavior of NOja algorithm when used for signal and noise subspace estimation. Clearly, the two
3493
results confirm the ODE analysis which predicts that only the signal subspace estimation is possible. We can note a good orthogonality is achieved for the signal subspace case. Case 2: b is normalized for noise subspace estimation. Fig. 3 shows the average behavior of
ρ
100
10-5 100
101
102 103 Number of iterations
104
105
101
102 103 Number of iterations
104
105
η
100
10-5
10-10 100
Fig. 2. Performance measures r and Z for noise subspace estimation using a fixed step size (b ¼ 0:001).
102
ρ
100 10-2 10-4 100
101
102 103 Number of iterations
104
105
101
102 103 Number of iterations
104
105
η
100
10-10
100
Fig. 3. Performance measures r and Z for noise subspace estimation using bopt (m ¼ 0:001, g ¼ 0:01).
ARTICLE IN PRESS S. Attallah et al. / Signal Processing 86 (2006) 3490–3495
3494
NOja algorithm when used for noise subspace estimation using the optimal step size bopt given in (2). We have observed from the simulations that when m in (2) is relatively small, NOja algorithm converges to the right noise subspace most of the time, and the ‘2 -norms of the noise subspace vectors (column-vector norms of WðiÞ) neither go to zero nor do they diverge to 1. As clearly shown in Fig. 4, the norms of the noise subspace vectors tend to a value close to 1.
However, for relatively large m, the subspace error r will still look good (see Fig. 5) although the ‘2 norms of the noise subspace vectors reduce to 0 as shown in Fig. 6. This observation can be explained by the fact that the column-vector norms of WðiÞ tend to zero while the column-vectors themselves remain more or less orthogonal to the signal subspace. As a result, the numerator in (16) tends to zero much faster than the denominator, which leads to the results obtained.
1.4
1.4
1.35
1.2
1.3
1 Norm
Norm
1.25 1.2
0.8 0.6
1.15 0.4
1.1
0.2
1.05
0
1 0
2
4 6 Number of iterations
8
10 x 105
Fig. 4. ‘2 -norm of the noise subspace vectors using bopt (m ¼ 0:001, g ¼ 0:01).
0
2
4 6 Number of iterations
ρ
101
102 103 Number of iterations
104
105
101
102 103 Number of iterations
104
105
η
100
10-5 100
10 x 104
Fig. 6. ‘2 -norm of the noise subspace vectors using bopt (m ¼ 0:05, g ¼ 0:01).
100
10-5 100
8
Fig. 5. Performance measures r and Z for noise subspace estimation using bopt (m ¼ 0:05, g ¼ 0:01).
ARTICLE IN PRESS S. Attallah et al. / Signal Processing 86 (2006) 3490–3495
Whereas, if we use NOOja [1] (WT ðiÞWðiÞ ¼ I is ensured at each iteration), then the algorithm is ensured to converge towards the noise subspace as the norm of the subspace vectors will be one all the time even when m is relatively large. Simulation results involving NOOja have already been given in [1].
3495
Theorem 2. Let f 2 Rn be an analytic function such that f X0 in the neighborhood of 0 and f ð0Þ ¼ 0. Then, there exists a neighborhood U of zero such that every trajectory yx , verifying yx ð0Þ ¼ x, of the system y0 ¼ gradf ðyÞ is defined over ½0; 1Þ, has a finite length, and converges uniformly towards a point of Z ¼ ff ðxÞ ¼ 0g when t ! 1.
4. Conclusion The simulation results show that for a fixed step size, NOja algorithm works for signal subspace estimation, but fails when used for noise subspace estimation. This is, of course, in agreement with the ODE analysis. However, when the optimal step size is used along with a small value for m, it is possible for NOja algorithm to estimate the noise subspace correctly most of the time. Including the effect of bopt into the ODE analysis is a challenging task that remains an open problem. Acknowledgments The second author acknowledges informative conversations with Professors Uwe Helmke and Iven Mareels. Appendix In the following, we give the theorem by S. Lojasiewicz (translated from French [3]), which was used to prove Lemma 2.
Proof. A detailed proof is given in [3].
References [1] S. Attallah, K. Abed-Meraim, Fast algorithms for subspace tracking, IEEE Signal Process. Lett. 8 (7) (July 2001) 203–206. [2] S. Haykin, Adaptive Filters, Prentice-Hall, Englewood Cliffs, NJ, 1995. [3] S. Lojasiewicz, Sur les trajectoires du gradient d’une fonction analytique, Seminari di Geometria 1982–1983, pp. 115–117. [4] J.H. Manton, I.M.Y. Mareels, S. Attallah, An analysis of the fast subspace tracking algorithm NOJA, in the Proceedings of the IEEE ICASSP’02, Florida, USA. [5] W.Y. Yan, U. Helmke, J.B. Moore, Global analysis of Oja’s flow for neural networks, IEEE Trans. Neural Networks 5 (September 1994) 674–683. [6] S.C. Douglas, S.-Y. Kung, S.-I. Amari, A self-stabilized minor subspace rule, IEEE Signal Process. Lett. 5 (12) (December 1998) 328–330.