Neural Networks 17 (2004) 233–245 www.elsevier.com/locate/neunet
Generalized neural networks for spectral analysis: dynamics and Liapunov functions Jose´ M. Vegasa, Pedro J. Zufiriab,* b
a Departamento de Matema´tica Aplicada, Facultad de Ciencias Matema´ticas, UCM, 28040 Madrid, Spain Grupo de Redes Neuronales, Departamento de Matema´tica Aplicada a las Tecnologı´as de la Informacio´n, ETSI de Telecomunicacio´n, UPM, 28040 Madrid, Spain
Received 20 March 2002; accepted 29 May 2003
Abstract This paper analyzes local and global behavior of several dynamical systems which generalize some artificial neural network (ANN) semilinear models originally designed for principal component analysis (PCA) in the characterization of random vectors. These systems implicitly performed the spectral analysis of correlation (i.e. symmetric positive definite) matrices. Here, the proposed generalizations cover both nonsymmetric matrices as well as fully nonlinear models. Local stability analysis is performed via linearization and global behavior is analyzed by constructing several Liapunov functions. q 2003 Elsevier Ltd. All rights reserved. Keywords: Hebbian networks; Principal component analysis; Homogeneous functions; Nonlinear dynamics; Stability; Global behavior; Liapunov functions
1. Introduction Principal component analysis (PCA) is a very useful and simple technique in statistics and signal processing (Johnson & Wichern, 1992). PCA measures redundancy in the data via the correlation between the involved variables and provides a smaller set of related variables with less redundancy, being specially suitable under the Gaussian assumption. For example, it is widely used in the context of signal source characterization under the stationarity hypothesis, where all the information is gathered in a random vector x [ Rn : PCA is based on the spectral analysis of the second-order moments matrix that statistically characterizes such random vector. This matrix is the so called correlation matrix which happens to be the variance– covariance matrix for the zero mean case. Traditionally, PCA computes first an estimation of such correlation matrix; then a standard numerical method for spectral analysis can be applied (e.g. an iterative scheme for computing the eigenvector associated with the largest matrix eigenvalue (Golub & Van Loan, 1983) and so on). Since PCA only uses second order statistics it can also be used as a useful preprocessing step * Corresponding author. Tel.: þ 34-91-3367284; fax: þ 34-91-3367289. E-mail address:
[email protected] (P.J. Zufiria). 0893-6080/$ - see front matter q 2003 Elsevier Ltd. All rights reserved. doi:10.1016/j.neunet.2003.05.001
for independent component analysis (ICA) (Hyva¨rinen, Karhunen, & Oja, 2001) when Gaussianity cannot be guaranteed. Neural models have been extensively used for PCA on-line computations (Haykin, 1994; Diamantaras & Kung, 1996). In the seminal work (Oja, 1982) a neural net, called Oja’s net, was proposed as a model that, by taking successive samples of a random vector x [ Rn ; computes a unitary vector w ð1Þ along the direction of the first principal component associated with such samples. Hence, the intermediate step for estimating the correlation matrix is not required. The asymptotic behavior of the stochastic iterative process defined by Oja’s net, can be characterized via Stochastic Approximation Theory Ljung, 1977; Kushner and Yin 1997), provided some assumptions are satisfied (Zufiria, 2002). This way, the so called ODE method links such asymptotic behavior with the study of the invariant set corresponding to a continuous-time deterministic dynamical system associated with the original process. Precisely, the procedure proposed by Oja leads to the ODE dw ¼ Aw 2 ðwT AwÞw dt
ð1Þ
234
J.M. Vegas, P.J. Zufiria / Neural Networks 17 (2004) 233–245
where A (n £ n symmetric and positive semidefinite) is the correlation matrix of population x; and w [ Rn will, presumably, evolve towards the above mentioned w ð1Þ : It is worth mentioning that, independently of the probabilistic interpretation of matrix A; the equilibrium points of Eq. (1) are the eigenvectors (with norm one) of A; and, if the largest eigenvalue l1 of A is simple, the pair ðw ð1Þ ; 2w ð1Þ Þ of associated eigenvectors are the only asymptotically stable equilibrium points of Eq. (1); also, their regions of attraction cover the whole of Rn except for a hyperplane (i.e. a set of measure zero). In fact, as proven in Tolmos, Vegas, and Zufiria (1999), symmetry is not a necessary condition for obtaining this asymptotic behavior. One of the main features of Oja’s method is its implementability as a simple neural network. This fact implies a set of advantages concerning its adaptability, hardware implementation, etc., which can be very relevant in practical applications (Berzal & Zufiria, 1998; Berzal, Zufiria, & Rodrı´guez, 1996). In Eq. (1) the nonlinear term represents a decay element that preserves network weights from growing out of bounds. In that sense, Zhang and Bao (1995) addresses the possibility of improving Oja’s net convergence properties by modifying such nonlinear term in the learning law. The proposed generalization is: dw ¼ Aw 2 aðwÞw; dt
ð2Þ
where the scalar function a must be appropriately chosen (Tolmos et al., 1999; Zhang and Bao, 1995). The possibility of carrying out a direct study of the dynamics in Eq. (2) leads to strong results on global convergence. These results guarantee the robustness of the iterative scheme with respect to the values selected for neural network initialization. This paper includes some extensions of the results obtained in Zhang and Bao (1995). On the other hand, in Samardzija and Waterland (1991) and Tolmos et al. (1999), also in a neural implementation framework, the behavior of the extension of Eq. (1) to nonlinear operators of the form dw ¼ FðwÞ 2 aðwÞw; dt
domains of attraction which can be interpreted in terms of convergence robustness. Therefore, the determination of Liapunov functions for the systems considered in this paper opens new possibilities for characterizing the convergence, robustness and efficiency of the associated procedures. This paper is divided in three parts. In Section 2, we prove a global stability result for semilinear nets of Oja type as generalized by Zhang and Bao (1995). In Section 3 the nonlinear homogeneous case defined by Eq. (3) is analyzed, proving a local stability result which is rather complete for systems of gradient type. In Section 4 a number of global Liapunov functions are constructed in order to obtain further information on the domains of attraction of the local attractors studied in Sections 2 and 3. We also include an Appendix A containing remarks on terminology as well as the statements of some results on the general theory of dynamical systems which are used in the proofs.
2. The semilinear case Let A a real n £ n matrix, and consider the differential Eq. (1). In the symmetric, positive semidefinite case Oja shows that, if the eigenvalues of A are l1 . l2 . · · · . ln $ 0; then (i) the equilibrium points of 1 are {0; ^w ð1Þ ; ^w ð2Þ ; …; ^w ðnÞ } where w ðiÞ is a unit eigenvector associated with li ; (ii) wðtÞ converges as t ! 1 to some normalized eigenvector of A; and (iii) unless wð0Þ lies in the n 2 1 subspace spanned by w2 ; w2 ; …; wn ; wðtÞ converges to ^w ð1Þ : The other eigenvectors are saddle points and thus, from the practical viewpoint, are unobservable, requiring a substantial modification of the system in order to observe them as stable equilibria (Berzal et al., 1996; Berzal and Zufiria, 1998; Sanger, 1989). In Zhang and Bao (1995) the same result is proved while keeping the symmetry and definiteness properties of A; but allowing the nonlinearity aðwÞw corresponding to the learning term to vary as stated in Theorem 1, which we prove here in a more general case.
ð3Þ
is considered, where F : Rn 7 ! R and a : Rn 7 ! Rn are homogeneous functions with degðaÞ . degðFÞ: This generalization is also motivated by recent work on the design of dynamical systems for solving algebraic and optimization problems (Helmke and Moore, 1996). In Samardzija and Waterland (1991) and Tolmos et al. (1999) a local study of the dynamics of a specific form of Eq. (3) is carried out, although general stability results are not given. The derived stability conditions for some special cases, being local, have a limited applicability asspectral analysis methods from a numerical point of view. On the other hand, it is well known that Liapunov functions can be a very useful tool for global system analysis. They provide very useful information on
Theorem 1. Let A be a real n £ n matrix with eigenvalues l1 ; l2 ; …; ln ordered so that Rel1 $ Rel2 $ …Reln ; and assume that l1 is real, positive and simple. Let a : Rn ! R be locally Lipschitz such that: (a) There exists M . 0 such that aðwÞ . lmax ðA þ AT Þ for all w [ Rn with kwk . M: (b) For all w – 0 there exist 0 , N1 , N2 such that aðuwÞ , l1 for 0 , u , N1 ; aðuwÞ . l1 for u . N2 and u 7 ! aðuwÞ is strictly increasing on ½N1 ; N2 :: Consider equation w_ ¼ Aw 2 aðwÞw:
ð4Þ
J.M. Vegas, P.J. Zufiria / Neural Networks 17 (2004) 233–245
Let H be the ðn 2 1Þ -dimensional eigenspace associated with eigenvalues l2 ; …; ln : Then: (i) All solutions of Eq. (4) are bounded as t ! 1: (ii) There exist exactly two eigenvectors w ð1Þ ð1Þ 2 associþ ; w ð1Þ ð1Þ ated with l1 satisfying aðw þ Þ ¼ að w 2 Þ ¼ l1 : (iii) H is invariant under Eq. (4). (iv) Each of the half-spaces Hþ ; H2 limited by H is invariant, and solutions starting from initial values wð0Þ lying in one of these half-spaces converge as t ! 1 to the eigenvector w ð1Þ ^ lying on the same half-space as wð0Þ: Proof. (i) Let V ¼ kwk2 : Then dV=dt ¼ 2ðwT Aw 2 aðwÞkwk2 Þ # 2ðlmax ðA þ AT Þ 2 aðwÞÞkwk2 which is negative for kwk $ M: This shows that all solutions remain bounded as t ! 1: (ii) Let vð1Þ be a given eigenvector associated with l1 : Hypothesis (b) implies that u 7 ! aðuvð1Þ Þ takes on the value l1 for a unique u ¼ uþ in ð0; 1Þ; and the same holds for u 7 ! aðuð2vð1Þ ÞÞ for some (unique) u ¼ u2 [ ð0; 1Þ: Then þ 2 2 wþ 1 ¼ u v1 ; w1 ¼ 2u v1 satisfy the properties stated in 2 (ii). For simplicity, let us denote wþ ð1Þ ð1Þ 2 : þ ; w1 ¼ w 1 ¼w þ 2 Observe that, in general, kw1 k – kw1 k: (iii) is obvious, since H is invariant under both operators w 7 ! Aw (by construction) and w 7 ! 2aðwÞw (a scalar multiple of the identity). (iv) Let {e1 ; …; en21 } be a basis of H: Then n {wþ 1 ; e2 ; …; en21 } is a basis of R in which system (4) takes the form ( x_ ¼ l1 x 2 aðx; yÞx ð5Þ y_ ¼ Cy 2 aðx; yÞy where x [ R is the coordinate along wþ 1 and y ¼ ðy1 ; …; yn21 ÞT [ Rn21 represent coordinates with respect to {e1 ; …; en21 }: The eigenvalues of C are l2 ; …; ln with the same associated eigenvectors of A: Obviously, both ‘axes’ x ¼ 0; y ¼ 0 are invariant. Define z ¼ y=x; or y ¼ xz: Then 1 1 1 1 z_ ¼ 2 2 x_ y þ y_ ¼ 2 2 ðl1 2 aÞxy þ ðC 2 aIÞy x x x x
235
l1 2 aðx; 0Þ , 0 for x . x : x is, therefore, a global attractor for 6 on the half-line {x [ R : x . 0}: We must show that ðx; 0Þ is also a global attractor for Eq. (4) on the whole halfspace {ðx; yÞ [ Rn : x . 0}: This part requires some of the concepts on invariant sets and omega-limits given in the Appendix A. Let ðxðtÞ; yðtÞÞ be any solution with xð0Þ . 0: We have already shown that ðxðtÞ; yðtÞÞ remains bounded as t ! 1 and xðtÞ . 0 for all t: Consider the omega-limit set of the bounded trajectory g ¼ {ðxðtÞ; yðtÞ}; that is, V ¼ wðgÞ ¼ {ðj; hÞ : ðj; hÞ ¼ lim ðxðtk Þ; yðtk ÞÞ k!1
for some sequence tk ! 1} It is known (see the Appendix A) that this set is nonempty, closed, bounded and invariant for the equation under consideration, and attracts the trajectory g in the sense that the distance dððxðtÞ; yðtÞÞ; VÞ between the generic point ðxðtÞ; yðtÞÞ and the set V tends to zero as t ! 1: In our case, the invariance of the half-space {ðx; yÞ : x . 0} together with the fact that yðtÞ ! 0 as t ! 1 imply that V is contained in the closed half-line {ðx; 0Þ : x $ 0}: The largest closed, bounded, invariant set contained in that half-line is the closed ‘interval’ {ðx; 0Þ : 0 # x # x }: Therefore, V must be contained in this interval. On the other hand, since l 2 aðx; 0Þ . 0 for x , x ; then xðl 2 aðx; yÞÞ is strictly positive for 0 , x # 1; kyk # 1 if 1 . 0 is sufficiently small. Since yðtÞ ! 0 as t ! 1; there exists T . 0 such that kyðtÞk , 1 for all t $ T: But then, since x_ ðtÞ ¼ xðtÞ ðl 2 axðtÞ; yðtÞÞ; then x_ ðtÞ . 0 for all t $ T that satisfy 0 # xðtÞ # 1: Now, it is obviously impossible for a such a function xðtÞ to have a sequence tk ! 1 for which xðtk Þ ! 0: In other words, (0,0) cannot be an omega-limit point of {ðxðtÞ; yðtÞ}; that is, ð0; 0Þ V and V is contained in the open half-line {ðx; 0Þ : x . 0} But then the only nonempty, closed, bounded, invariant subset contained in the open half-line {ðx; 0Þ : x . 0} is {ðx; 0Þ}: Hence, V ¼ {ðx; 0Þ} and Theorem 1 is proved. A
def
¼ 2ðl1 2 aÞz þ ðC 2 aIÞz ¼ ðC 2 l1 IÞz ¼ Bz: Since the eigenvalues of C have real parts strictly smaller than l1 ; we see that zðtÞ ! 0 exponentially fast as t ! 1: And since all solutions are bounded for t $ 0; we also have yðtÞ ¼ xðtÞzðtÞ ! 0
as t ! 1:
T þ Assume that ðwþ 1 Þ wð0Þ . 0; so that both wð0Þ and w1 lie on the same half-space Hþ : This means that xð0Þ . 0: On the invariant line y ¼ 0; Eq. (5) reduces to def
x_ ¼ l1 x 2 aðx; 0Þx ¼ ðl1 2 aðx; 0ÞÞx ¼ XðxÞ
ð6Þ
Our hypotheses imply that l1 2 aðx; 0Þ vanishes only for some x ¼ x . 0; while l1 2 aðx; 0Þ . 0 for 0 , x , x and
3. The nonlinear homogeneous case As mentioned earlier, in Samardzija and Waterland (1991) an extension of Eq. (1) is proposed for the case in which w 7 ! Aw is substituted by a homogeneous operator w 7 ! Fp ðwÞ of degree p; and aðwÞ is assumed to be a homogeneous function of degree q . p (see Eq. (3)). Some of the results in are not stated clearly, and, in particular, the stability properties of the equilibrium points of Eq. (3) are not given. These properties are non trivial, as shown in that same paper, where an example is given with n ¼ 2; p ¼ 3;
236
J.M. Vegas, P.J. Zufiria / Neural Networks 17 (2004) 233–245
q ¼ 4 which possesses 4 equilibrium points, some of them stable, some unstable. Let us recall some basic definitions and facts of spectral theory of homogenous operators. In order to keep the notation in agreement to that of Section 4, the degree of homogeneity will be denoted here by p 2 1 :
Proof. (i) Is obvious. (ii) Let w satisfy FðwÞ ¼ lw: Then tw is an eigenvector with associated eigenvalue tp22 l: But equation aðtwÞ ¼ tp22 l; that is, tq aðwÞ ¼ tp22 l; has a unique solution t ¼ ðl=aðwÞÞ1=ðq2pþ2Þ : Then, w ¼ tw is an eigenvector with associated eigenvalue aðwÞ: A
Definition 1. Let F : Rn ! Rn be homogeneous of degree p 2 1:
3.1. Local stability results
u [ Rn is an eigenvector of F if u – 0 and FðuÞ ¼ lu for some real l; which is called the associated eigenvalue. If l . 0; u is said to be a positive eigenvector. (ii) A ray Eu ¼ {tu : t . 0} is an eigenray if all its elements are eigenvectors. Eu is a positive eigenray if its elements are positive eigenvectors. (i)
Proposition 1. Let F : Rn ! Rn be homogeneous of degree p 2 1: If u satisfies FðuÞ ¼ lu; then w ¼ tu satisfies FðwÞ ¼ ðtp22 lÞw: (ii) For a ray Eu to be a ( positive) eigenray it is necessary and sufficient that any of its members (say u) be a ( positive) eigenvector. (iii) Euler’s formula: DFðwÞw ¼ ðp 2 1ÞFðwÞ for all w [ Rn : (iv) If u is an eigenvector of F with associated eigenvalue l then u is an eigenvector of the linearized operator DFðuÞ; with associated eigenvalue ðp 2 1Þl:
The local stability results for equilibria of Eq. (7) are a consequence of a general result in numerical linear algebra, which is related to the so-called method of deflation. The version of the method we need here is the following: Proposition 3. Let A be a real n £ n matrix, let l be a simple eigenvalue of A; with associated right eigenvector v: Let b [ Rn be an arbitrary vector. Then the eigenvalues of the ‘deflated matrix’ C ¼ A 2 vbT are: l 2 bT v (with associated right eigenvector v) together with the rest of the eigenvalues of A; with the same associated left eigenvectors.
(i)
p21
p21
ðA 2 vbT Þv ¼ lv 2 vðbT vÞ ¼ ðl 2 bT vÞv: Let uT be a left eigenvector of A associated with an eigenvalue m different from l: The general property of orthogonality between right and left eigenvectors associated with different eigenvalues implies that uT v ¼ 0; which in turn implies uT ðA 2 vbT Þ ¼ uT A 2 ðuT vÞbT ¼ muT
p22
Proof. (i) FðtuÞ ¼ t FðuÞ ¼ t lu ¼ t lðtuÞ: (ii) is an immediate consequence of (i). (iii) is a well-known result. (iv) By Euler’s formula, FðuÞ ¼ lu ) DFðuÞu ¼ ðp 2 1ÞFðuÞ ¼ ðp 2 1Þlu: A Proposition 2. Consider the system in general form w_ ¼ FðwÞ 2 aðwÞw
ð7Þ
where F : R ! R is homogeneous of degree p 2 1 and a : Rn ! Rþ is homogeneous of degree q . p 2 1 such that aðwÞ . 0 for w – 0 (it will be denoted as positive definite). Then: n
Proof. v is a right eigenvector of A 2 vbT associated with eigenvalue l 2 bT v since
n
(i)
The nonzero equilibrium points of Eq. (7) are the positive eigenvectors w of F with associated eigenvalue aðwÞ: (ii) In every positive eigenray Ew there exists exactly one vector w ¼ tw which is an equilibrium of Eq. (7), and t is given by l 1=ðq2pþ2Þ t ¼ aðwÞ where l is the eigenvalue associated with w:
This shows that uT is also a left eigenvector of A 2 vbT associated with the same eigenvalue m: A Let us go back to Eq. (7), and let w be an equilibrium point of Eq. (7). Then FðwÞ ¼ lw with l ¼ aðwÞ: Theorem 2. Let F : Rn ! Rn and a : Rn ! Rþ be homogeneous of degrees p 2 1 and q . p 2 1; respectively, a being positive definite. Let w be a nonzero equilibrium point of system Eq. (7). Then (i)
w is an eigenvector of F with associated eigenvalue aðwÞ: (ii) ðp 2 1ÞaðwÞ is an eigenvalue of DFðwÞ (iii) w is asymptotically stable if ðp 2 1ÞaðwÞ is a simple eigenvalue of DFðwÞ and the other eigenvalues have real parts strictly smaller than aðwÞ:
Proof. The Jacobian matrix of FðwÞ 2 aðwÞw at w is def
J ¼ DFðwÞ 2 aðwÞI 2 wD aðwÞ where DaðwÞ is the gradient of a considered as a row vector, that is, DaðwÞ ¼ 7aðwÞ T (Auchmuty, 1989).
J.M. Vegas, P.J. Zufiria / Neural Networks 17 (2004) 233–245
237
As shown above, w itself is an eigenvector of DFðwÞ with associated eigenvalue ðp 2 1ÞaðwÞ: This means that the Jacobian matrix J has the special structure assumed in the hypotheses of Proposition 3 with A ¼ DFðwÞ 2 aðwÞI; v¼ w and bT ¼ DaðwÞ: As a consequence, if
of numerical experience. We have thus rigorously proven that this is indeed the case. Note that the hypotheses of the theorem hold in the following two cases:
{m1 ¼ ðp 2 1ÞaðwÞ; m2 ; …; mn }
† F is linear, that is, FðwÞ ¼ Aw; A has a real largest eigenvalue l1 which is positive and simple, and w is any one of the two associated eigenvectors which satisfy aðwÞ ¼ l1 : In this case, Theorem 2 is a restatement of Theorem 1, with the additional assumption of homogeneity on a: † DF is symmetric, which means that F is the gradient 7f of a homogeneous function f of degree p and w=k wk is a nondegenerate local maximum of f on the unit sphere S ¼ {w [ Rn : kwk ¼ 1}; as shown in Theorem 3. The connection between eigenvectors of 7f and critical points of f on the unit sphere S has been extensively considered in the theory of homogeneous functions in Hilbert space because of its relevance in the study of bifurcation problems. In the context of this paper, we will only deal with those critical points which are maxima of f; but a much more general analysis is possible, both for minima and saddle points.
are the eigenvalues of DFðwÞ and m1 is simple, then the eigenvalues of A 2 aðwÞI are: {m1 2 aðwÞ ¼ ðp 2 2ÞaðwÞ; m2 2 aðwÞ; …; mn 2 aðwÞ}; and then the eigenvalues of B are {m2 2 aðwÞ; …; mn 2 aðwÞ} together with
m1 2 aðwÞ 2 DaðwÞ w ¼ ðp 2 2ÞaðwÞ 2 qaðwÞ ¼ ðp 2 2 2 qÞaðwÞ because of Euler’s formula DaðwÞ w ¼ qaðwÞ applied to the q-homogeneous function a: Since q $ p 2 1 and aðwÞ . 0; this eigenvalue is always negative, and the stability properties of w as an equilibrium point of Eq. (7) will be decided by the other eigenvalues of J; that is, by the signs of the real parts of mi 2 aðwÞ for i ¼ 2; …; n: If Remi , aðwÞ for i ¼ 2; …; n; all eigenvalues of the Jacobian matrix at w have negative real parts, which means that w is locally exponentially stable. A Example 1. Consider the following example, taken from Samardzija and Waterland (1991) in which Fðw1 ; w2 Þ ¼ ð2w31 þ w21 w2 ; w32 þ w1 w22 2 2w21 w2 ÞT ;
aðw1 ; w2 Þ ¼ ðw21 þ w22 Þ2 so that p 2 1 ¼ 3; q ¼ 4: The eigenrays of F are generated by the following eigenvectors: Eigenvector
Eigenvalue
^ð1; 0Þ
21
^ð1; 1Þ
0
^ð1; 21Þ
22
^ð0; 1Þ
1
w_ 2 ¼
þ
w1 w22
w_ ¼ fðwÞ 2 aðwÞw
2
2w21 w2
2
ðw21
þ
w22 Þ2 w2
are (0,0) and w ^ ¼ ^ð0; 1Þ: As expected, since a is positive definite, only the positive eigenrays are detected as equilibria of the dynamical system. Concerning stability, ! 0 0 ðp 2 1Þaðw ^ Þ ¼ 3; DFðw ^ Þ ¼ 1 3 Therefore, condition (iii) of Theorem 2 implies that both w þ and w 2 are asymptotically stable. This dynamic behavior is claimed in Samardzija and Waterland (1991) only in terms
ð8Þ
Then: (i)
Every positive eigenray Ew of 7f contains exactly one equilibrium w ¼ tw of Eq. (8), where t is given by the formula t ¼
The equilibrium points of the associated dynamical system ( w_ 1 ¼ 2w31 þ w21 w2 2 ðw21 þ w22 Þ2 w1 w32
Theorem 3. Let f; a : Rn 7 ! Rþ be homogeneous positive definite functions of degrees p and q; respectively, with q . p 2 2: Consider the system
pfðwÞ
!
1 q2pþ2
aðwÞkwk2
(ii) If w is a nondegenerate local maximum of f on S; then w ¼ tw is an asymptotically stable equilibrium of Eq. (8). Remark. w is a nondegenerate maximum if D2 LðwÞ is nondegenerate (thus negative definite) on the tangent hyperplane to S on w; where L is the Lagrangian function associated with the constrained optimization problem {maxfðwÞ; kwk2 2 1 ¼ 0}: Proof. (i) Is just a restatement of part (ii) of Proposition 2, together with the observation that l can be directly expressed in terms of w : Indeed, if 7fðwÞ ¼ lðwÞ with l . 0; and kwk ¼ 1; pfðwÞ ¼ ðEulerÞ ¼ k7fðwÞ; wl ¼ lkw; wl ¼ lkwk2
238
J.M. Vegas, P.J. Zufiria / Neural Networks 17 (2004) 233–245
which means that
l¼
pfðwÞ kwk2
and the stated formula obtains. (ii) If w is a local maximum of 7f on S; the Lagrange multiplier theory implies that fðwÞ ¼ lw for some real l which must be positive since fðtwÞ ¼ tp fðwÞ is increasing for t [ ð0; 1Þ: Thus, Ew is a positive eigenray of 7f; and w ¼ tw (with t given by the above formula) is an equilibrium point Eq. (8). In order to study the local stability properties of w we first observe that D2 fðwÞ ¼ tp22 D2 fðwÞ: Let m1 ; m2 ; …; mn be the eigenvalues of D2 fðwÞ; m1 ðp 2 2Þl being the one corresponding to eigenvector w : D2 fðwÞw ¼ ðEulerÞ ¼ ðp 2 2Þ7fðwÞ ¼ ðp 2 2Þlw: By Lagrange multiplier theory, the Lagrangian function LðwÞ ¼ fðwÞ 2
1 lðkwk2 2 1Þ 2
must be negative semidefinite along the directions orthogonal to w (Bertsekas, 1996). Our nondegeneracy assumption actually requires that it be negative definite. But D2 LðwÞ ¼ D2 fðwÞ 2 lI is a symmetric matrix, w is an eigenvector of D2 fðwÞ (hence of D2 LðwÞ) with associated eigenvalue m1 ¼ ðp 2 2Þlððp 2 3Þl for D2 LðwÞÞ and the other eigenvectors are necessarily orthogonal to w: By our previous discussion on the sign of D2 LðwÞ on w’ ; these other eigenvalues (which are m2 2 l; …; mn 2 l) must be negative. Therefore
mi , l
for
i ¼ 2; …; n
the domains of attraction associated with asymptotically stable equilibria. There is no general method to construct Liapunov functions and there is no knowledge available concerning the accuracy of the obtained domain estimates. Nevertheless, the partial information provided by Liapunov functions can be sufficient to guarantee a minimum size of the domains of attraction and, consequently, some robustness properties on the system behavior. In this section, different Liapunov functions are studied for some of the semilinear and nonlinear neural systems that have been analyzed in the previous paragraphs. System w_ ¼ 7fðwÞ 2 aðwÞw
where f; a : Rn ! R are positive and homogeneous functions of degrees p and q; respectively, has been studied in the Section 3, obtaining local stability results for local maxima of f on the unit sphere S: The case aðwÞ ¼ pfðwÞ ¼ kw; 7fðwÞl w_ ¼ 7fðwÞ 2 pfðwÞw
7f 2 px fðwÞ p 2 ¼ fðwÞ 7 ln f 2 kwk 2
7fðwÞ 2 pfðwÞw ¼ fðwÞ
mi ¼ t p22 mi , t p22 l ¼ aðwÞ which is just the hypothesis stated in Theorem 1.
4. Global Liapunov functions Liapunov functions have been already used in the context of Hebbian models for PCA. They represent a quantity that monotonically decreases in time such as, for instance, the evolution of the energy or the mutual information between input and output variables Plumbley (1995). In general, Liapunov functions provide estimates (subsets) of
ð11Þ
we conclude that V ¼ 2ln fðwÞ þ
Therefore, the n 2 1 eigenvalues mi ¼ t p22 mi of D2 fðwÞ ði ¼ 2; …; nÞ satisfy
ð10Þ
which was analyzed in Tolmos et al. (1999), is a nonlinear version of Oja’s net and can be fully analyzed by Liapunov function techniques. Taking into account that
If we now express this in terms of w instead of w we find: (a) D2 fðwÞ ¼ t p22 D2 fðwÞ: Hence the eigenvalues of 2 D fðwÞ are mi ¼ t p22 mi ; i ¼ 1; 2; …; n: (b) aðwÞ ¼ t p22 l:
ð9Þ
p kwk2 2
ð12Þ
satisfies V_ ¼ 2fðwÞk7Vk2 ; and is, therefore, a strict Liapunov function in the domain Rn ={0}: It is interesting to mention that, in addition, it is a Liapunov function for the discrete associated system wðk þ 1Þ ¼ wðkÞ þ 7fðwðkÞÞ 2 pfðwðkÞÞwðkÞ; as was pointed out by Zhang and Leung (1995) for Oja’s net. A detailed analysis of Eq. (10) leads to alternative Liapunov functions. As frequently happens in dynamical systems composed by homogeneous functions, one can benefit from using polar coordinates r ¼ kwk; u ¼ w=kwk; thus obtaining r_ ¼ ¼
1 1 kw; wl _ ¼ kw; 7fðwÞ 2 pfðwÞwl r r 1 1 2 r2 p ðpfðwÞ 2 pfðwÞr 2 Þ ¼ pr fðuÞ r r
¼ r p21 ð1 2 r 2 ÞpfðuÞ:
J.M. Vegas, P.J. Zufiria / Neural Networks 17 (2004) 233–245
w_ r_ 1 1 2 2w¼ 7fðwÞ 2 2 pfðwÞw u_ ¼ r r r r
239
Proposition 4. (i) V2 ðwÞ ¼ 2fðwÞ þ 12 Hðkwk2 Þw is a strict Liapunov function for system
¼ r p22 ð7fðuÞ 2 pfðuÞuÞ ¼ r p22 7S fðuÞ ¼ r p22 fðuÞ7S ln fðuÞ where 7S fðuÞ ¼ 7fðuÞ 2 k7fðuÞ; ul is the ‘projected’ or ‘reduced gradient’ on the unit sphere S; representing the gradient of f as a function defined on the Riemannian manifold S: We can see that both equations contain a common scalar factor r p22 fðuÞ; which is positive and nowhere vanishing (except for r ¼ 0; u ¼ 0; that is, outside of the origin). Now, it is a known fact (see Appendix A) that simplification of that common factor gives a related dynamical system ( r_ ¼ prð1 2 r 2 Þ u_ ¼ 7S ln fðuÞ which has the same orbits than the original system Eq. (10). The advantage of this simplified system is that it is uncoupled, a fact which enables us to prove easily the result stated in Theorem 4, eliminating the nondegenerate maximum hypothesis and providing a precise estimate of the region of attraction of local maxima of f on S as attractors of Eq. (10). Theorem 4. Let w be a strict local maximum of f on S ¼ {kwk2 ¼ 1}; and let V , S be its regions of attraction for the gradient system u_ ¼ 7S ln fðuÞ: on S: Then w is a local attractor of Eq. (10) being its region of attraction V~ ¼ {w [ Rn ={0} : w=kwk [ V}: Remark. Observe that fðwÞ ¼ 12 kAw; wl corresponds exactly to Oja’s net, with the same conclusions regarding domains of attraction.
4.1. Some variants: radial aðwÞ
dw ¼ 7fðwÞ 2 H 0 ðkwk2 Þw: dt 1 (ii) V3 ðwÞ ¼ 2ln fðwÞ þ Hðkwk2 Þ is a strict Liapunov 2 function for dw ¼ 7fðwÞ 2 fðwÞH 0 ðkwk2 Þw: dt Note that V3 corresponds to Eq. (12) for HðzÞ ¼ pz: Similar expressions could be obtained if aðwÞ=fðwÞm were radial for some m – 1; then, the Liapunov function would be V4 ðwÞ ¼ 2
1 1 fðwÞ12m þ Hðkwk2 Þ: 12m 2
and so on. The rest of the Liapunov functions will deal with the semilinear problem shown in Eq. (4) first studied by Zhang and Bao (1995). A will be assumed to be positive definite and a radially unbounded and positive. 4.2. Quadratic Liapunov functions and invariant ellipsoids In Tolmos et al. (1999) the global attractivity (except w ¼ 0) of the unit sphere S1 ; for system (10) was proven. For doing so, the Liapunov function V2 ¼ ðkwk2 2 1Þ2 was employed showing that V_ 2 ¼ 24pfðwÞV2 : See also the previously obtained equation r_ ¼ r p21 ð1 2 r 2 ÞpfðuÞ: This property does not extend to the general formulation (9) except for the semilinear case, that is fðwÞ ¼ 12 kw; Awl : Theorem 5. Let aðwÞ ¼ kw; Mwl with A and M symmetric positive definite. Let P the unique solution of the Liapunov equation AT P þ PA ¼ 2M:
For the general case, with aðwÞ unrelated to fðwÞ in Eq. (9), it does not seem to be possible to extend easily the results shown above, not even in the semilinear case, that is, fðwÞ ¼ 12 kw; Awl: We have a fairly obvious exception when aðwÞw is a gradient, which implies that a must be radial: If aðwÞ ¼ hðkwk2 Þ and hðzÞ ¼ H 0 ðzÞ; then 1 7fðwÞ 2 hðkwkÞw ¼ 7 fðwÞ 2 Hðkwk2 Þ : ð13Þ 2 On the other hand, if aðwÞ=fðwÞ is radial (say aðwÞ=fðwÞ ¼ H 0 ðkwk2 ÞÞ; then 0
2
7fðwÞ 2 aðwÞw ¼ 7fðwÞ 2 fðwÞH ðkwkÞ Þw 1 2 ¼ fðwÞ7 ln fðwÞ 2 Hðkwk Þ : 2 These formulas clearly imply the Proposition 4:
ð14Þ
ð15Þ
ð16Þ
Then, V5 ðwÞ ¼ ðkw; Pwl 2 1Þ2
ð17Þ
is Liapunov function of w_ ¼ Aw 2 kw; Mwlw
ð18Þ
and the ellipsoid {w [ R : kw; Pwl ¼ 1} is invariant and attracts all orbits except w ¼ 0: n
Indeed, direct calculations show that V5 satisfies V_ 5 ¼ 24aðwÞV5
ð19Þ
thus proving Theorem 5. In particular, if M ¼ A (Oja’s net) we have P ¼ I; and the ellipsoid becomes the unit sphere, as was already known. If M ¼ I (that is, aðwÞ ¼ kwk2 ), the Liapunov function just obtained can be combined with the one found above for the radial case ðV2 ðwÞÞ to obtain more
240
J.M. Vegas, P.J. Zufiria / Neural Networks 17 (2004) 233–245
information. In any case, the presence of invariant ellipsoids for the semilinear case suggest the possibility of using generalized polar coordinates. In this paper we will not pursue this topic further.
is differentiable, positive and homogeneous of degree q $ 2; a case which covers many interesting applications. As before, we will start from the diagonal form
4.3. Liapunov functions that depend on diagonal form of A
By means of the change of variable zi ¼ wiþ1 =w1 previously considered in Section 2, and renaming w1 ¼ x; we obtain ( x_ ¼ l1 x 2 aðx; xzÞx ¼ l1 x 2 xqþ1 að1; zÞ ð25Þ z_ ¼ Bz
If we assume that in system (4) A is already in diagonal form: A ¼ diagðl1 ; l2 ; …; ln Þ; with l1 . l2 . · · · . ln . 0; several possibilities can be considered: 4.3.1. aðwÞa1 ðw1 Þa2 ðw2 Þ· · ·an ðwn Þ Assume aðwÞ has this form, with a0i ðwi Þwi . 0 for wi – 0 Then Eq. (4) takes the form " # X wi ¼ wi li 2 ak ðwk Þ ; i ¼ 1; …; n ð20Þ k
which corresponds to the type of neural networks introduced by Cohen and Grossberg in the context of associative memories Cohen and Grossberg (1983); they show that the net has a global Liapunov function V6 ðwÞ ¼ 2
n X
li ai ðwi Þ þ
i¼1
n 1 X a ðw Þa ðw Þ 2 i;j¼1 i i j j
ð21Þ
Indeed, one can easily prove that X V_ 6 ¼ 2 a0 ðwi Þwi ½2li þ aðwÞ2 :
X
l2i w2i þ
1 kw; Awl2 2
¼ 2kAw; Awl þ
1 kw; Awl2 2
i
def
where B ¼ diagðb1 ; b2 ; …; bn21 Þ with bi ¼ li 2 l1 , 0: For convenience of notation, we will denote di ¼ 2bi ¼ l1 2 li and D ¼ 2B ¼ diagðdi Þ: D is then positive definite and diagonal. The first is a Bernoulli equation: The change of variable u ¼ x2q transforms Eq. (25) into ( u_ ¼ 2ql1 u þ qað1; zÞ ð26Þ z_ ¼ 2Dz If u ¼ að1; 0Þ=l1 ; then v ¼ u 2 u and z satisfy ( v_ ¼ 2ql1 v þ qgðzÞ
ð27Þ
where gðzÞ ¼ að1; zÞ 2 að1; 0Þ vanishes at z ¼ 0:
1 V6 ðwÞ ¼ 2 li ai ðwi Þ þ aðwÞ2 : 2 i¼1
¼2
ð24Þ
def
n X
For the case of Oja’s net, aðwÞ ¼ kw; Awl ¼ ai ðwi Þ ¼ li w2i : This implies that X 1 V7 ðwÞ ¼ 2 li ai ðwi Þ þ aðwÞ2 2 i
i ¼ 1; 2; …; n
z_ ¼ 2Dz
which in our case reduces to
i
w_ i ¼ wi ½li 2 aðwÞ;
ð22Þ P
li w2i ; so that
Proposition 5. (a) Let a : Rn ! R be differentiable and homogeneous of degree q: Let l1 . l2 $ · · ·ln . 0 be given. Let u ¼ að1; 0Þ=l1 ; di ¼ l1 2 li for i ¼ 1; 2; …; n 2 1; and D ¼ diagðd1 ; d2 ; …; dn21 Þ: Then there exists a constant K0 . 0 such that for all K . K0 ; def
V7 ðv; zÞ ¼ lvl þ KððzT D21 zÞ1=2 þ ðzT D21 zÞq=2 Þ
ð28Þ
is a Liapunov function for system (27) and therefore !1=2 n 1 að1; 0Þ K X 1 K 2 V8 ðwÞ ¼ q 2 wi þ q þ w1 w1 i¼2 l1 2 li w1 l1 ð23Þ
n X i¼2
1 w2 l1 2 li i
!q=2
which does not depend now on the diagonal structure of A: In fact, one can directly prove that V7 is a Liapunov function, although the proof depends on some non-obvious cancellations which suggest that the extension of this particular function form to other apparently close cases, may be difficult.
is a Liapunov function for Eq. (24) on the invariant halfspace w1 . 0 (similarly for w1 , 0). In particular, all trajectories lying on that half-space converge to ðu; 0Þ as t ! 1 (similarly for ð2u; 0Þ).
4.3.2. aðwÞ homogeneous of degree q P If aðwÞ cannot be written as a sum ai ðwi Þ; no generalization of V6 ðwÞ in Section 4.3.1 is readily available. As a final result in our search for Liapunov functions, we give one such function which can be constructed when aðwÞ
Proof. It is clear that gðzÞ ¼ að1; zÞ 2 að1; 0Þ satisfies lgðzÞl # Lðkzk þ kzkq Þ for some constant L . 0; since gð0Þ ¼ 0; a and hence g is assumed to be differentiable and the homogeneity of a implies that lað1; zÞl # L1 ðkð1; zÞkq Þ; where L1 is the maximum of a on the unit sphere. Let
J.M. Vegas, P.J. Zufiria / Neural Networks 17 (2004) 233–245
V ¼ V7 be as in Eq. (28). Then
where
v V_ ¼ v_ þ K½ðzT D21 zÞ21=2 zT D21 þ ðq 2 1Þ lvl
A¼
ðzT D21 zÞðq22Þ=2 zT D21 _z ¼
v ð2ql1 v þ qgðzÞÞ þ K½ðzT D21 zÞ21=2 zT D21 þ ðq 2 1Þ lvl ðzT D21 zÞðq22Þ=2 zT D21 ð2DzÞ
¼ 2ql1 lvl þ q
v gðzÞ 2 K½ðzT D21 zÞ21=2 kzk2 þ ðq 2 1Þ lvl
ðzT D21 zÞðq22Þ=2 kzk2
2
0
0
1
241
! ;
a ¼ x2 þ 2xy þ 4y2 is positive definite, homogeneous of degree q ¼ 2: pffiffi The equilibrium points are ð^ 2; 0Þ (eigenvectors associated to the largest eigenvalue 2) and (0, ^ 1/2) (eigenvectors associated to eigenvalue 1). pffiffi In the ‘homogeneous coordinates’ u ¼ x22 ; v ¼ u 2 ð 2; 0Þ; z ¼ y=x introduced above, the system takes the form ( v_ ¼ 24v þ 2ð2z þ 4z2 Þ z_ ¼ 2z
Now, since v=lvl lies between 2 1 and 1, it is clear that v gðzÞ # lgðzÞl kvk On the other hand, zT D21 z #
1 kzk2 l1 2 l2
) 2ðzT D21 zÞ21=2 # 2ðl1 2 l2 Þ1=2 kzk21
¼ 24lvl þ
and zT D21 z $
in the half-space x . 0: In this case, V7 ðv; zÞ ¼ Vðv; zÞ ¼ lvl þ Klzl þ Kz2 satisfies v z V_ ¼ v_ þ K z_ þ 2Kz_z lvl lzl v z ¼ ð24v þ 4z þ 8z2 Þ þ K þ 2Kz ð2zÞ lvl lzl v ð4z þ 8z2 Þ 2 Klzl 2 2Kz2 : lvl
Now,
v v z # z ¼ lzl lvl lvl
1 kzk2 l1 2 ln
) 2ðzT D21 zÞðq22Þ=2 # 2
1 kzkq22 ðl1 2 ln Þðq22Þ=2
which implies V_ # 24lvl þ ð4 2 KÞlzl þ ð8 2 2KÞz2
This means that V_ # 2ql1 lvl þ qlgðzÞl 2 K ðl1 2 l2 Þ1=2 kzk þ
Therefore, V will be a Liapunov function if we choose K . 4 (say K ¼ 5): Furthermore, for K ¼ 5 we have q21 kzkq ðl1 2 ln Þðq22Þ=2
# 2ql1 lvl þ qLðkzk þ kzkq Þ q21 q kzk 2 K ðl1 2 l2 Þ1=2 kzk þ ðl1 2 ln Þðq22Þ=2 It is now obvious that by choosing K sufficiently large one can make V_ negative definite, thus ending the proof (the case w1 , 0 is trivially equivalent). A Systems like Eq. (24) belong to the larger class of quasi-homogeneous autonomous equations, for which a fairly developed theory is available (Coll, Gasull, & Prohens, 1997). However, no Liapunov function had been given so far. Example 2. Consider the following equation ( x_ ¼ 2x 2 ðx2 þ 2xy þ 4y2 Þx y_ ¼ y 2 ðx2 þ 2xy þ 4y2 Þy
V_ # 24lvl 2 lzl 2 2z2 In the original coordinates, 1 1 lyl y2 þ5 2 V ¼ 2 2 þ5 2 x x x (restricted to x . 0) which can be written more conveniently as V7 ¼
1 ½l2 2 x2 l þ 5lxkyl þ 5y2 2x2
For V ¼ V0 . 0 fixed, the level curve V ¼ V0 can be easily constructed on the first quadrant, and then reflected about the x axis. For y . 0; the curve l2 2 x2 l þ 5xy þ 5y2 ¼ 2x2 V0 pffiffi can be decomposed in two parts: When 0 , x , 2; we have 2 2 x2 þ 5xy þ 5y2 ¼ 2x2 V0 which corresponds pffiffithe x axis ffi an arc of hyperbola crossing pffiffiffiffiffiffiffiffiffiffiffiffiffiffito at x ¼ 2=ð1 þ 2V0 Þ (which is always less than 2): As for
242
J.M. Vegas, P.J. Zufiria / Neural Networks 17 (2004) 233–245
pffiffi those x . 2;
and this is true if 4 $ k; 1 $ 5k and 2 $ 5k: Our best choice is k ¼ 1=5 :
x2 2 2 þ 5xy þ 5y2 ¼ 2x2 V0 or
1 V_ # 2 V 5
ð1 2 2V0 Þx2 þ 5xy þ 5y2 ¼ 2
and then we can assert that
represents another hyperbola p(as can be ffieasily checked) ffiffiffiffiffiffiffiffiffiffiffiffiffiffi which crosses the x axis at x ¼ 2=ð1 2 2V0 Þ if V0 , 1=2 and does not cross it if V0 $ p 1=2 ffiffi (which is the largest value of V along the half-line {x . 2; y ¼ 0}: Fig. 1 shows some level curves of V; for values of V between 0 and 1, as well as some trajectories of the dynamical system.
VðxðtÞ; yðtÞÞ # Vðxð0Þ; yð0ÞÞe2t=5
Remarks. (i) As we saw before, in ‘homogeneos coordinates’, V satisfies V_ # 24lvl þ ð4 2 KÞlzl þ ð8 2 2KÞz2 ¼ 24lvl 2 lzl 2 2z2 when K ¼ 5: Now, in Liapunov function theory one (ideally) wants to obtain a relation of the type V_ # 2kV for some k . 0: In our case,
for t $ 0
(in any system of coordinates). This gives an upper estimate of the time it takes the system to reach some particular level set V ¼ V0 given Vðxð0Þ; yð0ÞÞ; and enables us to obtain relevant information regarding the time discretization of our continuous-time dynamical system, a very important topic which lies beyond the purposes of this paper. (ii) The choice of V is rather arbitrary. For instance, one could allow for two different constants K1 ; K2 : 1 1 lyl y2 þ K2 2 V ¼ 2 2 þ K1 2 x x x
24lvl 2 lzl 2 2z2 # 2kðlvl þ 5lzl þ 5z2 Þ
so that for large values of K2 we could obtain ellipses instead of hyperbolas. For example, if 1 1 lyl y2 þ 10 2 V ¼ 2 2 þ 5 2 x x x
will hold if
the curves
4lvl þ lzl þ 2z2 $ kðlvl þ 5lzl þ 5z2 Þ
ð1 2 2V0 Þx2 þ 5xy þ 10y2 ¼ 2
Fig. 1. Level curves and trajectories for Example 2.
J.M. Vegas, P.J. Zufiria / Neural Networks 17 (2004) 233–245
243
Fig. 2. Globally attracting invariant ellipse for Example 2.
pffiffi (for x . 2) would correspond to arcs of ellipses for V0 sufficiently small. (iii) Finally, the results of Section 4.2 can be applied, since ! ! 1 1 x 2 2 aðx; yÞ ¼ x þ 2xy þ 4y ¼ ðx; yÞ 1 y so that M¼
1
1
1
4
!
in the notation of Section 4.2. One easily finds that ! 1=2 2=3 P¼ 2=3 4 is the unique symmetric solution to the Liapunov matrix equation AP þ PA ¼ 2M and therefore 2 1 2 4 2 V5 ¼ x þ xy þ 4y 2 1 2 3 is a Liapunov function for our system, and 1 2 4 x þ xy þ 4y2 ¼ 1 2 3 is a globally attracting invariant ellipse (except for the zero solution) which necessarily contains all nonzero equilibria as well as the separatrices connecting pffiffithe saddles (0, ^ 1/2) with each of the local attractors ð^ 2; 0Þ (see Fig. 2).
5. Concluding remarks Two previously proposed generalizations of the wellknown Oja network have been analyzed in this paper. One of them (Zhang and Bao, 1995) has been generalized further to include nonsymmetric matrices, so that its applicability has been extended beyond the standard component analysis applications. Concerning Samardzija and Waterland (1991), we have proved a particular case of a theorem that was not clearly stated in that paper. The correct statement and proof of a general result on homogeneous networks opens the door to new developments on the theory of functional networks, the design of dynamical systems for solving algebraic and optimization problems, and related information processing fields where higher-order correlations may be relevant. Several Liapunov functions have been constructed in the second part of the paper, with the aim of providing some practical results concerning global behavior (e.g. domains of attraction) and robustness of the dynamical models. The fruitful use of homogeneous coordinates z ¼ y=x reflects a much deeper structure that our computations suggest. Indeed, it is known (Helmke and Moore, 1996; Yan, Helmke, & Moore, 1994) that Oja’s model can be _ ¼ AW 2 WW T AW; reinterpreted in matrix terms as W where W is an n £ k matrix, and one can show that the Stiefel manifold Stðk; nÞ ¼ {W [ Rn£k : W T W ¼ Ik } of n £ k matrices with orthonormal columns is invariant under the flow (in fact, it is globally attracting, since kIk 2 W T Wk2
244
J.M. Vegas, P.J. Zufiria / Neural Networks 17 (2004) 233–245
is a global Liapunov function; see Yan et al. (1994)). This preservation of orthogonality is a very important property of Oja’s flow, and it is very helpful in computing eigenvectors of symmetric matrices, due to their inherent orthogonality. These authors also show that Oja’s ow restricted to the Stiefel manifold is just the gradient ow associated to the Rayleigh quotient RðWÞ ¼ trðW T AWÞ: Many other algorithms that preserve orthogonality have been proposed, mostly in the context of optimization problems and, more recently, in the neural network area. See Amari (1999), Edelman, Arias, and Smith (1998) and Fiori (2001). It would be very interesting to extend these methods to the general models analyzed in this paper, but there is no clear indication for it, except in the case of Section 4.2 (that is, w_ ¼ Aw 2 wwT MwÞ: (Enun 1 –23).
Acknowledgements This work has been partially supported by the Proyecto Multidisciplinar de Investigacio´n y Desarrollo 10978 of the Universidad Polite´cnica de Madrid, the coordinated project PB97-0566-C02 of the Programa Sectorial de PGC de la Direccio´n General de Ensen˜anza Superior e Investigacio´n Cientı´fica in the MEC, and projects BFM2000-1475 and BFM2003– 02960 of the Programa Nacional de PGC in the MCyT of Spain.
Appendix A In this section we include some remarks concerning notation, as well as some definitions which may help the reader. Note that some of the concepts presented below may have different versions in other sources of the literature. 1. The standard inner product of vecto x; y [ Rn is denoted by kx; yl or xT y: The Euclidean norm of vector x is denoted by kxk: Thus, kxk2 ¼ xT x: The P (Fro¨benius) norm of an n £ m matrix M is kMk ¼ ð ij m2ij Þ1=2 : 2. lmax ðMÞ and lmin ðMÞ denote the largest and smallest eigenvalues of a symmetric matrix M: One has lmin ðMÞkxk2 # xT Mx # lmax ðMÞkxk2 : For a general square matrix M; xT ðM þ M T Þx ¼ 2xT Mx; and therefore lmin ðM þ M T Þkxk2 # 2xT Mx # lmax ðM þ M T Þkxk2 : 3. If x [ Rn and A , Rn ; dðx; AÞ denotes the distance from x to A; that is, dðx; AÞ ¼ inf{lx 2 yl : y [ A}: The dneighborhood Nd ðAÞ is defined as {y [ Rn : dðy;AÞ , d}; and is an open set containing A: 4. The boundary of A , Rn is denoted by ›A: 5. For a function f : Rn ! Rm ; Df ðxÞ is the Jacobian matrix of f at point x : If m ¼ 1; Df ðxÞ is a row vector, whose transpose Df ðxÞT is the gradient 7f ðxÞ (a column vector). If A , Rn is an open set then f : A ! Rm is a C 1 function (on A) if all partial derivatives exist and are continuous on A:
6. f : A , Rn ! Rm is a Lipschitz function if there exists a constant L such that kf ðxÞ 2 f ðyÞk # Lkx 2 yk for all x; y [ A: Any such constant is said to be a Lipschitz constant for f (on A), and the smallest of them is called the Lipschitz constant of f (on A), usually denoted by Lip f : The Mean Value Theorem implies that if A is open and convex, f is C 1 on A and there exists a constant L such that kDf ðyÞk # L for all y [ A; then f is Lipschitz on A and L is a Lipschitz constant. If every x [ A has a neighborhood Ux such that f is Lipschitz on Ux > A; then f is said to be locally Lipschitz on A: If A is open, a sufficient condition for this to happen is that f be C 1 on A: 7. A function f : Rn ! Rm is homogeneous of degree k [ R if f ðtxÞ ¼ tk f ðxÞ for all t . 0 and all x [ Rn : By differentiating both sides with respect to t and setting t ¼ 1; one obtains Euler’s formula: Df ðxÞx ¼ kf ðxÞ for all x [ Rn : This is a necessary and sufficient condition for a C 1 function to be homogeneous of degree k: 8. Let an autonomous differential equation (1) x_ ¼ f ðxÞ be given in Rn : All differential equations appearing in this paper are C 1 ; and therefore satisfy one of the sufficient conditions guaranteeing existence and uniqueness of solutions. The solution through x0 ; satisfying xð0Þ ¼ x0 ; is denoted by xðt; x0 Þ whenever the specification of x0 becomes necessary. Otherwise solutions are simply written as xðtÞ;yðtÞ and so on. 9. gþ ðx0 Þ ¼ {xðt; x0 Þ : t $ 0} is the positive orbit (or semiorbit) through x0 : The orbit (or full orbit) through x0 is gðx0 Þ ¼ {xðt; x0 Þ : t [ R}: The word trajectory is also used in the same sense. If gðxÞ ¼ {x}; then f ðxÞ ¼ 0 and conversely. x is said to be an equilibrium ( point) of Eq. (1). 10. If system y_ ¼ bðyÞf ðyÞ; is obtained from system (1) by multiplying the vector field f by a positive scalar function bðyÞ . 0; then both systems have the same orbits with a different time parametrization. This result may be used to simplify the qualitative analysis of Eq. (1). 11. A set A , Rn is positively (resp. negatively) invariant if x0 [ A implies xðt;x0 Þ [ A for all t $ 0 (resp. t # 0). A is invariant if it is both positively and negatively invariant. Thus, if A is invariant and x0 [ A; then gðx0 Þ , A: This means that A consists of full orbits. 12. An invariant set A attracts a solution xðtÞ if dðxðtÞ;AÞ ! 0 as t ! 1: If A attracts all orbits starting at points in some neighborhood of A; then A is said to be a local attractor. If it attracts all orbits, then it is a global attractor. 13. The v-limit set of a bounded trajectory xðtÞ is the set of all points y [ Rn which can be obtained as limits y ¼ limk!1 xðtk Þ for some sequence tk ! 1: If xðtÞ is the solution through x0 ; the v-limit set is denoted vðx0 Þ or vðgþ ðx0 ÞÞ: It can be shown (Amann, 1990) that vðx0 Þ is nonempty, compact, connected, invariant and attracts xðtÞ: In particular, if vðx0 Þ is a singleton {x};
J.M. Vegas, P.J. Zufiria / Neural Networks 17 (2004) 233–245
then x ¼ limt21 xðtÞ and x is necessarily an equilibrium point. In general, if A is an invariant set, then it attracts a bounded orbit gþ ðx0 Þ if and only if vðx0 Þ , A: If gþ ðx0 Þ is bounded and contained in some set A; and M is the largest compact invariant set contained in A (assuming such a set exists), then necessarily vðx0 Þ , M; which means that M attracts gþ ðx0 Þ (and any other bounded positive orbit remaining in A). 14. If A , Rn is an open set and V : A ! R is a C 1 function, the derivative of V along the solutions of Eq. (1) is defined as _ ¼ DVðyÞf ðyÞ ¼ 7VðyÞT f ðyÞ ¼ VðyÞ
n X ›V ðyÞfk ðyÞ: ›x k k¼1
It is also denoted by dV=dt: Under these hypotheses, V is _ # 0 for all a Liapunov function for Eq. (1) on A if VðyÞ _ y [ A: If VðyÞ , 0 for all y [ A which is not an equilibrium point of Eq. (1), then V is said to be a strict Liapunov function (on A). Observe that, in contrast with other definitions, neither sign nor boundary conditions on ›A are imposed on V itself. 15. If V is a Liapunov function for Eq. (1) in A; gþ ðx0 Þ is _ ¼ 0 for all y [ vðx0 Þ: bounded and gþ ðx0 Þ , A; then VðyÞ Moreover, if there exists a set M which is the largest _ ¼ 0}; compact invariant set contained in {y [ A : VðyÞ then dðxðt; x0 Þ; MÞ ! 0 as t ! 1 (LaSalle invariance principle). In particular, M consists of a single point _ ¼ 0} {x}; which means that the set {y [ Rn : VðyÞ contains no full orbits besides the equilibrium point x ; then all bounded trajectories which remain bounded and in A for t $ 0 and bounded away from ›A tend to x as t ! 1:
References Amann, H. (1990). Ordinary differential equations. Walter de Gruyter. Amari, S.-i. (1999). Natural gradient learning for over- and under-complete bases in ICA. Neural Computation, 11, 1875–1883. Auchmuty, G. (1989). Unconstrained variational principles for eigenvalues of real symmetric matrices. SIAM Journal of Mathematical Analysis, 20(5), 1186–1207. Bertsekas, D. (1996). Nonlinear programming. Athena Scientific. Berzal A., Zufiria P. J., (1998). Linearized Training of Hebbian Neural Networks: application in image processing, Engineering Benefits from Neural Networks, Systems Engineering Association, pp. 1-8, Gibraltar. Berzal J. A., Zufiria P. J., Rodrı´guez L (1996). Implementing the Karhunen–Loeve transform via improved neural networks, Solving
245
Engineering Problems with Neural Networks. International Conference on Engineering Applications of Neural Networks, EANN’96. pp. 375378, London 1996. Cohen, M. A., & Grossberg, S. (1983). Absolute stability of global pattern formation and parallel memory storage by competitive neural networks. IEEE Transactions on Systems, Man and Cybernetics, 13(5), 815–826. Coll, B., Gasull, A., & Prohens, R. (1997). Differential Equations defined by the sum of two quasihomogeneous vector fields. Canadian Journal of Mathematics, 49(2), 212–231. Diamantaras, K. I., & Kung, S. Y. (1996). Principal component neural networks. Theory and applications. New York: Wiley. Edelman, A., Arias, T. A., & Smith, S. T. (1998). The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis Applications, 20(2), 303–353. Fiori, S. (2001). A theory of learning by weight flow on Stiefel–Grassman Manifold. Neural Computation, 13(7), 1625–1647. Golub, G. H., & Van Loan, C. H. (1983). Matrix computations. North Oxford Academic: Johns Hopkins University Press. Haykin, S. (1994). Neural Networks. A comprehensive foundation. Macmillan Publishing Company. Helmke, U., & Moore, J. B. (1996). Optimization and dynamical systems. London: Springer. Hyva¨rinen, A., Karhunen, J., & Oja, E. (2001). Independent component analysis. New York: Wiley. Johnson, R. A., & Wichern, D. W. (1992). Applied multivariate statistical analysis. New Jersey: Prentice Hall. Kushner, H. J., & Yin, G. G. (1997). Stochastic approximation algorithms and applications. New York: Springer. Ljung, L. (1977). Analysis of recursive stochastic algorithms. IEEE Transactions on Automatic Control, AC-22, 551 –575. Oja, E. (1982). A simplified neuron model as a principal component analyzer. Journal of Mathematical Biology, 15, 267 –273. Plumbley, M. D. (1995). Lyapunov functions for convergence of principal component algorithms. Neural Networks, 8(1), 11 –23. Samardzija, N., & Waterland, R. L. (1991). A neural network for computing eigenvectors and eigenvalues. Biological Cybernetics, 65, 211–214. Sanger, T. D. (1989). Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Networks, 2, 459–473. Yan, W.-Y., Helmke, U., & Moore, J. B. (1994). Global analysis of Oja’s flow for neural networks. IEEE Transactions on Neural Networks, 5(5), 674– 683. Tolmos, P., Vegas, J. M., & Zufiria, P. J. (1999). Estabilidad de sistemas dina´micos, asociados a redes neuronales, disen˜ados para el ca´lculo de autovectores de operadores lineales y no lineales (II). XVI CEDYA/VI Congreso de Matema´tica Aplicada, pp. 1619–1626. Zhang, Q., & Bao, Z. (1995). Dynamical system for computing the eigenvectors associated with the largest eigenvalue of a positive definite matrix. IEEE Transactions on Neural Networks, 6(3), 790–791. Zhang, Q., & Leung, Y.-W. (1995). Energy function for one-unit Oja algorithm. IEEE Transactions on Neural Networks, 6(5), 1291–1293. Zufiria, P. J. (2002). On the discrete-time dynamics of the basic Hebbian neural network node. IEEE Transactions on Neural Networks, 13(6), 1342–1352.