J. Math. Anal. Appl. 286 (2003) 618–635 www.elsevier.com/locate/jmaa
Chaoticity generated by a learning model Fernanda Botelho ∗ and James E. Jamison Department of Mathematical Sciences, The University of Memphis, Memphis, TN 38152, USA Received 17 June 2002 Submitted by S. Heikkila
Abstract In this paper we consider a learning rule whose underlying space, possibly infinite dimensional, is equipped with an inner product. The rule proposed is a generalization of Oja’s maximum eigenfilter algorithm. We study its convergence properties and iterative behavior. We observe a whole variety of dynamical behaviors. We establish conditions on parameter values generating chaoticity as well as asymptotic convergence. 2003 Elsevier Inc. All rights reserved.
1. Introduction Learning is one of our brain’s attributes, crucial to environmental adaptation. In fact, the exposure to outside stimuli forces an internal adjustment and ultimately an educated performance. The development of mathematical models that reproduce simplified versions of real learning is of great interest since, as a result, specific brain functions may be executed with higher degree of efficiency. Biological observations have identified some basic principles upon which a model should rely, these are often referred to, in the literature, as the Hebbianian postulates of learning. In simplistic terms, we may say that, a brain’s paradigm might consist of a finite number of interconnected nodes where quantifiable information is collected. Information flowing through the connecting channels changes via a linear factor, denoted a connecting weight. At discrete times and in a synchronous or asynchronous way, this information is additively collected at the nodes. Spearheaded by Hebb’s work, several researchers have been proposing different models of learning (cf. [3,8,10,11]). Oja’s model is a discrete algorithm which * Corresponding author.
E-mail addresses:
[email protected] (F. Botelho),
[email protected] (J.E. Jamison). 0022-247X/$ – see front matter 2003 Elsevier Inc. All rights reserved. doi:10.1016/S0022-247X(03)00505-5
F. Botelho, J.E. Jamison / J. Math. Anal. Appl. 286 (2003) 618–635
619
defines, if convergence occurs, a vector of connecting weights from an initial assignment and outside stimuli. This model behaves as a maximum eigenfilter or principal component analyzer. In this paper, we consider a discrete algorithm acting on a Hilbert space. The algorithm proposed is a straightforward generalization of Oja’s model (cf. [10]) and depends on a compact, self-adjoint operator, denoted by C. This operator is directly related to the input set and determines a partition of the underlying Hilbert space into an orthogonal direct sum of finite-dimensional eigenspaces plus a possibly infinite-dimensional space, the kernel of C. Even though, we are in a more general setting, it can be asserted a stability identical to the one observed in Oja’s case; see [10] and [5]. Under some controllable assumptions, local stability occurs when the set of fixed points lies in the eigenspace attached to the highest eigenvalue; see Theorem 6.2. We can only assure this type of stability along directions transversal to the given eigenspace, therefore the algorithm studied may be classified as a principal component analyzer. Global learning seems to emerge as a superposition of various pseudo-learning stages occurring in finite-dimensional invariant subspaces. We start by determining the stationary connecting weight vectors and then study their stability properties. We study the dynamical behavior of our system restricted to each eigenspace. This approach was inspired by the beam forming effect, a type of directed learning task that allows us to focus on a detail embedded in a background of interferences. We found that, under parameter constraints, chaoticity may be observed. On the other hand, global convergence may also occur. The discussion of Oja’s rule, presented in [6], refers to the possibility of quasi-cyclic formation and fluctuation behaviors. These types of phenomena might, in fact, appear as shown in Proposition 5.2 and Theorem 5.1. Moreover, Theorem 5.1 shows that, under certain conditions, an initial weight assignment lying in some subregion of an eigenspace, evolves along a specified direction, toward a unit vector. In other words, this might suggest that a present pseudo-learning is strongly influenced by a past one.
2. Background and notation We start by describing an interpretation of Oja’s model (cf. [6]), a learning algorithm that aims to determine a vector of connecting weights from an initial choice of parameters and an initial input vector. We extend Oja’s settings by considering a Hilbert underlying space and a natural extension of the Oja’s learning rule. As presented in [6], let ξ be an n-dimensional random input vector following a joint probability distribution. The output value, denoted by V , is the outcome of the network’s action on ξ and is given by the summation V = nj=1 ωj ξj , where ωj represents the weight of the connection from the node j . The main goal is to define a converging algorithm that determines the network’s internal parameters ωj s. Given an initial parameter set ω, an n-vector, and an input vector ξ , we consider the expected parameter correction values to be defined by ∆ωi = V (ξi − V ωi ) .
620
F. Botelho, J.E. Jamison / J. Math. Anal. Appl. 286 (2003) 618–635
Substituting the expression for V we obtain ∆ωi =
n
ωj ξj ξi −
j =1
ωj ωk ξj ξk ωi =
j,k
n j =1
Cij ωj −
ωj Cj k ωk ωi .
j,k
The algorithm considered is given by ωnew = ωold + ∆ω. The algorithm is convergent if the expected correction value is zero. It follows that ∆ω = 0 iff Cω − (ωt Cω)ω = 0, where C is a matrix of expected values with the (i, j )-entry equal to ξi ξj , ω is a column vector of connecting weights, and ωt is the transpose of ω. Inspired by Oja’s mathematical model, we consider C to be a compact linear operator acting on a Hilbert space H, with inner product ·, ·. Our system is defined to be φ(ω) = ω + Cω − ω, Cωω. Our main goal is to determine the fixed points of φ and their stability behavior. The operator C determines an orthogonal splitting of H, H = ( i Hi ) ⊕ Ker(C). Let B be an orthonormal basis of eigenvectors of C. Since C is compact there are countably many nonzero eigenvalues each one associated with a finite-dimensional eigenspace, cf. [15]. We denote by Hi the eigenspace spanned by such eigenvectors, say {ω1i , . . . , ωki i } ⊂ B, and by . An element v, in H, λi the corresponding eigenvalue. Let Pi be the projection onto Hi is uniquely decomposed into the sum of its components, i.e., v = i Pi (v), and we also k have that Pi (v) = ji=1 v, ωji ωji . 3. Existence of stationary solutions In this section we determine all fixed points of φ(ω) = ω + Cω − ω, Cωω, where C is a linear operator. This is equivalent to solve the equation Cω − ω, Cωω = 0. Proposition 3.1. An element z ∈ H is a solution to the equation Cω − ω, Cωω = 0 if and only if z ∈ Ker(C) or z is a unit eigenvector of C. Proof. We first notice that vectors in the kernel of C and unit eigenvectors are, in fact, solutions of the given equation. Now, we consider z, not in Ker(C), such that Cz−z, Czz = 0. The inner products z, Cz − z, Czz = z, Cz − z, Czz2 = 0 and
Cz, Cz − z, Czz = Cz2 − z, Cz2 = 0
imply that z = 1 or z, Cz = 0, and Cz = |z, Cz|. Since z is not in Ker(C) then z = 1. It remains to show that z is an eigenvector. It follows from the Cauchy–Schwarz inequality, z = 1, and Cz = |z, Cz| the existence of a scalar λ, for which Cz = λz. ✷
F. Botelho, J.E. Jamison / J. Math. Anal. Appl. 286 (2003) 618–635
621
The following corollary is a consequence of Proposition 3.1. It characterizes the solution set of the equation considered. We represent by Si the unit sphere in Hi . Corollary 3.1. The solution set of Cω − ω, Cωω = 0 is a disjoint union of countably many connected components Ker(C) ∪ {Si }i . The example below describes the particular case of independent inputs. This is the same to say that the (i, j )-entry of C, ξi ξj = ξi ξj . This example will be studied in detail throughout this paper since it brings good insight to the general case and shows many of the dynamical features encountered. Example. Independent inputs. Consider C = ξ t ξ , where ξ = {ξj }j is a sequence in l2 . Assume C is defined in l2 , i.e., given ω ∈ l2 , Cω = ξ, ωξ . The operator C is compact since it is the limit of a sequence of finite ranked operators {Cn }, where Cn = ξ˜nt ξ˜n and ξ˜n = (ξ1 , ξ2 , . . . , ξn , 0, 0, . . .). In fact, we have that 2 (C − Cn )ω = ξ˜n 2 ξ − ξ˜n , ω − ω˜ n 2 + ξ − ξ˜n 2 ξ, ω2 2ξ ω ξ − ξ˜n , 2
2
2
2 consequently C −Cn 2ξ ξ − ξ˜n , and C −Cn converges to 0 as n grows to infinity. The operator C has two eigenvalues λ0 = 0 and λ1 = ξ 2 , the eigenspace associated with λ1 has dimension 1 and is spanned by ξ , Sp{ξ }. This way, we define an orthogonal splitting of l2 , l2 = Sp{ξ } ⊕ Ker(C). The solution set of the equation Cω − ω, Cωω = 0 is Ker(C) ∪ {±ξ/ξ }.
4. Independent inputs In this section we consider the particular case of independent inputs, as described in the previous example. The main goal is to characterize the overall dynamics of φ. The structure of the operator C allows us to reduce the problem to a one-parameter family of real-valued functions. We start by reviewing some definitions and set appropriate notation. Then, we describe the one-dimensional reduction and collect relevant dynamical properties of the family introduced. Finally, we transcribe the results to the map φ. These include the identification of those regions in the underlying space where convergence as well as chaotic behavior occur. The input independence is equivalent to say that C is self-adjoint and has rank one. The operator C is therefore given by a product ξ t ξ , where ξ = {ξj }j is a sequence in l2 (or R n ). The kernel of C is the hyperplane {v ∈ l2 : v, ξ = 0} and l2 admits an orthogonal splitting determined by C, l2 = Sp{ξ } ⊕ Ker(C). We consider v, a fixed point of φ(ω) = ω + Cω − ω, Cωω, not in Ker(C). We denote by Pξ and PK the standard projections onto the spaces Sp{ξ } and Ker(C), respectively. Therefore, v has the decomposition v = vξ + vK , where vξ = Pξ (v) and vK = PK (v). We also have that φ(v) = (1 +ξ 2 −ξ 2 vξ 2 )vξ + (1 − ξ 2 vξ 2 )vK .
622
F. Botelho, J.E. Jamison / J. Math. Anal. Appl. 286 (2003) 618–635
We review some concepts in dynamical systems concerning the stability behavior of invariant sets (cf. [7,14]). Definition 4.1. Suppose A is an invariant set under a map f , defined in a metric space, (X, d). The set A is said to be locally attracting or stable if every neighborhood U of A contains U0 , an open neighborhood of A, such that the orbit of every point in U0 \ A (a point in U0 but not in A), lies entirely in U . The set A is said to be locally repelling or unstable if there exists an open neighborhood U of A such that the orbit of every point in U \ A is not contained in U . An invariant set that is neither attracting nor repelling is said to be of saddle-type. The stable set of an invariant space A is the set of all points that evolve toward A under f iterations. In other words,
W s (A) = x ∈ X | d f n (x), A → 0 as n increases to infinity . Similarly, we define the unstable set of A, W u (A) to be the set of all points x in X for which there exists a sequence {an }∞ n=1 such that f (an ) = an−1 , f (a1 ) = x, and d(an , A) → 0, as n increases to infinity. If A reduces to a single point then we refer to A as being an attracting, repelling, or saddle equilibrium point. 4.1. One-dimensional reduction Here we reduce the dynamics of φ to the dynamics of one-parameter real-valued family of functions fλ . These functions are obtained by composing the norm function with the projection of φ onto Sp{ξ }. The next lemma asserts the reduction. Lemma 4.1. For every n, Pξ φ n (v) = φ n (vξ ). Proof. We recall that φ(v) = (1 + ξ 2 − ξ 2 vξ 2 )vξ + (1 − ξ 2 vξ 2 )vK , φ(vξ ) = (1 + ξ 2 − ξ 2 vξ 2 )vξ , and PK (φ(v)) = (1 − ξ 2 vξ 2 )vK . In particular, this implies the statement for n = 1. Now proceeding by induction, we have that φ n (v) = φ n (vξ ) +
n−1
2
1 − ξ 2 φ j (vξ ) vK ,
j =0
where φ n (vξ ) = (1 + ξ 2 − ξ 2 φ n−1 (vξ )2 )φ n−1 (vξ ).
✷
The previous lemma also implies that Pξ (φ(v)) = |1 + ξ 2 − ξ 2 vξ 2 |vξ . We simplify notation by setting λ = ξ 2 , x = vξ , and fλ (x) = |1 + λ − λx 2 |x. The dynamical behavior of fλ reflects the behavior of the iterates of φ. Indeed, fλn (x) is equal to the distance of φ n (vξ ) from the hyperplane Ker(C). More precisely, we have that φ(vξ ) = Pξ (φ(v)) = sign(1 + λ − λvξ 2 )fλ (vξ )(vξ /vξ ). The dynamics of fλ is fairly standard. As the parameter λ increases we observe a period doubling phenomena and a route to chaos; see Fig. 1 and cf. [1].
F. Botelho, J.E. Jamison / J. Math. Anal. Appl. 286 (2003) 618–635
623
Fig. 1. Graphs of fλ for different parameter values.
Just summarizing the relevant features of the dynamics √ encountered in the family {fλ }λ we say that, for each λ, f has three fixed points, 0, 1, and (2 + λ)/λ. Two of those fixed √ λ derivatives are 1 + λ points, namely 0 and (2 + λ)/λ, are repelling, since their respective √ and 5 + 2λ. We also observe that fλ has a critical point at x = (1 + λ)/(3λ) and 1+λ 2 (1 + λ)3 fλ = 3λ 3 3λ is a relative maximum value. The fixed point 1 is attracting if and only if λ < 1. In the neighborhood of λ = x = 1, the family fλ goes through a flip bifurcation and consequently a period two orbit emerges. As λ increases, we observe a cascade period doubling phenomena. Periodic points appear, with periods following the Sarkovski’s ordering, cf. [12]. Figure 2 represents the bifurcation diagram of {fλ }λ and the approximate bifurcation values of the parameter λ. We review in Definition 4.2 a straightforward generalization of chaos, first introduced by Li and Yorke (cf. [2,4]). Definition 4.2. A transformation φ acting on a Hilbert space H is said to be chaotic in the sense of Li and Yorke if there exists an uncountable subset of H, S, satisfying the following two properties: (1) For every k = 1, 2, . . . there is a periodic point in H of period k. (2) For every x and y in S with x = y and z a periodic point lim infφ n (x) − φ n (y) = 0, lim supφ n (x) − φ n (y) > 0, n→∞
and
n→∞
lim supφ n (x) − φ n (z) > 0. n→∞
Such a set S is referred to as a scrambled set.
624
F. Botelho, J.E. Jamison / J. Math. Anal. Appl. 286 (2003) 618–635
Fig. 2. The bifurcation diagram for fλ with λ ∈ [0, 2].
We state a theorem, proved in [2], to be applied in forthcoming results. Theorem 4.1. A continuous interval map with a periodic point of period 3 is chaotic. √ Corollary 4.1. If λ 3 3/2 − 1 then fλ is chaotic. √ √ √ Proof. If λ 3 3/2 − 1 then fλ ([0, (λ + 1)/λ ]) ⊇ [0, (λ + 1)/λ ]. Therefore fλ has a periodic point of period 3 and the statement follows from Theorem 4.1. ✷ 4.2. Asymptotic behavior Based upon the dynamics of the family fλ we study the long term behavior of orbits under φ. We identify those regions in the Hilbert space H (l2 or R n ) where φ-orbits become unbounded, convergent, or present an erratic trace. We recall notation previously set that will be used in the following results: λ = ξ 2 , x = vξ , vξ (and vK ) is the projection of v onto Sp{ξ } (and Ker(C), respectively), φ(v) = (1 + ξ 2 − ξ 2 vξ 2 )vξ + (1 − ξ 2 vξ 2 )vK , and fλ (x) = |1 + λ − λx 2 |x. In addition, if v ∈ Sp{ξ } then φ(v) = sign(1 + λ − λv2 )fλ (v)(v/v). Theorem 4.2. If v is so that vξ
√ (λ + 2)/λ then {φ n (v)}n is unbounded.
Proof. First, we observe that φ leaves invariant eigenspace Sp{ξ }. If vξ > the derivative fλ (vξ ) > 5 which implies that
λ+2 λ+2 fλ vξ − > 5 vξ − λ λ
√ (λ + 2)/λ,
F. Botelho, J.E. Jamison / J. Math. Anal. Appl. 286 (2003) 618–635
625
Fig. 3. Two orbits converging to a fixed point with λ = 0.3 and 0.7, respectively.
and
n λ + 2 ξ n > 5 vξ − λ + 2 ξ . φ (vξ ) − λ ξ λ ξ √ Furthermore, if vξ = (λ + 2)/λ then {φ n (vξ )}n is periodic of period 2 and the projection of each φ n (v) onto Ker(C), is given by n−1
j =0
n−1
2 1 − ξ 2 φ j (vξ ) vK = (−1 − λ)vK . j =0
Then the norm of this projection is unbounded as n grows toward infinity. Therefore this shows that, in both cases, φ n (v) becomes unbounded as n increases. ✷ The following theorem establishes sufficient conditions for stability of the stationary solutions ±ξ/ξ . √ Theorem 4.3. If λ < 1, vξ < (λ + 1)/λ, and v, ξ > 0 (or < 0) then {φ n (v)}n converges to ξ/ξ (or −ξ/ξ , respectively). √ Proof. If λ <√1, fλ leaves invariant the interval [0, (λ + 1)/λ ]. For every x ∈ √ ( (λ + 1)/λ, (λ + 2)/λ ) the derivative of fλ satisfies the√inequality fλ (x) > 2. This interval [0, (λ + 1)/λ ]. implies that some iteration of x under fλ is in the √ Next, we√consider x in the open interval (0, (λ + 1)/λ ), and λ 1/2 (or equivalently 1 (1 + λ)/(3λ) ). Under these assumptions and if, in addition, x 1, the sequence {fλn (x)} is increasing and convergent to the only possible attracting fixed point, √ the point 1 (cf. Fig. 3). The√same is true for√x (−λ + λ2 + 4λ )/(2λ), since the 2 image of the interval [(−λ √ + λ + 4λ )/(2λ), (λ + 1 )/λ ] is the interval [0, 1]. More2 over, for x ∈ [1, (−λ + λ + 4λ )/(2λ)], the derivative of fλ satisfies the inequality, |fλ (x)| max{1 − 2λ, 1/2} < 1. This implies that the sequence {fλn (x)} converges to 1. √ On the other hand, √ if x is in the open interval (0, (λ + 1)/(λ)n),√and λ > 1/2 (or equivalently 1 > (1 + λ)/(3λ) ), the orbit of the critical point {fλ ( (λ + 1)/(3λ) )}n converges to 1, since fλ has negative Schwartzian derivative (see [13]). √ In fact, the , when restricted to each open interval (0, (λ + 1)/(3λ) ) Schwartzian derivative of f λ √ √ or ( (λ + 1)/(3λ), (λ + 1)/(λ) ), is given by 2 fλ (x) 3 fλ (x) 2 −6λ 3 −6λx − = − fλ (x) 2 fλ (x) 1 + λ − 3λx 2 2 1 + λ − 3λx 2
626
F. Botelho, J.E. Jamison / J. Math. Anal. Appl. 286 (2003) 618–635
=
−6λ(1 + λ − 3λx 2 ) − 3(18)λ2 x 2 −6λ(1 + λ) − 36λ2 x 2 = < 0. 2(1 + λ − 3λx 2 )2 (1 + λ − 3λx 2 )2
Arguments from kneading theory, relying upon the ordering of√orbits’ itineraries, allow us to conclude that the orbit of any point in the open interval (0, x(λ +√1)/λ ) converges to 1 (see [9] or [1]). This analysis implies that {φ n (vξ )}n , with vξ < (λ + 1)/λ, converges to 1 and also that the sequence {φ n (vξ )}n converges to either ξ/ξ or to −ξ/ξ , depending whether vξ is in the positive or the negative hyperspace determined by Ker(C), i.e., {v ∈ H: v, ξ > 0} or {v ∈ H: v, ξ < 0}, respectively. To complete the proof we need to control the projection of φ n (v) onto the Ker(C). Since the sequence φ n (vξ ) converges to 1, then 1 − ξ 2 φ n (vξ )2 converges to 1 − ξ 2 = 1 − λ, and then there exists j0 after which 1 − ξ 2 φ n (vξ )2 < 1 − λ/2. This implies that, for n > j0 , n−1 0 −1
2 j 2 λ n−j0 2 j 2 j 1 − ξ φ (vξ ) 1 − ξ φ (vξ ) . 1− 2 j =0
j =0
This sequence converges to zero and the sequence {φ n (v)} is asymptotic to Sp{ξ }.
✷
Remark. We also notice that (1) There exists an increasing sequence {an }n such that λ+1 λ+2 a1 = < a2 < · · · < an < · · · → , λ λ fλ (ai ) = ai−1 , and if vξ = ai then φ i (v) ∈ Ker(C). If v has norm ai then fλi (v) = 0 and therefore φ i (v) ∈ Ker(C). (2) If ai < vξ < ai+1 then {φ n (vξ )}n converges to (−1)i (ξ/ξ ). √ Theorem 4.4. If λ 3 3/2 − 1 then φ is chaotic in the sense of Li and Yorke. Proof. Corollary 4.1 implies that fλ is chaotic. For every positive integer√ there exists a periodic orbit of that given period, entirely contained in the interval [0, (λ + 1)/λ ]. Moreover, √ there exists a scrambled set S, such that every orbit of a point in S lies entirely in [0, (λ + 1)/λ ]. If a is a periodic point of fλ then aξ/ξ is a periodic point of φ, and Sφ = {aξ/ξ : a ∈ S} is a scrambled set of φ. Given v0 and v1 in Sp{ξ }
φ(v0 ) − φ(v1 ) = fλ v0 − fλ v1 , which proves the statement.
✷
5. The general case In this section we study the dynamical behavior of φ, considering C a compact, self-adjoint, and nonnegative operator defined on a Hilbert space H. The compactness implies that the spectrum of C, σ (C) ⊆ {λ1 , λ2 , . . . , λn , . . .} ∪ {0} and each eigenspace
F. Botelho, J.E. Jamison / J. Math. Anal. Appl. 286 (2003) 618–635
627
Ker(C − λi I ) is finite dimensional. The self-adjointness implies that {Ker(C − λI )λ∈σ (C)} is an orthogonal family of closed subspaces. The operator C determines a splitting H into invariant subspaces H= Ker(C − λI ) ⊕ Ker(C). λ∈σ (C)
The direct sum above has countably many nontrivial terms. We denote by λ1 , λ2 , . . . , λn , . . . all nonzero eigenvalues of C, listed in decreasing order, and Hi = Ker(C − λi I ) represents the eigenspace associated with eigenvalue λi . In addition, we have that {λi }i is a decreasing sequence, either finite or convergent to 0. We recall the transformation φ : H → H,
ω → ω + Cω − ω, Cωω.
Lemma 5.1. The Ker(C) and each Hi are invariant subspaces under φ. Proof. If ω ∈ Ker(C) then φ(ω) = ω. If ω ∈ Hi then Cω = λi ω. Therefore
✷ φ(ω) = ω + λi ω − λi ω2 ω = 1 + λi − λi ω2 ω ∈ Hi . Remark. We remark that φ(ω) is a real scalar multiple of ω, if ω ∈ Hi , for some i. Lemma 5.1 leads to a natural representation of a φ-iterate of an element in H, as presented in the proposition below. An element ω ∈ H is written as ω = ωK + ωO , where ωK and ωO are the projections of ω onto the Ker(C) and H , i i respectively. Moreover, ωO has a unique representation ∞ ω with ω ∈ H . i i i=1 i Proposition 5.1. For each positive integer n, n−1
1 − φ j (ωO ), Cφ j (ωO ) ωK + φ n (ωO ). φ n (ω) = j =0
Proof. The proof follows from an induction argument and the fact that
φ(ω) = 1 − ωO , CωO ωK + ωO + CωO − ωO , CωO ωO = a0 ωK + φ(ωO ). ✷ 5.1. Reduction method Our main goal is to understand the long term dynamical behavior of the φ-orbits . For that, we reduce φ, via a quotient map h, to an infinite parameter map on l2+ , the nonnegative cone of l2 , i.e., the set of all 2-summable sequences with nonnegative entries. The maps h and F are given by h : H → l2+ , ω → ωi i ,
628
F. Botelho, J.E. Jamison / J. Math. Anal. Appl. 286 (2003) 618–635
and F
: l2+
→ l2+ ,
∞ 2 {xi }i → 1 + λi − λj xj xi . j =1
i
∞
We denote by fλi ({xi }) = |1 + λi − i=1 λi xi2 |xi , for future reference. Both maps, ∞ 2 2 F and are well-defined since ω2 = ωK 2 + ∞ i=1 ωi i=1 ωi and {|1 + h, ∞ 2 λi − i=1 λi xi |}i is bounded. It is a straightforward calculation, presented in the proof of Lemma 5.2 below, to check the commutativity of the following diagram: H
φ H
h
h
l2+
F l2+
We say that φ induces a natural transformation on l2+ , via the quotient map h, whose dynamics captures important features of the dynamics of φ. Lemma 5.2. For each positive integer n and ω ∈ H, h(φ n (ω)) = F n (h(ω)). Proof. First, we check commutativity for n = 1, ∞
h φ(ω) = h φ(ωK + ωi ) = h ωK − ωi , λi ωi ωk + φ ωi i
i=1
i
∞
2 λk ωk ωi = F ωi i = F h(ω) . = 1 + λi − k=1
i
i
Second, given a bounded sequence of real numbers {αi }i , we have that ∞ h φ(ωK + αi ωi ) = h λk αk ωk 2 αi ωi 1 + λi − i=1
i
k
∞ 2 = 1 + λi − λk αk ωk αi ωi k=1 i ∞
= F αi ωi i = F h ωK + αi ωi .
i=1
Finally, since φ(ω) = α0 ωK + i αi ωi , where αi = 1 + λi − k λk ωk 2 , the previous calculations show that
hφ 2 (ω) = hφ φ(ω) = F h φ(ω) = F F h(ω) = F 2 h(ω). The general statement follows by induction. ✷
F. Botelho, J.E. Jamison / J. Math. Anal. Appl. 286 (2003) 618–635
629
5.2. Asymptotic behavior We reduced the dynamical analysis of φ to the map F . Here, we start by stating dynamical properties of φ that remain unchanged under the filtering action of h. Next, we identify regions in the underlying Hilbert space where orbits are convergent, oscillatory, or chaotic. We denote by Per(φ) and Per(F ) the set of all the periodic points of φ and F , respectively. We also represent by φj the restriction of φ to the eigenspace Hj and by fλj the real-valued function defined on the nonnegative reals, given by fλj (x) = |1 + λj − λj x 2 |x. Proposition 5.2. (1) h(Per(φ)) = Per(F ). (2) fλj is chaotic in the sense of Li and Yorke if and only if φj is. (3) If fλj has an attracting fixed point then φj has an attracting sphere of fixed points. Proof. Lemma 5.2 implies that h(Per(φ)) ⊆ Per(F ). Let x = {xi }i ∈ l2+ be a point so that F n0 (x) = x for some positive integer n0 . The axiom of choice allows us to select v such that h(v) = x. Lemma 5.1 implies that Pj φ(v) ∈ Sp{vj }, for each j , where v is represented by i vi , vi ∈ Hi , and vi = xi . Therefore, Pj φ n0 (v) is either vj or −vj and v is a periodic point of φ, as claimed in the first statement. To prove the second statement we start by observing that a given period of fλj , say k, is always attained by some orbit contained in the interval [0, (1 + λj )/λj ]. This implies that φj has a periodic orbit of period k, the φj-orbit of a lift (via h) of any element of a k-periodic orbit of fλj contained in [0, (1 + λj )/λj ]. All pe riods of fλj are attainable by orbits fully contained in [0, (1 + λj )/λj ] is clearly true if fλj ( (1 + λj )/(3λj ) ) (1 + λj )/λj , since [0, (1 + λj )/λj ] is fλj -invariant. If fλj ( (1 + λj )/(3λj ) ) > (1 + λj )/λj then there exists an invariant Cantor set in [0, (1 + λj )/λj ], restricted to which, fλj is topologically conjugate to a left shift defined on the set of all sequences consisting of zeroes and ones; see [1]. Such sequences describe the itineraries of all possible orbits of fλj . Therefore, fλj restricted to that Cantor set has periodic points of all periods and the first statement in Definition 4.2 holds. It remains to show that the existence of a scrambled set is preserved. This follows from the identities
n
φ (v) − φ n (ω) = hφ n (v) − hφ n (ω) = f n h(v) − f n h(ω) , λj λj j j j j where v, ω ∈ Hj . Moreover, the only possible attracting fixed point of fλj is the point 1, whose lift to the eigenspace Hj is a unit sphere of fixed points under φj . The last statement in the theorem is a consequence of Lemma 5.3 stated below. ✷ Let gλj (x) = (1 + λj − λj x 2 )x; we notice that gλj is an odd function and gλj (x) = sign(1 + λj − λj x 2 )fλj (|x|). In addition, we also have that n
2
sign 1 + λj − λj fλij v fλnj v , gλnj v = i=1
(1)
630
F. Botelho, J.E. Jamison / J. Math. Anal. Appl. 286 (2003) 618–635
where v ∈ Hj . In particular, this implies that if the orbit of v under fλj is contained in the interval [0, (1 + λj )/λj ] then gλnj (v) = fλnj (v) for every n. Lemma 5.3. For each v ∈ Hj and n a positive integer,
v . φjn (v) = gλnj v v If 1 is an attracting fixed point of fλj and v lies in its stable set then
n φ (v) − v = f n v − 1. λj j v Proof. To prove the first statement we use induction. For n = 1, we have that
v φj (v) = gλj v . v Now, we assume that
v φjn (v) = gλnj v . v Then
φjn (v) φjn+1 (v) = φj φλnj (v) = gλj gλnj v φjn (v) n
gλj (v)v/v
v = gλn+1 . = sign gλnj v gλn+1 v v n j j |gλj (v)| v If v lies in the basin of attraction of 1, an attracting fixed point of fλj , then the orbit of v under fλj is contained in the interval [0, (1 + λj )/λj ]. This implies that gλnj (v) = fλnj (v) and therefore φjn (v) − v/v = |fλnj (v) − 1|. ✷ The next theorem establishes conditions on the eigenvalues of C that guarantee chaotic and stable behaviors. √ Theorem 5.1. (1) If there exists λj such that λj 3 3/2 − 1 then φ is chaotic in the sense of Li and Yorke. (2) If C has norm less than 1 then, for every j and v ∈ Hj , {φjn (v)}n is convergent, periodic of period two, or divergent whenever v < (λj + 2)/λj , v = (λj + 2)/λj , or v > (λj + 2)/λj , respectively. Proof. The first statement follows from Theorem 4.3, Proposition 5.2, and Definition 4.2. The hypothesis in the second statement assures that every eigenvalue is less than 1, therefore computations already used in the proof of Theorem 4.3 imply that 1 is the only
F. Botelho, J.E. Jamison / J. Math. Anal. Appl. 286 (2003) 618–635
631
attracting fixed point of fλj for every j . In addition, we also have that the unique critical point of fλj , x = (1 + λj )/(3λj ), satisfies the following: 1 + λj λj + 1 fλj < 3λj λj and
fλj
1 + λj 3λj
=1
if λj = 12 ,
>1
if λj = 12 .
If v ∈ Hj is nonzero and has norm less that (1 + λj )/λj then its orbit, under φj , converges to the positive scalar multiple of v that belongs to the unit sphere Sj . If v = O or has norm equal to (1 + λj )/λj then φ(v) = O. There is an increasing sequence of numbers {ri }i between (1 + λj )/λj and (2 + λj )/λj , converging to (2 + λj )/λj , such that n for every v with norm equal to some ri there exists ni for which φj i (v) = O. The orbit of v whose norm satisfies ri < v < ri+1 converges to (−1)i+1 v/v, this is a consequence of Lemma 5.3. If v has norm greater than (2 + λj )/λj then {φjn (v)}n has an oscillatory behavior with corresponding norms approaching ∞. ✷ Remark. The eigenspace Hj is partitioned into a countable set of regions bounded by spheres with increasing radii, 1 + λj 2 + λj < r1 < r2 < · · · < rn < · · · → r∞ = . r0 = λj λj Some φj -iterate of v, with norm ri (i = 0, 1, . . .), is equal to O. If v has norm r∞ then the orbit of v is periodic of period two, oscillating between v/v and −v/v. If norm of v is either less than r0 or greater than r∞ then the orbit {φjn (v)}n converges to v/v or diverges, respectively. The orbit of some v, with norm between ri and ri+1 , converges to either v/v or −v/v, depending on i being even or odd. This occurs because some terms in the product ni=1 sign(1 + λj − λj fλij (v)2 ), incorporated in Eq. (1), are negative.
6. Stability analysis Here we investigate the stability behavior of the algorithm introduced in this paper. The algorithm is represented by the action of a nonlinear transformation φ(ω) = ω + Cω − ω, Cωω, where C is a nonnegative, compact self-adjoint operator defined on a Hilbert space H. We follow similar techniques presented in [6] to show that if C < 1, or equivalently all the eigenvalues of C are less than 1, then the unit sphere contained in the eigenspace associated to the largest eigenvalue is the only stable set relative to perturbations along orthogonal directions. The remaining eigenspaces, different from the Ker(C), are of saddle-type with a finite-dimensional unstable direction. Therefore, we may conclude that this algorithm presents a principal component filtering property.
632
F. Botelho, J.E. Jamison / J. Math. Anal. Appl. 286 (2003) 618–635
We denote by λ1 , . . . , λn , . . . the nonzero eigenvalues of C, listed in decreasing order, and Hi represents the eigenspace associated with λi . The derivative of φ at some generic point ω in Hi along an orthogonal direction v is given by
Dφ(ω)v = Id + C − λi ω2 Id v. This is a linear transformation with eigenvalues 1 + λj − λi ω2 for j = i and 1 − λi ω2 , if Ker(C) is nontrivial. Next, the following results state the stability of the different eigenspaces and properties thereof. Proposition 6.1. Ker(C) is locally unstable relatively to φ. Proof. An element v in H admits the decomposition vK +
∞
vi
i=1
with vi ∈ Hi and vK ∈ Ker(C). The φ-invariance of each Hi and the induction method imply that 2 ∞ ∞ n−1 2 n−1 2 n 2 φ (v) λj φj (v) φi (v) , 1 + λi − j =1
i=1
where φj denotes the composition of φ with the projection onto Hj . We choose a neighborhood W of Ker(C) as follows: ∞ 1 2 W = ω ∈ H: ωj . 2 j =1
If v ∈ W is such that v1 = v2 = · · · = vk−1 = 0 but vk = 0 then 2 ∞ ∞ 2 2 φ(v) λj vj vi 2 1 + λi − i=k j =k 2 ∞ λk 2 2 1 + λk − λj vj vk 2 1 + vk 2 . 2 j =k
Therefore φ(v) (1 + λk /2)vk . In general, we may say that, if φ s (v) ∈ W, for s = 0, . . . , n − 1, then 2 n−1 ∞
n 2 s 2 λk n φ (v) λj φj (v) vk 2 1 + vk 2 . 1 + λk − 2 s=0
j =1
These considerations imply that the orbit of any v ∈ W \ Ker(C) cannot lie entirely in W since the sequence (1 + λk /2)n converges to infinity as n increases. ✷
F. Botelho, J.E. Jamison / J. Math. Anal. Appl. 286 (2003) 618–635
633
Remark. Given an eigenspace Hi , the space Hi ⊕ W, W as defined in the proof above, is a neighborhood of Hi . For every neighborhood U of O and v ∈ U ∩ W but not in Ker(C), the orbit {φ n (v)}n does not lie entirely in W, as shown in the previous proof. On the other hand, if v ∈ Ker(C) we have that φ n (v) = v. Therefore Hi is neither locally stable nor unstable. The previous proposition asserts the stability behavior of a large set of fixed points, namely those in the kernel of C. It remains to establish the behavior of those fixed points that lie in unit spheres, each one of them contained in some eigenspace of C. The Hartman–Grobman theorem is a classical result in dynamical systems that establishes sufficient conditions for the existence of a local conjugacy between a given differentiable map and its linear approximation, around an isolated fixed point. This theorem is a cornerstone in the local stability analysis of fixed points or stationary solutions. Before stating the Hartman–Grobman theorem, we review the definition of hyperbolic endomorphism, cf. [14]. Definition 6.1. T : E → E, a bounded linear map on a Banach space E, is said to be hyperbolic if there exist a decomposition of E into a direct sum of two invariant subspaces, E = E1 ⊕ E2 , and two positive constants c and λ < 1 such that: (1) The restriction of T to E1 , T1 , is an expansion, i.e., for all n 0, T1n cλ−n . (2) The restriction of T to E2 , T2 , is a contraction, i.e., for all n 0, T2n cλn . Theorem 6.1 (Hartman–Grobman [14]). A small Lipschitzian perturbation of a hyperbolic operator, L : E → E, defined on a Banach space E, is topologically conjugate to L. More precisely, there exists 2 > 0 such that for every φ, a continuous and bounded Lipschitz map with Lipschitz constant less than 2, there exists a homeomorphism H for which L + φ = H LH −1 . As already mentioned, we want to study the stability of those fixed points of φ, not in the kernel of C. Each one of these fixed points lie in some unit sphere Si , contained in the eigenspace Hi . Clearly, only for those eigenvalues with multiplicity one, we encounter isolated fixed points. The Hartman–Grobman theorem will allow us to decide the local stability of each fixed point along the orthogonal space to the eigenspace where the given fixed point lies. Let u ∈ Si ⊂ Hi and Nu , a neighborhood of u. Without loss of generality, we assume that there exists some positive 2 for which Nu = {u + v: v < 2}. Since φ is enough differentiable, Taylor’s expansion applies, and we have that φ(u + v) = u + Dφ(u)v + higher order terms. The derivative of φ at u, when restricted to Hi⊥ , is given by Dφ(u) : Hi⊥ → Hi⊥ ,
v → (Id + C − λi Id)v.
We define the map ψ on the space Hi⊥ as ψ(v) = φ(u + v) − u. If the neighborhood Nu is sufficiently small, we may say that ψ is a small Lipschitz perturbation of Dφ(u) with
634
F. Botelho, J.E. Jamison / J. Math. Anal. Appl. 286 (2003) 618–635
O an isolated fixed point of ψ. Therefore, if the Hartman–Grobman theorem applies, we have that ψ is locally topologically conjugate to Dφ(u). The linear transformation Dφ(u) has eigenvalues equal to 1 + λj − λi (for i = j ) and 1 − λi , if Ker(C) is nontrivial. Theorem 6.2. If C < 1 and u ∈ S1 , u is locally stable along H1⊥ . Proof. We notice that Dφ(u), while restricted to H1⊥ is hyperbolic. In fact, its eigenvalues are equal to 1 + λj − λ1 (for j = 1) and 1 − λ1 , if Ker(C) is nontrivial. Therefore, they are all positive and less than 1. The statement in the theorem follows from the Hartman– Grobman theorem. ✷ Remark. We observe that, under the assumption C < 1 and given u in Si , with i = 1, u is a saddle type fixed point. The derivative Dφ(u) along Hi⊥ has eigenvalues both less and greater than one. In fact, there are only i − 1 eigenvalues that are greater than one.
7. Conclusions In this paper we have analyzed convergence properties of Oja’s learning rule, now acting on a Hilbert space. Though this rule is defined with a single output node, this approach might bring insight into a multi-valued output setting. We assume that the input set of values determines a compact, nonnegative and selfadjoint operator C acting on a Hilbert space. This operator defines an algorithm, which is a straightforward generalization of Oja’s model, φ(ω) = ω + Cω − ω, Cωω. The conditions above imply that C has at most countably many nonzero eigenvalues with zero the only possible accumulation point. Spectral theory asserts that H can be split into a direct sum of countably many φ-invariant subspaces. Parameter regions that assure stability and chaoticity of φ are characterized in Theorems 4.3, 4.4, and 5.1. Following these results we may say that learning occurs in many (inputs)-to-one (output) independent clusters of nodes where some pseudo learning occurs. Moreover, except for possibly finitely many such clusters, Theorem 5.1 also implies that learning always happens within a certain region of initial data. If within desirable constraints initial data evolves to some stable point and learning takes place, Theorem 6.2 shows how Oja’s algorithm stabilizes to some fixed point lying in a sphere of possible values. This fact captures some dependence between present and precedent learning. Chaotic behavior, as previously mentioned, may only take place on finitely many eigenspaces. In this situation, the algorithm does not converge and a natural set of connecting weights cannot be selected. Nevertheless, as the eigenvalues decrease toward zero, convergence occurs in larger and larger regions allowing a natural set of connecting weights to emerge. Theorem 5.1 and the subsequent remark identify initial data which evolves towards an oscillatory behavior, this might represent a bi-stability type of phenomena.
F. Botelho, J.E. Jamison / J. Math. Anal. Appl. 286 (2003) 618–635
635
References [1] R. Devaney, Introduction to Chaotic Dynamical Systems, Benjamin–Cummings, 1986. [2] T.Y. Li, J. Yorke, Period three implies chaos, Amer. Math. Monthly 82 (1975) 985–992. [3] K. Kingsley, P. Adams, Formation of new connections by a generalisation of Hebbian learning, 2001, preprint. [4] P.E. Kloeden, Chaotic difference equations in R n , Austral. Math. Soc. Ser. A 31 (1981) 217–225. [5] S. Haykin, Neural Networks: A Comprehensive Foundation, Macmillan, 1994. [6] J. Hertz, A. Krogh, R. Palmer, Introduction to the Theory of Neural Computation, Addison–Wesley, 1991. [7] M. Hirsch, S. Smale, Differential Equations, Dynamical Systems, and Linear Algebra, Academic Press, 1974. [8] J.J. Hopfield, Learning algorithms and probability distributions in feed-forward and feed-back networks, Proc. Nat. Acad. Sci. USA 84 (1982) 8429–8433. [9] J. Milnor, W. Thurston, On iterated maps of the interval, Princeton, 1976, preprint. [10] E. Oja, Principal components, minor components, and linear neural networks, Neural Networks 5 (1992) 927–936. [11] E. Oja, A simplified neuron model as a principal component analyzer, J. Math. Biol. 15 (1982) 267–273. [12] A. Sarkovski, Coexistence of cycles of a continuous map of a line into itself, Ukraïn. Mat. Zh. 16 (1964) 61–71. [13] D. Singer, Stable orbits and bifurcation of maps of the interval, SIAM J. Appl. Math. 35 (1978) 260–267. [14] M. Shub, Global Stability of Dynamical Systems, Springer-Verlag, 1987. [15] J. Weidmann, Linear Operators in Hilbert Spaces, in: Graduate Texts in Mathematics, Springer-Verlag, 1980.