Universal learning network and its application to chaos control

Universal learning network and its application to chaos control

NN 1385 Neural Networks PERGAMON Neural Networks 13 (2000) 239–253 www.elsevier.com/locate/neunet Contributed article Universal learning network a...

321KB Sizes 0 Downloads 80 Views

NN 1385

Neural Networks PERGAMON

Neural Networks 13 (2000) 239–253 www.elsevier.com/locate/neunet

Contributed article

Universal learning network and its application to chaos control Kotaro Hirasawa*, Xiaofeng Wang, Junichi Murata, Jinglu Hu, Chunzhi Jin Department of Electrical and Electronic Systems Engineering, Graduate School of Information Science and Electrical Engineering, Kyushu University, 6-10-1 Hakozaki, Higashiku, Fukuoka 812-8581, Japan Received 10 September 1998; received in revised form 29 September 1999; accepted 29 September 1999

Abstract Universal Learning Networks (ULNs) are proposed and their application to chaos control is discussed. ULNs provide a generalized framework to model and control complex systems. They consist of a number of inter-connected nodes where the nodes may have any continuously differentiable nonlinear functions in them and each pair of nodes can be connected by multiple branches with arbitrary time delays. Therefore, physical systems, which can be described by differential or difference equations and also their controllers, can be modeled in a unified way, and so ULNs may form a super set of neural networks and fuzzy neural networks. In order to optimize the ULNs, a generalized learning algorithm is derived, in which both the first order derivatives (gradients) and the higher order derivatives are incorporated. The derivatives are calculated by using forward or backward propagation schemes. These algorithms for calculating the derivatives are extended versions of Back Propagation Through Time (BPTT) and Real Time Recurrent Learning (RTRL) of Williams in the sense that generalized node functions, generalized network connections with multi-branch of arbitrary time delays, generalized criterion functions and higher order derivatives can be deal with. As an application of ULNs, a chaos control method using maximum Lyapunov exponent of ULNs is proposed. Maximum Lyapunov exponent of ULNs can be formulated by using higher order derivatives of ULNs, and the parameters of ULNs can be adjusted so that the maximum Lyapunov exponent approaches the target value. From the simulation results, it has been shown that a fully connected ULN with three nodes is able to display chaotic behaviors. q 2000 Elsevier Science Ltd. All rights reserved. Keywords: Universal learning networks; Neural networks; Higher order derivatives calculation; Chaos; Lyapunov exponent

1. Introduction Since the first proposal of a neuron model in the 1940s McCulloch and Pitts (1943), especially after the revitalization of artificial neural networks in 1980s (Hopfield, 1982; Rumelhart et al., 1986), a variety of neural networks have been devised and are now applied in many fields. The vast majority of neural networks in use are those networks whose parameters or weights are tuned by gradient-based supervised learning. This category includes feedforward neural networks (Rumelhart & McClelland, 1986) or multi-layer perceptrons (Rosenblatt, 1958), various types of recurrent neural networks (Williams & Zipser, 1989, 1990), radial basis function networks (Moody & Darken, 1989), and some networks with special architecture, such as time delay neural networks (Lin, Dayhoff & Ligomenides, 1995). These networks seemingly have different * Corresponding author. Tel.: 181-92-642-3907; fax: 181-92-642-3962. E-mail addresses: [email protected] (K. Hirasawa), [email protected] (X. Wang), [email protected] (J. Murata), [email protected] (J. Hu), [email protected] (C. Jin).

architecture and are trained by distinguishable training algorithms. In essence, however, they can be unified in a single framework, with regard to both their architecture and learning algorithms. Universal Learning Networks (ULNs) have been proposed, as the name indicates, to provide a universal framework for the class of neural networks, and moreover to model and control complex systems because most of the general complex systems in the real world can be modeled by the networks in which the nodes represent the processing elements and the branches between the nodes describe the relation among the processes. Unification of a variety of network architecture, which can describe the complex systems and unification of their learning algorithms, is the objective of ULNs; this provides a consistent viewpoint for the various kinds of networks. However, there is another benefit that is expected for ULNs. Generalization of the architecture and the gradient-based learning algorithms attains new abilities of the networks. Allowing a high degree of freedom in their architecture gives them more flexibility and representing abilities, and therefore the ULNs can be useful and effective tools for modeling and controlling large-scale nonlinear complex systems including physical,

0893-6080/00/$ - see front matter q 2000 Elsevier Science Ltd. All rights reserved. PII: S0893-608 0(99)00100-8

240

K. Hirasawa et al. / Neural Networks 13 (2000) 239–253

social and economical phenomena. In addition to the calculation of the first order derivatives of the signals flowing in the networks that are necessary in gradientbased learning, the generalized ULN learning algorithm is equipped with a systematic mechanism that calculates their second or higher order derivatives. As for the second order derivatives, Buntine and Weigend (1994) summarized many kinds of computational methods for feedforward neural networks. But, it is generally recognized that calculation of the second order derivatives is difficult for the recurrent neural networks (Buntine & Weigend, 1994; Wang & Lin, 1998). On the contrary, more general higher order derivatives including second order derivatives can be calculated systematically in ULNs, for not only static networks but also for dynamic networks. This allows elaborate criterion functions to be used in their learning, which enables more sophisticated specifications of the networks or system performance. This paper describes the ULNs and their application to chaos control problems, especially paying attention to higher order derivatives and their applications. Before going into the details, it should be worth giving to the readers, as preliminary knowledge, the salient features of the ULNs and their differences from the existing alternatives. A ULN consists of a number of inter-connected nodes. They may have any continuously differentiable functions in them, and each pair of the nodes could be connected by means of multiple branches, with arbitrary time delays. Networks having time delays of multiple-length can be converted to those having unit-length time delays only by inserting an appropriate number of additional nodes between the nodes (Hirasawa, Ohbayashi, Fujita & Koga, 1996). However, zero or negative time delays cannot be represented by the unit time delays. Therefore, the following become possible using ULNs: 1. Many kinds of physical systems described by differential or difference equations and also their controllers can be modeled naturally in a unified manner. 2. Application oriented modeling can be carried out because black box neural networks can be installed in ULNs when detailed information on the systems is not available. On the other hand, when a priori information is available, some networks like fuzzy inference networks or special networks packed with special information can be used. Learning of a ULN is defined as determination of its optimal parameter values that minimize a criterion function of the network. Generalized learning algorithms are derived based on two ideas. One is that the static networks are merely special dynamic networks, and the other is that the derivatives of the node outputs can be obtained by either propagating the changes of the node outputs caused by the changes in the parameters through the network forward in time/space (Hirasawa, Ohbayashi, Koga & Harada, 1996) or by propagating the changes in the criterion function caused by the changes in the other node outputs backward in

time/space (Hirasawa, Ohbayashi & Murata, 1995). The former idea is easily derived by considering that training algorithm of static networks is obtained from that of dynamic networks by just deleting the feedback loops and making the time delays zero. The latter idea is an extension of the idea used in Real Time Recurrent Learning (RTRL) or Back Propagation Through Time (Williams & Zipser, 1989, 1990) and the proposed algorithms can deal with generalized node functions, generalized networks connections with multi-branch of arbitrary time delays, generalized criterion functions and also higher order derivatives. So, a very generalized forward or backward propagation learning algorithm can be obtained in ULNs irrespective of the functions and connections of the nodes and also the type of the criterion functions in the networks. The ULNs, when applied to controller design problems, will exhibit the following advantages. The ULNs allow any nonlinear functions to be embedded in their nodes. Therefore, both a controlled system and its controller can be represented by a single ULN. If the controlled system is unknown, its identification can also be done by the ULN learning algorithm (Han, Hirasawa, Ohbayashi & Fujita, 1996). Then an optimal controller can be obtained by off-line gradient learning. The ULN controllers have a similarity with the traditional nonlinear optimal controllers (Shimizu, 1994) in the sense that both are constructed by minimizing criterion functions or performance indexes. However, in the traditional controller design, derivation of the control laws requires much mathematical knowledge, and the obtained control laws are, in most cases, feedforward control laws, and the control signals to the plants are given as functions of time. On the other hand, the ULN controllers form feedback control schemes and their parameters are determined by numerical optimization. Therefore, although the ULN controllers do not attain better performance indexes because of their structural constraint, they exhibit better robustness against external disturbances due to their feedback configuration. And the fundamental difference of the ULN control from the conventional control is that the controller design can be regarded as the parameter optimization, which makes the design problem of the controller more simplified. In this paper, a new chaos control method, which generates or eliminates chaotic phenomena, is also proposed by using the higher order derivatives of networks. One of the oldest pieces of research on chaos in control field, can be seen in the paper by Kalman (1956). He found first non-synchronous oscillations in a two dimensional sampled-data control system. In 1990, the OGY method was developed by Ott, Grebogi and Yorke (1990), where chaotic phenomena of systems were eliminated by adjusting parameters of the systems when the chaotic orbit approaches near a periodical orbit. And since then, many methods for controlling chaos have been developed (Pyragas, 1992; Shinbrot, Grebogi, Ott & York, 1993) which stabilize the chaotic phenomena. But all these methods concentrate on avoiding chaos.

K. Hirasawa et al. / Neural Networks 13 (2000) 239–253

241

Dki (|B(k,i)|) Dki (p) Dki(1) Time delay

Ni Node

Di j(1) Di j (p) Di j ( | B(i,j) | )

Nj

D (1) jk D (p) jk

Nk

D (|B(j,k)|) jk

Fig. 1. Architecture of Universal Learning Network.

On the other hand, along with the development of neural networks, the chaotic behavior, one of the characteristics of neural networks, has been discussed in many papers: for example, Aihara’s chaotic neural networks (Aihara, Takabe & Toyoda, 1990), Chen’s chaotic simulated annealing by a neural network model with transient chaos (Chen & Aihara, 1995), Falcioni’s correlation functions and relaxation properties in chaotic dynamics and statistical mechanics (Falcioni, Isola & Vulpiani, 1990), Ikeda’s high-dimensional chaotic behavior in systems with time-delay feedback (Ikeda & Matsumoto, 1987), etc. But all these papers use special nonlinear functions or special mechanisms to model chaotic phenomena and recently a more systematic discussion has been carried out on how to avoid chaos or take advantage of it (Ogorzalek, 1994). Moreover, the synchronization of chaos is also studied (Hirasawa, Misawa, Murata, Ohbayashi & Hu, 1998; Kocaver & Ogorzalek, 1993). It is generally well known (Chen & Lai, 1997a) that chaotic behaviors are mainly characterized by maximum Lyapunov exponent of systems. If maximum Lyapunov exponent of the system with bounded node outputs is positive, then the system behaves chaotically, otherwise, the system is stable. In Chen’s paper (Chen & Lai, 1997a,b), Lyapunov exponents are utilized positively to make the controlled system behave chaotic or nonchaotic. And he showed that a positive state feedback controller with a uniformly bounded control gain sequence can be designed to make all Lyapunov exponents of the controlled system strictly positive, for any given n-dimensional discrete time smooth system that could be originally nonchaotic or even asymptotically stable. But in a general situation, the control gain sequence may not be so simple. The reason is that the system Jacobian needs to be calculated at each control step. However, it should be noted that this situation has been recently improved (Chen & Lai, 1998). The chaos control method proposed in this paper is based on the following things: (1) Maximum Lyapunov exponent instead of Lyapunov exponents of network systems can be controlled by just adjusting the parameters of the networks instead of using the feedback control of Chen’s method, so, the networks can behave chaotically or stably by selecting the appropriate values of the parameters. (2) Maximum

Lyapunov exponent itself can be expressed by the first order derivatives, therefore, the second order derivatives can be used for maximum Lyapunov exponent to approach the target value when a gradient method is applied to optimize the parameters of networks. Therefore, compared with these pieces of previous research, the proposed method has the following distinctive points: 1. The proposed chaos control method can not only eliminate the chaotic phenomena but also generate chaos as the systems of Chen’s method. So, it can be used also to take advantage of chaos and also to avoid chaos. 2. Any special mechanism is not needed to control chaos. Maximum Lyapunov exponent of the networks with bounded outputs is changed so that the chaotic phenomena are generated or eliminated by only adjusting the parameters of the commonly used networks. In this paper, the higher order derivative calculation method of ULNs and its application to chaos control will be highlighted among the features and applications of ULNs. In Section 2, the structure and the learning algorithms of ULNs are described. The specific procedures for calculating the higher order derivatives are presented in Sections 3 and 4. Section 3 proposes the backward propagation procedure, while Section 4 describes the forward propagation scheme and compares the computational complexities of the both procedures. In Section 5, a new chaos control method is proposed, where the maximum Lyapunov exponent of the dynamic systems approaches its target value by adjusting the parameters of the systems using the second order derivatives of the ULNs. Generation and die-out of the chaotic phenomena are shown in Section 6, which describes the simulation results. Section 7 presents the conclusions. 2. Universal learning networks In this section, we discuss the structure and learning algorithms of ULNs. 2.1. Structure of ULNs As was stated in Section 1, the purpose of ULNs is to

242

K. Hirasawa et al. / Neural Networks 13 (2000) 239–253

f

x

y

g

z

Fig. 2. Ordered derivative and ordinary partial derivative.

provide a general framework for modeling and control of the complex systems widely found in the real world. It is generally recognized that any dynamic system can be described by using a set of related equations (Isermann, Ernst & Nelles, 1997). Depending on prior knowledge available about the systems, the equations may be fully known, partly known, or totally unknown. To model such dynamic systems, we introduce a learning network consisting of two kinds of elements: nodes and branches. The nodes correspond to the equations and the branches to their relations. The nodes may have continuously differentiable nonlinear functions including a function realized by a subnetwork, e.g. sigmoid neural network or neuron-fuzzy network, and each pair of nodes can be connected each other by multiple branches with arbitrary time delays. Multiple branches and their arbitrary time delays can be effective to model the dynamic systems in a very compact network (Hirasawa et al., 1996). Such learning networks are Universal Learning Networks (ULNs) (see Fig. 1). When no prior knowledge is used and only sigmoid functions are used in all nodes, for instance, the learning networks can be reduced to the common sigmoid neural networks. One of the distinctive features of the ULN is that it incorporates prior knowledge, if available, in modeling. For example, the dynamics that are commonly expressed by differential or difference equations can also be treated in ULNs, and if some information on the systems is available, it may be incorporated as fuzzy networks in ULNs. It is therefore expected to have performance superior to the common neural networks in the application of system modeling and control. But in the following sections, we stress the importance of the higher order derivatives of ULNs, by explaining their calculation methods and their applicability to chaos control in more detail. The generic equation that describes the behavior of ULNs is expressed as follows hj …t† ˆ fj …{hi …t 2 Dij …p††ui [ JF…j†; p [ B…i; j†}; {rn …t†un [ N…j†}; {lm …t†um [ M…j†}†; …j [ J; t [ T†;

…1†

where hj …t† is the output value of node j at time t; rn(t) the value of nth external input variable at time t; lm …t†the value

of mth parameter at time t; fj the nonlinear function of node j; Dij(p) the time delay of pth branch from node i to node j; J the set of nodes {j}; JF(j) the set of nodes which are connected to node j; JB(j) the set of nodes which are connected from node j; N the set of external input variables {n}; N(j) ˆ {nurn is fed to node j}; M the set of parameters {m}; M(j) ˆ {muhj is partially differentiable with respect to l m}; B(i,j) ˆ set of branches from node i to node j; and T is the discrete set of sampling instants. The ULNs operate on a discrete-time basis. Each pair of nodes i and j may have multiple branches between them i.e. set B(i,j) may contain more than one elements, and parameters lm …t† can be time-varying. Functions fj …·† that govern the operation of the nodes can be any continuously differentiable functions; typically, sigmoid functions can be employed. And in that case, Eq. (1) can be expressed specifically as 1 ; 1 1 e2fj aj …t†

hj …t† ˆ fj …aj …t†† ˆ X

aj …t† ˆ

X

vij …p†hi …t 2 Dij …p†† 1 uj ;

…2† …3†

i[JF…j† p[B…i;j†

where vij …p† is the weight parameter of pth branch from node i to node j; uj the threshold parameter of node j; and fj is the slope parameter of node j.Therefore, the set of ULNs can be recognized as a framework for modeling more general systems in the sense that after being processed by each function at each node, the output signal of the node is transferred to another node through multiple branches with arbitrary time delays, then it is again processed by another node with its function. 2.2. Learning of ULNs Learning of ULNs is realized by minimizing a criterion function L based on the gradient method:

l1 …t† ← l1 …t† 2 g

2† L ; 2l1 …t†

…4†

where g is the learning coefficient assigned a small positive value and 2† L=2l1 …t† is the ordered derivative defined by Werbos (Jang & Sun, 1995; Werbos, 1974) which means the net change of the criterion function L caused by the change of l1 …t†, with other variables being fixed. Now, the ordered derivatives are explained in more detail (Jang & Sun, 1995). The difference between the ordered derivative and the ordinary partial derivative lies in the way we view the function to be differentiated. Consider the simple network shown in Fig. 2, where z is a function of x and y, and y is in turn a function of x: y ˆ f …x†;

…5†

z ˆ g…x; y†:

…6†

For the ordinary partial derivative 2z/2x, we assume that

K. Hirasawa et al. / Neural Networks 13 (2000) 239–253

all the other input variables (in this case, y) are constant: 2z 2g…x; y† ˆ : 2x 2x

…7†

In other words, we assume the direct inputs x and y are independent, without paying attention to the fact that y is actually a function of x. For the ordered derivative, we take this indirect causal relationship into consideration: 2† z 2g…x; f …x†† 2g…x; y† 2f …x† 2g…x; y† ˆ ˆ 1 2x 2x 2y yˆf …x† 2x 2x yˆf …x†: …8† Therefore, the ordered derivative takes into consideration both the direct and indirect paths that lead to the causal relationship. The criterion function L usually consists of two parts, a fundamental part, E, and an extended part, Ex: L ˆ E 1 Ex :

…9†

The fundamental part E is defined as a differentiable function of relevant node outputs and parameters at appropriate time instants: E ˆ E…{hr …s†}; {lm …s†}†; …r [ J0 ; m [ M0 ; s [ T0 †;

…10†

where J0 ˆ {ruhr is related to evaluation of criterion function}; M0 ˆ {mul m is related to evaluation of criterion function}; T0 ˆ {sutime instant s is related to evaluation of criterion function}. A typical choice of E is the sum of errors between the network outputs and their desired values. And function E defined above is a generalized function in the sense that almost all the criterion functions used for neural network training are in the form of E. The extended part Ex is a function of derivatives of node output hr(s). Therefore, this part is related to some notions that can not be represented by the fundamental criterion E, such as, smoothness, robustness, and stability, and thus it renders the ULNs a variety of advantageous features. The cost that one might have to pay for these advantages is computational complexity in the gradient-based optimization of the generalized criterion function L. The gradient of L includes not only the first order derivatives of node output hr(s) but also its higher order derivatives since Ex in L itself contains the derivatives of hr(s). In the subsequent sections, we will propose systematic calculation methods of the derivatives. Of course we have to endure the additional computational cost for calculation of the higher order derivatives. However, the calculation itself is not complicated and is easily implemented. Since E is a differentiable function of node outputs hr(s), calculation of the derivatives of E and computation of those of hr(s) are essentially identical. Therefore, for clarity, in the following two

243

sections we will discuss the first and higher order derivatives of E only. Their computational complexity will be evaluated and compared in Section 4.3. 3. A derivatives calculation method by backward propagation In this section, a backward calculation method of the first order derivatives and the second order derivatives of the criterion function E (Eq. (10)) with respect to the parameters will be presented, which can be easily extended to calculation of the nth order derivatives. These algorithms are extensions of Back Propagation Through Time (BPTT) (Williams & Zipser, 1989, 1990). In BPTT, sigmoid node functions, squared error between teacher signal and actual signal, single branch and one sampling time delay are assumed. But, in ULNs, generalized node functions, generalized network connections with arbitrary time delays and generalized criterion functions, are adopted. Also, time-varying parameters and time-invariant parameters, dynamic networks and static networks are treated in a unified way. One of the most distinguished features of ULNs is that higher order derivatives of the network, above generalized, can be systematically calculated; this was earlier considered as being fairly hard to obtain (Buntine & Weigend, 1994; Wang & Lin, 1998). 3.1. The first order derivatives Let us evaluate the change of criterion function E caused by a change of a time-varying parameter l 1(t) at time t ˆ t1 : It should be noted that the parameter is changed at time t1 only with keeping its values at other time instants unchanged. The parameter may influence the function directly or indirectly. The direct influence can be evaluated by a partial derivative 2E=2l1 …t1 †: To evaluate the indirect influence, we need to take the roles of some intermediate variables into account, that bridges the function E and the parameter l 1(t1). We can adopt node outputs hd(t1) which is directly affected by the parameter as those intermediate variables. Then, the first order derivative 2† E=2l1 …t1 † can be represented as follows " † # X 2† E 2 E 2hd …t1 † 2E ˆ 1 ; …11† 2l1 …t1 † 2hd …t1 † 2l1 …t1 † 2l1 …t1 † d[JD…l † 1

where t1 is a certain specified time instant; l 1(t1) the value of parameter at time t1; JD(l 1) ˆ {duhd(t1) is partially differentiable w.r.t. l 1(t1)}. The first term of the right-hand side of Eq. (11) corresponds to the first term on the right-hand side of Eq. (8), and the second term alike. Since node outputs hd(t1) may affect E directly or indirectly, we need to consider the ordered derivative 2† E=2hd …t1 † here. Now, let us denote

244

K. Hirasawa et al. / Neural Networks 13 (2000) 239–253

the ordered derivative as d1 …j; t† : 2† E : 2hj …t†

d1 …j; t† ˆ

…12†

Its evaluation again requires the help of some intermediate variables, which leads to the following back-propagation algorithm " # X X 2hk …t 1 Djk …p†† d1 …j; t† ˆ d1 …k; t 1 Djk …p†† 2hj …t† k[JB…j† p[B…j;k† 1

2E ; …j [ J; t [ T†: 2hj …t†

(13)

The first term in the right-hand side indicates the indirect effect which is calculated taking the outputs of the downstream nodes k as the intermediate variables, and the second term shows the direct effect of node output hj(t) on the criterion E. Substituting this signal d 1(j,t) into Eq. (11), we can derive the first order derivative of E with respective to a timevarying parameter l 1(t) at time t1, and also the derivative with respective to a time-invariant parameter l can be obtained in a similar way. For static networks, the formulas are reduced to simpler forms. The calculation procedures of the first order derivatives for time-varying parameters, timeinvariant parameters and static networks are summarized as follows. • For time-varying parameters  X  2hd …t1 † 2† E 2E ˆ : d1 …d; t1 † 1 2l1 …t1 † 2 2 l …t † l 1 1 1 …t1 † d[JD…l †

…14†

1

• For time-invariant parameters " # X X 2† E 2hd …t 0 † 2E 0 ˆ d1 …d; t † 1 : 2l1 2l 1 2l 1 t 0 [T d[JD…l †

…15†

consider 2hd …t 0 †=2l1 at any time t 0 and sum them up. This is the reason why Eq. (15) only contains a summation over t 0 . In algorithm (13)–(17), the derivative 2† E=2l1 is calculated by propagating d 1 or the derivative of the function E with respect to the node outputs backward in time (for dynamic networks) or backward in space (for static networks), which is, in essence, identical to the back-propagation algorithm for ordinary neural networks. In other words, the learning algorithm includes the ordinary back-propagation as one of its special examples. 3.2. The second order derivatives The first order derivative calculation discussed in the preceding subsection can be easily extended to calculation of the nth order derivatives. Details of the nth order derivative calculation method by backward propagation can be found in (Hirasawa et al., 1995). In this subsection, we give the calculation method of the second order derivatives as an example. The higher order derivative calculation of the ULNs renders the following advantages to the ULNs. If we have a calculation method for first order derivatives only, the criterion function to be minimized by a gradient-based method must be a function of network output, e.g. an output error. However, a mechanism for higher order derivative calculation, can make it possible to include the derivatives of network output in the criterion function. This allows sophisticated notions to be added to the network, for example, sensitivity, smoothness, stability and robustness (Han, Hirasawa, Hu & Murata, 1998; Hirasawa, Wang, Murata & Hu, 1998; Ohbayashi, Hirasawa, Murata & Harada, 1996). Let t1 and t2 be certain specified time instants, and l 1(t) and l 2(t) be parameters. Then by differentiating Eq. (14) with respect to l 2(t2), we have the second order derivative 2†2 E=2l1 …t1 †2l2 …t2 † as "

1

• For static networks  X  2hd 2† E 2E ˆ d1 …d† 1 ; 2 l l1 2l1 2 1 d[JD…l †

X 2†2 E ˆ 2l1 …t1 †2l2 …t2 † d[JD…l

d1 …j† ˆ

k[JB…j†

"

# 2hk 2E d …k† 1 ; 2hj 1 2hj

…j [ J†:

#

  3 † 2hd …t1 † 2 X 6 7 2l1 …t1 † 6 7 1 4 2l …t † d1 …d; t1 †5 2 2 d[JD…l † 2

…16†

1

X



2hd …t1 † 2† d1 …d; t1 † 2l1 …t1 † 2l2 …t2 †

1

…17†



 2E 2 2l1 …t1 † : 1 2l2 …t2 † †

The summation over d [ JD(l 1) in Eqs. (14)–(16) reflects the fact that there are a number of nodes whose outputs are partially differentiable with respect to a certain l 1. While a change of a time-varying parameter at a specific time instant t1 directly affects node outputs hd(t) at time t ˆ t1 only, a change in a time-invariant parameter directly causes change of hd(t) at any time. Therefore, we need to

…18†

Let us define

2† d1 …d; t1 † ˆ d12 …d; t1 † ˆ 2l2 …t2 †

2



2† E 2hd …t1 † 2l2 …t2 †

! :

K. Hirasawa et al. / Neural Networks 13 (2000) 239–253

It can be calculated by differentiating Eq. (13) with respect to l2 …t2 † # " X X 2hk …t 1 Djk …p†† d12 …j; t† ˆ d12 …k; t 1 Djk …p†† 2hj …t† k[JB…j† p[B…j;k† 2 1

2hk …t 1 Djk …p†† 2hj …t†

6 2† X 6 4

X

2l2 …t2 †

k[JB…j† p[B…j;k†

2† 1

2E 2hj …t†

!

7 7 5 d1 …k; t 1 Djk …p††

;

…j [ J; t [ T†:

2

X

6 62 4

2hd …t 0 † 2l1



!

2 l2

t 0 [T d[JD…l1 †

d[JD…l1 †

…19†

3

  2E 7 2† 7 5 2l1 d1 …d; t 0 † 1 ; 2l2 …20†

X

X

"

k[JB…j† p[B…j;k†

2hk …t 1 Djk …p†† 2hj …t†

#  d12 …k; t 1 Djk …p†† 2 1

1

2hd 2l1 2l2

! #

2



d1 …d† 1

2

1

X

"2

X



2E 2l1 2l2

62 X 6 4



2hk …t 1 Djk …p†† 2hj …t†

!

2l2

k[JB…j† p[B…j;k†

3 7  d1 …k; t 1 Djk …p††7 51

2



2E 2hj …t† 2l2

! ;

…22†

• For time-invariant parameters " # X X 2†2 E 2hd …t 0 † 0 ˆ d …d; t † 2l1 2l2 t 0 [T d[JD…l † 2l1 12

d12 …j; t† ˆ

1

3

In Eqs. (18) and (19), 2† …2hd …t1 †=2l1 …t1 ††=2l2 …t2 †; † 2† …2hk …t 1 Djk …p††=2hj …t††=2l2 …t2 † 2 …2E=2l1 …t1 ††=2l2 …t2 †; † and 2 …2E=2hj …t††=2l2 …t2 † can be calculated by applying the calculation procedure of the first order derivatives, with E being replaced with 2hd …t1 †=2l1 …t1 †; 2E=2l1 …t1 †; 2hk …t 1 Djk …p††=2hj …t† and 2E=2hj …t†; respectively. Because E is replaced with a new one such as 2hd …t1 †=2l1 …t1 † as mentioned above and a generalized node function described by Eq. (1) instead of sigmoidal functions is adopted in ULNs, the higher order derivatives can be calculated systematically. For time-invariant parameters and static ULNs, the calculation procedure is rewritten as follows:

1

• For static networks # " X 2†2 E 2hd ˆ d …d† 2l1 2l2 2l1 12 d[JD…l †

!

2l2 …t2 †

X

245

! …j [ J; t [ T†: …21†

d12 …j† ˆ

"

X k[JB…j†

2 1



2hk 2hj

# 2 X 6 6 2hk 6 d …k† 1 6 2l 2hj 12 2 k[JB…j† 4

2E 2hj 2l2



!

3 7 7 d1 …k†7 7 5

! ; …j [ J†:

(23)

The nth order derivative 2†n E=2l1 …t1 †2l2 …t2 †…2ln …tn † can be obtained by evaluating iterative equations for d1 ; d12 ; d13 ; …; d12…n ˆ 2† d12…n21 =2ln : Details of the calculation method of nth order derivatives can be found in Hirasawa et al. (1995).

4. A derivatives calculation method by forward propagation In this section, we will introduce a new calculation scheme for the nth order derivatives of the criterion function E by using forward propagation method. In the forward propagation calculation, the change of node outputs with respect to changes in the parameters propagate forwards, whereas the changes of criterion function with respective to changes in node outputs propagate backwards in the backward propagation calculation. Details of the nth order derivatives calculation by forward propagation method can be found in Hirasawa et al. (1996). In this section, only the calculation of the first and second order derivatives is discussed as examples. These algorithms are also extensions of Real Time Recurrent Learning (RTRL) of Williams and Zipser (1989, 1990). The extensions are done for generalized node functions, generalized network connections with arbitrary time delays and generalized criterion functions as the case of backward propagation. Also, time-varying and time-invariant parameters, dynamic and static networks can be treated in a unified manner. The higher order derivatives can also be calculated systematically as the backward propagation in the preceding section.

246

K. Hirasawa et al. / Neural Networks 13 (2000) 239–253

4.1. The first order derivatives In contrast to the backward propagation procedure described in the previous section, let us think of evaluating the indirect effect of the changes in the parameter on the criterion function E by considering the node outputs that directly influence E as the intermediate variables. Then, the first order derivative of the function E with respect to a time-varying parameter l 1(t) at time t1 can be calculated as follows X X 2† E ˆ 2l1 …t1 † r[J0 s[T0

"

2E 2† hr …s† 2hr …s† 2l1 …t1 †

X

P1 …j; t; l1 † ˆ

# 1

2E ; 2l1 …t1 †

…24†

X

i[JF…j† p[B…i;j†

"  1

2hj …t† P …i; t 2 Dij …p†; l1 † 2hi …t 2 Dij …p†† 1

#

2hj …t† ; …j [ J; t [ T†; 2l 1

…28†

where P1 …j; t; l1 † ˆ 2† hj …t†=2l1 : • For static networks  X  2E 2† E 2E ˆ P1 …r; l1 † 1 ; 2 l1 2hr 2l1 r[J0

…29†

where:  X  2hj 2hj P1 …i; l1 † 1 ; …j [ J†; …30† P1 …j; l1 † ˆ 2h 2 l1 i i[JF…j†

t1 is a certain specified time instant; l 1(t1) is the value of parameter at time t1; J0 ˆ {ruhr is related to evaluation of E}; T0 ˆ {sutime instant s at which E is evaluated}.

where P1 …j; l1 † ˆ 2† hj =2l1 :

Eq. (24) shows that the change of the criterion function caused by the change of parameters can be reduced to the change of the outputs of the nodes related to evaluation of the function. The ordered derivative of node output hj(t) with respect to the time-varying parameter l 1(t1), P1 …j; t; l1 …t1 †† ˆ

2† hj …t† ; 2l1 …t1 †

…25†

can be calculated using the following forward propagation procedure: P1 …j; t; l1 …t1 †† ˆ

X

X

i[JF…j† p[B…i;j†

1

"

2hj …t† × P1 …i; t 2 Dij …p†; l1 …t1 †† 2hi …t 2 Dij …p††

2hj …t† ; …j [ J; t [ T†: 2l1 …t1 †

#

(26)

The first term in the right-hand side indicates the indirect effect which is calculated taking the outputs of the upstream nodes i as the intermediate variables, while the second term shows the direct effect of parameter value l 1(t1) on node output hj(t). For time-invariant parameters and static networks Eqs. (24) and (26) can be reduced to the following simpler forms. • For time-invariant parameters  X X  2E 2† E 2E P1 …r; s; l1 † 1 ˆ ; 2l1 2hr …s† 2l 1 r[J0 s[T0

…27†

In the backward propagation scheme, the calculation of the first order derivative of E with respect to a time-invariant parameter l 1 needed a summation over t 0 [ T (see Eq. (15)). This was because a change in a time-invariant parameter directly causes change of hd …t† at any time. In contrast to this, the forward propagation procedure for the timeinvariant parameters does not contain such a summation. The reason for this is that, in the forward propagation scheme, the sensitivity of a node output hr …s† with respect to the change in a parameter l 1 is evaluated by an ordered derivative which incorporates all the direct and indirect effects. The indirect effects are calculated by following a chain of spatial and temporal causes-and-effects as shown in Eqs. (26), (28) and (30). Therefore, P1 …j; t; l1 † includes all the effects at any time t 0 # t caused by the change of l 1. 4.2. The second order derivatives Let t1 and t2 be certain specified time instants, and l 1(t) and l 2(t) be time-varying parameters. Then the second order derivative 2† E=2l1 …t1 †2l2 …t2 † can be obtained by differentiating Eq. (24) with respect to l 2(t2): X X 2†2 E ˆ 2l1 …t1 †2l2 …t2 † r[J0 s[T0 "2 

2E 2hr …s†

! 2† hr …s† 2E 2†2 hr …s† 1 2l1 …t1 † 2hr …s† 2l1 …t1 †2l2 …t2 †

2l2 …t2 † 2

1





2E 2l1 …t1 † 2l2 …t2 †

#

! :

…31†

K. Hirasawa et al. / Neural Networks 13 (2000) 239–253

247

X

P2 …j; t; l1 ; l2 † ˆ

Then introducing

X

i[JF…j† p[B…i;j†

P2 …j; t; l1 …t1 †; l2 …t2 †† ˆ

2

2 hj …t† 2 P1 …j; t; l1 …t1 †† ˆ ; 2l1 …t1 †l2 …t2 † l2 …t2 † …32† †2





and differentiating Eq. (26) with respect to l 2(t2), the following iterative formula for P2 can be obtained: P2 …j; t; l1 …t1 †; l2 …t2 †† ˆ

X



6 62 4

2hj …t† 2hi …t 2 Dij …p††



2l2 …t2 †

7 7 5



# 2hj …t† P …i; t 2 Dij …p†; l1 …t1 †; l2 …t2 †† 2hi …t 2 Dij …p†† 2

1

 2hj …t† 2l1 …t1 † ; …j [ J; t [ T†: 2l2 …t2 †



 2†

1 …33†

The first order derivatives in Eqs. (31) and (33), for example 2† …2E=2hr …s††=2l2 …t2 †; can be calculated by replacing 2E=2hr …s† with E and again using the first order derivative calculation. As the preceding section, replacing 2E=2hr …s† with E and again using the first order derivative calculation when calculating the second order derivatives are the reasons why higher order derivatives can be calculated systematically by forward propagation. Furthermore, as in the preceding section, Eqs. (31) and (33) can be transformed to the following simplified expressions for time-invariant parameters and static networks. • For time-invariant parameters X X 2†2 E ˆ 2l1 2l2 r[J0 s[T0 2 



62 4



 1

2†

 ;

2†

2E 2l1 2l 2

#

…35†

 ;

…36†

2

3   † 2hj 2 7 X 6 4 5 2hi P1 …i; l1 † P2 …j; l1 ; l2 † ˆ l 2 2 i[JF…j†  X  2hj P …i; l1 ; l2 † 1 2hi 2 i[JF…j†  1

2†

2hj 2l1 2l 2



…34†

; …j [ J†;

…37†

where P2 …j; l1 ; l2 † ˆ 2†2 hj =2l1 2l2 : The nth order derivative 2†n E=2l1 …t1 †2l2 …t2 †…2ln …tn † can be obtained by calculating P1 ; P2 ; …; Pn iteratively, where P3 …j; t; l1 …t1 †; l2 …t2 †; l3 …t3 †† ˆ

3  2E 7 2E 5 2hr …s† P2 …r; s; l1 ; l2 † P1 …r; s; l1 † 1 2hr …s† 2l2

2E 2 l1 2l 2

P1 …i; t 2 Dij …p†; l1 †

2l2 X

7 7 5

where P2 …j; t; l1 ; l2 † ˆ 2†2 hj …t†=2l1 2l2 : • For static networks 2  3  † 2E 2 7 X6 2†2 E 2E 4 5 2hr ˆ P1 …r; l1 † 1 P2 …r; l1 ; l2 † 2l1 2l2 2hr 2l2 r[J0

i[JF…j† p[B…i;j†

"

3

2hj …t† P …i; t 2 Dij …p†; l1 ; l2 †  2hi …t 2 Dij …p†† 2   2hj …t† 2† 2l1 ; …j [ J; t [ T†; 1 2 l2

X

X

1

!

"

3 P1 …i; t 2 Dij …p†; l1 …t1 ††

2hj …t† 2hi …t 2 Dij …p†

i[JF…j† p[B…i;j†

X

!

X

1

i[JF…j† p[B…i;j†

2

6 62 4



2†3 hj …t† ; 2l1 …t1 †2l2 …t2 †2l3 …t3 †

…38†

.. .

Pn …j; t; l1 …t1 †; …; ln …tn †† ˆ

2†n hj …t† : 2l1 …t1 †…2ln …tn †

…39†

Details of calculation of the nth order derivatives can be found in (Hirasawa et al., 1996).

248

K. Hirasawa et al. / Neural Networks 13 (2000) 239–253

4.3. Computational complexity of backward and forward propagation schemes Comparing the forward propagation scheme with the backward propagation scheme discussed in the previous section, we can see that they are similar in form. However, the computation loads required in these two schemes are very different. Since P1(j,tl 1(t1)) is a function of parameter l 1(t1), we need to calculate P1 for every parameter. On the other hand, d 1(j,t) in the backward propagation scheme does not depend on the parameters, and therefore we only need to calculate a single time function d 1(j,t) for a given node j. These result in, for calculation of the first order derivatives, a larger computational load of the forward propagation scheme by a factor of the number of parameters involved. For this reason, the forward propagation algorithm is rarely used for the calculation of the first order derivatives. The situation is different for the calculation of the higher order derivatives. The forward propagation scheme requires lower computational load when applied to calculation of the higher order derivatives. Let uJu be the number of nodes, uTu be the number of sampling instants and uMu be the number of parameters. As explained in (Hirasawa et al., 1995, 1996), in the cases of time-invariant parameter systems, the computational load of the nth order derivatives by forward propagation is proportional to uJu2 uTuuMun ; whereas it is proportional to uJu 2nuTu n by backward propagation. In practical applications, for instance controller design, a large number of sampling times are usually involved. Therefore, when the higher order derivatives are required, it is highly recommended to use the forward propagation scheme.

5. A chaos control method In this section, a new chaos control method is proposed which generates or eliminates chaotic phenomena by adjusting parameters in ULNs whose signal is within bounds (Hirasawa et al., 1998). The criterion function adopted for training parameters is maximum Lyapunov exponent which can be represented by the first order derivative of ULNs. Therefore, parameter training by the gradient method can be carried out by using the second order derivative of ULNs described in the previous sections since maximum Lyapunov exponent itself is the first order derivative. The fundamental idea of the proposed method is that if maximum Lyapunov exponent of the dynamic system becomes positive by adjusting parameters, then the system behaves chaotically, otherwise, the system is stable. The purpose of this section is to verify whether commonly used neural networks can generate chaos or not, by the proposed method without any special mechanism for controlling the chaotic behavior and also to demonstrate that the higher order derivatives of ULNs can be effectively used to control the chaos. It should be noted that, this method could be applied, only

to systems that have bounded node outputs such as neural networks. And as a sigmoid neural network is a special instance of ULNs, the calculation method of the higher order derivatives of ULNs in the previous section can also be applied to neural networks. Therefore, we adopt sigmoid neural networks as the system to behave chaotically. But this time, the distinctive features of ULNs such as multibranch and arbitrary time delays are not utilized for controlling chaos. Generally speaking, Lyapunov exponent is an index that describes the rate of expansion of trajectory discrepancy caused by the initial value variation. So Lyapunov exponent xj …j [ J† of uJu dimensional dynamics described by Eq. (40) hj …t† ˆ fj …h…t 2 1††

…40†

can be expressed as follows

xj ˆ lim

L!∞

1 log s …L†; L e j

…41†

DFL ˆ DF…h…L 2 1††DF…h…L 2 2††…DF…h…0††; 2

2f1 6 2h1 6 6 6 6 2f2 6 6 2h 1 DF…h…t†† ˆ 6 6 6 . 6 . 6 . 6 6 2f 4 uJu 2h1

2f1 2h2



] ] 2fuJu 2h2



…42†

3 2f1 2huJu 7 7 7 .. 7 7 . 7 7 7; 7 .. 7 7 . 7 7 2fuJu 7 5 2huJu

where h…t† ˆ …h1 …t†; h2 …t†; …; huJu …t††; and sj …L† is jth singular value of matrix DFL. Obviously, from what was mentioned above, computing Lyapunov exponent for multi-dimensional dynamics is very complicated, since many matrix operations and singular value calculations are necessary. Therefore, we adopted the following maximum Lyapunov exponent represented by Eqs. (43) and (44) p 21 1 LX Sr[J Dhr …ls 1 s; Dh…ls††2 p ; …43† x ˆ lim loge L!∞ Ls Sr[J Dhr …ls†2 lˆ0 Dhr …ls 1 s; Dh…ls†† ˆ

X 2† hr …ls 1 s† Dhr1 …ls†: 2hr1 …ls† r1[J

…44†

where Dh…ls† ˆ …Dh1 …ls†; Dh2 …ls†; …; DhuJu …ls†† is the variation of node outputs at time ls; Dh…ls 1 s; Dh…ls†† ˆ …Dh1 …ls 1 s; Dh…ls††; Dh2 …ls 1 s; Dh…ls††; …; DhuJu …ls 1 s; Dh…ls††† is the change of dynamics at time ls 1 s caused by the variation Dh(ls). When calculating Eq. (43), the variation Dh(ls) is initialized every s sampling instants as shown in Fig. 3. Dh(ls) is in the same direction of D…ls; Dh……l 2 1†s††:

K. Hirasawa et al. / Neural Networks 13 (2000) 239–253

h3 ∆h(s, ∆h(0) )

∆h(s)

∆h(2s, ∆h(s)) ∆ h(3s, ∆h(2s))

∆h(2s)

h2

∆h(0)

X 2† Dhr …ls 1 s; Dh…ls†† ˆ 2l m r1[J

Fig. 3. Calculation of maximum Lyapunov exponent.

5.1. Parameter training with gradient method The chaotic phenomena of ULNs can be controlled by adjusting the parameters for maximum Lyapunov exponent represented by Eq. (43) to approach a positive or negative value. So the following criterion function is adopted: E ˆ …x 2 x0 †2 :

…45†

where x 0 is the target value of maximum Lyapunov exponent. Since maximum Lyapunov exponent is represented by the first order derivative (Eqs. (43) and (44)), the computation of the second order derivatives described in the preceding section is necessary when the above criterion function is used and the gradient method is adopted. Therefore, in order to minimize the criterion function, the parameters of ULNs are trained as follows: 2† E lm ← lm 2 g ; 2lm

…46†

2† E 2† x ˆ 2…x 2 x0 † ; 2lm 2lm

2†2 hr …ls 1 s† Dhr1 …ls† ; 2hr1 …ls†2lm …49†

where g is the learning coefficient. 2† hr …ls 1 s†=2hr1 …ls†; 2†2 hr …ls 1 s†=2hr1 …ls†2lm in Eqs. (44) and (49) can be calculated by using the higher order derivatives computation with forward propagation described in Section 4. The reason why the forward propagation method instead of the backward propagation method is used is that the forward propagation scheme requires lower computational load when applied to calculation of the higher order derivatives as described in Section 4.

…47†

The value of maximum Lyapunov exponent changes abruptly with small variations of parameters when the value of maximum Lyapunov exponent is positive because the criterion function including maximum Lyapunov exponent is not a monotonic function but a complicated one(Fig. 5(a)). Therefore, it is very easy to be stuck at a local minimum. So, a random search method is combined with the gradient method to globally optimize the parameters. During the iteration of parameter training with the gradient method, the parameters will be changed randomly within a fixed range if the criterion function increases. The random searching will continue until it finds the value of the criterion function less than the one obtained at the last learning epoch of the gradient method. After that the learning with the gradient method resumes again using the new parameters. 6. Computer simulations 6.1. Sample network for simulations A sample network for simulations is a fully connected recurrent neural network shown in Fig. 4, and the following sigmoidal functions are adopted as network’s node functions hj …t† ˆ fj …aj † ˆ

aj ˆ

X

1:0 ; 1:0 1 exp…2fj aj †

vij hi …t 2 1† 1 uj ;

…50† …51†

i[JF…j†

2x 1 ˆ lim L!∞ Ls 2lm †

2† Dhr …ls 1 s; Dh…ls†† 2lm ; Sr[J Dhr …ls 1 s; Dh…ls††2

LX 2 1 Sr[J Dhr …ls 1 s; Dh…ls†† lˆ0

#

5.2. Local-minimum problem

h1



249

"

…48†

where u j, f j are the threshold and slope parameters of sigmoidal function; and wij is the weight parameter for node i to node j. The reason why sigmoid nodes are adopted in the simulations is to investigate whether the most commonly used sigmoid neural networks can generate or eliminate chaos by just adjusting their parameters. Simulations were carried out

250

K. Hirasawa et al. / Neural Networks 13 (2000) 239–253

w32

w31 w21

h1

w11

w22 1 w12

w33

h2

w23 1 w13

h3

1

Fig. 4. Recurrent neural network for simulation.

in two networks having two nodes and three nodes, respectively. But the simulation results were almost the same, so in this paper the simulation results for the three-node networks only will be stated. The initial conditions are shown in Table 1. 6.2. Simulation results Generation of chaotic phenomena. Target value of maximum Lyapunov exponent is set at 13. After training the network by the combined gradient and random search method, learning curve of maximum Lyapunov exponent, weight, threshold and slope parameters of sigmoidal functions are obtained as shown in Fig. 5(a)– (d). Network dynamics is shown in Fig. 6(a)–(d). At learning epoch 5, dynamical trajectory shown in Fig. 6(a) converges to a fixed point where maximum Lyapunov exponent is 20.859. At learning epoch 50, dynamical trajectory shown in Fig. 6(b) also converges with maximum Lyapunov exponent being 20.038. At learning epoch 450, dynamical trajectory shown in Fig. 6(c) becomes chaotic because maximum Lyapunov exponent is 0.215. Maximum Lyapunov exponent 0.359 at learning iteration 1150 proves that the dynamics shown in Fig. 6(d) behaves completely chaotically. Although the target value of maximum Lyapunov Table 1 Simulation conditions Initial value of weights wij: random between Initial value of slopes f j Initial value of threshold u j Learning coefficient g of weights wij Learning coefficient g of slopes f j Learning coefficient g of thresholds u j Range of random searching Target value of maximum Lyapunov exponent x 0 Sampling instant s L in Eq. (43) Initial variation Dh1(0), Dh2(0), Dh3(0)

[22,2] 1.0 0.0 0.004 0.00001 0.00001 0.25

exponent is 3.0 in the example, the value achieved in the simulation is 0.359. Such a relatively large target value is set to investigate how large a maximum Lyapunov exponent the recurrent neural network in Fig. 4 can achieve. From the simulations, it turned out that the fully connected recurrent neural network with three nodes could give the largest maximum Lyapunov exponent around 0.36. Die-out of chaotic phenomena. Die-out of chaotic phenomena is realized by setting the target value of maximum Lyapunov exponent at 23. Learning curves of maximum Lyapunov exponent, weight, slope and threshold parameters of sigmoidal functions are respectively shown in Fig. 7(a)–(d). As was stated in the Section 5.2, maximum Lyapunov exponent around or more than zero is very sensitive to parameter changes as shown in Fig. 5(a). Therefore the value of maximum Lyapunov exponent can decrease very quickly as shown in Fig. 7(a) if target value x 0 is assigned a negative value. Network dynamics at epoch 1 and epoch 2000 are shown in Fig. 8(a) and (b), respectively. The network dynamics shown in Fig. 8(a) is chaotic because its maximum Lyapunov exponent is equal to 0.238. But after training parameters 2000 times, the network has been completely stabilized while its maximum Lyapunov exponent decreased down to 22.434. 6.3. Discussion From Fig. 5, it is evident that as the value of maximum Lyapunov exponent increases through learning, the absolute values of the weight parameters tend to increase. The reason for this tendency is as follows. It is clear from Eqs. (26), (43) and (44), that u2† hr …ls 1 s†=2hr1 …ls†u; in other words, u2hj …t†=2hi …t 2 Dij …p††u should be large enough for the value of maximum Lyapunov exponent to increase. As the following equation holds 2hj …t† …52† 2h …t 2 1† ˆ uwij uufj u…1 2 hi …t 2 1††hi …t 2 1†; i uwij u should be large for the system to behave chaotically.

3.0–3.0 150 3

7. Conclusion

0.01

ULNs and their application to chaos control have been discussed. With regards to the architecture and learning

K. Hirasawa et al. / Neural Networks 13 (2000) 239–253

251

Fig. 5. Value of maximum Lyapunov exponent and parameters in the case of chaos generation: (a) maximum Lyapunov exponent; (b) weight parameters; (c) slope parameters; (d) threshold parameters.

Fig. 6. Network dynamics at various learning epoch in the case of chaos generation: (a) dynamics at epoch 5; (b) dynamics at epoch 50; (c) dynamics at epoch 450; (d) dynamics at epoch 1150.

252

K. Hirasawa et al. / Neural Networks 13 (2000) 239–253

Fig. 7. Value of maximum Lyapunov exponent and parameters in the case of chaos-die out; (a) maximum Lyapunov exponent; (b) weights parameters; (c) slope parameters; (d) threshold parameters.

algorithms, a class of neural networks is unified in the framework of ULNs, and salient features added. In an architectural aspect, ULNs have higher degree of freedom: arbitrary continuously differentiable functions in the nodes, multiple branches and arbitrary time delays. The ULN training algorithm, in either backward propagation scheme or forward propagation scheme, is equipped with a mechanism for calculating the higher order derivatives as a noble feature that has not been possessed by common neural networks. Therefore, ULNs are natural and the furthest extensions of discrete-time recurrent neural networks, which are useful for modeling and controlling large scale complicated systems such as industrial plants, economic, social and

life phenomena, and the ULNs can be applied widely to many kinds of problems (Han et al., 1998; Ohbayashi, Hirasawa, Hashimoto & Murata, 1996). In this paper, a new chaos control method is also proposed which can generate or eliminate chaotic phenomena by controlling maximum Lyapunov exponent of dynamic systems that can be represented by ULNs. A new method based on the gradient and random search method is used to minimize the criterion function, which is the difference between the desired maximum Lyapunov exponent and its actual value. The second order derivatives of ULNs are effectively used to train the parameters of the network in the gradient learning method. Simulation results show that the generation and

Fig. 8. Network dynamics at various learning epoch in the case of chaos die-out: (a) dynamics at epoch 1; (b) dynamics at epoch 2000.

K. Hirasawa et al. / Neural Networks 13 (2000) 239–253

elimination of chaotic phenomena of ULNs is easily realized by using the proposed method. However, the optimization of parameters is fairly difficult when the value of maximum Lyapunov exponent is positive. Work should be done in future to find a more powerful optimization method and to control the system’s sensitivity and stability efficiently.

References Aihara, K., Takabe, T., & Toyoda, M. (1990). Chaotic neural networks. Physics Letters A, 144, 333–340. Buntine, W. L., & Weigend, A. S. (1994). Computing second derivatives in feed-forward network: a review. IEEE Transaction on Neural Networks, 5, 480–488. Chen, L., & Aihara, K. (1995). Chaotic simulated by a neural network model with transient chaos. Neural Networks, 8, 915–930. Chen, G., & Lai, D. (1997). Making a dynamic system chaotic, feedback control of Lyapunov exponents for discrete time dynamical systems. IEEE-CAS I, 44, 250–253. Chen, G., & Lai, D. (1997). Anticontrol of chaos via feedback. (pp. 367– 372). Proceedings of IEEE conference on decision and control 1997, San Diego. Chen, G., & Lai, D. (1998). Feedback anticontrol of discrete chaos. International Journal of Bifurcation and Chaos, 8, 1585–1590. Falcioni, M., Isola, S., & Vulpiani, A. (1990). Correlation functions and relaxation properties in chaotic dynamics and statistical mechanics. Physics Letters A, 144, 234–240. Han, M., Hirasawa, K., Ohbayashi, M., & Fujita, M. (1996). Modeling dynamic systems using universal learning network. (pp. 1172–1177). Proceedings of IEEE conference on systems, man and cybernetics, Beijing, 5. . Han, M., Hirasawa, K., Hu, J., & Murata, J. (1998). Generalization ability of Universal Learning Network by using second order derivatives. (pp. 1818–1823). Proceedings of IEEE international conference on systems, man, cybernetics, San Diego. Hirasawa, K., Ohbayashi, M., & Murata, J. (1995). Universal Learning Network and computation of its higher order derivatives. (pp. 1273– 1277). Proceedings of IEEE international conference on neural networks, Perth. Hirasawa, K., Ohbayashi, M., Fujita, F., & Koga, M. (1996). Universal learning networks. Transactions of the Institute of Electronic Engineers of Japan, 116-C (7), 794–801 in Japanese. Hirasawa, K., Ohbayashi, M., Koga, M., & Harada, M. (1996). Forward propagation Universal Learning Network. (pp. 353–358). Proceedings of IEEE international conference on neural networks, Washington, DC. Hirasawa, K., Misawa, J., Murata, J., Ohbayashi, M., & Hu, J. (1998). Clustering control of Chaos Universal Learning Network. (pp. 1482– 1487). Proceedings of IEEE conference on neural netwroks, Anchorage. Hirasawa, K., Wang, X., Murata, J., & Hu, J. (1998). Chaos control using maximum Lyapunov number of Universal Learning Network. (pp. 1702–1707). Proceedings of IEEE international conference on systems, man, cybernetics, San Diego. Hopfield, J. J. (1982). Neural networks and physical systems with emergent

253

collective computational abilities. Proceedings of National Academy of Science (USA),, 2554–2558. Ikeda, K., & Matsumoto, K. (1987). High-dimensional chaotic behavior in systems with time-delay feedback. Physica D, 29, 223–235. Isermann, R., Ernst, S., & Nelles, O. (1997). Identification with dynamic neural networks—architecture, comparisons, applications. (pp. 997– 1022). Proceedings of the 11th IFAC symposium on system identification, Kitakyushu, 3. . Jang, J. S. R., & Sun, C. T. (1995). Neuro-fuzzy modeling and control. Proceedings of the IEEE, 83 (3), 378–406. Kalman, R. E. (1956). Nonlinear aspect of sampled-data control systems. (pp. 273–313). Proceedings on nonlinear circuit analysis. Kocaver, L., & Ogorzalek, M. (1993). Transition in dynamical regime by driving—a method of control and synchronization of chaos. International Journal of Bifurcation and Chaos, 3, 479–483. Lin, E. T., Dayhoff, J. E., & Ligomenides, P. A. (1995). Trajectory production with the adaptive time-delay neural network. Neural networks, 8 (3), 447–461. McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5, 115–133. Moody, J., & Darken, C. J. (1989). Fast learning in networks of locally tuned processing units. Neural Computation, 1, 281–294. Ogorzalek, M. (1994). Chaos control—how to avoid chaos or take advantage of it. Journal of the Franklin Institute, 331B (6), 681–704. Ohbayashi, M., Hirasawa, K., Hashimoto, M., & Murata, J. (1996). Robust control using second order derivative of Universal Learning Network. (pp. 1184–1189). Proceedings of IEEE international conference on systems, man, cybernetics, Beijing. Ohbayashi, M., Hirasawa, K., Murata, J., & Harada, M. (1996). Robust learning control using Universal Learning Network. (pp. 2208–2213). Proceedings of IEEE international conference on neural networks, Washington, DC. Ott, E., Grebogi, C., & Yorke, J. A. (1990). Controlling chaos. Physics Review Letters, 64, 1196–1199. Pyragas, K. (1992). Continuous control of chaos by self-controlling feedback. Physics Letters A, 170, 421–428. Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, 65, 386–408. Rumelhart, D. E., & McClelland, J. L. (1986). Parallel distributed processing, Cambridge, MA: MIT Press. Shimizu, K. (1994). Theory and computational method of optimal control, Tokyo: Korona in Japanese. Shinbrot, T., Grebogi, C., Ott, E., & York, J. A. (1993). Using small perturbation to control chaos. Nature, 362 (3), 411–417. Wang, Y. J., & Lin, C. T. (1998). A second order learning algorithm for multilayer networks based on block Hessian matrix. Neural Networks, 11, 1607–1622. Werbos, P. (1974). Beyond regression: new tools for prediction and analysis in the behavior science. Unpublished doctoral dissertation, Harvard University. Williams, R. J., & Zipser, D. (1989). A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1 (2), 270–280. Williams, R.J., Zipser, D. (1990). Gradient-based learning algorithms for recurrent connectionist networks, College of Computer Science Technical Report No. NU-CCS-90-9, Northeastern University.