New Learning Algorithm of Neuro-fuzzy-network

New Learning Algorithm of Neuro-fuzzy-network

Copyright © 2001 IFAC IFAC Conference on New Technologies for Computer Control 19-22 November 200 I, Hong Kong NEW LEARNING ALGORITHM OF NEURO-FUZZY-...

494KB Sizes 0 Downloads 52 Views

Copyright © 2001 IFAC IFAC Conference on New Technologies for Computer Control 19-22 November 200 I, Hong Kong

NEW LEARNING ALGORITHM OF NEURO-FUZZY-NETWORK 1

Zhen-Iei Wang*

Jian-hui Wang*

Shu-sheng Gu*

(*School o/Information Science and Engineering. Northeastern University. Box 131, Shenyang, ll0004,China)

Abstract: Training of neuro-fuzzy-networks by conventional error backpropagation methods introduces considerable computational complexities due to the need for gradient evaluations. In this paper, the concepts coming from the theory of stochastic learning automaton are used. This method eliminates the need for computation of gradients and hence affords a very simple implementation, particularly for implementation on low-end platforms such as personal computers. And the neuro-fuzzynetwork training by a learning automaton approach is applied to a nonlinear multi variable system-the three-tank-system. The simulation result is given. CopyrightlO 200IIFAC

Keywords: neuro-fuzzy-network, stochastic learning automaton, nonlinear multivariable system, and error backpropagation

1.

INTRODUCTION

proved very popular for static feedforward networks. Later the GAs is used to train the neuro-fuzzy-network (Wang and Gu, 2000). The computation of the required updating equations for achieving desired dynamical behavior can however be very tedious. The importance of bypassing the computation of gradients is clearly evident. A new learning scheme based on the theory of stochastic learning automaton is posed in the paper.

Neuro-fuzzy-network is used in diverse areas. The universal approximation of the neuro-fuzzy-network was proved by Wang (1995). And the applications in the control and identification c f complex systems are posed in many papers (Horikawa, et aI, 1992; Ishibuchi, et aI, 1993; Sundareshan and Condarcure, 1998; Zhang and Morris, 1999). Despite the wide uses of the neuro-fuzzy-network, a major problem in a successful deployment of the network is the complexity in training. Earlier procedures attempt to extend the backpropagation approach that has been

Research on learning automaton dates back to the early work of Tsetin (1962) and has been developed since then by a number of others in various contexts (Lakshmivarahan, 1981; Narendra and Thathachar, 1989). Popularly investigated versions of these ideas, known as reinforcement learning methods, have been used in studies of animal learning. Reinforcement learning is based on the intuitive idea that if an action

ISupported by Liaoning Province Science and Technology Foundation (No. 002011)

614

on a system results in an improvement in the state of the system, then the probability of repeating that action is strengthened.

are m

u

k

=TIJ.1

A,'

(3)

(ZJ

;=1

and

In comparison with more popularly used neurofuzzy-network training procedures, reinforcement learning is bounded between supervised learning methods and unsupervised learning methods (such as competitive learning algorithm which don't require a knowledge of external signals for the evolution of the learning process). In order to guide the training in the proper direction, reinforcement learning methods use a scalar value or an indicator function that indicates whether the chosen inputs (or actions) have steered the network in the direction of accomplishing the learning task.

v

=[(IJ.1

C(f3 1, f3 2) = f31 1-1J f3i

where

f3;

is k

N

Ij (i) =

x=

{XI

it

···xn}T

l:onverts

the

to

Z = {ZI ···zm}T according

the

layer4

~=1

j=1

lm (6),

'

I, 85[I1.u A' (ZJ] k=1

1J 1J +

I

1-1J+1J m

'

where N is the number of the rules, j

=1,2, ... , p ,

= 1,2, . . . ,m. ..........--..... _.._......_._...__._..........._..........- ._.?:"!!...__........ Layer6

input vector

Z= W

to

of

m

k=~

The neuro-fuzzy-network used in this paper is the modification of the network proposed by Zhang and Abraham (1998). Figure I is the general architecture of the network. Layer I is called input layer. { XI • .. XII} is the input of the network. Layer2 is and

output

k

I,b585 [I1.u At (Zj )]I-

ARCHITECTURE OF THE NEURO-FUZZYNETWORK

layer,

the

(5),

(f3; equals u or v ) and 77 E [0, 1]. Layer6 is the defuzzification layer. The operation in this layer is

i

dealing

(4)

A,' (Z;)r"

where layer input vector is Z=(zl'z2,···,zm). LayerS is the compensatory operation layer. The operation is

An outline of the paper is as follows. In Section 2, we shall show the architecture of the neuro-fuzzynetwork. In Section 3, the training procedure is given in detail. In Section 4, we design controller for a nonlinear multivariable system using the neurofuzzy-network. Section 5 is the conclusion.

2.

k

x,

where

WE Rmxn . Layer2 is used to get the prompt input to Layer3 . Layer3 is the fuzzification layer. The operation is:

2] J.1 A,' (x;) =exp[ - ( Z (J_at ;k I

I

(1)

)

where a;* is the center of the fuzzy membership function and (J;* is the width value of the fuzzy membership function. The fuzzy membership function of output is defmed as

.u.; (y)~exp[-( In (1) and (2),

U; and

\;b; J]

A;* and

Fig. 1 Architecture of the modified neuraJ-fuzzy-network

(2)

3.

BJ are fuzzy sets in

3.1 Basic Updating Rules for Learning Automaton

eR and Vj eR, respectively, and x; E U; Yj

E

Vj

are

lingUIstic

variables

TRAINING PROCEDURE

A learning automaton interacts adaptively with the environment it operates in and updates its actions at each stage based on the response of the environment to these actions. Hence an automaton can be defined

for

i =1,2, ... , m and j = 1,2, ... , p. Layer4 is the pessimistic-optimistic-operation layer. The operations

615

pj (n + 1) = Pj (n) - Yj (p(n» Whereas if 13 =0 , then

by the triple (a,p, T). And a denotes the set of actions

a ={a p a 2 , ... ,a r } available

to

the

automaton at any stage, p = {f3p 132"" 13m} is the set of observed responses from the environment, which are used by the automaton as inputs and T is an updating algorithm which the automaton uses for selecting a particular action from the set a at any stage.

j

(8)

1

pj(n+ 1)= pj(n)+-Oj(p(n», r -1 Where I p j(n)= Ip,(n+l)=l ;=1

; =1

The functions Yj (.), Dj (.) are appropriately selected continuous-valued nonnegative functions. Reinforcement Algorithm

1 1 13

Error Criterion

"-

a

Learning Automaton

Environment Feedback



Plant

I

An alternative way of specifying the updating algorithm is to define a state vector for the automaton and consider the transition of the state due to a certain action, which enables one to state the updating rule in terms of state transition probabilities. This approach has been quite popular in the development of learning automaton theory (Varshavskii and Vorontsova, 1963). For our application to neuro-fuzzy-network training, however, the action probability updating approach, with the updating algorithms specified in the form of (7), provides a simple and convenient framework.

Action

.... Neuro-fuzzynetwork

Plant Feedback

+-

t

~. In put

rence

ENVIRONMENT Fig. 2 Learning configuration for neuro-fuzzy-network controller

3.2 Neuro-Juzzy-network Training The network training is a problem of nonlinear parameter optimization. And the parameters are changed in three directions-increment, decrement or keeping of the value. The parameters that need to be trained in neuro-fuzzy-network are

For execution of training, the feedback signal from the environment, which triggers the updating of the action probabilities by the automation, can be given by specifying an appropriate "error" function. The environmental response set f3 (n) at any stage n

{a:

,b; ,O" jk,0;,

W jj

,1J}. The automaton actions are

defmed as either an increment or a decrement to any of the network parameters {a:

={O,I},

,b; ,O"jk,0; ,

W jj

,1J} .

If the number of the parameters is N , this corresponds to a set of 2N single parameter updating actions.

with 13 =0 indicating that the action selected is not considered satisfactory by the environment and 13 = 1 indicating that the action selected is considered satisfactory. For a stochastic automaton with r available actions, the updating rules can then be specified in a general form as follows .

The environment for this learning configuration comprises of the neuro-fuzzy-network itself with an appropriately specified error functional E defined over the time interval [to,tf] by

For the selected action at the n rh stage a(n) =a j' if

~

p/n + 1) = Pj(n) - Dj (p(n»

For a stochastic learning automaton, the updating algorithm specifies a rule for adjusting the probability p j (n) of choosing a particular action a j at stage n . A functional relation of the form may generally describe such a rule: p j(n + I) = F(pj (n), a(n), f3(n» (7) The learning procedure at each stage hence consists of two sequential steps. In the first step the automaton chooses a specific , ction a(n) =a j from the fmite set of actions available. And in the second step, the probabilities of choosing the actions are updated depending on the response of the environment to the action in the first step, which influences the choice of future actions.

can then be selected as the binary set f3(n)

for i

E

13 =1 then

=L iEK

Where

p/n+l)=p/n)+ IYj(p(n»

r

1(

f(Yj (t), Y: (t»dt

denotes the set of designated output nodes

Y:

of the network and signals. In this

i =1 i~j

616

(10)

0

denote the desired output paper, we adopt

f (y;, y~ ) = !y; - y;d! E.

B=

The (7) and (8) is special in the training neuro-fuzzynetwork. Suppose E(n) and E(n + 1) are the errors in present and in next step after one of actions (without losing general, e.g. i th action) is used, if f3 =0, then (j = I/(l + exp( -'l", (E(n + I) - E(n)) / E(n))))- 0.47,

pj(n+I)=pj(n)+~Pi(n) r-I p;(n+l)=(1-8)p;(n) .

If

[

tanks

c =[

~ ~l 0

(Unit:

h, liqu;d level,

Q,o = aoSn (2gh,)"2 ,

cm),

Q)) = a,S. sgn(h, - h3)(2glh, - h31)"2 , Q32 =a 3S. Sgn(h3 - hJ(2glh3 - h21)"2, Q20 = a 2S n (2ghJ 11 2 , Q"Q2: supplying flow rates [cm3/sec], S=150: Section of cylinder (cm2), Sn =O.5:section ofleak opening (cm2), g =9.8rn/s 2 is

(j:t;i)

earth acceleration,

sgn(x)

is the sign of the

argument x; a; is the out-flow coefficient (correcting factor, dimensionless, real values ranging from 0 to I).

y = I/(l +exp( -'l"2 (E(n)- E(n+ 1))/ E(n))))- 0.47 ,

P / n + 1) =(l - y) p j (n) (j:t; i) p;(n+ 1) = p;(n)+ y(l- p;(n» 'l",

~~ ~

of the

f3 =1 , then

where

l

A(H)=(-Q13 -Q,o,Q32 -Q20,Q'3 -Q32)T IS,

to defme the error functional

> 1, 'l" 2 > 1.

!6o....-_____

Tricking Perlormlnce 01 Nelv.oR Controller

~----~

N~

A probability of selection is initially assigned to each action. At first, a uniform distribution is used at the beginning for the action probabilities. As learning progresses, the probability associated with each action is changed. The more successful a particular action is at reducing the error, the more likely its selection will be in the future stages.

~ 1 .5

• 40

I-

o

~

~ 0 .5

, ~

3

00IL--50--'-00--'-50---1200

'"

O~_ _~A~~_L!'~~

o

Time In Seconds(s)

~

150

200

10 ,...-_ _ _ _ _ _- ,

C'Oe

8

"g

6

o

w

!... ,

'0

4

g

2

'"

O~~_~~~~u

;

20

~

:J

The control and modeling of nonlinear multi variable systems play a more and more important part in the scope of the advancing automation of technical processes. Due to the ever increasing requirements of process control (e.g. response time, precision, transfer behavior) non linear controller designs are necessary (Zhang and Abraham, 1998). But the traditional control designs need the precise mathematics model of the nonlinear system. This restricts the application of the nonlinear controller in industry. We adopt the modified neuro-fuzzy-network as the controller of non linear multivariable systemthe three-tank-system. It does not need the precise mathematics model. The learning configuration for neuro-fuzzy-network controller is show in Figure 2.

100

60

~

APPLICATION IN THE COMPLEX SYSTEM CONTROL

SO

S~=.~tO~Ee~~~~~~?D

TrIcking Perform ence of P ID

~ 40

4.

1

m

0 1£.....--------1 0 SO 100 150 200

0

hn. in Seconds(s)

SO Tim e

100 In

150

200

SecondS(s)

Fig. 3 Tracking Perfonnance of Neuro-fuzzy-network Controller and PID

Squire ofEuor

Robust 01 Network C ontrOfher

E60~---------~

50

1 40

" ~

"~

:; 40

I-

.

o

30

w

'020

~ 10

'" 100

150

0

~ 40

. i

I-

~

Tie. in Sac:oncts(,)

Fig. 4 Disturbance Experiment Re5ult

Y=CH Where 4.2 The Experiment Result

617

1 40

"2

30

~ to

~ °O~--50---'-00---1S-0--~ 200

Q = (Q" Q2)T,

200

so

.

,

dH Idt=A(H,t)+BQ

'SO

'0 20

20

~

The equations of the three-tank-system can be written as:

'00

W

'0

4.1 Equation of the nonlinear system

50

S~:.~:~rE·~~~:~~SlD

ttobustofPIt) System

E60....------------~

\

0

200

Tim e in Setonds(s)

'"

0 0

50

'OD

'SO

Ti. e in Sec:onOs(s)

200

algorithm. IEEE Trans.On Neural Networks, Vol. 3, pp. 801-806. Ishibuchi, H., et al (1993), Learning of fuzzy neural networks from fuzzy inputs and fuzzy targets. Proc. 5th IFSA World Congr. , vol. 1, pp. 147-150. Lakshmivarahan, S. (1981). Learning Algorithms: Theory and Applications, Springer-Veriag, New York. Narendra, K. S., M. A. L. Thathachar (1989). Learning Automata: An Introduction, Englewood Cliffs, NJ: Prentice-Hall. Rescorla, R. A., A. R. Wagner (1972). A theory of Pavlovian conditioning: Variations in the of reinforcement and effectiveness nonreinforcement. Classical Conditioning 11: Current Research and Theory, A. H. Black and W. R. Prokasy, Eds. New York: AppletonCentury-Crofts. Rumelhart, D. E., et al (1986). Learning internal representations by error propagation. Parallel

In the section, we show the tracking performance and robust performance of the neuro-fuzzy-network control system. And compare the control effect with the traditional PID. The initiative states and the of the tanks are outflow coefficients

hI

= h2 = h3 = 0 ,

respectively. T

Q

o

= 1 = 2 = 3 = 0.3 , Q

Q

Q

= O.Ss is the sample time.

And the reference signals are given in (I I) and (12): r.(k + I) = r.(k) + 2.5 (k < 20 I)

r, (k) = r, (201) + 3 * sin(27r(k - 1)/100)

(11)

(200 < k < 401) r2 (k+I)=r 2 (k)+2 (k<201)

r2 (k) = r2 (201) + 2 * sin(27r(k -I) 11 00)

(12)

(200 < k < 401) The tracking performances delivered by neuro-fuzzynetwork controller and PID are shown in Figure 3.

Distributed Processing: Explorations in the Microstructure of Cognition, D. E. Rumelhart

In disturbance experiment, keep the value of the reference signals, change the liquid levels of tankl and tank2 manually (Liquid in tank 1 leaks from the leak clique and add some water to tank2.), which will cause disturbances to the system. The experiment result is shown in Figure 4.

5.

and J. L. McClelland, Eds. Cambridge, MA: MIT Press, pp.45-76. Sundareshan, M.K., T. A. Condarcure (1998). Recurrent neural network training by a learning automation approach for trajectory learning and control system design. IEEE Trans. On Neural Network, vol. 9, pp. 354-368. Tsetlin, M. (1962). On the behavior of finite automata in random media. Automatic Remote Control, vo1.22, pp.121O-1219. Varshavskii, V. 1., 1. P. Vorontsova (1963). On the behavior of stochastic automata with variable structure. Automatic Remote Control, vol. 24, pp.327-333. Wang, L.X. (1995). Adaptive Fuzzy Logic System and Control-Designs and Stability Analysis, pp.242246, State Defence Industry Press, Beijing . Wang, Z. L., S. S. Gu (2000). FNN identifier based on real-valued genetic algorithms. Journal of Northeastern University (Nature Science), vol. 21, pp. 354-356. Zhang, J., A. J. Morris (1999). Recurrent Ne~o­ Fuzzy Networks for Nonlinear Process Modelmg. IEEE Trans. On Neural Network, vol. 10, pp. 313-326. Zhang Y. Q., K. Abraham (1998). Compensat~ry neurofuzzy systems with Fast learnmg algorithms. IEEE Trans. On Neural Network, vol. 9, pp. 83-105.

CONCLUSION

In the paper, the feasibility of neuro-fuzzy-network with a learning automaton approach is demonstrated. The learning automaton approach enables the neurofuzzy-network to gain experience in the operating environment and be trained based on the experience. The principal advantage of this learning algorithm is that it requires no complex computations (such as gradient evaluations). So the algorithm affords a simple implementation. The efficiency of the training approach was demonstrated in the design of controllers for nonlinear multi variable plant. The simple design and better control performances show its ability in the complex dynamical plant control.

REFERENCES Horikawa, S., et al (1992). On fuzzy modeling using fuzzy neural networks with the backpropagation

618