Copyright © 2001 IFAC IFAC Conference on New Technologies for Computer Control 19-22 November 200 I, Hong Kong
NEW LEARNING ALGORITHM OF NEURO-FUZZY-NETWORK 1
Zhen-Iei Wang*
Jian-hui Wang*
Shu-sheng Gu*
(*School o/Information Science and Engineering. Northeastern University. Box 131, Shenyang, ll0004,China)
Abstract: Training of neuro-fuzzy-networks by conventional error backpropagation methods introduces considerable computational complexities due to the need for gradient evaluations. In this paper, the concepts coming from the theory of stochastic learning automaton are used. This method eliminates the need for computation of gradients and hence affords a very simple implementation, particularly for implementation on low-end platforms such as personal computers. And the neuro-fuzzynetwork training by a learning automaton approach is applied to a nonlinear multi variable system-the three-tank-system. The simulation result is given. CopyrightlO 200IIFAC
Keywords: neuro-fuzzy-network, stochastic learning automaton, nonlinear multivariable system, and error backpropagation
1.
INTRODUCTION
proved very popular for static feedforward networks. Later the GAs is used to train the neuro-fuzzy-network (Wang and Gu, 2000). The computation of the required updating equations for achieving desired dynamical behavior can however be very tedious. The importance of bypassing the computation of gradients is clearly evident. A new learning scheme based on the theory of stochastic learning automaton is posed in the paper.
Neuro-fuzzy-network is used in diverse areas. The universal approximation of the neuro-fuzzy-network was proved by Wang (1995). And the applications in the control and identification c f complex systems are posed in many papers (Horikawa, et aI, 1992; Ishibuchi, et aI, 1993; Sundareshan and Condarcure, 1998; Zhang and Morris, 1999). Despite the wide uses of the neuro-fuzzy-network, a major problem in a successful deployment of the network is the complexity in training. Earlier procedures attempt to extend the backpropagation approach that has been
Research on learning automaton dates back to the early work of Tsetin (1962) and has been developed since then by a number of others in various contexts (Lakshmivarahan, 1981; Narendra and Thathachar, 1989). Popularly investigated versions of these ideas, known as reinforcement learning methods, have been used in studies of animal learning. Reinforcement learning is based on the intuitive idea that if an action
ISupported by Liaoning Province Science and Technology Foundation (No. 002011)
614
on a system results in an improvement in the state of the system, then the probability of repeating that action is strengthened.
are m
u
k
=TIJ.1
A,'
(3)
(ZJ
;=1
and
In comparison with more popularly used neurofuzzy-network training procedures, reinforcement learning is bounded between supervised learning methods and unsupervised learning methods (such as competitive learning algorithm which don't require a knowledge of external signals for the evolution of the learning process). In order to guide the training in the proper direction, reinforcement learning methods use a scalar value or an indicator function that indicates whether the chosen inputs (or actions) have steered the network in the direction of accomplishing the learning task.
v
=[(IJ.1
C(f3 1, f3 2) = f31 1-1J f3i
where
f3;
is k
N
Ij (i) =
x=
{XI
it
···xn}T
l:onverts
the
to
Z = {ZI ···zm}T according
the
layer4
~=1
j=1
lm (6),
'
I, 85[I1.u A' (ZJ] k=1
1J 1J +
I
1-1J+1J m
'
where N is the number of the rules, j
=1,2, ... , p ,
= 1,2, . . . ,m. ..........--..... _.._......_._...__._..........._..........- ._.?:"!!...__........ Layer6
input vector
Z= W
to
of
m
k=~
The neuro-fuzzy-network used in this paper is the modification of the network proposed by Zhang and Abraham (1998). Figure I is the general architecture of the network. Layer I is called input layer. { XI • .. XII} is the input of the network. Layer2 is and
output
k
I,b585 [I1.u At (Zj )]I-
ARCHITECTURE OF THE NEURO-FUZZYNETWORK
layer,
the
(5),
(f3; equals u or v ) and 77 E [0, 1]. Layer6 is the defuzzification layer. The operation in this layer is
i
dealing
(4)
A,' (Z;)r"
where layer input vector is Z=(zl'z2,···,zm). LayerS is the compensatory operation layer. The operation is
An outline of the paper is as follows. In Section 2, we shall show the architecture of the neuro-fuzzynetwork. In Section 3, the training procedure is given in detail. In Section 4, we design controller for a nonlinear multivariable system using the neurofuzzy-network. Section 5 is the conclusion.
2.
k
x,
where
WE Rmxn . Layer2 is used to get the prompt input to Layer3 . Layer3 is the fuzzification layer. The operation is:
2] J.1 A,' (x;) =exp[ - ( Z (J_at ;k I
I
(1)
)
where a;* is the center of the fuzzy membership function and (J;* is the width value of the fuzzy membership function. The fuzzy membership function of output is defmed as
.u.; (y)~exp[-( In (1) and (2),
U; and
\;b; J]
A;* and
Fig. 1 Architecture of the modified neuraJ-fuzzy-network
(2)
3.
BJ are fuzzy sets in
3.1 Basic Updating Rules for Learning Automaton
eR and Vj eR, respectively, and x; E U; Yj
E
Vj
are
lingUIstic
variables
TRAINING PROCEDURE
A learning automaton interacts adaptively with the environment it operates in and updates its actions at each stage based on the response of the environment to these actions. Hence an automaton can be defined
for
i =1,2, ... , m and j = 1,2, ... , p. Layer4 is the pessimistic-optimistic-operation layer. The operations
615
pj (n + 1) = Pj (n) - Yj (p(n» Whereas if 13 =0 , then
by the triple (a,p, T). And a denotes the set of actions
a ={a p a 2 , ... ,a r } available
to
the
automaton at any stage, p = {f3p 132"" 13m} is the set of observed responses from the environment, which are used by the automaton as inputs and T is an updating algorithm which the automaton uses for selecting a particular action from the set a at any stage.
j
(8)
1
pj(n+ 1)= pj(n)+-Oj(p(n», r -1 Where I p j(n)= Ip,(n+l)=l ;=1
; =1
The functions Yj (.), Dj (.) are appropriately selected continuous-valued nonnegative functions. Reinforcement Algorithm
1 1 13
Error Criterion
"-
a
Learning Automaton
Environment Feedback
•
Plant
I
An alternative way of specifying the updating algorithm is to define a state vector for the automaton and consider the transition of the state due to a certain action, which enables one to state the updating rule in terms of state transition probabilities. This approach has been quite popular in the development of learning automaton theory (Varshavskii and Vorontsova, 1963). For our application to neuro-fuzzy-network training, however, the action probability updating approach, with the updating algorithms specified in the form of (7), provides a simple and convenient framework.
Action
.... Neuro-fuzzynetwork
Plant Feedback
+-
t
~. In put
rence
ENVIRONMENT Fig. 2 Learning configuration for neuro-fuzzy-network controller
3.2 Neuro-Juzzy-network Training The network training is a problem of nonlinear parameter optimization. And the parameters are changed in three directions-increment, decrement or keeping of the value. The parameters that need to be trained in neuro-fuzzy-network are
For execution of training, the feedback signal from the environment, which triggers the updating of the action probabilities by the automation, can be given by specifying an appropriate "error" function. The environmental response set f3 (n) at any stage n
{a:
,b; ,O" jk,0;,
W jj
,1J}. The automaton actions are
defmed as either an increment or a decrement to any of the network parameters {a:
={O,I},
,b; ,O"jk,0; ,
W jj
,1J} .
If the number of the parameters is N , this corresponds to a set of 2N single parameter updating actions.
with 13 =0 indicating that the action selected is not considered satisfactory by the environment and 13 = 1 indicating that the action selected is considered satisfactory. For a stochastic automaton with r available actions, the updating rules can then be specified in a general form as follows .
The environment for this learning configuration comprises of the neuro-fuzzy-network itself with an appropriately specified error functional E defined over the time interval [to,tf] by
For the selected action at the n rh stage a(n) =a j' if
~
p/n + 1) = Pj(n) - Dj (p(n»
For a stochastic learning automaton, the updating algorithm specifies a rule for adjusting the probability p j (n) of choosing a particular action a j at stage n . A functional relation of the form may generally describe such a rule: p j(n + I) = F(pj (n), a(n), f3(n» (7) The learning procedure at each stage hence consists of two sequential steps. In the first step the automaton chooses a specific , ction a(n) =a j from the fmite set of actions available. And in the second step, the probabilities of choosing the actions are updated depending on the response of the environment to the action in the first step, which influences the choice of future actions.
can then be selected as the binary set f3(n)
for i
E
13 =1 then
=L iEK
Where
p/n+l)=p/n)+ IYj(p(n»
r
1(
f(Yj (t), Y: (t»dt
denotes the set of designated output nodes
Y:
of the network and signals. In this
i =1 i~j
616
(10)
0
denote the desired output paper, we adopt
f (y;, y~ ) = !y; - y;d! E.
B=
The (7) and (8) is special in the training neuro-fuzzynetwork. Suppose E(n) and E(n + 1) are the errors in present and in next step after one of actions (without losing general, e.g. i th action) is used, if f3 =0, then (j = I/(l + exp( -'l", (E(n + I) - E(n)) / E(n))))- 0.47,
pj(n+I)=pj(n)+~Pi(n) r-I p;(n+l)=(1-8)p;(n) .
If
[
tanks
c =[
~ ~l 0
(Unit:
h, liqu;d level,
Q,o = aoSn (2gh,)"2 ,
cm),
Q)) = a,S. sgn(h, - h3)(2glh, - h31)"2 , Q32 =a 3S. Sgn(h3 - hJ(2glh3 - h21)"2, Q20 = a 2S n (2ghJ 11 2 , Q"Q2: supplying flow rates [cm3/sec], S=150: Section of cylinder (cm2), Sn =O.5:section ofleak opening (cm2), g =9.8rn/s 2 is
(j:t;i)
earth acceleration,
sgn(x)
is the sign of the
argument x; a; is the out-flow coefficient (correcting factor, dimensionless, real values ranging from 0 to I).
y = I/(l +exp( -'l"2 (E(n)- E(n+ 1))/ E(n))))- 0.47 ,
P / n + 1) =(l - y) p j (n) (j:t; i) p;(n+ 1) = p;(n)+ y(l- p;(n» 'l",
~~ ~
of the
f3 =1 , then
where
l
A(H)=(-Q13 -Q,o,Q32 -Q20,Q'3 -Q32)T IS,
to defme the error functional
> 1, 'l" 2 > 1.
!6o....-_____
Tricking Perlormlnce 01 Nelv.oR Controller
~----~
N~
A probability of selection is initially assigned to each action. At first, a uniform distribution is used at the beginning for the action probabilities. As learning progresses, the probability associated with each action is changed. The more successful a particular action is at reducing the error, the more likely its selection will be in the future stages.
~ 1 .5
• 40
I-
o
~
~ 0 .5
, ~
3
00IL--50--'-00--'-50---1200
'"
O~_ _~A~~_L!'~~
o
Time In Seconds(s)
~
150
200
10 ,...-_ _ _ _ _ _- ,
C'Oe
8
"g
6
o
w
!... ,
'0
4
g
2
'"
O~~_~~~~u
;
20
~
:J
The control and modeling of nonlinear multi variable systems play a more and more important part in the scope of the advancing automation of technical processes. Due to the ever increasing requirements of process control (e.g. response time, precision, transfer behavior) non linear controller designs are necessary (Zhang and Abraham, 1998). But the traditional control designs need the precise mathematics model of the nonlinear system. This restricts the application of the nonlinear controller in industry. We adopt the modified neuro-fuzzy-network as the controller of non linear multivariable systemthe three-tank-system. It does not need the precise mathematics model. The learning configuration for neuro-fuzzy-network controller is show in Figure 2.
100
60
~
APPLICATION IN THE COMPLEX SYSTEM CONTROL
SO
S~=.~tO~Ee~~~~~~?D
TrIcking Perform ence of P ID
~ 40
4.
1
m
0 1£.....--------1 0 SO 100 150 200
0
hn. in Seconds(s)
SO Tim e
100 In
150
200
SecondS(s)
Fig. 3 Tracking Perfonnance of Neuro-fuzzy-network Controller and PID
Squire ofEuor
Robust 01 Network C ontrOfher
E60~---------~
50
1 40
" ~
"~
:; 40
I-
.
o
30
w
'020
~ 10
'" 100
150
0
~ 40
. i
I-
~
Tie. in Sac:oncts(,)
Fig. 4 Disturbance Experiment Re5ult
Y=CH Where 4.2 The Experiment Result
617
1 40
"2
30
~ to
~ °O~--50---'-00---1S-0--~ 200
Q = (Q" Q2)T,
200
so
.
,
dH Idt=A(H,t)+BQ
'SO
'0 20
20
~
The equations of the three-tank-system can be written as:
'00
W
'0
4.1 Equation of the nonlinear system
50
S~:.~:~rE·~~~:~~SlD
ttobustofPIt) System
E60....------------~
\
0
200
Tim e in Setonds(s)
'"
0 0
50
'OD
'SO
Ti. e in Sec:onOs(s)
200
algorithm. IEEE Trans.On Neural Networks, Vol. 3, pp. 801-806. Ishibuchi, H., et al (1993), Learning of fuzzy neural networks from fuzzy inputs and fuzzy targets. Proc. 5th IFSA World Congr. , vol. 1, pp. 147-150. Lakshmivarahan, S. (1981). Learning Algorithms: Theory and Applications, Springer-Veriag, New York. Narendra, K. S., M. A. L. Thathachar (1989). Learning Automata: An Introduction, Englewood Cliffs, NJ: Prentice-Hall. Rescorla, R. A., A. R. Wagner (1972). A theory of Pavlovian conditioning: Variations in the of reinforcement and effectiveness nonreinforcement. Classical Conditioning 11: Current Research and Theory, A. H. Black and W. R. Prokasy, Eds. New York: AppletonCentury-Crofts. Rumelhart, D. E., et al (1986). Learning internal representations by error propagation. Parallel
In the section, we show the tracking performance and robust performance of the neuro-fuzzy-network control system. And compare the control effect with the traditional PID. The initiative states and the of the tanks are outflow coefficients
hI
= h2 = h3 = 0 ,
respectively. T
Q
o
= 1 = 2 = 3 = 0.3 , Q
Q
Q
= O.Ss is the sample time.
And the reference signals are given in (I I) and (12): r.(k + I) = r.(k) + 2.5 (k < 20 I)
r, (k) = r, (201) + 3 * sin(27r(k - 1)/100)
(11)
(200 < k < 401) r2 (k+I)=r 2 (k)+2 (k<201)
r2 (k) = r2 (201) + 2 * sin(27r(k -I) 11 00)
(12)
(200 < k < 401) The tracking performances delivered by neuro-fuzzynetwork controller and PID are shown in Figure 3.
Distributed Processing: Explorations in the Microstructure of Cognition, D. E. Rumelhart
In disturbance experiment, keep the value of the reference signals, change the liquid levels of tankl and tank2 manually (Liquid in tank 1 leaks from the leak clique and add some water to tank2.), which will cause disturbances to the system. The experiment result is shown in Figure 4.
5.
and J. L. McClelland, Eds. Cambridge, MA: MIT Press, pp.45-76. Sundareshan, M.K., T. A. Condarcure (1998). Recurrent neural network training by a learning automation approach for trajectory learning and control system design. IEEE Trans. On Neural Network, vol. 9, pp. 354-368. Tsetlin, M. (1962). On the behavior of finite automata in random media. Automatic Remote Control, vo1.22, pp.121O-1219. Varshavskii, V. 1., 1. P. Vorontsova (1963). On the behavior of stochastic automata with variable structure. Automatic Remote Control, vol. 24, pp.327-333. Wang, L.X. (1995). Adaptive Fuzzy Logic System and Control-Designs and Stability Analysis, pp.242246, State Defence Industry Press, Beijing . Wang, Z. L., S. S. Gu (2000). FNN identifier based on real-valued genetic algorithms. Journal of Northeastern University (Nature Science), vol. 21, pp. 354-356. Zhang, J., A. J. Morris (1999). Recurrent Ne~o Fuzzy Networks for Nonlinear Process Modelmg. IEEE Trans. On Neural Network, vol. 10, pp. 313-326. Zhang Y. Q., K. Abraham (1998). Compensat~ry neurofuzzy systems with Fast learnmg algorithms. IEEE Trans. On Neural Network, vol. 9, pp. 83-105.
CONCLUSION
In the paper, the feasibility of neuro-fuzzy-network with a learning automaton approach is demonstrated. The learning automaton approach enables the neurofuzzy-network to gain experience in the operating environment and be trained based on the experience. The principal advantage of this learning algorithm is that it requires no complex computations (such as gradient evaluations). So the algorithm affords a simple implementation. The efficiency of the training approach was demonstrated in the design of controllers for nonlinear multi variable plant. The simple design and better control performances show its ability in the complex dynamical plant control.
REFERENCES Horikawa, S., et al (1992). On fuzzy modeling using fuzzy neural networks with the backpropagation
618