Reinforcement Adaptive Fuzzy Control for a Class of Nonlinear Uncertain Systems

Reinforcement Adaptive Fuzzy Control for a Class of Nonlinear Uncertain Systems

Copyright © 2001 IFAC IFAC Conference on New Technologies for Computer Control 19-22 November 2001, Hong Kong Reinforcement Adaptive Fuzzy Control fo...

635KB Sizes 2 Downloads 60 Views

Copyright © 2001 IFAC IFAC Conference on New Technologies for Computer Control 19-22 November 2001, Hong Kong

Reinforcement Adaptive Fuzzy Control for a Class of Nonlinear Uncertain Systems *Young H. Kim, **Frank L. Lewis, and ***Chiman Kwan

* Weapons Systems Studies Center Korea Institute for Defense Analyses Cheong Ryang P.O. Box 250 Seoul, 130-650 Republic of Korea

** ARRI ***IAI, Inc. The University of Texas 7519 Standish place Suite 200 7300 Jack Newell Blvd. S. Fort Worth, TX 76118 Rockville, MD 20855 USA USA

Abstracts The paper describes the application of reinforcement learning techniques to feedback control for a class of nonlinear systems using adaptive fuzzy system. The adaptive fuzzy system approximates the non linear characteristics of the plant on-line to achieve the desired tracking accuracy without any preliminary off-line learning or training phase. The reinforcement signal from a critic is used for finding proper fuzzy rules to approximate the non linear dynamics in the plant. The approximation of the non linear dynamics in the plant is performed on-line without knowing any knowledge of the nonlinear terms in the plant, whence comes the term Reinforcement Adaptive Learning (RAL). The fuzzy system based on the RAL algorithm is referred to as Reinforcement Adaptive Fuzzy (RAF) system. The RAL algorithm is derived from Lyapunov stability analysis, so that both tracking stability and error convergence can be guaranteed in the closed-loop system. Key Words: Closed-loop control, Feedback control, Intelligent control, Nonlinear dynamic system, Fuzzy system

I. Introduction The linear conventional controller, such as proportional derivative and proportional integral, is widely used in industrial motion control systems due to simple control structure, ease of design, inexpensive cost and robustness in the face of plant nonlinear terms. Nevertheless, the conventional control algorithm might be difficult to deal with systems with highly non linear characteristics. To date, many sophisticated algorithms have been used to help the linear conventional controller under the highly non linear and uncertain environment [9]. One

of the sophisticated algorithms is to employ artificial neural networks or fuzzy systems to deal with the nonlinear terms and uncertainties in the plant [2, 3, 6, 8]. Fuzzy control has recently found extensive applications for a wide variety of industrial systems and has attracted the attention of many control researchers due to its model-free approach [6]. An important adaptive fuzzy control system has been developed to incorporate with the human expert information systematically [10]. An adaptive fuzzy

197

of the plant, and x(t) = [xi xI ... x~ f is the state vector of the plant which is assumed to be available for measurement. And d(t) is the unknown external disturbance.

system is a fuzzy logic system equipped with a learning algorithm, in which the fuzzy logic system is constructed from a collection of fuzzy IF-THEN rules, and the learning algorithm adjusts the parameters of the fuzzy logic system according to the given numerical information.

The control objective can be described as follows: given a desired trajectory xd (t) = [x~ x~ x~··.f , find a control action u(t) such that the plant follows the desired trajectory with an acceptable accuracy.

A new adaptive fuzzy control algorithm is proposed in this paper so that the stability of adaptive fuzzy control system is guaranteed and the new design methodology of a reinforcement learning techniques is shown to be model-free and intelligent in the feedback control. The proposed reinforcement control scheme consists of a performance evaluator, a critic, and an adaptive fuzzy system. The fuzzy system is exploited to provide the non linear and uncertain characteristics in the plant according to the reinforcement signal from a critic. The critic evaluates the effects of the plant performance of the state-dependent nonlinear terms or unpredictable uncertainties. The reinforcement signal is used for adjusting the parameters of the fuzzy logic system, such as fmding proper fuzzy rules or tuning membership functions. The learning is performed online without any preliminary analysis on the non linear characteristics appearing in the plant, whence comes the term Reinforcement Adaptive Learning (RAL). The RAL algorithm is derived from Lyapunov stability analysis, so that both system stability and error convergence can be guaranteed in the closed-loop system. Stability is the main objective of all control systems.

Define the tracking error vector as, e(t) = xd(t) - x(t) . (2.3) The problem is thus to design a controller u(t) which ensures that the tracking error should be as small as possible. In the rest of the paper, we show how to derive u(t) in terms of fuzzy intelligent mechanism to achieve the above control objective. .

Fi" 1. BiWdin, Blocks of Ad.ptivc Futty System

C. Fuzzy Systems as Function Approximator The basic configuration of an adaptive fuzzy system is shown in Fig. I. An adaptive fuzzy system can be described as a typical fuzzy logic system equipped with training or learning elements. The operations of each components can be found in [10]. Without loss of generality, we consider multi-input-single-output fuzzy system. A fuzzy system with the singleton fuzzifier, Gaussian membership functions, the product inference rule, the center average defuzzifier would be used in this paper. The output of such a fuzzy system with M inference rules is in the following form [10]

11. Preliminaries A. Stability: Nonlinear Dynamic Systems Given x(t) E 9\n and a nonlinear function h(x,t): Rn

X

R ~ Rn , the differential equation

x(t ) =h(x,t), has a differential solution continuous in x(t) and t.

to~t,

x(t)

(2.1)

if h(x,t)

is

The solution x(t) is said to b! Uniformly Ultimately Bounded (UUB) if there exists a compact set D such that, for all Xo E D , there is a, 8 > 0 and a number

t,WJ[IT ex{-[ X'~,CJ'

T(8, xo) such that IIx(t~1 S; 8 for all t? to + T.

B. A Class of Nonlinear Systems We assume the plant can be represented as the following nth-order non linear systems of the form XI (t) = x2 (t) x2 (t)

g(x; P) =

f[n j=1

=X3 (t)

ex [- [Xi p

i=1

L w/p/a(x; P

=g(x) + d(t) + u(t)

=

yet) = XI(t)

j ))

j =1

~M:-:------

LIP/a(x;pj))

where g(x) is an unknown linear or nonlinear

j=1

function, u(t) is the control input, y(t) is the output

198

Jl]

- Cji )2]] Sji

M

(2.2) xn (t)

.

;AdaptatioolLcamini', . Sipal '

(2.4)

where an

qJ(~(x;Pj))

the approximation property of the fuzzy system [10], a new reinforcement learning algorithm is developed on the basis of the Lyapunov theory, and the tracking stability is guaranteed in closed-loop control. The proposed scheme avoids the difficulty of getting the correct value for the fuzzy system output.

is the corresponding crisp output of

Wj

with unit membership function value.

The values x and gO, are the input and output to the fuzzy system, respectively. The vector P j is the set of all adjustable parameters for the Gaussian membership functions in the fuzzy system. The basis vector q>( ~(x; P j )) is defined as fuzzy basis functions (l0]. Since the fuzzy system (2.4) uses only the fuzzy IF-THEN rules for the purpose of approximating the nonlinear functions, we define the fuzzy system as the un-normalized form g(x;P) =

f WJ[rr J~

ex p[-

[Xi ;CJi ]21l

~1

fl

(2.5)

FaJ. 1.. Structure o(thc Propolc4 RAF comroUcr

M

=

I

Wjq> /O"(X; P j)).

B. Stability Analysis: Lyapunov Approach Firstly the performance evaluator is defined as output tracking error, i.e., r(t)=[AT I]e(t) (3.1)

J=l

The expression (2.5) can be written in a compact matrix form T (2 .6) g(x,P) = W q>(x,P) with We

Cj\MxN

and q>(x,P)e

Cj\MxN.

where A = P"l "'2 •.. "'n-d T is an appropriately chosen coefficient vector such that sn-1 + A.n _ 1S n - 2 + ... + ~ is Hurwitz (i.e. e(t) ~ 0

The number

N is chosen to be "1' in the expression (2.6) since we consider multi-input-single-output fuzzy system. We assume that the number of fuzzy rule M and the membership function parameters P are given in this paper, therefore, the problem becomes fmding the parameters W such that the system performance meets the specified requirement.

exponentiallyas r(t) ~ 0). The expression I is the identity vector. The performance evaluator can be viewed as the real-valued utility function of the plant performance, i.e., when r(t) is small enough, the plant performance is good.

Ill. RAF Controller Design

The critic for reinforcement signal is defmed in terms of the performance measure as

A. Structure Fig. 2 shows the detailed structure of the proposed RAF controller used in this paper, illustrating the overall adaptive-learning scheme. This RAF controller is a combination of an adaptive fuzzy system that uses a reinforcement signal to update the parameters in the fuzzy system, and a performance evaluator, which uses an error based on the given desired trajectory. The ideal behavior of the RAF controller is that the performance evaluator measures the system performance for the current system states, and at the same time provides information to a critic, which supplies the reinforcement signal for the adaptive fuzzy system. Then, the fuzzy system generates the counter-signal necessary to overcome the nonlinear phenomena and the disturbances with which the performance evaluator cannot deal. The performance evaluator plus the gain term constitutes a conventional linear controller. In the architecture, the critic is defmed as a pseudo-sign function, which outputs the negative or positive quantity depending on the signal from the performance evaluator. The reinforcement signal provided by the critic represents the system performance degradation due to the nonlinear terms appearing in the plant and/or additional unknown bounded disturbances. Based on

with

cr- (r) = -(X -

and /(1 +

ea - r ).

Its behavior characteristics

are much dependent on the critic slope gain a+ and a- . The time derivative of the performance signal can be written from (2.2) and (3.l) as ;-(t) =I(x,t)-u(t) (3.3) where the nonlinear function l(x,/) is a fairly complex function of x(t) and xd(t). It is assumed that there are no external disturbances in the plant. From the architecture of the proposed RAF controller in Fig. 2, the control input to the plant u(t) is given by u(t) =K vr(t)+ l(x,/).

(3.4)

The function I(x,t) is provided by the adaptive fuzzy system, the fixed gain matrix is Kv = K! > O.

199

From (3.4), the time derivative of the performance measure signal (3.3) can be rewritten as r(t)=-Kvr(t)+ l(x,t).

differentiable with respect to r(t) and t . The time derivative of Lyapunov function is given by

t {a+

(3.5)

i =

And the functional estimation error is defmed as

. I(x,t) = I(x, t)- I(x,t) .

(3.14) Using the definition of(3.2) and applying the performance signal dynamics (3.11), we have i(t)=-R T Kvr+RTWT
Take transpose and trace operator, then we have T T T i(t)=-R Kvr+tr(W q>(cr(x»R )

The following property is used: tr(D Tab T ) = aT Db

i(r,W) ~ -am A.m(Kv)llrll+aMEM

(3.9)

is provided by an

with aM =max(a+ ,a-) and am =min(a+,a-) . Note that the relation R(tl r(t) > 0 for any r(t)::i= 0 has been used.

.

Wet) = Fq>(cr(x»R(tl -KFW(t)

Using the following inequality tr(i T Z) = tr{iT (Z -

i(r,W) ~ -am A.m(Kv)llrll+aMEM

+

(3.12)

Il

M

2

r(t)ll~ KWMI4+aMEM =B

a m A. m (K v )

(3 .22)

r(l)

is satisfied . Or rewrite (3.21) such that i(r, W)

~ -amA. m(K v)llrll-o- O)~lwll:

(3.23)

is guaranteed to be negative as long as the following condition

Kv ·

Proof Consider the following positive definite Lyapunov function

I W(t)11

{In(l+e Ct +Ij(I) )+In(l+e-Ct-Ij(I))} (3.13)

;=1

(3 .21)

K{llwll FW -llwll:}·

The time derivative L(r,W) is guaranteed to be negative if the following condition

W(t) are Uniformly Ultimately Bounded. Moreover, the performance measure r(t) can be made arbitrarily small by increasing the fixed control gains

t

(3.20)

i)}~ IlillF (ZM -lliII F)

yields

with F = FT > 0 design matrices determining the learning rate and /( > 0 design parameters governing the speed of convergence. Then the errors r(t) and

=

(3.19)

+KtrpT(W -W)}

Define fuzzy system parameter errors as (3.10) Wet) = W -W(t). From (3.7) and (3.9), we have the following performance signal dynamics, which is useful for the stability proof, r(t) =-Kvr(t)+ui(tl
L(t)

(3.17)

with a E 9\n, bE 9\( , and DE 9\nxl . Inserting the adaptive-learning rule (3.12) into (3.16), we obtain i(t) = _RT Kvr + RT E(X) + Klr(WTW). (3.18) The expression (3.18) is bounded by

Let the fuzzy system functional estimate for the continuous nonlinear function I(x, t) be given by

Wet)

(3.16)

+tr(W T F-1W)+R T E(X).

with the value WM assumed to be known.

The current value, adaptation algorithm.

(3.15)

+tr(W T F-1i¥>+R T E(X).

E(X) ~ EM. (3.7) The functional reconstruction error .s(x) is bounded by known constant EM. We assume that the ideal values W are bounded by a known positive value so that (3.8) IIWIIF ~WM

=W(t)T
)}r;(t)

(3.6)

= W T q>(cr(X»+E(X)

l(x,t)

)-a- {I + eCt-Ij(I)

1=1

According to the approximation property of fuzzy systems, the continuous nonlinear function I(x, t) in (3.3) can be represented by a fuzzy system with some constant ideal values W and some sufficient number of rules, i.e., I(x,t)

{1+e-Ct +Ij(I)

F

~ WM

20

+

holds with 0 < 0 < 1. The regions

+ttr(W T F-1W).

It is easy to show that this Lyapunov function candidate is positive definite, continuous, and

W~2 + aMEM

KO =B

40

Br(l)

and BW(t)

(324) wet)

are

.

the

convergence regions for the performance measure (error), and the parameter estimation errors for the

200

adaptive fuzzy system, respectively. Therefore, L(r,W) is negative outside a compact set. According to a standard Lyapunov theory extension [4], this demonstrates the UUB of both Ilr(t~1 and

the proposed control scheme. The MA TLAB command "ode23," which uses the second/third order Runge-Kutta method was used for solving the differential equations in the simulations.

IIW(t~IF· •

The Duffing forced oscillation system is given by [10] Xl =X2

It is interesting to note that Ilr(t~1 can be kept

. =0.lX2 -Xl3 X2

arbitrarily small by increasing the gain Kv or the

+12cos t+u(t)+d(t)

(4.1)

y=Xl·

critic slope gain am . The right-hand side of (3.22)

The system shows chaotic behavior if u(t) =0 and d(t) =o. The reference signal selected here is y d (t) = sin t , and the external disturbance d(t) is a square wave with the amplitude ±I and the period 2rc . The initial state conditions are chosen to be Xl (0) = 2 and x2 (0) = -2 through the simulations. The linear gain terms are K v =10 and A =5 in the simulations.

and (3.24) can be taken as practical bounds on Ilr(t~1 and IIW(t~IF' respectively. Moreover the regions given by (3.22) and (3.2m 4) represent the worst case one can have. Remarks: The following remarks highlight some aspects of the proposed RAF controller. (1) The persistent excitation ofiearning systems has been a challenging issue since any learning algorithms for nonlinear plant have a tendency to stagnate in a preferred region. Due to the second terms in the RAF adaptive-learning rules (3 .12), the persistent excitation condition is eliminated in the proposed learning scheme. In fact, our reinforcement learning rules is in the form of (j - modification where the persistency of excitation condition is not needed [7] in adaptive control. (2) From the stability proof, the performance evaluator Ilr(t~1 which is defined as output

A. I(x,t): known and d(t): unknown square wave Assuming the nonlinear function in the plant is exactly known and the unknown disturbance is existent, we select the following the controller, u(t) = Kvr(t)+ I(x,t) (4.2) with I(x,t) =

-xl + 12 cos(t) and

d(t) .

The tracking performance is shown in Fig. 3. It shows the fairly good tracking performance with the known nonlinear function and the unknown disturbance. However, it is very difficult to have the exact knowledge of the nonlinear function in the real plant. The effect of the. disturbance on the performance is shown. As expected, the inexact knowledge on the plant nonlinear characteristics produces a significant steady state error. Even though the high gain K v can reduce the tracking error considerably, there still exist steady state error resulting from the nonlinear characteristics. In fact, the high gain is limited by the excitation of unmode led high frequency component in the plant or measurement noise.

tracking error in this paper, can be kept arbitrarily small by increasing the gain matrix K v • This demonstrates the stability definition of Uniformly Ultimately Bounded in a practical sense. In practical situations, the output tracking errors are not exactly equal to zero. The best we can do is to guarantee that the error converges to a neighborhood of the origin. (3) Finally, it is emphasized that the adjustable weight parameters W in the fuzzy system may be initialized at zero, and the fixed controller, Kvr(t) maintain the stability until the fuzzy system learns the plant. This means that there is no off-line learning or trial and error phase, which require enormous times of learning. The detailed plant dynamics are completely unknown to the fuzzy system. Here comes the terminology, Adaptive-Learning, in that there is no consideration on the effect of plant dynamics and the controlled plant follows the desired trajectory in one process.

B. RAF Controller and d(t): unknown square wave Assuming the non linear function and the disturbance in the plant is completely unknown, select the following the controller, u(t) = Kvr(t)+w(tl
IV. Simulation Results The simulations of Duffing fo,ced oscillation system were performed to demonstrate the effectiveness of

201

(4.4)

with the parameters F

=diag[ 30]

and

=0.01. The as w(O) =0 .

system parameter is derived, based on the Lypaunov stability. Simulation results confinn that after the fuzzy system has compensated perfectly or partially for the non linear characteristics of the controlled plant on-line, the output tracking of the plant follows the desired trajectory very satisfactorily even in the presence of unknown non linear plant dynamics and disturbances.

1(

initial parameter values are selected The critic is selected as

R(t) = R+ (t) + R- (t) = (5+ (r) +(5- (r)

(4.5)

with a+ =a- =2. The input and the output universes of discourse are given as X\E[-2 2], X2E[-2 2] and ye[-2 2] .

References H. M. Kim and J. M. Mendel, "Fuzzy Basis Functions: Comparisons with Other Basis Functions," IEEE Trans. Fuzzy Systems, vol. 3, no. 2, pp. 158-168, 1995. [2] C. C. Lee, "Fuzzy logic in control systems: fuzzy logic controller-Part I," IEEE Trans. Syst., Man, Cybern., vol. 20, no. 3, pp. 404418,1990. [3] C. C. Lee, "Fuzzy logic in control systems: fuzzy logic controller-Part 11," IEEE Trans. Syst., Man, Cybern., vol. 20, no. 3, pp. 419435, 1990. [4] F. L. Lewis, C. T. Abdallah, and D. M. Dawson, Control of Robot Manipulators. MacMillan, New York, 1993. F. L. Lewis, A. Yesildirek, and K. Liu, "Neural [5] net robot controller with guaranteed tracking performance," IEEE Trans. Neural Networks, vol. 6, no. 3, pp. 703-715, 1995. [6] J. M. Mendel, "Fuzzy logic systems for engineering: A tutorial," Proc. IEEE, vol. 83, no. 3,pp. 345-377,1995. [7] K. S. Narendra and A. M. Annaswamy, "A new adaptive law for robust adaptation without persistent excitation," IEEE Trans. Automat. Control, vol. 32, no.2, pp. 134-145, 1987. [8] K. S. Narendra and K. Parthasarathy, "Identification and control of dynamical systems using neural networks," IEEE Trans. Neural Networks, vol. 1, no. 1, pp. 4-27, 1990. [9] J-J. E. Slotine and Li, Applied Nonlinear Control. Prentice Hall, 1991. [10] L. X. Wang, Adaptive Fuzzy Systems and

The fuzzy membership functions are chosen along each state trajectory x = [Xl x 2 ] as follows: G(I) -

G(2) = exp~ (xi + 0.75)2 J

1 1+exp(5· (xi + 1.25))

G(3) =exp~ (xi + 0.25)2

j

G(4) = exp~

G(5) =exp~ (xi - 0.25)2 J G(7)

[1]

xr J

G(6) = eXp~(Xi -0.75)2

j

=

1 1+ exp( -5· (xi -1.25))

The tracking performance is shown in Fig. 4. It shows an excellent tracking performance even with the unknown nonlinear func.tion provided by the RAF output.

~)

~o~--:-7",--:,--:--:'" T. . . . . . .

Pm.........

Fi .. J. Ouaput Tmu" (a) ,,(I) (0)<,(1) (d....<\, d..i"d. ..lid, adual)

I.)

•••••

~ ••• - •• ~ • • • --~.-- •• -~_._._

I •.:: --~-- .• -.~----.;

0$ " - .. -:-. _. - - ~ .. -

o - •• 4J •1

.U

• • • • • _•.• ' ••••

-- .• ~ •• _• •

.~....

~.- •••• ; ••• _~. __

~-.

.

A _

-:. _...

~_._

••

•••• ". __ •• _; _._.

_~. __ ••

L_ . .. __ .~ ...

Control:

, ••• -~.- •• --: •• --.~.- ••• _:_._._

Design

and Stability

Analysis.

Englewood Cliffs, NJ: Prentice Hall, 1994.

" Fi .. 4. Ouaput T","" P.,f"""",. (a) ',(I) (O)x,(I) (dotted: 6nired. .oid: accu.al)

V. Conclusion We have presented a new method of Reinforcement Adaptive Fuzzy scheme for control of nonlinear plants. The proposed control scheme is fairly simple compared with other reinforcement learning methods that require the gradient information either directly or indirectly on the cntlc or the plant. The reinforcement adaptive-learning rule for the fuzzy

202