Copyright © 2002 IFAC 15th Triennial World Congress, Barcelona, Spain
www.elsevier.com/locate/ifac
ROBUST ADAPTIVE CONTROL OF UNCERTAIN NONLINEAR SYSTEMS Charles H. Dillon ∗ Jason L. Speyer ∗ ∗
Mechanical and Aerospace Engineering Department University of California, Los Angeles Los Angeles, CA 90095
Abstract: An adaptive control problem for a class of uncertain nonlinear systems is formulated as a disturbance attenuation problem. In particular, the class of problems having nominal linear dynamics perturbed by small polynomial functions of the state which multiply the state, control and measurements is considered. Using a perturbation approach, an approximate solution to the resulting differential game problem is found which provides an implementable form of the robust adaptive compensator. Copyright © 2002 IFAC Keywords: Adaptive Control, Disturbance Rejection, Dynamic Programming, Differential Games, Nonlinear Control Systems, Perturbation Analysis
1. INTRODUCTION
produces a compensator having a linear estimator structure but which requires the global maximization of a multimodal nonlinear function of the unknown parameters in the determination of the control.
In this paper, a disturbance attenuation problem is considered for a class of systems having nominal linear dynamics which are perturbed by small nonlinearities in the form of polynomial functions of the state. This is a particularly useful class of systems in that many nonlinear functions which occur in the dynamics of physical systems can be approximated in polynomial form.
The approach taken here is to use perturbation theory to develop an approximate form of the minimax adaptive compensator resulting from the disturbance attenuation problem for the class of systems having small polynomial functions of the state. The assumption which must be made is that there exist a small parameter, , which multiplies the nonlinearities, and a disturbance attenuation bound θ1 such that a solution to the disturbance attenuation problem exists. An approximate form of the robust adaptive compensator is then determined as an expansion in a small parameter .
In general, the solution to the disturbance attenuation problem for nonlinear systems is difficult to find, as a solution to two nonlinear partial differential equations must be found to exist over all time as a function of the state and the sum of these two solutions must have a unique maximizing value at each point in time along the trajectory for a minimax solution to exist (Ba¸sar and Bernhard, 1991). An exact solution for the special case of linear systems having uncertain coefficients multiplying the control has been found (Chichka and Speyer, 1995; Yoneyama et al., 1997) which
By considering systems with polynomial type nonlinearities, it is shown that the terms of the expansion have the characteristic of being separable in that each term can be described as a function of time which multiplies a function of the current state. This provides a structure which can be propagated forward in time based on current data
1
The work presented in this paper was performed under AFOSR Grant No. F49620-00-1-0154
37
A disturbance attenuation problem is then formulated which can be written as a differential game with cost given by
1 1 J = ˆ0 2P −1 x (tf )2Qf − x (0) − x 0 2 θ tf 1 x (τ )2Q + u (τ )2R − w (τ )2W −1 + θ 0 2 + z (τ ) − H(x (τ ), τ ; )x (τ )V −1 dτ (5)
only, with the worst case state calculated via a simple algebraic relationship. In contrast, the nonseparable structure inherent in general nonlinear systems requires evaluation over all past and future data and an iterative solution in determining the worst case state. A Kronecker product notation is utilized to provide an efficient means of representing the polynomial functions of the state vector. The symbol ⊗ is used to denote the Kronecker product A ⊗ B of two matrices A ∈ Rp×q and B ∈ Rm×n , defined as in (Bellman, 1995) as the block matrix formed by multiplying each element of A by the entire matrix B, such that a11 B . . . a1q B A ⊗ B ... . . . ...
so that the minimax problem may be written as min u
max
w ,z ,x (0)
J
(6)
3. ADAPTIVE CONTROL SOLUTION
ap1 B . . . apq B or, more simply,
A ⊗ B aij B
To determine the approximate minimax adaptive compensator, the game problem (6) is separated into a control subproblem which produces an optimal return function, X c (x (t), t), and filtering subproblem which produces an optimal accumula tion function, X f (x (t), t). The worst case state ∗ x (t) is then determined by maximizing the sum of the optimal return function and optimal accumulation function over x (t).
The notation A[j] denotes the j th Kronecker power of A, defined by A[2] = A ⊗ A A[j+1] = A ⊗ A[j] = A[j] ⊗ A
In general, this problem does not lend itself to an easily implementable solution. However, if the nonlinearities are multiplicative in nature, i.e. a polynomial type nonlinearity, then an approximate solution based on an expansion in the small parameter leads to a series of terms which are separable in x (t) and t, so that each term in the expansion may be written as a polynomial function of x (t) with time varying coefficients which are propagated by first order linear differential equations. The key assumption which must be made is that, for a given disturbance attenuation bound, θ1 , a parameter > 0 exists such that a unique continuous solution exists for both X c (x (t), t) and X f (x (t), t), and the maximizing value of their sum, x ∗ (t), is unique for all t ∈ [0, tf ].
2. PROBLEM FORMULATION The system considered is one having nominal linear dynamics and measurements which are perturbed by polynomial nonlinearities in the dynamics and measurements multiplied by a small parameter such that x˙ (t) = A(x (t), t; )x (t) + B(x (t), t; )u (t) + Γ(x (t), t; )w (t) (1) z (t) = H(x (t), t; )x (t) + v (t) (2) ∞ n where x (t) n=0 xn (t), with u (t), w (t), v (t) and z (t) defined similarly. The nonlinearities are introduced as polynomial forms such that ˜ (t), t) A(x (t), t; ) A0 (t) + A(x
(3)
with B(x (t), t; ), Γ(x (t), t; ), and H(x (t), t; ) defined in similar fashion, and ˜ (t), t) A(x
k
Aj (t) In ⊗ x [j] (t)
3.1 Control Subproblem The control subproblem consists of the process of determining the feedback strategies for u , w , and z which result in a saddle structure for the approximate cost to go from a given current time t to the final time tf given the current value of the state x (t). The cost to go is represented by
(4)
j=1
˜ (t), t), Γ(x ˜ (t), t), and H(x ˜ (t), t) Likewise, B(x are defined in the same way such that, for j = 0, 1, . . . , k, Aj (t) ∈ Rn×n Γj (t) ∈ R
j+1
n×qnj
j
Bj (t) ∈ Rn×mn Hj (t) ∈ R
J c = J [t, tf ]
p×nj+1
(7)
subject to the dynamics and measurements (1)(2). An approximate form of the optimal return
with In denoting an n × n identity matrix.
38
where
function is then sought of an expansion ∞ innthec form X c (x (t), t) = n=0 Xn (x (t), t), where c
max J X (x (t), t) = min u
c
w ,z
Π(τ ) Aj (τ ) − Bj (τ ) R−1 B0T (τ )Π(τ ) ⊗ In[j] T + θΓj (τ ) W ΓT0 (τ )Π(τ ) ⊗ In[j]
c q˜1j (τ ) vec
(8)
Using dynamic programming, the cost to go is then found which satisfies the partial differential equation ∂X c ∂X c T min A(x (t), t; )x (t) max + u w ,z ∂t ∂x + B(x (t), t; )u (t) + Γ(x (t), t; )w (t) 1 1 + x (t)2Q + u (t)2R − w (t)2W −1 2 θ + z (t) − H(x (t), t; )x (t)2V −1 = 0 (9)
Since the nominal dynamics are linear, x (τ ) in (16) can be written in terms of x (t) by x (τ ) = Φc0 (τ, t)x (t)
1 X c (x (tf ), tf ) = x (tf )2Qf 2
∂X c ∂x c ∂X w ∗ (x (t), t) = θW ΓT (x (t), t; ) ∂x ∗ z (x (t), t) = H(x (t), t; )x (t)
∂ c Φ0 (τ, t) = −Φc0 (τ, t) A0 (t) ∂t − B0 (t)R−1 B0T (t) − θΓ0 (t)W ΓT0 (t) Π(t)
(10)
(12)
(13)
(21)
where each mc1j (t) is defined by tf c c Φc0 [j+2]T (τ, t)˜ q1j (τ )dτ m1j (t)
(22)
t
Each mc1j (t), then, is a nj+2 × 1 vector which is a function of time only and, when differentiated with respect to time t, is propagated backward in time from a terminal boundary condition of mc1j (tf ) = 0 by the linear matrix differential equation
(14)
−m ˙ c1j (t) =
j+1
c [j+1−i] In[i] ⊗ A¯cT m1j (t) 0 (t) ⊗ In
i=0
(15)
and using properties of Kronecker products (Bellman, 1995; Brewer, 1978; Rottella and Dauphin-Tanguy, 1988), X1c (x (t), t) can be written as a path integral, given by tf
k cT X1c (x (t), t) = q˜1j (τ )x [j+2] (τ )dτ (16) t
mc1j T (t)x [j+2] (t)
j=1
with boundary condition Π(tf ) = Qf . This is, of course, just the familiar solution of the control subproblem for the linear disturbance attenuation problem (Ba¸sar and Bernhard, 1991). Effects of the nonlinearities, then, begin appearing in the first order term, X1c (x (t), t). Substituting the value of the derivative of X0c (x (t), t), which from (13) is given by ∂X0c = Π(t)x (t) ∂x
k
X1c (x (t), t) =
where ˙ − Π(t) = AT0 (t)Π(t) + Π(t)A0 (t) + Q − Π(t) B0 (t)R−1 B0T (t) − θΓ0 (t)W ΓT0 (t) Π(t)
(20)
Then, substituting the value of x (τ ) (19) as a function of x (t) in the path integral (16), the first order optimal return function X1c (x (t), t) can then be written in terms of a separable function of x (t) and t by
(11)
Substituting these strategies in (9) and collecting terms of X c (x (t), t) in like powers of a series of partial differential equations for the components Xnc (x (t), t) of (8) is obtained. Beginning with the 0 terms, X0c (x (tf ), tf ) is determined as 1 x (t)2Π(t) 2
(19)
where Φc0 (τ, t) is a state transition matrix which is propagated by
Performing the extremization in (9) yields the saddle strategies u ∗ (x (t), t) = −R−1 B T (x (t), t; )
(17)
and x (τ ) is subject to the nominal linear dynamics given by x˙ (t) = A0 (t) − B0 (t)R−1 B0T (t) − θΓ0 (t)W ΓT0 (t) Π(t) x (t) (18)
with terminal condition
X0c (x (t), t) =
j=1
39
c (t) + q˜1j
(23)
where Ac0 (t) is defined as A¯c0 (t) A0 (t) − B0 (t)R−1 B0T (t)
− θΓ0 (t)W ΓT0 (t) Π(t)
(24)
Higher order terms, Xnc (x (t), t), with n > 1, are determined similarly as a path integral containing derivatives of the lower order terms in the expansion as polynomial functions of x (τ ), with x (τ ) subject again to the linear dynamics given by (18).
3.2 Filtering Subproblem
with initial conditions x ˆ(0) = x ˆ0 and P (0) = P0 . Collecting first order terms in in equation (27), the partial differential equation for the first order accumulated cost is satisfied by the path integral T t ∂X0f ˜ f − A(x (τ ), τ )x (τ ) X1 (x (t), t) = ∂x 0 ˜ (τ ), τ )u(τ ) + 1 x T (τ )H ˜ T (x (τ ), τ )V −1 + B(x θ × z(τ ) − H0 (τ )x (τ ) dτ (32)
The filtering subproblem consists of the process of determining the accumulated cost and maximizing disturbance from initial time, t = 0, to current time t given that the past controls Ut {u(s), 0 ≤ s < t} and measurement history Zt {z(s), 0 ≤ s < t} as well as the initial state x (0) are known. As in the control subproblem, the approximate accumulated cost is found as an expansion in the small parameter , with each term a function time t and the state x (t). The accumulated cost J f is given by J
f
= J [0, t]
where x (τ ) is subject to the linear dynamics given by
(25)
x˙ (t) = A0 (t)x (t) + B0 (t)u(t) + Γ0 (t)W ΓT0 (t)P −1 (t) x (t) − x ˆ(t)
subject to the dynamics and measurements (1)(2), and with the past controls, Ut , and measurements, Zt , known functions of time. An approximate form of the optimal accumulated cost is sought in the form of an expansion X f (x (t), t) = ∞ n f n=0 Xn (x (t), t) of the optimal accumulation function, where f
X (x (t), t) = max J
f
Since the dynamics for x (τ ) are linear, the state x (τ ) at time τ can be written in terms of the state x (t) at the current time t in the form x (τ ) = Φf0 (τ, t)x (t) + Ψf0 (τ, t)
(26)
w
where ∂ f Φ (τ, t) = −Φf0 (τ, t) A0 (t) ∂t 0 + Γ0 (t)W ΓT0 (t)P −1 (t) ∂ f Ψ (τ, t) = −B0 (t)u(t) ∂t 0 + Γ0 (t)W ΓT0 (t)P −1 (t)ˆ x(t)
The accumulated cost is then determined via dynamic programming by the partial differential equation T ∂X f ∂X f − A(x (t), t; )x (t) − max w ∂t ∂x + B(x (t), t; )u(t) + Γ(x (t), t; )w (t) 1 1 + x (t)2Q + u(t)2R − w (t)2W −1 2 θ + z(t) − H(x (t), t; )x (t)2V −1 = 0 (27)
Φf0 (τ, τ ) = In
T
(36)
Ψf0 (τ, τ ) = 0
X1f (x (t), t) =
k+2
j=0
T
mf1j (t)x [j] (t)
(37)
such that the mf1j (t) terms may be differentiated with respect to time t to produce a system of linear differential equations which are forced by the control u(t), measurement z(t), zeroth order state estimate x ˆ(t) and zeroth order Riccati solution P (t).
Beginning with 0 , the zeroth order accumulated cost, X0f (x (t), t) has the solution
Similarly, higher order terms can then be generated by collecting terms in increasing powers of , and using the properties of the Kronecker product to arrange in the form
1 1 = − x (t) − x ˆ(t)2P −1 (t) t2 θ ˆ x(τ )2Q + u(τ )2R +
X0f (x (t), t)
0
1 − z(τ ) − H0 (τ )ˆ x(τ )2V −1 dτ θ
(35)
Again using properties of Kronecker products, and substituting (34) for x (τ ), (32) can be manipulated into the form
∂X f w (x (t), t) = −θW Γ (x (t), t; ) (28) ∂x Substituting (28) in (27) and collecting terms in like powers of gives a series of partial differential equations which can then be solved to give the terms in the expansion for the optimal accumu lated cost, X f (x (t), t).
(34)
with
Maximizing with respect to w yields the strategy ∗
(33)
Xnf (x (t), t) =
(29)
nk+2
j=0
T
mfnj (t)x [j] (t)
(38)
with the terms mfnj (t) again generated by a system of first order linear differential equations which are forced by the controls u(t), the measurements z(t), the zeroth order state estimate x ˆ(t), the zeroth order Riccati solution P (t), as well as the lower order mfnj (t) terms.
where ˆ(t) + B0 (t)u(t) x ˆ˙ (t) = A0 (t) + θP (t)Q x (30) T −1 + P (t)H0 (t)V x(t) z(t) − H0 (t)ˆ P˙ (t) = A0 (t)P (t) + P (t)AT0 (t) + Γ0 W ΓT0 (31) − P (t) H0T (t)V −1 H0 (t) − θQ P (t)
40
3.3 Connection Condition
The control may then be implemented as a truncated series from (10) as a function of the worst case state determined from (41) by collecting terms in powers of to the desired level of accuracy and substituting the appropriate values of x∗n (t).
In the above control and filtering subproblems, the minimizing control and maximizing disturbance strategies are determined for a given value of the state x (t). To determine the worst case value of the current state x (t), the sum of the accumulated cost X f (x (t), t) and the cost to go X c (x (t), t) is maximized in an algebraic “connection condition”. To do so, the state x (t) is expanded in powers of the small parameter , then terms in like powers of are maximized, beginning with 0 . The result is a series of quadratic forms in the components xn (t) of x (t), such that the maximizing values of each successive component x∗n (t) may be determined analytically as an algebraic function of the previously determined components.
4. A SCALAR EXAMPLE As an example, consider a scalar system with small cubic nonlinearity in the dynamics, such that x˙ = ax + bu + γw + x3 z = hx + v
Beginning with the zeroth order terms, the zeroth order optimal return function, from (13), is given as 1 X0c (x(t), t) = π0 (t)x2 (t) (44) 2 where, from (14),
First, the state x (t) can be written in terms of its components as x (t)
∞
n xn (t)
b2 − θγ 2 W ) + q (45) r with terminal condition π0 (tf ) = qf . Similarly, the zeroth order accumulation function, (29), is found as
(x(t) − x ˆ(t))2 1 − + X0f (x(t), t) = 2 θp(t) t 2 (z(τ ) − hˆ x(τ ))2 dτ (46) qx ˆ (τ )+ru2 (τ )− θV 0 −π˙ 0 = 2aπ0 − π02 (
n=0
The sum of the optimal accumulation func tion X f (x (t), t) and optimal return function c X (x (t), t) is then expanded in powers of and each term is then maximized, beginning with terms multiplied by 0 . The zeroth order maximization is then f c max X0 (x (t), t) + X0 (x (t), t) x0 (t)
where, from (30)-(31),
=0
1 = max x0 (t)2Π(t) x0 (t) 2 − x0 (t) − x ˆ(t)2(θP (t))−1
(39)
This yields the familiar solution −1 x∗0 (t) = I − θP (t)Π(t) x ˆ(t)
(40)
ph x ˆ˙ = (a + θpq)ˆ x + bu + (z − hˆ x) (47) V h2 (48) p˙ = 2ap − p2 ( − θq) + γ 2 W V with initial conditions x ˆ(0) = x ˆ0 and p(0) = p0 . The zeroth order connection condition then produces a worst case x0 (t) in the form (40)
Higher order terms are determined similarly, by collecting terms in even powers of , such that x∗n (t) is determined by maximizing the quadratic form resulting from the summation of the components of the optimal return and accumulation functions up to 2n which are multiplied by 2n . x∗n (t)
(42) (43)
2n 1 ∂j f X2n−j = max (x (t), t) j j! ∂ xn (t) j=0 c + X2n−j (x (t), t)
x∗0 (t) =
x ˆ(t) (1 − θp(t)π0 (t))
(49)
so that the zeroth order control, u0 (t) is implemented as b u∗0 (t) = − π0 (t)x∗0 (t) r (50) x(t) π0 (t)ˆ b =− r (1 − θp(t)π0 (t)) Next, first order terms are considered, with the first order optimal return function, X1c (x(t), t), determined as in (21) in the form
(41)
=0
The resulting function being maximized then contains a term which is quadratic in xn (t), weighted by [Π(t)−(θP (t))−1 ], and a term which is linear in xn (t), multiplied be a polynomial function of the previously determined x∗m (t), with m < n, such that each additional term in the expansion can be determined explicitly as a polynomial function of the previously determined terms.
X1c (x(t), t) = x4 (t)mc14 (t)
(51)
mc14 (t)
with propagated backwards in time from mc14 (tf ) = 0 as in (23) by b2 −m ˙ c14 (t) = 4 a − ( − θγ 2 W )π0 (t) mc14 (t) r + π0 (t) (52)
41
Next, the first order accumulation function is determined as in (37) in the form X1f (x(t), t) = where
mf1n (0)
4
n=0
mf1n (t)xn (t)
approximation of the disturbance attenuation solution for this class of systems. The nominal compensator for the system is in the form of the familiar H ∞ robust controller, with higher order terms determined algebraically as an explicit polynomial form. The coefficients of this polynomial form are a combination of terms which are propagated forward in time by first order linear ordinary differential equations forced by known control and measurement information and coefficients which are propagated backwards in time, also by first order linear ordinary differential equations. The control can then be implemented as an explicit algebraic form.
(53)
= 0 and
γ2W m ˙ f10 (t) = mf11 (t) −bu(t) + x ˆ(t) p(t) x ˆ(t) + θp(t) γ2W f x ˆ(t) m11 (t) + 3 m ˙ f11 (t) = − a + p(t) θp(t) γ2W f x ˆ(t) + 2m12 (t) −bu(t) + p(t) γ2W f x ˆ(t) m12 (t) + 3 m ˙ f12 (t) = −2 a + p(t) θp(t) γ2W x ˆ(t) + 3mf13 (t) −bu(t) + p(t) 2 W γ x ˆ(t) mf13 (t) + m ˙ f13 (t) = −3 a + p(t) θp(t) 2 W γ x ˆ(t) + 4mf14 (t) −bu(t) + p(t) γ2W f 1 m14 (t) + m ˙ f14 (t) = −4 a + p(t) θp(t)
(54)
(55)
6. REFERENCES Ba¸sar, Tamer and Pierre Bernhard (1991). H ∞ Optimal Control and Related Minimax Design Problems. Birkh¨ auser. Bellman, Richard E. (1995). Introduction to Matrix Analysis. Classics in Applied Mathematics. Society for Industrial and Applied Mathematics. Republication of the work first published by McGraw-Hill Book Company, Inc., 1960. Brewer, John W. (1978). Kronecker products and matrix calculus in system theory. IEEE Transactions on Circuits and Systems CAS25(9), 772–781. Chichka, D. F. and J. L. Speyer (1995). An adaptive controller based on disturbance attenuation. IEEE Transactions on Automatic Control 40(7), 1220–1233. Rottella, F. and G. Dauphin-Tanguy (1988). Non-linear systems; identification and optimal control. International Journal of Control 48(2), 525–544. Yoneyama, Jun, Jason L. Speyer and Charles H. Dillon (1997). Robust adaptive control problem for linear systems with unknown parameters. Automatica 33(10), 1909–1916.
(56)
(57)
(58)
The first order connection condition, determined from (41) with n = 1, then gives the worst case value of x1 (t) as x∗1 (t) =
θp(t) × mf11 (t) 1 − θp(t)π0 (t)
+ 2mf12 (t)x∗0 (t) + 3mf13 (t)x∗0 2 (t)
+ 4(mc14 (t) + mf14 (t))x∗0 3 (t)
(59)
and the first order control, u1 (t) is implemented from (10) as b π0 (t)x∗1 (t) + 4mc14 (t)x∗0 3 (t) (60) r Higher order terms, then, produce higher order polynomial functions of x(t), with coefficients which are propagated similarly via a system of first order linear differential equations. u∗1 (t) = −
5. CONCLUSION A robust adaptive compensator was developed for a class of nonlinear systems by considering an approximate form of the disturbance attenuation problem for systems with small nonlinearities. The class of systems considered, having small polynomial nonlinearities, has a convenient representation using a Kronecker product notation. By exploiting the properties of this particular form of nonlinearity, an approximate compensator structure was developed which gives an implementable
42