STOCHASTIC ADAPTIVE CONTROL
Copyright © IFAC Identification and System Parameter Estimation 1982, Washington D.C ., USA 1982
ON THE OPTIMAL ADAPTIVE CONTROL OF LINEAR SYSTEMS* P. R. Kumar Department of Mathematics, University of Maryland Baltimore County, 5401 Wilkens Avenue, Baltimore, Maryland 21228, USA
Abstract. The problem considered is that of the optimal adaptive control with respect to a quadratic cost criterion of a standard linear, gaussian system. Attention is focused on the case where there is a finite set of parameterized systems one of which, unknown to us, is the true system. Via a counterexample, it is demonstrated that a commonly used adaptive control scheme can be severely nonoptimal. To overcome this, a new family of optimal adaptive controllers is introduced for which all the properties usually desired of an adaptive control scheme- closed-loop identification, stability of the overall system, convergence to the optimal control law and optimality of the cost incurred-can be proven. The salient feature of this adaptive control scheme is a novel identification criterion, which is obtained by adding to the usual least-squares error criterion a term which depends on the optimal cost associated with a parameter. Keywords: Adaptive control; adaptive systems; stochastic control; linear systems. 1 t-l 2 2 t im - L x + 2u t~ t s=O s s
INTRODUCTION A commonly used adaptive scheme consists of making a least-squares estimate of the underlying parameters defining a standard linear, gaussian system and then using a control input which is optimal if the estimate is correct. This procedure of estimation and control is repeated at each time instant.
(1 )
by which the performance of the adaptive control scheme is measured. At each time instant t, we make a leastsquares estimate of the parameter, i.e. at time instant t we make an estimate (a , b ) t of the unknown parameter as follows: t
At least for the case in which there are only a finite number of possible parameter values, one of which, unbeknownst to us, is the true one, this scheme can be seriously deficient. What can happen is that the parameter estimates can "lock" on to a false parameter. When this false parameter is such that the optimal control law for it is severely nonoptimal for the true parameter, then the resulting adaptive control scheme can incur a cost much larger than the optimal cost achievable.
(0, -1) (at' bt ) t-l t-l + u )2 < L (x - x - u )2 if L (x s+l s s+l s s s=O s=O (2 ) (1,1) t-l if
L
s=O
The &ollowing is a counterexample to show that this undesirable sequence of events can indeed transpire. There is a system
(x
s+l
+ u )2 > s
t-l
L
s=O
(x s + 1 - x s - u s ) 2.
With regard to the cost criterion (1) by solving the Algebraic Riccati Equation we know that the optimal control is u
t
Ut
where {et } is a sequence of independent, N(O,l) (noise) random variables. We know that the unknown parameters (a,b) can be either (0, -1) or (1,1). We also have a cost criterion
0 if (a,b) = (0, -1) and x (f) i f (a,b) = (1,1).
Therefore,
pretending at each time t that (at' bt ) is indeed the correct estimate, our adaptive
*The
research rep orted here has been supported by the U.S. Army Researc h Office under Contract No. DAAG-29-80-K0038. 627
P. R. Kumar
628 choice of control inputs is:
b ) = (0, - 1) t
if (at'
Ut = °
The goal of this paper is to exhibit an adaptive control scheme which never encounters problems of this sort. (3 )
THE ADAPTIVE CONTROL SCHEME
x (-E.)
bt)
if (at'
2
(l,n Let
°
Now if (a • B ) = (1,1) for some t, then t
t
(a + ' b + ) = (1,1) for all n > 0. That tn tn --is if at any time, the parameters are estimated to be (1,1), then the parameter estimates will never again be changed, i.e. the parameter estimates can "lock" on to the values (1,1). The basic reason for this is that under the control u = -
1 ZX,
the closed-
we are given is a finite set 0 with 8° some element of it. We are also given a cost criterion t-l (5) 1im 1. L x'C'Cx + u'Ru • t+OO t s=O s s s s We assume that (A(8), B(8), C) is a minimal triple for each 8 E (3
(1)
bt )
=
(1,1)
t-l <=>
L (x s + 1
s=O
2
+
t-l >
u)
s
L (x s + 1-
s=O
2 x
s
(4)
be the true system with {w + I} a sequence t of independent gaussian random vectors N(O,I). We do not know the value 80. All
loop systems associated with both (a,b) = (0, -1) and (a,b) = (1,1) look alike, and the estimation criterion (2) cannot distinguish between them. More precisely observe the following sequence of implications: (at'
°
x + = A(8 )x + B(8 )u + w + t t l t l t
-
(ii) R
=
R' >
(6 )
°
- u )
s
Under (6) it i~ known that there exists an unique (within the class of symmetric nonnegative definite matrices) solution P(8) of
1
- - x 2 t
P (8) = A' (8)[ P (8) - P (8) B(8) (B' (8) P (8 ) B(8) + R)-lB'(8)P(8»)A(6) + C'C t
=>
t
2
\' (Xs + 1 + u s ) >
L
s=O
~> (a + l , t
bt + l ) 1
- "2
=
~)' (x s
s=O
+ 1- x s - u s )
2
and that p(8) > 0. Also for the system with parameter 8, u K(8)x is the optimal t
t
feedback control for (5) where
(1,1)
l K(8) := - (B'(8)P(8)B(8)+ro- B'(8)P(6)A(8)
xt + 1
and the resulting optimal cost (5) (both a.s. and in expectation, see [1]) is J(8) := trace P(8).
(at +n'
bt
+n)= (1,1) for all n > 0.
Now if the true values of the parameters are (a,b) = (0, -1), and X = I, U = 0 are
o
o
the initial conditions, then 1
Prob (u = - - x ) 121
Prob(x l +uot>
prob(x~
(~-'U-u(/)
l
Pro b ( e l > 1.) 2
- 1)2) °
0
31.
Hence Prob (Ut = -
"2
1im o(t) = +
t
and 1im o(t) = ° t-+
(7)
Our ~ adaptive control scheme is the following. At each t = 0, 1, 2, ... make an "estimate" § according to t
§
: = arg min{0(t)1n J(8) 8 E (3 , t-l + (x +1-A(8)x -B(8)u)'(x +1-A(8)x s=O s s s s s t
I
-B(8)U } s x
00
t-+
> (xl - 1)2)
Prob (e~ > (e
1
Let o(t) be any positive sequence with
for all t) > 0
0
for t
0, 2, 4, 6, 8, 9, ...
31.
Thus the adaptive control scheme can stick at a strictly nonoptimal control law (cost = 2 versus optimal cost = 1) with probability exceeding 31%.
(8 )
8 t _ 1 for t = 1, 3, 5, 7, 9,... . Apply the control input,
629
On the Optimal Adaptive Control of Linear Systems (9 )
The crucial feature which makes our adaptive control law different from those previously studied is the term o(t)in J(6) in (8). Were it not for the presence of this term, § would merely be the leastt
square estimate.
"
As it stands, 6
t
is
mildly biased in favor of 6 's for which J(6) is smaller. THE MAIN RESULTS Let (rl , Y ,1» be the underlying probability space for the evolution of the system (4, 8, 9), that is under the feedback law. We denote points in rl by w. Let Ncrl , '(N) = be a fixed null set. Our main results are:
°
(i)
Closed-loop identification
If for w
i im sup t->oo
E N and 6 ~ e ,
~ n t
t-l
L 1(8
s=O
= 6) > 0, then
s
What this says is that for any 6 which occurs infinitely often and with a certain time-density of occurrences, the closedloop dynamics under K(6) are identified. Overall stabili~ of the adaptive Zcmtrol system
For every pE [1,00), there is an M(p) < 00 such that t-l hm sup ~ L II x li P + II u liP 2 M(p) t->oo s=O s s for every w ~ N
C
•
Thus the system is mean p-th power stable for p > 1. (iii)
Thus this adaptive control scheme has all the properties usually desired of such schemes. For proofs of all these results, the reader is referred to [2). DISCUSSION The problem examined here falls within the general class of problems dealing with linear stochastic systems. In recent years much progress has been made [1, 3-9) in the study of such systems. This is especially true of the case where R = in (5). Such adaptive control schemes seek to minimize the variance of the output, without regard to the cost of control.
°
For the present case R > 0, much less seems to be known. For example, it does not even appear to be known whether
C
A(6) + B(6)K(6) = A(60) + B(60)K(6).
(ii)
better cost even if one knew the value of the true parameter 6° at the start.
Convergence of the adaptive controls to an optimal control law
t-l hm _1_ l(u \ t->oo Q,n t s=O s
L
K(60)x ) = ° s
I t-l 2 L qx + ru;} is bounded when control { -t s=O s
°°
laws appropriate to the case r > are used. This is in contrast to the case r = where such questions appear to have been satisfactorily resolved [8,9). Indeed our results here resolve similar questions for the case R > and even appear to be slightly stronger.
°
Another motivation for the work here is the closed-loop identification problem of Section I. This problem does not appear to have been adequately dealt with in the literature, at least for the case where e, the parameter set, is finite, even thougb the nature of the problem appears to be part of the folklore. The problem examined here is however limited in several respects. For example the noise sequence is temporally incorrelated. Also the noise covariance is positive definite. These restrictions have been successfully overcome for the case R = 0. Clearly much remains to be done. REFERENCES
[1)
C
for every wE N
•
This says that the time instants at which the inputs are not optimal are very rare. (iv)
Achievement of~mal cost t-l Q,im 1 L x'C'Cx + u'Ru = J( 60) t->oo t s=O s s s s for every w ~ N
C
•
Thus the actual cost achieved under operation is the optimal cost achievable for the
°
system 6 .
Hence one cannot achieve a
[2)
[3] [4]
Hall, P. and C. C. Heyde (1980). Martingale limit theory and its application. Academic Press. Kumar, P. R. (1981). Optimal adaptive control of linear quadratic gaussian systems. UMBC Mathematics Research o Report. Astr8m, K. J. and B. Wittenmark. (1973). On self-tuning regulators. Automatrica, Vol. 9, pp. 185-19-9-.--Ljung, L. and B. Wittenmark. (1974). Asymptotic properties of self-tuning regulators, Report 7404, Division of Automatic Control, Lund Institute of Technology.
630 [5]
[6]
[7]
P. R. Kumar Ljung, L. (1977). On positive real transfer functions and the convergence of some recursive schemes. IEEE Transactions on Automatic Control, Vol. AC-22, pp. 539-550. Ljung, L. (1977). Analysis of recursive stochastic algorithms. IEEE Transactions on Automatic Control, Vol. AC-22, pp. 551-575. Astr8m, K. J., U. Borisson, L. Ljung and B. Wittenmark (1977). Theory and applications of self-tuning regulators, Au~£a, Vol. 13, pp. 457476.
[8]
[9]
Goodwin, G. C., P. J. Ramadge and P. E. Caines (1979). Recent results in stochastic adaptive control. Proceedings of the 1979 Conference on Information Sciences and Systems, Baltimore, Maryland, pp. 363-367. Egardt, B. (1979). Stability of adaptive controllers. Lecture note in Control and Information Sciences, Vol. 20, Springer-Ver1ag, Berlin.