An Ergodic Theorem for the Adaptive Control of Random Parameter Finite State Stochastic Systems

An Ergodic Theorem for the Adaptive Control of Random Parameter Finite State Stochastic Systems

Copyright © IFAC 10th Triennial World Congress. Munich. FRG. 1987 AN ERGODIC THEOREM FOR THE ADAPTIVE CONTROL OF RANDOM PARAMETER FINITE STATE STOCHA...

859KB Sizes 2 Downloads 86 Views

Copyright © IFAC 10th Triennial World Congress. Munich. FRG. 1987

AN ERGODIC THEOREM FOR THE ADAPTIVE CONTROL OF RANDOM PARAMETER FINITE STATE STOCHASTIC SYSTEMS P. E. Caines and S. P. Meyn Department of Electrical Engineering, McGill University, 3480 University St., Montreal, P.Q., H3A 2A7, Canada

Ab t t s rac The problem of the adaptive control of systems whose parameters are stochastic processes pushes the stochastic Lyapunov stability analysis technique IChen and Caines, 1985] to its limits. A recently introduced technique ICaines, Meyn, Aloneftis, 1986; Meyn and Caines, 1987] analyses such systems by applying the ergodic theory of Markov processes to a suitable hyper-state system with stationary transition probabilities. The application in the cited papers is to AR (p) systems . In this paper the technique is applied to completely observable finite state Markovian systems with finite state Markovian parameter processes for which the control functions take values in a compact set.

This behaviour corresponds precisely to that displayed by the stochastic gradient and modified least squares stochastic adaptive control algorithms used for autoregressive moving average (ARMAX) systems (see e.g. \Goodwin, Ramadage, Caines, 1981 ], ISin and Goodwin, 1~82, IC.hen , 1984]). The present paper add~esses the st.ochastlc adaptIve control problem for the general fi,rllte state, ~nlte param~ter set~ compact control s~t systern specIfied by axIOms P1-P4 In SectIOn 2 below. In thIS case the state x of the system is completely observed (as before) but ~he parameter value ~ is not only par~ially. obser~ed but evolves In tIme as a stochastlc process - a SituatIon which we contend arises in most adaptive control problems. (Axiom P2 formalizes the idea that this 0 process is exogenous to the system). Stochastic adaptive control results for analogous problems Work partially supported by the Science and Engineering Re- may be found in the adaptive control literature: in particular, search Council (U.K.) and the Consiglio Nazionale delle Ricerche IChen and Caines, 1985] establishes asymptotic stabilization resuits for the case of autoregressive systems with exogenous in(Italy) . Subject area: 14.4 Adaptive Control, modelling and Identifica- puts (ARX) for which the unknown AR parameters are subject tion. . to bounded martingale difference disturbances, and [Caines and Keywords : Parameter adaptive control, Convergence and sta- Chen , 1985] treats an optimal control problem for a totally obbility problems in adaptive control, Stochastic adaptive control, served non-linear system with a partially observed Markovian Recursive estimation. finite state parameter process. However the first work to give an asymptotic analysis of the stochastic adaptive control of systems with an unbounded correlated partially observed parameter process is ICaines, Meyn, Aloneftis, 1986] and IMeyn , Caines, 1987]. 1. INTRODUCTION The key innovation (applied there to completely observed ARX The parameter adaptive control problem for finite state systems) is to view the entire closed loop adaptive control S!lSMarkov chains with parameter and control dependent transition tern as a Markov process with stationary transition probabilities probabilities, Pij(O, u), 0 £ 8 (8 a finite set), U £ U, has been ~once an adaptive f~edback control law ha.: bee~ selected). Then addressed by Mandl.· (1974), Borkar and Varaiya (1979) and ~n any c~e for which o~e may sho~ .a~ Invanant measure exKumar and Becker (1982), amongst others. Essentially, these IStS (possl.bly parametenzed by the Initial state of th.e system), authors addressed the problem of minimizing the ergodlc theorem for Markov processes may be Invoked to conclude that limits such as

where the function Uk at any instant k £ 7l+ is restricted to be an l{ measurable function of the form Uk = 4>(Xb Ok) where l{ exist almost surely. It is then possible, in principle, to optimize denotes the sigma field u(xo," " Xk) generated by the x process the performance of the system over the class of adaptive control and where 4> : X x 8 -+ U is a function and Ok is l{ - mea- laws giving rise to invariant measures. surable (denoted Ok £ l,kX ) , The purpose of this is to restrict 2 PROBLEM FORMULATION uk to have the certainty equivalence (adaptive control) law form . uk = u(xb Ok), where Ok is interpreted as an estimator of O. We consider a finite state set X = {I , ··" N} on which Kumar and Becker impose conditions that ensure X consists of evolves a discrete time parameter stochastic state process x = only one ergodic class for each fixed (0, u) £ 8 x U. They then {Xk; k £ 7l+}, where 7l+ denotes the positive integers {O, 1, ... }, calculate Ok for each k l 7l+ by use of a modified version of the a. finite parameter set 8 = {1," " M} on which evolves a discrete maximum likelihood method wherein the likelihood function at time parameter process 0 = {Ok; k l tld and a compact control v is multiplied by a factor depending upon the optimal cost set U c IR , for some 1/ f 7l +, on which. e~olves ~ c~ntrol process It is then shown that the certainty equivalence adaptive control U = {uk; k f 7l+} . These processes are JOIntly dlstnbuted on the law results in asymptotically optimal performance in the sense sample space n = (X x 8 x U)Z+ according to a probability that P. We shall assume P satisfies the following axioms where la.]t 1 ~ ~. denotes {as; 1 S S S k} for any sequence a . The first axiom lim N ~ C(Xk+l,U(xb Ok)) = J a.s. IPxo ]'

J; .

o

N-oo

k=O

139

P. E. Caines and S. P. Meyn

140

const itutes our definition of a (u -) controlled Markov process P4

Tim!' Invariance

UJ)' PI

P

On assIgning any values

The Hyper-state [00 ] is a Controlled Markov Process

([:J [:]1[~: J:)

[:]I [!:J)

= P ( [ ::: : ]

Vkl71 _,

VnlX,

(2.1) Vmle

and is continuous in u as a function from U to [0, I ]. Note that here to depend only on implies that x is a controlled Markov

o

the transition probabilities are also restricted the most recent control. Observe that (2.1) [ ~] controlled Markov process and 8 is an [~] process.

P2 Future Parameters are Conditionally Independent of Past States and Controls

P(Ok+ 1 = m lxk+l>

[~: ]) = P(O k+ 1 = m [Ok) Vk (71 +,

P(OQ

= m lxQ) = P(OQ = m)

[

Xk ] on the right hand ;n'] ' to [ ~~

side of (2.1) and any value m' to 10k i on the right hand side of (2.2), the resulting values of the conditional probabilities an~ independent of k ( 7l + .

o

In the light of axiom P4 we shall write

p~,m

= P(Ok+ 1 = n lOk = m)

Vk l 7l +

Furthermore , we will use the following notation throughout this paper. A conditional probability will be understood to be a function from some appropriate product space into [0, I ]. For example, the conditional probability P(OJ3ixJ2,On)')I,h,h (71+, will represent the function which maps e x X x e - t [0 , I ] given by

and furthermore we may make use of Bayes rule to make calculations such as

p(Oh lxh,OJI) =

(2 .2a)

p(Oh, Oil lxh) (0 I . ) P JI xn

°

Vm le

(2.2b)

To introduce th e control problem examined in this paper we first cons id er the case where is a constant process and Uk is permitted to be a function of [xsl t and the value of 0. Let the control objective be to minimize

o (2 .6) P3

Controls are Functions of Past States wh ere (u)f = {Ul , .. ·,UK},UJ (~x, n l u=i,x=JI is the indicator function of the control-state pair (i,)) lUX X , and where ciJ :::: (2.3) 0 , 1 :::; i :S M , 1 :S ) :S N. Then we have a standard Markov Vk l 7l + process control problem whi ch may be so lved by the techniques of dynamic programming. It is well known that the res ulting optimal control funco u2 = u2([x sl~, 0) , 1 :S k :S K , will be a Markovian contion is a Markov process by itself trol law , i. e. a state feedback control function of the form u~ = u~(xk>O),l :S k :S K . An asymptotic sampl e path version of the minimization of (2 .6) is the minimization of

For any k l 7l +, let:J{ denote the a-field a{xQ , .. · , xd generated by the process x on th e interval [0, k]. Then

i.e. uk is :Jkx measurable. We remark that P2 implies and , further , implies that

°

and

P(XQ

= n IOQ) = P(xQ = n)

(2.4b)

°

We further remark that a simple example of a system satisfying PI , P2, P3 is given as follows. Let wand be respectively !Rn and e valued discrete time processes where w is independently and identically distributed and is a Markov process which is independent of w. Suppose a non-anticipative control law U is defined by the Borel measurabl e fun ctions u : 7l + x XJ: + - t U , where uk(x) = g(k,x~),k l 7l ~, and x l XZ + . Then given a family of continuous (and hence Borel measurable) function s Fk : X x U x e x !Rn - t X and an initial random variable XQ, we obtain a state process x via the recurs ion

°

over the state feedback control laws of the form Uk = U(Xk), k ( 7l + . When the control process U is of this form it is evident from PI that x is a Markov process with stationary transition probabilities. Let us assume the process has only one ergodic class , then by the Ergodi c Theorem for Finite Markov Chains (see e.g. [Feller, 1950]) 1

Je (u)

~ )~oo K =

[~] satisfies PI-P3 .

N

Eoo(2: 2: CiJ ll luUl=i,xk=J I) i= 1 J=I

N

=

Finally we add

M

k=Q .=I J= I M N

(2 .5) It may be verified that

K

2.: {2: 2: C.Jll luk="Xk=Jr}

2: CuU),J Poo(x = J) J= I

w.p.l.

An Ergodic Theorem

Sinre here the minimization over the control laws u : X -+ U is a minization over the finite set L' X , the minimum of (u) over U ( U X exists for at least one uO As stated in Section 1, the work of Mandl, Borkar and Varaiya, and Kumar and Becker treats the adaptive version of this problem where () is unknown (i.e. the control cannot be an explicit function of ()). In these papers () is estimated by the maximum likelihood method, or by modifications of it, and in [Kumar and Becker, 1982[ it is shown that the resulting certainty equivalence adaptive control law is asymptotically optimal. In this paper we address the problem where x and () are jointly distributed random process. The class of admissible control laws are those that are time invariant continuous functions from the state space, and the set of posterior probabilities PO,k U) = P(()k = j [ x~) of () given the

J; Jr

observations x~ , to a compact subset U of JRv, v

x CM u : (Xb PO,k)

u: X

where Po k

-+

U

-+

uk

7l+. Formally

=
= (po k(I),··· 'Po k(M)),C M


f

141

Applying Bayes rule twice more gives

p(xk l()bX~-I) =

M

L(P(Xklxk-J,Uk-J,()k-l = e) l=1

x

p(()klx~-I'()k_l =

X

(P(()k [X~-I)) - 1

and we note that P(()k [ X~-I'()k _ l

l[x~-I))

l)P(()k-l =

(3.3)

= i) = P~

k'

l by P2.

~ P(()k

This yields the recursive scheme for {PO,k(l) l l x~) ; 1

:s l :s M, k f

Pe,kU)

= (

(2.7)

P

t

M

k-l)

xk Xo

'll+} given by LP(Xk [Xk-l,Uk - J,()k - l l=l

= e)p~,lPO,k-l(l)

l:Sj :S M is the unit simplex, and

Xx CM to U , where X

is given the discrete topology (i.e. all points in X are open sets). The control problem we wish to solve is the minimization (when a minimum exists) of

Set

Pe,k = [

pe,~(I) 1 JRM, f

k

f

(3.4)

7l+,

Pe,k(M) and define the M x M matrix Mb k

~

1, by

(2 .8) (when the limit exists), where c is a bounded measurable function from X x X x U -+ JR.

3.

MAIN RESULTS

Lemma 3.1. Consider the process (x,(),u) satisfying PI-P4, where u is given by (2.7). Then the process

~k ~ ~~ 1'

(3.1)

[

PO,k

is a Markov process with stationary transition probabilities on the state space S ~ e x X x CM. Furthermo ~ e, this is true if Po 0 is taken to be any lox measurable distribution (in which case {PO,k' k f tld is not necessarily a sequence of conditional probability distributions).

Proof. First we derive the recursive filter generating Po kk f 'll+ . By repeated applications of Bayes rule and assumptio~s PI through P4 we may derive a recursive formula for the conditional distribution P(()k [ X~), For k ~ 1

(

() [xk) = P(()bXk [ X~-I) 0 ( [ k-l) P xk Xo p(xk l()k> x~-l )p(()k l x~-l)

(by Bayes rule) (3.2)

M

=

L>(Xk l ()bX~ - I'()k - l

=

l)P(()k-l = l l ()k>x~ - l)

l=l M

= L>(Xk lxk-l,Uk-l,()k- l = i) l=l X

P(()k-l = l l ()k,X~ - I)

Mk - lPe k-l Pe k = ' , [M k- 1Pe,k-l

k

~

(3.6)

1

1

where la[ denotes the sum of the positive entries of an M component vector a, and for k = 0

Pe,o(l) = p(()o = elxo)· We note that if Pe o(e) "# p(()o = e) but is lrf measurable than the recursion abo~e can still be carried out although it will not be generating the conditional distribution of (). We verify that the Markov property

holds for all bounded measurable functions Pe,k f lb k f 7l+, we have

f :S

-+

JR. Because

E [f(~k+Il I cI>~l = E [ f(cI>k+Il lx~,cI>~'Pe,kl

~~:~ ) I X~,()~,pe,k]'

Pe ,k+l

Expanding the conditional probability of Xk given above and using equation (2.4) gives

p(xk l ()k>x~ - l)

(Note that the quantities in (3.5) are specified in PI and P2.) Then (3.4) may conveniently be written

= E[f (

P k

p(x klx~- l)

(3.5)

Since by (3.5,6) Pe k+l is the function of Xb xk+J, and Pe k given by' , x Mk(Xk+J,Xb
an application of Axiom PI shows that

E[f(cI>k+tl[cI>~l = E[f(cI>k+l)[xbcI>bPe,kl = E[f(cI>k+Il[cI>kl,

k

f

'll+

Hence {cI>d k<71+ is indeed a Markov process. We conclude that this Markov process has stationary transition probabilitieE

142

P. E. Caines and S. P. Meyn

by Axiom P4 and the observation that mk is a fixed matrix- Caines, P.E ., S.P. Meyn , and A. Aloneftis, (1986). "Stochasvalued function of Xb xk + l, and PIJ k' tic Stability and the Ergodic Theory of Markov Processes with Applications to Adaptive Control" . IFAC Conference The next step is to prove that 'an invariant measure exists. on System Identification and Parameter Estimation, Lund, First we topologize 5 with the product topology which results Sweden, July 1986. when e and X are given the counting topology (where each point is an open set) and CM is given the topology induced by Chen , H.F . and P.E. Caines, (1985). "On the Adaptive Conthe usual one on JR N. trol of a Class of Systems with Random Parameters and The set 5 is compact and it may be verified that ~ has the Disturbances". A utomatica, 21, No. 6, p.737. Feller property. Hence we may apply a theorem of [Benes , 1968] to prove that an invariant measure 11-00 exists. Chen, H.F. , (1984). "Recursive System Identification and AdapFor a measure 11- on a measurable space (5, 8) and a Markov tive Control by Use of the Modified Least Squares Algotransition function P (see [Doob , 1953, Chapter 3]), let "Ej derithm" . SIAM J. Contr. Optimiz., 22, No. 5, p.758. note the a-field of invariant sets A f 8 with the property Doob, J.L., Stochastic Processes, John Wiley, N.Y., 1953. a.e. [11- ]. Feller, W. (1950) . An Introduction to Probability Theory and its Then, because Lemma 3.2 holds, we may apply the erApplications. Vo!. 1, John Wiley, N.Y., 1950. godic theorem for discrete time parameter Markov processes (see [Doob , 1953, Chapter 5]) to the problem under examination. Let Goodwin , G.C., P.J. Ramadge and P.E. Caines (1980) . "Discrete Time Multivariable Adaptive Control". IEEE Trans. A ut. 11-00 be an invariant measure on 5, let Poo be the corresponding Control, A C-25, p.449. invariant measure on the sample space 5 Z + , let 9 denote the a-field generated by the cylinder sets in 5 Z + , and let "Er c 9 Goodwin, G.C., P.J. Ramadge and P.E. Caines (1981). "Disbe the a-field consisting of the sets AZ+ for A E "Ejoo. Let u(~) crete time Stochastic Adaptive Control". SIAM J. Control denote the shift [ U(~) ] T = ~T + l , T f 7l + , and let ~(~o) denote optimization, 19, p.829. Corrigendum: 20, No. 6, 1982, the Markov process with initial condition ~o. Then we have p.893. Theorem 3.1. For any If Ll (5 Z +, g, Pool Kumar, P.R. and A. Becker (1982). "A New Family of Optimal Adaptive Controllers for Markov Chains". IEEE Trans. A ut. Control, A C-27, p.137. Mandl, P., "Estimation and Control in Markov Chains". Advances in Appl. Prob. Vo!'6, pp.40-60, 1974 . almost surely [Pl'o ] for almost all ~o [11-00], or almost surely [Pl'o ] Meyn , S.P. and P.E. Caines, (1987) . "A New Approach to whenever the distribution of ~O is absolutely continuous with Stochastic Adaptive Control". IEEE Trans. Aut. Control AC-92, No. 3, March , 1987. respect to 11-00'

o

Sin , K.S . and G.C. Goodwin (1982). "Stochastic Adaptive Control Using a Modified Least Squares Algorithm". Automatica, 18, No. 3, p.315.

Corollary to Theorem 3.1 Let l e be a bounded measurable function from (X x X x U) 7/, + to JR given by

where c is a bounded measurable function from X x X x U to JR. Then , for every certainty equivalence adaptive control law in the class (2.7) applied to a process satisfying PI-P4 ,

(2.13)

almost surely [Poo l for almost all ~o [JLoo ].

o

REFERENCES Benes , V.E ., (1967). "Existence of Finite Invariant Measures for Markov Processes" . J. Appl. Prob. No. 5, p.203. Billingsley, P., (1968) . Convergence John Wiley, N.Y. , 1968.

0/ Probability Measures.

Borkar, V. and P. Varaiya, (1979). "Adaptive Control of Markov Chains, I: Finite Parameter Set". IEEE Trans. A ut. Control, A C-2~, No. 6, p.953. Caines, P.E. and H.F. Chen, (1985). "Optimal Adaptive LQG Control for Systems with Finite State Process Parameters" . IEEE Trans . Aut. Cant ri, AC-90, p.185.

..