Automatica, Vol. 7, pp. 351-358. Pergamon Press, 1971. Printed in Great Britain.
Steady-State Optimal Control of Finite-State Machines** Commande optimale en r~gime 6tabli des machines h 6tat fini Die optimale Steuerung des station~iren Zustandes ftir Maschinen mit endlichem Zustand O n T H M a . r i ~ , H o e y r l p a B y l e H H e B yCTaHOBI4BIIIeMCa COCTOgHHH M a I I I r t H a M r l C K O H e q H b I M H COCTOflHHeM P. D O R A T O ~
Optimality conditions for the feedback control of finite-state machines operating over an infinite time interval are especially applicable to systems whose closed-loop response is inherently periodic. Summary--The basic problem considered here is the optimal control of finite-state machines operating over an infinite time interval. This is essentially a deterministic version of the theories of HOWARD[4] and EATONand ZADAI4[5] on the control of Markov Chains.+* The deterministic theory presents certain special problems, not found in the stochastic theory. This is especially so for finite state machines whose closed-loop dynamics always include periodic motions, cyclic machines. This class of dynamical system is of interest in Economic and Biological systems where the system behavior is often inherently periodic. Necessary and sufficient conditions are given for optimal control. However, the necessity condition, given in Theorem 4, requires more restrictive assumptions than the sufficiency condition, given in Theorem 2. Also, while both cyclic and absorbing-state machines, with all transient states and one absorbing state, are considered, the major emphasis is on cyclic machines. INTRODUCTION
THE FINITE-STATE machine model for dynamical systems is becoming incleasingly popular in control system theory. See for example the recent text of KALMAN, FALB a n d ARBIB [l] and ARBIB and ZEIGER [2]. There are three important reasons for this. One is that the finite-state model is in many ways the most suitable model to assume if computations are ultimately to be done on a digital computer. A second reason is that in the areas of Biological, Social, and Economic systems, the finite-state model is a most natural model. Finally, the finite-state model sometimes provides insight * Received 22 April 1970; revised 27 October 1970. The original version of this paper was not presented at any IFAC meeting. It was recommended for publication in revised form by associate editor L. Meier. 'fAlter completing the present study it came to the author's attention that related work on deterministic machines had been done by TRAICERand GILL [13]. However a basic difference of the present paper and Ref. [13] is that the former deals with closed-loop control while the latter deals with open-loop control. A significant difference in the statement of optimality conditions results between these two cases. +*Department of Electrical Engineering, Polytechnic Institute of Brooklyn, Brooklyn, New York.
into non-linear systems, which a continuous model cannot easily provide. In this paper we pursue the problem of optimally controlling a finite-state machine over an infinite time interval. The theory developed here is essentially a deterministic version of the theory developed by HOWARD [4] and ZADEH and EATON [5] on the control of Markov Chains. Although some of the Markov-Chain theory specializes easily to the deterministic case, certain results are peculiar to the deterministic case and require special development. This is especially true for the class of systems referred here to as "cyclic machines"; that is, systems whose closed-loop behavior has unavoidable periodic motions. One then attempts to choose the best periodic motion, a limit-cycle in the case of continuous models. This type of behavior may be found in economic systems [9, 10], heatcontrol systems [11], and in certain chemical process-control systems [3]. The general approach and notation used here follows that found in WONHAM [6]. FINITE-STATE MODEL AND PERFORMANCE INDEX
In a finite-state machine model, the state, denoted
x(n), and control input, denoted u(n), assume at each instant of time n, n =0, 1, 2 . . . . a finite set of values. The state set is denoted X and has elements numbered !, 2 . . . . J. The control input set is denoted U and has elements numbered 0, 1. . . . M. For convenience the elements of the sets X and U are actually taken to be the aboveindicated integers. The dynamics of the machine are specified by the next state function 6(x, u), i.e.
x(n + 1) = 6(x(n), u(n)).
(1)
It is assumed here that the machine is stationary, i.e. ~i does not depend explicitly on n. 351
352
P. DORATO
A common representation for a stationary finitestate machine is the state diagram illustrated in Fig. 1. The input shown by each arrow indicates the state transition induced by that particular input. There is, of course, a one-to-one correspondence between the state diagram representation and the next state function 6(x, u).
tions made here and with n approaching infinity, il can be shown that this last assumption is not actually restrictive. The steady-state control problem naturally divides into two cases, depending on the behavior of the closed-loop system and the properties of the loss function L(x, u). If the closed-loop system has an absorbing state, a state which once entered is never left, and a loss function value of zero for this state and every state leads to the absorbing state, then the steady-state problem essentially reduces to a first-entrance type of problem, that is, the limit of (4) as n approaches infinity becomes lZla
voo =
I
u(m))
Y,
(5)
ra=O
FIG. 1. State diagram example X=(1, 2, 3), U=(0, 1).
A state feedback control law may be defined on a finite-state machine as a function ~b(x, n), i.e.
u(n) = gp(x(n), n).
(2)
In particular for a stationary control law,
u(n)=c~(x(n)), also denoted u , where
u,= d?(i), ieX.
(3)
where ~ is the first entrance time to the absorbing state, a, starting from state i. For this class of problems it is convenient to augment the state set to X== (a, X). In the second class of problems, the closed-loop machine is assumed to be ultimately, periodic, that is, after some finite time. Machines with this property will be referred to as cyclic machines. In addition, it is assumed that the loss function is not identically zero for a complete cycle. Figure 3 illustrates these two classes of machines.
The closed-loop dynamics for a given control law are illustrated in Fig. 2. ( i
ui
I
I
2 3
0 1
FI~. 3. Examples of absorbing state and cyclic closed-
FIG. 2. Closedqoop state diagram for machine of Fig. 1.
loop machines (a) Absorbing state machine (b) Cyclic machine.
Note that in the dosed-loop system the system behavior is completely initial state determined. The control laws considered here will be restricted to be state feedback control laws. It is assumed that the performance of a closedloop machine is satisfactorily measured by the sum of loss functions L(x, u), i.e.
V,= ~ L(x(m), u(m)),
(b)
(4)
m=0
where L(x, n) is a bounded non-negative real va!ued function of state and control input. Ultimately the object will be to minimize V, when n approaches infinity, the steady-state control. The loss function is assumed stationary, and the class of control laws considered here will be restricted to stationary control laws. With the other stationarity assump-
Note that keeping with common usage [6] a machine with a cycle of period one, as in an absorbing-state machine, is not considered a cyclic machine. The period for cyclic machines must be greater than one. The set of states in a loop are referred to as the loop states, denoted L, and the period of the loop is denoted P. For example, the cyclic machine in Fig. 3, L = ( 1 , 2) and P = 2 . It is further assumed that the machine has only a single loop. Finally it should be stressed that the separation of machines into cyclic and absorbing-state machines is based on the closed-loop properties of the machines. The determination of these dosedloop properties from the next state function is in general a nontrivial problem.
Steady-state optimal control of finite-state machines The multlloop problem is not considered here. Also, linear sequential machines cannot be cyclic machines since the zero state is always an absorbing state [12], hence cyclic machines are inherently nonlinear machines. For cyclic machines the limit of Vn as n approaches infinity is infinite, hence a modified performance index must be introduced. An obvious choice is,
353
corresponding to the control law u and starting state i, then v is given, uniquely, by (9). Functions v which satisfy an equation like (10) are referred to as potentials [6]. The potential of lemma 1 has a direct interpretation, in the absorbingstate case, as the performance index value (5) for the control law u, starting in state i.
Lemma 2 Vn= 1 ~ L(x(m), u(m)),
(6)
rim=0
For a cyclic-machine problem, if o(j) salisfies the equation,
which, for a cyclic machine, becomes equivalent, in the limit as n approaches infinity, to 1
.~(us)v(i)+ L(i, us)=2, i~X,
(12)
then
e
Voo = ~ m~=t L(x(m), u(m)),
(7)
with a loop starting state. For cyclic machines then, (7) is taken as the relevant performance index. PERFORMANCE ANALYSIS OF CLOSED-LOOP MACHINES In this section the value of performance index for a given control law is related to certain equations. These equations are then used in the next section in the statement of necessary and sufficient conditions for optimality. Proofs of the various lemmas and theorems may be found in the Appendix. Following WONHAM[6] we define an operator La, operating on bounded real valued functions* v(j), j~X, as follows:
0~ v(i) = v(6(i, ut))-- v(i).
(8)
Note that .oq'v(i) represents a one-step change in the value of the function v as the machine moves from state i with feedback control u s. The dependency of .oq' on u s is denoted, when required, by writing .~(u~) in place of L#.
Lemma 1 For an absorbing-state problem
v(i) = ~ C(x(m), u(m)), x(0)= i, i~Xa,
(9)
m=0
satisfies the equation,
.~(us)v(i)+L(i , us)=0, i¢Xa,
(10)
with boundary condition,
v(a) = O,
(11)
where x is the state path corresponding to the starting state i and feedback control u and %a is the first entrance time into the absorbing state " a " starting from state i. Conversely, if v is a function satisfying (10) and (11) and x is the state path * Functions v(j), ]eX will be denoted v in the usual way.
,
1
P
=:m ~e----1L(x(m), u(m)), x(O)eL,
(13)
where x is the state path corresponding to the control law u and any starting state in the loop L generated by the above control law. Hereafter 2 will be referred to as the loop loss. In addition, a potential v which satisfies (12) is given by, qk
- 1
Z (L(x(m), u(m))-2), i ~ X - { k } ,
v(i)=
m=0
v(k)=O, k~L,
(14)
where x is the state path corresponding to control law u starting in state i~ X - {k}, and %k is the first entrance time to state k, which is assumed in loop L, starting in state i. Given a control law u, expression (14) can then be used to evaluate a potential which satisfies (12). This will be useful in the next section where necessary and sufficient conditions for optimality are given. Note that in the cyclic-machine case the performance index is given by 2 and not the potential v(i). Note also that the potential given by (14) is not the only solution to (12). Since for any constant c, Aac = 0, the solution to (12) may contain an arbitrary constant. However, the value of 2 is unique, since the loop gain, given by (I 3), is unique. Expression (14) follows directly from lemma I. The difference in the upper limit of the sum in (9) and (14) is simply because of the fact that in the absorbing-state case the loss function is zero for the absorbing state, while the cyclic-machine ease the loss is generally non-zero for loop states. It should be recalled that k is a loop state and not an absorbing state in expression (14). The following example illustrates the computation of potential v in lemma 2.
Example 1 Consider the machine of Fig. 1. The problem is to compute 2, and v which satisfies (12) for the control law, and the loss function given below:
354
P. DORATO
i
L(i, ul)
II i
-o P
1
1
I
2 3
0 1
314 0
3/4 1 1
The closed-loop system is a cyclic machine and is shown in Fig. 2. Thus L = ( 1 , 2), P = 2 , and 2 is given by L(I, 11+L(2, 0) 2 -3/4.
2-
Let the state " k " required in lemma 2 be the loop state 1, then "~21=1 and 1:31=1, and the potential from (141 is
v(1)=0, v(2) =L(2, 0) - 3/4 = 3 / 4 - 3/4 = 0, v(3)=L(3, 1 ) - 3 / 4 = 1 - 3 / 4 = 1/4. One can verify by direct substitution that the above potential does indeed satisfy (12). OPTIMAL CONTROL In this section some theorems giving necessary conditions and sufficient conditions for optimality are presented. An approximation in policy-space [4, 6] algorithm is also outlined.
Theorem 1. Sufficiency Condition for AbsorbingState Machines. If the potential v° corresponding to the control law u° is such that the inequality
0 <<.~(ui) v°(i) + L(i, ui),
(15)
L(x°(m), u°(m))<_ m=l
L(x(m), u(m)) (181 1
for all u where the variables have the same meaning as in Theorem 1, but the starting state i is a loop state, and P° and P represent the periods of the loops L ° and L corresponding to the two control laws u ° and u. Of course, since the number of states is finite and the range of values of the control law is finite, it is actually possible to enumerate all possible control laws. More particularly, if # U and # X denote the number of elements in U and the number of elements in X, then the number of possible control laws is (@ U) -~x, since for every state i there are U possible choices of ui. Thus one can test directly for the optimality of a given control law by direct comparison with all control laws. This involves (@ U) ex computations. The advantage of the above sufficiency theorem is that only ( # X). (4~ U) computations are required to test for optimality, since for each state # U computations are required and # X states must be tested. The next theorem indicates how a given control can be modified to yield improved performance.
Theorem
3, Policy Improvement, Machines. If the strict inequality,
2a> ~(uni)va(i)+ L(i, u~),
,.=0
~'ia
L(xO(m), u°(m))<_ ~ L(x(m), u(m)),
(16t
"=0
for all u, where the zero superscript indicates values corresponding to the control law u ° and no superscript represents values corresponding to the control law u. It should be noted that in both cases the machine is started in the same state i, and the inequality (16) is satisfied for all ieX. Theorem 1 also follows directly from the stochastic results of Ref. [5]. However the following theorem is not a specia| case of the corresponding stochastic results, given in Ref. [4].
Theorem 2, Sufficient Condition for Cyclic Machines. If the potential v° and loop loss 2 corresponding to the control law u ° are such that the inequality, 2° ~<~(ui) v°(i) + L(i, ui),
(17)
is satisfied for all ie X and all u~eU, then the control law u is optimal, i.e.
(19)
is satisfied for at least one state in the loop generated by u B and equality for all other states, then, P---A"=I ~ L(XA(m)' u'~(m))> r ,.=1 L(XB(m)' uB(m))'
is satisfied for all ie X and all u~U, then the control law u° is optimal, i.e. 7;°ia
Cyclic
(20) i.e. the loop loss corresponding to the control u B is strictly less than the loop loss corresponding to ua. Theorem 3 has potential application to the development of an approximation-in-policy-space algorithm which yields a sequence of "improved" control laws. That is, one can attempt to improve the nth iteration policy, ul ") by choosing u~"+l) such that the inequality,
2~,)~(u~"+l))v(")(i)+L(i, u~"+l)),
(21)
is satisfied, with strict inequality for at least one loop state. It must then be shown that ~(n)converges to the optimal value. Indeed, since there are only a finite number of possible control laws, if 2 (~) does converge, it will do so in finite time. However, it is not possible to prove convergence without having the sufficient condition (17) also be a necessary condition. Unfortunately, to do this additional restrictions, not usually met in practice, are required on the structure of the closed-loop machine. The next theorem indicates the additional conditions required for necessity.
Steady-state optimal control of finite-state machines
Theorem 4, Necessary Condition for Cyclic Machines. If the control law u° is optimal and the closed-loop machine always has a single loop with L = X, a cyclic machine with this property will be referred to as an irreducible cyclic machine, then, 2 ° ~ .~(u~)v°(i) + L(i, ui),
(22)
for all uieU and all i~X, where 2 °, u°, and v° satisfy 12. Unfortunately, most interesting machines are not irreducibly cyclic. For example, the machine of Fig. 1 is not, as demonstrated by the control law of Fig. 2, where L = ( 1 , 2 ) ¢ X = ( 1 , 2, 3). This is in contrast to the stochastic case where the related concept of an irreducible ergodic chain, see Ref. [6], does correspond to an interesting class of problems. For irreducible cyclic machines with a bounded positive loss function it can be shown that the approximation-in-policy-space algorithm, (21), does indeed converge to the optimal. The proof is not given here since as previously stated this is a rather restricted class of problems. The next example illustrates the use of some of the above theorems.
Example 2 Again consider the machine of Fig. 1 with the loss function of example 1. The problem now is to attempt to find a control law which minimizes the loop-loss, 1
L(x(m), u(m)).
P 1
(23)
The control law which minimizes the "instantaneous loss" L(x(m), u(m)), is exactly the control law given in example 1. Let us consider the optimality of this control law. From the computations of example 1 the loop loss for this control law is 2°= 3/4 and the potential is v(°)(1)=O, v(°)(2) = 0, v(°)(3) = 1/4.
modified in state 3 to u 3 = 0, then the inequality of (19) is satisfied. Unfortunately, the state 3 is not in the loop generated by this new control law, hence improved performance cannot be guaranteed as indicated by Theorem 3. Indeed, it can be verified directly that the new loop loss is identical to initial loop loss. This illustrates the non-convergence of the approximation-in-policy-space for non-irreducible machines. However, consider the control law u~i)
Using the same computations as in example 1, the loop loss for this control law is
,~(1)_ L(2, 1)+L(3, 0)_1/2 ' 2 and tile potential is (with v(1)(3)=0) v(l)(1) = 1/2 v(2)(2) = I/2 V(3)(3) = 0 . In this case the righthand side of inequality (17) evaluates to .~(O)v
£P(0) v(1)(3) +L(3, 0) ==v(t)(2) - v(t)(3) + 0 = 1]2, L#(l)v(l)(3) +L(3, 1)= v(l)(l) - v(~)(3)+ 1 =3/2. Since the equality in theorem 2 is now satisfied (2(1)= I/2), the control law ul 1) is optimal. CONCLUSIONS
(24)
The right-hand side of inequality (17) evaluates to Lf(O)v(°)(l) +L(1, £f(1)v(°)(1) +L(1, if(O) v(°)(2) + L(2, ~(1)v(°)(2) +L(2, ~(0)v(°)(3) +L(3, ff(1)v(°)(3) +L(3,
355
O) = v(°)(3)- v(°)(1) + 1 = 5/4, 1) = v(°)(2) - v(°)(1) + 3]4 = 3/4, O) = v(°)(1) - v(°)(2) + 3/4 = 3/4, 1)= v(°)(3)- v(°)(2)+ 1 = 5/4, 0) = v(°)(2) - v(°)(3) + 0 = 1/4, 1) = v(°)(1) - v(°)(3) + 1 = 3/4.
Note that for state 3 and control input 0, the inequality required in theorem 2, with 2 (0) = 3/4, is violated; hence one cannot conclude that the above control law is optimal. Since the above machine is not irreducible cyclic, one cannot conelude either that the satisfaction of inequality (17) is necessary for optimality. If the control law is
Although the theory presented here is essentially a special case of stochastic theory developed almost a decade ago, it does present certain novel features, perhaps the most notable being the notion of periodic motion optimization. Also of interest is the difficulty in obtaining reasonable necessary conditions for optimality for finite state machines. However, this appears to be the price one pays for quantization; for example, necessary conditions for optimality in discrete-time systems are more restrictive than in continuous-time systems [7]. Efficient algorithms for the determination of an optimal control law for cyclic machines are also limited to the case of ineducible cyclic machines. The cyclic-machine theory presented here is limited to single-loop machines. The extension to multi-loop machines would be of interest, as would
356
P. DORATO
be the development of some reasonable tests on the open-loop machine to determine the number of loops for a given class of control laws. It is also possible to develop, using the same arguments given here, a theory of finite-state deterministic games paralleling the stochastic theory of KUSHNER and CHAMBERLAIN [8]. However, this extension is very direct and yields few novel results.
v(a) = L(a, u,,) = O, hence the boundary condition (11) is also satisfied. Conversely, if v(i) satisfies (10) for all i, then
.gev(x(m)) + L(x(m), u(m)) = O, for O<~m<~zia. Summing up this equality from 0 to ~ and using the fact that for x(0) = i and v(a) = O, zia
~, .~.v(x(m)) = I v ( x ( 1 ) ) - v(x(0))] m=0
+ [ v ( x ( 2 ) ) - v(x(1))] + . . .
REFERENCES [1] R. E. KALMAN,P. L. FALBand M. A. ARmB: Topics in Mathematical System Theory. McGraw-Hill, New York
(1969). [2] M. A. ARmB and H. P. ZEIOER: On the relevance of abstract algebra to control theory. Automatica 5, 589606 (1969). [3] M. FJELD: Optimal control of multivariable periodic processes. Automatica $, 497-506 (1969). [4] R. A. HOWARD: Dynamic Programming and Markov Processes. M.I.T. Press, Cambridge, Mass. (1960). [5] J. H. EATON and L. A. ZADEH: Optimal pursuit strategies in discrete-state probabilistic systems. ASME J. has. Engng, 23-29 (March 1962). [6] W. M. WONHAM: Lecture notes on stochastic control, Center for Dynamical Systems, Brown University, Providence, R.I. (February 1967). [7] J. M. HOLTZMAN: Convexity and the maximum principle for discrete systems. IEEE Trans. Aut. Control AC-11, 30-35 (1966). [8] H. J. KUSnNERand S. G. CHAMBERLAIN: Finite state stochastic games: Existence theorems and computational procedures. IEEE Trans. Aut. Control AC-14, 248-255 (June 1969). [9] R. H. SaxoTz, J. C. McANOLTY and J. B. NAIN~S: Goodwin's non-linear theory of the business cycle: An electro-analog solution. Econometrica 21, 390-M11 (1953). ll0] L. A. MErZLER: The nature and stability of inventory cycles. Rev. econ. Stat. 23, 113-129 (1941). [11] W. K. ROOTSand J. M. NIOrrrlNQALE: Two-position discontinuous temperature control in electric space heating and cooling processes--lI. The controlled process. 1EEE Trans. AppL Ind. 70, 27-38 (1964). [12] B. Fm~DLAND: Linear modular sequential circuits. IRE Trans. Circuit Theory CT-6, 61-68 (1959). [13] I. L. TP.AmERand A. GILL: On an asymptotic optimization problem in finite directed, weighted graphs. Inform. Control 13, 527-533 (1968).
= [v(x(Zia)) - v(x(0))] = - v ( i ) ,
(A. 1)
one obtains the desired results, equation (9), for
v(i). The uniqueness of directly from the same section 3 of Ref. [6]. written in vector form
the solution (9) follows arguments as in Chapter I, In particular, (10) can be as
(P-1)9+
= 0,
(A.2)
where ~ and L are vectors with components v(i) and L(i, ui), i= 1, 2 . . . . J, respectively, / is a J by J unit matrix, and P is the state transition matrix (a matrix whose ij entry is one if the one step transition is from state i to state j, and is zero otherwise) for the transient states (1, 2, . . . J). Since the states ( 1 , . . . J ) are transient there exists an N such that p u = 0 , where 0 is the zero matrix. Thus, N
pro= E Pro. m=O
m=0
Since the right sum above is finite, the inverse required to guarantee a unique solution to (A.2) is assured.
Proof of lemma 2 Since (12) is satisfied for all ieX, it must be satisfied along a loop path, that with x(O)i~L, &t'v(x(m)) + L(x(m), u(m)) = 2.
APPENDIX
Proofs o f lemmas and theorems
for 1 ~
Proof o f lemma 1
x(e)=x(1),
The potential given by (9) can be written, for i#a,
v(i) =L(i, u3 + ~ L(x(m), u(m)) 1
= L(i, u3 + v(fi(i, u3). Hence, by definition of £~a,
0 =L(i, ul) + 5Yv(i), and (10) is established for i # a. For i= a, L(a, u~) = 0 and 6(a, u a ) = a so that (10) is again satisfied. Also, since %, = 0 and L(a, us)= 0, it follows that,
P
~ev(x(m)) = v(x(e))- v(xO))=0,
(A.3)
1
and obtains equation (13). That v(i) given by (14) satisfies (12) follows directly from lemma 1.
Proof o f theorem 1 Since inequality (15) is satisfied for all i e X and all uteU, it follows that for any path {x(m), 0~< m,<. z~}, starting in i and using control law u
Steady-state optimal control o f finite-state machines
0 <.~v°(x(m)) + L(x(m), u(m)). If this inequality is summed from 0 to %a and (A. 1) is used, it then follows that,
Assume the conclusion false, i.e. assume that for some states ~ and some control t7 one has the strict inequality
Tla
A° > .~02~) v°(i) + L(i,
v°(i)< ~ L(x(m), u(m)). m=O
But the potential corresponding to u ° is given, from lemma 1, by,
L(x°(m),
u~=u, ieg, o ieX-g. UI• ~U I,
u°(m)),
rn=0
hence (16) is established.
Proof of theorem 2 Since the inequality (17) is satisfied for all ieX and all u~eU, it follows that for the loop path x, corresponding to the control law u,
2o<..£~,vO(x(m))+ L(x(m), u(m)), for 1 ~
L(x(m), u(m)).
1" 1
But from lemma 2,
;~°=l ~ L(x°(m), u°(m)), hence the conclusion that uj is optimal, equation (18), follows.
Proof of theorem 3
2`4> *~V`4(xB(M))+L(x(M), uS(M)). For all other values of m one has, by assumption, equality, i.e.
2~ =.~v`4(xB(m)) +L(xB(m), uB(m)), for 1 ~
It will now be shown that u* yields a smaller value for the performance index than u*, contradicting the optimality of u °. Let x ° denote the state path corresponding to the control law u*, then, 2 ° > LP v°(x*(m)) + L(x*(m), when
x*(m)eX, 2° =
e
2`4>-;~ ~
P ,n=1
L(xB(m), uB(m)).
From lemma 2,
2`4=-~ ~m~l=L(x`4(m), u`4(m)), hence the conclusion (20) is established.
Proof of theorem 4 The p r o o f of theorem 4 follows that of WONHAM~ [6] for irreducible ergodic M a r k e r Chains,
u*(m)),
(A.4)
and,
v°(x*(m)) +L(x*(m), u*(m)),
(A.5)
when x*(m)eX-JT. Since by the assumption of an irreducible eyelic machine every state is touched, it follows that inequality (A.4) is satisfied for some m. Adding the contributions of (A.4) and (A.5) for l - - ~ P* 1
L(x*(m), u*(m)).
Since u °, v°, and 2 ° satisfy (12) it follows that, 2°=1~
Let x ~ be the loop path generated by the control law u B. Then by assumption xB(m) hits the state i, say at time m=M, such that the inequality (19) is satisfied, i.e.
~),
and consider the new control law u* defined as follows:
171a
v°(i) = ~
357
and hence that established.
L(x°(m), u°(m)), the required
contradiction
is
R6sum6--Le probl6me fondamentai consid6r6 iciest la commande optimale des machines/t 6tat fini sur une p6dode de temps infinie. Ceci consitue essentiellement une version d6terministe des th6ories de Howard [4] et de Eaton et Zadeh [5l sur la commande de chaines de Markov. La th6ode d6terministe pr6sente certains probl6mes spbziaux que l'on ne trouve pas dans la th6ode al6atoire. Ceci est particuli6rement vrai pour les machines/t 6tat fini dont la dynamique en boucle ferm6e eomprend toujours des mouvements p~riodiques, les machines cycliques. Cette categorie de syst6mes pr~ente un interSt dans les syst~mes &;onomiques et biologiques dans lequels le comportement du syst~me est souvent intrins&luement p6dodique. Des conditions n~,essaires et suflisantes sent donn6es pour la commande optimale. Toutefois, la condition de n~essit6 donn6e darts le th6or~me 4 exige des hypotheses plus restrictives que la condition de suftisance donn6e darts le th6or~me 2. De plus, alors que I'on consid6re aussi bien des machines cycliques que des machines ~t 6tat d'absorption, la plus grande attention est accord~e aux machines cycliques.
358
P. D O R A T O
Z u s a m m e n f a s s u n g - - D a s hier betrachtete G r u n d p r o b l e m ist die optimale S t e u e r u n g v o n M a s c h i n e n mit endlichem Z u s t a n d , die tiber ein unendliches Zeitintervall arbeiten. Dies ist wesentlich eine deterministische Version der T h e o r i e n v o n HOWARD [4] u n d EATON u n d ZADEH [5] tiber die S t e u e r u n g yon M a r k o w s c h e n Ketten. Die deterministische Theorie zeigt b e s t i m m t e spezielle Probleme auf, die m a n in der stochastischen Theorie nicht antrifft. D a s ist speziell der Fall for M a s c h i n e n m i t - e n d l i c h e m Z u s t a n d , deren D y n a m i k bei geschlossener K e t t e i m m e r periodische Bewegungen, zyklische M a s c h i n e n einschliegt. Diese Klasse d y n a m i s c h e r Systeme ist bei /Skonomischen u n d biologischen Systemen von Interesse, wo das Systemverhalten oft inhfirent periodisch ist. N o t w e n d i g e u n d hinreichende B e d i n g u n g e n werden ftir die optimale S t e u e r u n g angegeben. Jedoch erfordert die im T h e o r e m 4 gegebene notwendige B e d i n g u n g einschriinkendere A n n a h m e n , als die in T h e o r e m 2 angegebenen. Wiihrend also sowohl zyklische M a s c h i n e n als a u c h z u s t a n d s a b s o r b i e r e n d e M a s c h i n e n betrachtet werden, liegt die H a u p t b e t o n u n g a u f zyklischen Maschinen.
Pe3~oMe--PaccMaTprmaeMaa 3,aecb OCHOBHa~t IIpo6YleMa pTHOCHTCR K OEITHMaJIbHOMy ynpaBJIeHH~O MatHHHaMH C KOitettttblM COCTO~IHI, IeM Ha 6eCKOHeqHOM rlpoMe~yTKe BpeMeHH. .")TO ilpe~cTanJi~eT co6oi:i, B OCHOBHOM, ~acTep M14nHpoBaHHyro Bepcmo Teoprm Xoyap/la [4] 14 HTOHa 14 3a/le [5] 06 ynpaBneHmi MapxoBCKr~Mtt llerlHM14, fl,eTepM14HrlpoBaHlta~I Teop14~l rlp14BOllrlT K HeKOTOpblM OCO~blM rlpo6JieMaM KOTOpble He BcTpeqaroTc~ B cJtyqa~rto~ TeopHH. '~TO Oco~eHHO BepHo £UI~IMalIIHH C KOHeqHbIM COCTOflHHeM, ~HitaMrlXa KOTOpblX B 3aMKHyTOM IK14T Bceraa rIepIlO/IItqeCKHe /IBHXe14Hfl, UHKIIItHeCK14X MaIH14H. ~TOT KJIaCCC14CTeMlqpe/lCTaBJlrleT HItTepec B 3KOHOMHqeCKHX t4 611OJIOFI,IqeCKtlXCHCTeMax B KOTOpblX noBe:IeHHe CttCTeMbI qaCTO CBOI~CTBeHHOIleplloJlttqHO. j l a t o l c a HeO6XO~lltMble 14 ~OCTaTOqHble yCJIOBHII ~I211t OI1TltMa.rlbHOFO yrlpaBJleHI4~l. Oj1HaKO, ycJIOBlle Heo6xo/IHMOCTrI JlaHHoe B TeopeMe 4 Tpe6yeT 60slee oFpaHHqeHHblX Ilpe~rlOCbtJ~OK qeM yCJIOBHe /.I,OCTaTOqHOCTII jIaHHOe B TeopeMe 2. TaKxe, xox~ pacCMaTpttBatOTClt KaK IlttKJ'IHqeCI(H~ MaHJI4HbI TaK g MalllItHbI C COCTOiIttIteM rlOI'.rlOlll(iHHIt, CO BCeMH nepexoJIHMMH COCTOIIHII,NMH H C OJII-IHM COCTOItHiteM rlOl'JlOllleHlfll, HaH6Oabllaee BHHMaHHe y:len~le14c8 IIHK.rIHqeCKHMMaltlHHalVl.