Systems & Control Letters 34 (1998) 1–9
Direct construction of LQ regulator based on orthogonalization of signals: Dynamical output feedback Yoshiaki Kawamura Faculty of Engineering, Osaka Electro-Communication University, Hatsu-cho 18-8, Neyagawa, Osaka 572-8530, Japan Received 21 March 1997; received in revised form 29 September 1997; accepted 29 September 1997
Abstract A design method of the LQ regulator, in which unknown systems are directly optimized from response signals without the process of identi cation, is presented from a viewpoint of orthogonalization of signals. The basic idea for the state feedback c 1998 Elsevier Science B.V. All rights reserved. is extended to dynamical output feedback. Keywords: Linear quadratic regulator; Output feedback; Orthogonality; Identi cation; Gradient method; Learning control
1. Introduction This note deals with a direct design method of the LQ regulator which needs no separate process of identi cation. The basic algorithm, which has been proposed by Kawamura since 1985 [5, 6, 13], is brie y explained for the rst-order system as the following iterative improvements of the state feedback gain G: G = −
a b hzunit ; zunit i ; b b hzunit ; zunit i
(1)
a b is the unit initial state response, zunit is the where zunit unit impulse response observed under a closed-loop condition, respectively, and h ·; · i denotes their inner product in a functional space. Then G converges to the LQ optimal gain. The calculation needs no data on system parameters. We can nd a prototype of such an approach in a pioneering experiment by Narendra and Streeter in the 1960s [12] on a simple matching problem. Although various adaptive control schemes have been developed for matching problems, the LQ problem
c 1998 Elsevier Science B.V. All rights reserved 0167-6911/98/$19.00 PII S 0 1 6 7 - 6 9 1 1 ( 9 7 ) 0 0 1 3 8 - 2
requires the usual identi cation as yet. Recently, Furuta et al. gave the solutions of the LQ and LQG problems in terms of the Markov parameters [2,3]. It oers an alternative approach to the problems, which has the distinction that a drawn-out input–output causal relation of a system is directly taken into account in the controller design. However, the open-loop input– output relation is not necessarily a good description of the controlled (closed-loop) response of a real system. On the other hand, Eq. (1) is based on a quite simple property of the closed-loop response: orthogonality on the LQ optimal response signals [7]. This basic idea should be regarded as one of the most simple and essential properties of the LQ optimality from the viewpoint of response signals. We can interpret the optimization as “orthogonalization of real responses”. Although the original idea is based on state feedback, only the output is available for many control problems. Kawamura also derived a nonminimal realizing state space model whose state feedback is equivalent to dynamical output feedback, and suggested a possibility of applying it to the orthogonalization [8].
2
Y. Kawamura / Systems & Control Letters 34 (1998) 1–9
The main purpose of this note is to give a detailed discussion of this extension. The basic idea, which has been mainly published in Japanese, is explained preliminarily.
2.1. Responses and inner product The Kalman lter satis es some orthogonality conditions on signals, and therefore, we can verify its optimality without data on a system model. However, corresponding (dual) relations have not attracted the attention for the LQ problem because the duality has been understood through the Riccati equation. The general idea of this “orthogonality” on control given by Kawamura [7] plays a central role through this paper. Though the orthogonality is given for a general linear (possibly time-varying, stochastic and non-parametric) system, we restrict discussion within the time-invariant deterministic state space model. Let S be a linear time-invariant system x(t + 1) = Ax(t) + Bu(t); Cx(t) z(t) = Du(t)
(2) (3) n
de ned on t = 0; 1; 2; : : : ; L where x(t) ∈ R is the state, u(t) ∈ Rm is the controlling input, and z(t) is the controlled output (evaluation). We assume the following standard condition for the LQ control. Assumption 1. (A; B) is stabilizable, (C; A) is detectable and DT D ¿ 0. Remark 2. It is natural for us to regard Cx(t) or x(t) as the response. However, this aspect does not imply the orthogonality. It is important to regard z(t) as the response, so that z(t)T z(t) agrees with the cost x(t)T C T Cx(t) + u(t)T DT Du(t). Let z denote the whole signal {z(0); z(1); : : : ; z(L)}. Now, we de ne an inner product of two response signals by h z1 ; z2 i =
T
z1 (t) z2 (t):
u(t) = Gx(t) + u(t); ˜
(5)
where G is a time-invariant gain and u(t) ˜ is a supplementary input. Then the feedback transforms S into x(t + 1) = (A + BG) x(t) + B u(t); ˜ C 0 z(t) = x(t) + u(t): ˜ DG D
2. Orthogonality of responses
L X
Consider a kind of state feedback
(6) (7)
Let SFB be this transformed (closed-loop) system with the following responses: z a is an initial state response such that xa (0) ∈ Rn and u˜a (t) ≡ 0; z b is an impulse response such that xb (0) = 0 and u˜b (t) ≡ u˜b (0) (t) where u˜b (0) ∈ Rm and (t) is the unit impulse at t = 0. 2.2. Basic orthogonality condition As is well known, a suitable time-invariant state feedback gives the LQ control. Therefore, z a is the optimal response if V is minimum with respect to G. It implies the following property. Theorem 3. Suppose Assumption 1. Then the state feedback u(t) = Gx(t) is the in nite-horizon LQ control if and only if lim h z a ; z b i = 0
L→∞
for any xa (0) and u˜b (0). This relation means, as a nature of the LQ control, that the closed-loop initial state response and the impulse response are orthogonal. We can also interpret the relation as orthogonality between the optimal response z a and an additional perturbation z b . Since we restrict the discussion to the state space model, we can proof the orthogonality in relation to the Riccati equation as follows (a proof is shown in Appendix A). Corollary 4. Suppose Assumption 1. Then the orthogonality condition of Theorem 3 is equivalent to the algebraic Riccati equation and the gain equation P = AT PA + C T C
(4)
t=0
We study the in nite-horizon LQ control which minimizes the total cost V = limL→∞ h z; z i.
− AT PB{DT D + BT PB}−1 BT PA ; G = − {DT D + BT PB}−1 BT PA with the restriction P¿0.
(8) (9)
Y. Kawamura / Systems & Control Letters 34 (1998) 1–9
Remark 5. Let G ∗ be the optimal gain. It is well known that Assumption 1 guarantees stability of (A + BG ∗ ). Therefore, the orthogonality condition implies the stability as well under Assumption 1. In addition (A + BG) is stable if G is in a neighborhood of G ∗ .
The inner product is signi cant even for unoptimal systems as a sensitivity relation. Recall that z a and z b depend linearly on (0)T = (xa (0)T u˜b (0)T ). Then there exists a (n + m; n + m) matrix which satis es (0)T Let
(0) = h z a + z b ; z a + z b i:
=
aa
ab
ba
bb
(10)
(11)
be its (n; m) division. In other words aa , ab and bb are corresponding to h z a ; z a i, h z a ; z b i and h z b ; z b i, respectively. It follows that h z a + z b ; z a + z b i = h z a ; z a i + 2h z a ; z b i + h z b ; z b i = xa (0)T +u˜b (0)T
aa a
x (0) + 2 xa (0)T
bb
ab
u˜b (0)
u˜b (0);
(12)
regardless of whether G is optimal or not. Then aa , ab and bb express the 0th-, rst- and second-order sensitivities with respect to the perturbation, respectively. Therefore, the orthogonality of Theorem 3 is equivalent to limL→∞ ab = 0. We refer to as the fundamental sensitivity matrix. 2.4. Dual orthogonality for estimation Consider the estimation problem which is the dual one of the above control problem, that is, prediction error signals satisfy ˜ − 1) + G T y(t|t ˜ − 1) + v(t); x(t ˜ + 1|t) = AT x(t|t ˜ − 1) + w(t); y(t|t ˜ − 1) = BT x(t|t
Their orthogonality relates to the LQ orthogonality as follows [7] (a proof is shown in Appendix B). Theorem 6. The orthogonality conditions lim h z a ; z b i = 0 that is; lim ab = 0 ;
L→∞
2.3. Sensitivity relation
where v(t) and w(t) are independent zero-mean Gaussian white noises with covariances CC T and DDT , respectively. Let signals be de ned on t −L; t −L+1; : : : and let L → ∞. Then x(t|t ˜ − 1) is the minimum prediction error of the state and y(t|t ˜ − 1) is the innovation process if G T is the time-invariant Kalman gain.
L→∞
˜ + 1|t) y(t|t ˜ − 1)T } = 0 lim E{x(t
L→∞
are the dual relations in the sense that SFB satis es the former if and only if Eq. (13) satis es the latter.
3. Orthogonalization algorithm 3.1. Basic algorithm We can obtain the LQ regulator of an unknown system by nding the gain G which orthogonalizes the signals. In this section we show the basic orthogonalization algorithm. In the kth step (k = 0; 1; 2; : : :) of the iterative improvements on the gain, x G at a matrix Gk and x L at an integer Lk . Observe practical response signals of SFB to calculate sensitivity matrices. The basic algorithm of iterative improvements on the gain is Gk+1 = Gk − {
bb −1 ab T k } { k } ;
(14)
where kab and kbb are the sensitivity matrices in the kth step. Suppose the special case that n = m = 1 and x(0) = u(0) ˜ = 1. Then Eq. (14) agrees with Eq. (1). We can easily calculate the fundamental sensitivity matrix for vector signals from inner products of responses (the following [9] is somewhat generalized from the original form [5, 6, 13] where ab and bb are separately given from hz a ; z b i and hz b ; z b i). Let zki (i = 1; 2; : : : ; Nk ) be the response signals observed in the kth step under the condition that xki (0) ∈ Rn and u˜ki (t) = u˜ki (0) (t). Then
(13)
3
aa k ba k
×
ab k bb k
= {k }−1
Nk X Nk X
i=1 j=1
ki (0) hzki ; zkj i kj (0)T
{k }−1 ; (15)
4
Y. Kawamura / Systems & Control Letters 34 (1998) 1–9
where Nk X
ki (0)ki (0)T ; i=1 xki (0) : ki (0) = u˜ki (0) k =
(16)
The number Nk is larger than n + m so that k ¿ 0. The period Lk is a suciently large constant or a variable which is suciently large as k → ∞. However, a careful choice of Lk is important for real systems because the eect of uncertain behavior of responses increases as Lk increases. Given an original gain G0 we can use this algorithm without data on system matrices. We obtain the LQ regulator by removing u(t) ˜ from SFB after orthogonalization. Remark 7. In some problems the actual range of x(0) is a linear subspace of Rn . Then the optimal gain is unde ned outside the range. Therefore, it is possible to replace −1 k with the pseudo-inverse of k . Its approximation by (k + I )−1 is practically useful for calculation (15) where is a small positive number and I is the unit matrix. 3.2. Relation to the Newton–Raphson class The Newton–Raphson class (Kleinman’s method is representative) is known as a useful method of solving the algebraic Riccati equation [1]. We can approximately analyze the convergence property of Eq. (14) in comparison with the Newton–Raphson method. Approximate Lk with ∞ provided that (A+BG0 ) is stable and Lk is suciently large. Then we can rewrite kaa and Eq. (14) in terms of the system matrices (refer to Appendix C) as Pk = (A + BGk )T Pk (A + BGk ) + C T C +GkT DT DGk ; Gk+1 = −{DT D + BT Pk B}−1 BT Pk A:
(17) (18)
where Pk = limL→∞ kaa . They agree with the Hewer’s algorithm (discrete-time Newton–Raphson method) [4]. We can regard Eq. (14) as a new member of the class written by signals. It is known that Hewer’s algorithm converges rapidly (the second-order convergence) to the LQ gain under Assumption 1 and stability of (A + BG0 ). It means that Eq. (14) has the same convergence property.
Remark 8. The above discussion is based on the approximation of Lk . Indeed there exists an important dierence between their convergence properties because Lk is actually nite in Eq. (14). The inner product is de ned and Eq. (14) has a meaning as a kind of Newton’s method even if (A + BGk ) is unstable. Some theoretical consideration and numerical simulation strongly suggest that if Assumption 1 holds, Gk converges to the optimal gain regardless of the initial stability. The global convergence means that any unstable feedback is stabilized within a nite k (refer to Remark 5). This behavior is owing to the Newton structure of Eq. (14). In general, the stabilization is more rapid for extremely unstable but stabilizable systems (refer to the example in Section 4.6). This tendency is similar to that of the Riccati dierence equation. 4. Dynamical output feedback 4.1. Input–output model We assume that x(t) is not directly measured and we study this output–feedback problem in the following discussions provided that the system has no noise. The standard approach is based on the separation into state observation and state feedback. However, the separation makes the problem rather dicult for unknown systems from the viewpoint of orthogonalization. We directly apply the idea of orthogonalization to a nonminimal state-space model. Suppose that S, which is given by Eqs. (2) and (3), has a measurable output y(t) = C 0 x(t);
(19)
where y(t) ∈ Rr . Suppose that z(t) is a function of measurable signals as 00 C y(t) z(t) = : (20) Du(t) It follows that C = C 00 C 0 . The detectability of (C; A) is satis ed provided that (C 0 ; A) is detectable and C 00 is column full-rank. We assume Assumption 1 for the original system S through this paper. The input–output relation of S is y(t) = a1 y(t − 1) + a2 y(t − 2) + · · · + an y(t − n) +b1 u(t − 1) + b2 u(t − 2) + · · · + bn u(t − n);
(21)
Y. Kawamura / Systems & Control Letters 34 (1998) 1–9
where n is a suitable integer. In fact, this relation is familiar for a SISO system as n = n. Let o = min j : rank [ (C 0 )T AT (C 0 )T · · · (AT ) j−1 (C 0 )T ]
= rank [ (C 0 )T AT (C 0 )T · · · (AT )n−1 (C 0 )T ] : (22) Then o denotes the observability index of the observable subsystem of S. Generally speaking, o decreases as r increases. The above input–output relation is valid under the following assumption: Assumption 9. n¿ o.
We study two extended models Sj (j = 1; 2) according as data on y(t) are immediately available to the control u(t) without delay or not. When a discussion is common to both cases, we omit the subscript j. First we assume existence of the delay. De ne T
T
u(t − 1) : : : X1 (t) = (y(t − 1) : : : y(t − n) u(t − n) T )T
(23)
by measurable signals. Let S1 be the (m + n)nth-order extended state space model: (t) + Bu(t); X (t + 1) = AX 0 y(t) = C X (t); (t) CX z(t) = ; Du(t)
where
a1
A 1 = 0
Remark 10. The idea of the extended state space model is related to nonminimal order adaptive observers (e.g. [11]). Intuitively speaking, the extended model is what is given by regarding such an observer as a model of S. In the following discussion, however, the orthogonalization gives the LQ regulator without requiring parameters of the observer. 4.3. Nonminimal state space model II Next, we assume nonexistence of the delay. De ne another variable X2 (t) = ( y(t)T : : : y(t − n + 1)T u(t − 1)T : : : u(t − n + 1)T )T :
4.2. Nonminimal state space model I
T
5
(24)
n th-order state space model: Let S2 be the m(n−1)+r a1 : : : an b2 : : : bn In−1 0 0 A 2 = 0 ::: 0 0 ::: 0 ; 0 0 In−2 B2 = (b1 0 : : : 0 I 0 : : : 0)T ; 0 C 2 = (I 0 : : : 0 C 2 = C 00 C 2 ;
0
0
0
:::
0);
(27)
D 2 = D;
where B 2 has I in the same block as B 1 . This model is given in [8] for the SISO system. 4.4. Orthogonalization We consider the orthogonality of the output feedback by means of the above state space models. The following relations justify this approach (a proof is given in Appendix D).
: : : an b1 : : : bn In−1 0 0 ; ::: 0 0 ::: 0 0 0 In−1
B 1 = (0 : : : 0 I 0 : : : 0)T ; 0 C 1 = (a1 : : : an b1 : : : bn); 0 C 1 = C 00 C 1 ;
(26)
(25)
D 1 = D:
is the unit matrix with (n − 1) In the equations, In−1 diagonal blocks and B 1 has the unit matrix as the (n + 1)th block. Eq. (24) is formally a state equation though it is not a minimal realizing system.
Lemma 11. Suppose Assumptions 1 and 9. Then S satis es the following relations: (1) S and S have the same input–output (u; y) relation provided that X (0) is given at t = 0. (2) X (t) → 0 (t → ∞) implies x(t) → 0. B) A) is detectable and is stabilizable, (C; (3) (A; T D D ¿ 0. Remark 12. The systems S and S are not algebraically equivalent because x(t) does not uniquely determine X (t). In addition, S does not uniquely determine S except that Eq. (21) corresponds to the coprime fractional representation.
6
Y. Kawamura / Systems & Control Letters 34 (1998) 1–9
Remark 13. Note that the LQ control theory allows nonminimal realizing models. In fact Assumption 1, which is the standard condition for the LQ control, does not require controllability and observability. Note also that the condition of Lemma 11 (3) is just corresponding to Assumption 1, and it guarantees stability of all uncontrollable modes and all unobservable Therefore, the standard LQ control themodes of S. ory is applicable to S regardless of its nonminimality. In addition Lemma 11 (2) guarantees that a stabilizing state feedback of S stabilizes the original system S as well. For this reason there exists a time-invariant feed∗ ∗ back gain G such that u(t) = G X (t) is the LQ con∗ trol of S and (A + B G ) is stable. This state feedback of S also means a dynamical output feedback of S. By de nition this control minimizes V = limL→∞ h z; z i provided that X (0) is given at t = 0. We refer to this optimal control as the “LQ output-feedback control”. Consider the transform of S given by u(t) = (t) + u(t) GX ˜ where G is a time-invariant gain. Let SFE be the transformed (closed-loop) system. Given X (0) and u(0) ˜ we can de ne responses za and zb of SFE in the same manner as Section 2.1. We can immediately apply Theorem 3 to this case. Theorem 14. Suppose Assumptions 1 and 9. Suppose that X (0) is given at t = 0. Then the input u(t) = (t) is the LQ output-feedback control if and only GX if lim h za ; zb i = 0:
L→∞
for any X (0) and u(0). ˜ In addition, x(t) → 0 and X (t) → 0 if this orthogonality condition holds. If X (0) is given at t = 0, the orthogonalization (14)–(16) is applicable to SFE by replacing x(t); G; n n and z, and z with X (t), G, respectively. This process also means optimization of the output–feedback transfer function. Data on A; B; C 0 (or ai ; bi ) and x(t) are unnecessary for this calculation. The discussions of stability and convergence shown in Section 3.2 are valid for this orthogonalization as mentioned in Remark 13. Both S and S are stabilized by this or thogonalization regardless of nonminimality of S. 4.5. Relation to the standard LQ control ∗
Let G ∗ and G be the standard LQ gain and the LQ output–feedback gain, respectively. If x(t) is un-
known, an estimate x(t) ˆ of the state substitutes for G ∗ x(t) in the standard approach with a state observer. Similarly, if X (0) is unknown, an estimate Xˆ (t) must ∗ substitute for G X (t) for the moment. Regardless of the nonuniqueness of S (Remark 12), these controls are simply related as follows (the proof is given in Appendix E). Theorem 15. Suppose Assumptions 1 and 9. Then the LQ output–feedback control is equivalent to the combination of the standard LQ control with dead-beat state observation; namely, ∗ G X (t) = G ∗ x(t)
holds after X (t) is known. If Xj (0) is unknown, it becomes known at t = n (j = 1) or t = n − 1 (j = 2). The LQ control is practically possible after this. Namely, there exists the dierence between S 1 and S2 in the dead-beat characteristic. 4.6. Example
Consider that S is 1:2 0 x1 (t) x1 (t + 1) = 1 0 x2 (t + 2) x2 (t) 0:2 + u(t); 0
y(t) = 1:5x2 (t); z(t) =
√
2 3
y(t) u(t)
:
This is an unstable system with eigenvalues 1.2 and 0. Then the input–output model (21) is y(t) = 1:2 y(t − 1) + 0:3 u(t − 2); where n = 2. The nonminimal model S2 is X (t) = (y(t) y(t − 1) u(t − 1)) ;
1:2 0 0:3 A = 1 0 0 ; 0 0 0 B = (0 0 1)T ;
(28)
Y. Kawamura / Systems & Control Letters 34 (1998) 1–9
√ 2 0 C = C= 3
7
! √ 2 00 ; 3
D = 1; where the subscript j = 2 is omitted. We can solve the algebraic Riccati equation of S exactly. The nonnegative solution and the optimal gain are 74=9 0 2 ∗ 0 0 ; (29) P = 0 2 0 2=9 ∗ G = (−1:6
0
− 0:4):
∗
Then (A + B G ) is stable, and its eigenvalues are 0.8, 0 and 0. Numerical simulation is performed on this nonminimal model S on supposition that system parameters and x(t) are unknown. The orthogonalization algorithm (14)–(16) are applied to improve the feedback (t) = g1 y(t) + g2 y(t − 1) + g3 u(t − 1). u(t) = GX Fig. 1a illustrates the change of the gain where G 0 = (0 0 0) and Lk = 2k. Then the gain ∗ G k converges rapidly to the above G . For example, G 5 = (−1:58239 0 − 0:39560) and G 10 = (−1:59985 0 − 0:39996). Though the initial system is unstable, (A + B G k ) is stabilized for k¿3. We can see the stabilizing property more clearly by considering extremely unstable initial feedback such that G 0 is a randomly chosen large gain, say G 0 = (50 40 − 30) and Lk = 2k. Fig. 1b illustrates this case where G 0 is omitted because its elements are too large. Then (A + B G k ) is stabilized for k¿2. If the state feedback of the original system S is possible, the standard LQ control is u(t) = −2 x1 (t). Though the signal ow of the LQ output–feedback control is more complicated than the standard LQ control, both the transfer functions from u˜ to y agree with 0:3 zˆ−1 =(1−0:8 zˆ−1 ) where zˆ−1 denotes the unit delay. 5. Concluding remarks By numerical simulation the orthogonalization gives regulators whose performance are exactly the same as those given by the previous LQ design method. On the other hand, their results make a dierence for real systems with nonlinearity and unmodeled dynamics. An attractive feature of the proposed approach is that the orthogonalization of real responses is directly combined with optimization and
Fig. 1. Change of the gain G k .
stabilization of a real system. The basic algorithm, however, has a defect that the gain improvement is largely aected by uncertain behavior of real responses caused by noises and nonlinearity such as static friction. Recently practical experimental results are being obtained for the state feedback of real systems by introducing a statistical data processing, which will be reported elsewhere. Other possibilities of orthogonalization such as the continuous-time LQ control and the H∞ control are mentioned in [10].
8
Y. Kawamura / Systems & Control Letters 34 (1998) 1–9
Appendix C. Derivation of Eqs. (17) and (18)
Appendix A. Proof of Corollary 4 We can write z a (t) and z b (t) in terms of system matrices as C z a (t) = (A + BG)t xa (0); (A.1) DG 0 u˜b (0); (t = 0); D z b (t) = C (A + BG)t−1 B u˜b (0); DG (t = 1; 2; : : : ; L):
aa
)
+G T DT DG)(A + BG)t−1 B + G T DT D u˜b (0): (A.3) De ne a matrix P¿0 by P = (AT + G T BT )P(A + BG) + C T C + G T DT DG: (A.4) We can rewrite this recurrent equation into the in nite series provided that (A + BG) is stable. Since limL→∞ h z a ; z b i contains this in nite series as Eq. (A.3), the orthogonality condition of the theorem is equivalent to (A.5)
It is easy to see that Eq. (A.5) is equivalent to Eq. (9). In addition, Eq. (A.4) is equivalent to Eq. (8) as shown by removing G from Eqs. (9) and (A.4). Appendix B. Proof of Theorem 6 The sensitivity matrix ab has the series-expansion given in the brackets { } of Eq. (A.3). Note that Eq. (13) is just the dual system of Eqs. (2) and (3). It ˜ − 1)} has the identical follows that E{x(t ˜ + 1|t)T y(t|t series expansion. This equivalence holds regardless of the value of G. Therefore, Theorem 6 immediately follows.
= (AT + G T BT )P(A + BG) +C T C + G T DT DG;
(C.6)
lim
ab
= (AT + G T BT )PB + G T DT D;
(C.7)
lim
bb
= BT PB + DT D
(C.8)
L→∞
Substitute these relations into the inner product Eq. (4). Then we have ( L X a b a T (AT + G T BT )t (C T C h z ; z i = x (0)
(AT + G T BT )PB + G T DT D = 0:
lim
L→∞
L→∞
(A.2)
t=1
We obtain
by the same manner as Eq. (A.5). Consider the case of P = Pk and G = Gk . Then Eqs. (A.4) and (C.6) imply Eq. (17), and moreover Eqs. (C.7), (C.8) and (14) imply Eq. (18). Appendix D. Proof of Lemma 11 Since o is the observability index of the observable subsystem, signals y(t − 1); : : : ; y(t − n), u(t − 1); : : : ; u(t − n) uniquely determine the observable part of x(t − n) under Assumption 9. It follows that the signals uniquely determine y(t) as Eq. (21). Then it is easy to see the relation (1) of the lemma by comparing the signals of S and Eq. (21). As mentioned in Remark 5 the relation z(t) → 0 implies x(t) → 0 under Assumption 1. Since X (t) → 0 means z(t) → 0, the relation (2) of the lemma immediately follows. Proof of (3) is as follows. From the stabilizability of Assumption 1, there exists a sequence of u(t) such that both u(t) → 0 and x(t) → 0. Therefore, this input leads to X (t) → 0 as well. It means the stabilizability B). Similarly from Assumption 1, z(t) → 0 of (A; implies x(t) → 0 and u(t) → 0. It follows that y(t) → 0 and therefore X (t) → 0. It means the detectability A). The relation on D immediately follows from of (C; de nition. Appendix E. Proof of Theorem 15 Let x(t) = xo (t)+xo(t) be the decomposition of the state into the observable factor and the other. Note that X (t) uniquely determines xo (t) under Assumption 9 (refer to Appendix D). Therefore, there exists a nonsquare matrix such that xo (t) = X (t). Since G ∗ is 0 for xo(t), we can neglect the unobservable factor in the following. Replace x(t) with xo (t) in (2) and Eq. (19) and substitute xo (t) = X (t) into these
Y. Kawamura / Systems & Control Letters 34 (1998) 1–9
equations. Similarly, substitute Eq. (24) into xo (t + 1) = X (t + 1). Comparing these system equations we know that A = A, B = B and of S and S, C = C though is not invertible. Multiply both sides of Eq. (8) by T and from the left- and right-hand side, respectively. Multiply both side of Eq. (9) by from the right-hand side. Then these equations agree with the Riccati equation and the gain equation of S by substituting the above relations and D = D where their solution is the pair T P ∗ and G ∗ . It means that G ∗ is equal to the ∗ LQ output–feedback gain G . We obtain the equation ∗ of the theorem from G X (t) = G ∗ X (t) = G ∗ x(t). References [1] B.D.O. Anderson, Second-order convergent algorithm for the steady-state Riccati equation, Int. J. Control 28 (2) (1978) 295–306. [2] K. Furuta, M. Wongsaisuwan, Closed-from solution to discrete-time LQ optimal control and disturbance attenuation, System Control Lett. 20 (1993) 427–437. [3] K. Furuta, M. Wongsaisuwan, Discrete-time LQG dynamic controller design using plant Markov parameters, Automatica 38 (9) (1995) 1325–1332. [4] G.A. Hewer, An iterative technique for the computation of the steady state gains for the discrete optimal regulator, IEEE Trans. Automat. Control AC16 (1971) 382–384.
9
[5] Y. Kawamura, Inner products of some responses and iterative optimization, 9th SICE Symp. on DST, 1986, pp. 59–62 (English transl.). [6] Y. Kawamura, A basic algorithm of constructing the optimal regulator from input–output data, Trans. SICE 24(11) (1988) 1216–1218 (Japanese). [7] Y. Kawamura, Duality on orthogonality conditions for discrete-time optimal control and optimal estimation, Trans. SICE 24(12) (1988) 1360–1367 (Japanese). [8] Y. Kawamura, Unimodality of performance functions related to the LQ and the LQG learning optimization problem, Trans. SICE 29(5) (1993) 555–563 (Japanese). [9] Y. Kawamura, Consideration on a fast learning scheme for the LQ optimal regulator, Trans. SICE 29(7) (1993) 767–775 (Japanese). [10] Y. Kawamura, Direct synthesis of LQ regulator from inner product of response signals, 11th IFAC Symp. on System Identi cation, SYSID’97, 1997, pp. 1717–1722. [11] G. Kreisselmeier, Adaptive observers with exponential rate of convergence, IEEE Trans. Automat. Control AC22 (1977) 2–8. [12] K.S. Narendra, D.N. Streeter, An adaptive procedure for controlling unde ned linear processes, IEEE Trans. Automat. Control AC9 (1964) 545–548. [13] A. Sakata, Y. Kawamura et al., Optimization of the linear feedback system by means of real data, Kansai Chapter Joint Conf. Electr. Engineering Record G2-7, 1985, (Japanese).