Proceedings of the 18th World Congress The International Federation of Automatic Control Milano (Italy) August 28 - September 2, 2011
Dual Adaptive Control Properties R. Tenno Aalto University School of Electrical Engineering, P.O. Box 5500, Aalto, Finland
Abstract—The dual adaptive controls derived for a linear unknown drifting parameters plant from the Bellman equation in a semi-explicit form is analysed numerically in this paper. The probing signal generation mechanism is discussed and shown that the dual controls are dependent on certain (probing) function that drives the estimation process initially in full range of unknown parameters and then selectively with respect to the model inaccuracy. It improves the model most in this part where the model is most inaccurate (variation peak) if the drifting parameters are error-estimated compared to the average steady-state level of drift. The instantaneous cost is enhanced correspondingly in the most inaccurate part of the model and well indicates the inaccuracy of the model with respect to the past estimated trajectory. I.
INTRODUCTION
T
HE dual adaptive control, or actively adaptive control, was a rolling focus in the early 60´s [2] and 70´s [19, 20, 21, 6] but forgotten later for many years because of huge analytical and computational difficulties. Some progress was made in simple two-step dual control problems solved [24, 14, 15] and more complex problems but the latter solved only at the expense of huge simplifications [20, 21, 22, 6], for the latest simplifications, see [3] and the references therein. A mall control effect was found in simple problems [16] while promising effect in the complex problems [3, 19, 20]. This contradiction was solved later and found that the dual controls are beneficial in the case of constant unknown parameters where the estimation process can fully stop [24, 23, 3]. Also the dual controls can stop the estimation process if the original control problem and the separated control are not equivalent but the dual controls can be designed to fit these two problems and to avoid the stopping issue [18]. In the case of unknown drifting parameters the control effect is small if the drift is a stationary process [17]. Although that covers many practical processes the drifting parameters case is still interesting because the controls are essentially the same in the case of constant parameters but some difficulties of the problem with constant parameters do not appear in the drifting parameters problem. The latter problem is analysed in this paper; it explains the probing signal generation mechanism without any auxiliary perturbation or extension of the probability space as conventionally explained in the stochastic control theory [5, 11]. This theory advanced rapidly since the early 60´s especially in studies of completely nonlinear and degenerate Bellman equation for the controlled diffusion processes. Numerous theoretical [8] and computational [9, 10] problems where solved. For dual 978-3-902661-93-7/11/$20.00 © 2011 IFAC
4125
control the most interesting results are (i) the Bellman equation in the lattice of measures developed for the degenerate controlled diffusion processes and (ii) the evidence that this a fully nonlinear equation can be computed with a reasonable accuracy. These theoretical results are published in series of papers by Krylov, e.g. [9, 10]. For a comprehensive survey of the controlled diffusion processes that are partly related to the dual control problem, see [1] and the references therein. The controlled diffusion processes theory [8] was partly applied on some interesting dual control problems in [12, 13] but on the functional level only that did not lead to the computation of controls. In some cases [17, 18] direct application of the Bellman equation leads to the dual controls in semi-explicit form. Since old days (in the early 60´s) the speed of computers is increased drastically and good solvers for solution of fully nonlinear PDEs have been developed. The advanced control theory helps to derive the dual controls in a semi-explicit form and development of the computing means help to apply a high resolution calculation for their computing from the Bellman equation. Hence, this paper is a completion of the progress made by many authors of the stochastic control theory and computing theory and practice. This paper aims solution of a specific dual control problem discussed next. II.
THE PROBLEM AND SOLUTION
The dual control problem formulated and solved in [17, 18] is simplified and summarized underneath. A. The Control Problem The problem is to minimize the average quadratic cost ⎧⎪ T 2 2⎫ ⎪ v u = M u ⎨ ∫ ( xt − λt ) + ut Kut dt + ( xT − λT ) ⎬ , (1) ⎪⎩ 0 ⎭⎪
(
T
)
with respect to the partially observed stochastic system (2)(4) that is bilinear in respect to the controls and unobserved drift parameters dθ = ( a0 + a1θ t ) dt + bdW ,
θ (0) = θ 0 ,
xt = ut θt ,
(2) (3)
T
d ξ = ( A0ξ t + A1 xt ) dt + BdV ,
ξ (0) = 0 .
(4)
In (1)–(4), the following notions apply: u is the vector of controls, λ is the tracking target (reference), K is the positive definite matrix, x is the controlled inertial-free scalar process with unknown drifting parameters θ (vector): θ is the 10.3182/20110828-6-IT-1002.01159
18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011
unmeasured drift with known dynamics and stochastic properties. The parameters a0 (vector), a1 (matrix) and b (matrix) are given. The initial drift θ 0 is unknown but its guess estimates: m0 – average and γ 0 – covariance (estimation accuracy) are given parameters of the Gaussian distribution. The drift is a Wiener process driven: W is a stochastic non-degenerate process with a positive definite matrix b. ξ (vector) is the measured process that depends on the controlled process through known dynamics and stochastics, for which the parameters A0 (vector), A1 (matrix) and B (matrix) are given. The measurements are corrupted by a Wiener process V, which is independent of W; B is the measurements accuracy. Admissible controls are set by measurable functions (i.e., causal, feedback controls) supporting a strong solution [8, 7] to the system (2)-(4). For a refined statement, see [18]. The system (2)-(4) is an application of linear regression model (3) with unknown time-varying parameters frequently used as a non-stationary plant description. B. The Separated Control Problem The original control problem (1)-(4) can be converted to the separated control problem (5)-(7) using the filtering results [7]. The separated control problem is to minimize the quadratic cost (5)
(
)
⎧T ⎫ 2 − + + + u m λ u γ u u Ku dt ( ) ⎪ ⎪⎪ t t t t t t t ∫ ⎪ vu = M u ⎨ 0 ⎬ 2 ⎪ ⎪ ⎪⎭ ⎩⎪( uT mT − λT ) + uT γ T uT T
T
T
T
(5)
⎧1 n ⎫ ⎪ 2 ∑ (σσ )i , j vmi m j + ⎪ ⎪ i , j =1 ⎪ ⎪⎪ n ⎪⎪ vt + ( a0 + a1m ) vm + inf ⎨ ∑ ( a1γ + γ a1 + bb − σσ ) vγ i , j ⎬ = 0 i, j u ⎪i , j =1 ⎪ 2 ⎪ ⎪ ⎪+ ( u m − λ ) + u ( γ + K ) u ⎪ ⎩⎪ ⎭⎪ T
T
T
vT = inf u`T
γ = a1γ t + γ t a1 + bb − σσ ,
γ (0) = γ 0 ,
(7)
T
T
T
{(u m − λ ) + u γ u } . 2
T
T
T
T
T
T
optimal value function v = inf v u . u
Formally, the problem (10)-(11) can be solved and the optimal controls (12) derived for any moment 0 ≤ t < T ut = { K + mt mtT + γ t + Pt } mt λt −1
(12)
and controls (13) derived for the terminal moment
uT = ( mT mT + γ T ) mT λT . −1
T
(13)
In (12), Pt is the probing function (14), recalled in [17, 18] due to the probing features.
∑ (γ A ( BB ) n
T
t
i , j =1
−1
1
A1 γ t T
)
⎛1 ⎞ ⎜ vmi m j − vγ i , j ⎟ . (14) i, j ⎝ 2 ⎠
The optimal controls (12) depend on the probing function (14) via the value function v that satisfies the equation (15) n
vt + ( a0 + a1m ) vm + ∑ ( a1γ + γ a1 + bb T
which the diffusivity matrix σ depends on the controls and system state
σ = γ t A1ut ( BB T
T
)
−1/ 2
.
(8)
In (6), w is the innovation process that has the Wiener process properties and is related to the measurements and the estimated model by (9) dw = ( BB
T
)
−1/ 2
⎡dξ − ⎣
( A0ξt + A1ut mt ) dt ⎤⎦ . T
(9)
In (6)-(7), mt is the conditional mean of drift and γ t is the estimation covariance (accuracy). C. Semi-explicit Dual Controls The separated control problem can be represented as an initial value problem through the Bellman equation (10) 4126
(11)
T
T
In (10), vt , vm , vmi m j and vγ i , j are partial derivatives of the
with respect to the completely observed stochastic system (6)
T
(10)
Pt =
m(0) = m0 ,
T
with the terminal condition
T
dm = ( a0 + a1mt ) dt + σ dw,
T
{
i , j =1
−1
}
T
)
i, j
vγ i , j +
(15)
λ 1 − m ⎡⎣ K + mm + γ + P ⎤⎦ m = 0 2
T
T
with the terminal condition (16)
v(T ) = ( mT mT + γ T ) γ T λ 2 . T
−1
(16)
This type of controls (12), (13) was referred [2] as the dual controls or actively adaptive controls [19, 20]. In the further calculation steps the value function is solved numerically from (15)-(16), saved in a lookup table along with the derivatives required for calculation of the probing function (14) and then they are used for prompt calculation of the dual controls (12). These calculations are computationally light if a singe input single output (SISO) system is analysed.
18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011
III. NUMERICAL BENCHMARK ANALYSIS The dual controls can be obtained as a solution of the problem that minimizes the square functional (19) 10 ⎫ 2 2⎪ ⎪⎧ vu = M u ⎨ ∫ ( xt + 0.15 ) + 10−4 ut2 dt + ( xT + 0.15 ) ⎬ (19) ⎪⎩ 0 ⎪⎭
(
)
with respect to the partially observed stochastic system dθ = ( 0.05 − 0.5θt ) dt + 0.05dW ,
(20)
xt = utθ t , d ξ = −ξt dt + xt dt + 0.5dV , ξ (0) = 0 .
(21)
Discussion of the results of numerical analysis requires numerous figures. In this paper this number is reduced by considering of all process at the same value, say variation equal to γ = 0.75, which is a purely presentational choice and not principal. Also some figures (left out) are explained in words without data. The most interesting dual control properties are discussed next.
A. The optimal cost The value function solved from the Bellman equation is demonstrated in Fig. 1 in the case of random walk drift.
(22)
The initial state θ (0) = 0.75 is assumed unknown up to the low quality estimates θ 0 ∼ N(−0.75, 1.52 ) available in this problem. Here N(−0.75, 1.52 ) is the Gaussian distribution with a mean value of –0.75 and a standard deviation of 1.5. The probing function is simpler in this SISO example than in the general case
⎛ ∂ 2v ∂v ⎞ Pt = 2γ t2 ⎜ 2 − 2 ⎟. ∂γ t ⎠ ⎝ ∂m
(23)
Also the dual controls (24) are somewhat simpler than (12) ut =
mt λt K + mt2 + γ t + Pt
(24)
This benchmark example was numerical analysed in [17, 18] and numerous results can be found there. They characterize processes in the time-domain: the evolution of value function is analysed, the dual controls are compared with the certainty equivalent controls and cautious controls. In this paper our concern is the probing function properties in the spatial domain. How they affect the dual controls, the estimation quality and the instant losses, and why? To answer on these questions is all-inclusive if we distinguish between the partial processes: 1) positive (-a0/a1 > 0) drift, 2) negative (-a0/a1 < 0) drift, 3) zero drift (a0 = 0), and 4) random walk (a0 = a1 = 0). IV. THE DUAL CONTROL PROPERTIES The effect of dual controls is small in comparison to the cautious controls. It is small in the case of drifting plant [17] and as a result some specific difficulties do not arise in this case as they rise in the case of constant unknown parameters [23, 18] left out by purpose in this paper - the drift is assumed to be a non-degenerate stochastic process. The significance of dual controls is not the point we are going to demonstrate but: How they act and on what they influence. The dual controls are much more significant in the case of constant unknown parameters and their effect is in many aspects similar to the drifting plant case but scaled up there. 4127
Fig. 1. The optimal cost of random walk process. The value function is shown in z-axis, conditional mean in x-axis and variation in y-axis.
B. The Probing Function Properties The dual controls are driven by the probing function (23) shown in Fig. 2. The probing function is shifted along the mean value x-axis to the left or right in dependence on the steady-state level a0/a1 of the conditional mean (6). The shift is opposite (minus) and proportional to the average steadystate level: -a0/a1. For a positive average steady-state level of the drift -a0/a1 = +0.1 (recall it as the positive drift) the shift is to the left and for a negative steady-state level −a0 / a1 < 0 it is to the right as shown in Fig. 2. Being so, the dual controls (24) have more gain to “push” the process away from the wrongly estimated by sign mean value (if it is on the left and average value of the drift (2) is on the right) than from the correctly estimated by sign mean value (if it is on the right). The curve depends on the unconditional variance as well (in figure: γ = 0.75) and is less for a less variance. For a small enough variance the probing function itself is small Pt = 0 and therefore out of interest. The probing function is not shifted in either side for a process with the zero steady-state level, and also not shifted in the case of a random walk process (Fig. 2). In these cases the dual control tries to push the mean value process away from the less favourable for identification zero values without specific preferences on the neither sides. In the case of random walk, the dual control is less selective around the zeroes and more time-dependent than in the short-term drift case. This is because no benefit for identification can be withdrawn from the short-term process dynamics but withdrawn gradually from the slowly floating random walk level. The probing function depends on the time-to-go-on weakly expect close to the terminal moment; it is dependent on the time more in the case of a random walk process.
18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011
Negative drift
Positive drift
Positive drift
Negative drift
Random walk
Zero drift
Zero drift
The probing function is an increasing function of variation (γ = 0.75 in Fig. 2). Initially, in short period (t < 1), when variation is big the probing function is also big and its negative part in the denominator of controls (23) generates a big probing signal (all other terms are positive in the denominator) as we demonstrate next.
C. The Control Properties The dual controls are most active in the same location with the probing function where the probing function is large. In this manner, the dual controls are drift-forced to change asymmetrically with respect to the origin in Fig. 3. Consider the asymmetric bump of controls as a probing signal. Location of the bump is irrespective of the target value. By change of the target value λ = −0.15 to the opposite value λ = 0.15 the curve in mirror results (these mirror data are not shown). The controls are almost independent of time similarly to the lack of this dependence in the probing function itself. The probing part of controls is symmetric in the case of zero steady-state level (Fig. 3). It is smaller and less dependent on time if compared to the random walk process, which probing function is more selective than in the case of short-term drift. The mean value and variation processes are relatively regular (Fig. 4) if compare to the controlled processes (Fig. 5). The diffusivity σ = 2γ t ut is proportional to the dual controls. This gives certain irregular (“bang-bang” like) behaviour to the controlled processes (Fig. 5). However these processes are still regular because the probing function is small enough to reduce the denominator of controls (24) up to zero in the stationary drift case as is demonstrated in 4128
Fig. 3. The dual controls are activated in the same location with the probing function in Fig. 1. The dual controls are symmetric if the conditional mean has the zero steady-state level and also symmetric in the random walk case.
Fig. 5 through the multiple 5-realisations of the controlled processes. The mean value and true processes are both controlled by the same dual controls but different Wiener process trajectories were generated in each realization. Beside the initial peaks there is no other large peak in Fig. 5 that is an indication of relative smoothness of dual controls in the case of non-degenerate stationary drift processes. Notice also that the true process is more affected by a noise than the estimated process in the control target neighbourhood. This is because the dual controls do not stabilize a drifting process exactly; they eliminate the initial uncertainty and then they track the drift with the best attainable accuracy. The control accuracy for the estimated process is always better than for the true process. Although they are influence by the same feedback controls (24) ut = u (t , m, γ ) multiplied in the plant model (3) by the
xˆt = mt ut and true drift xt = θ t ut correspondingly, which makes the variation of true process bigger than that for the estimated process similar to the bigger unconditional variation Mθt2 − (Mθ t ) 2 of drift and estimated
drift
smaller conditional variation γ t of estimates. 1.5
1 Estimated process
Fig. 2. The probing function is shifted to the left if the steady-state level of drift is positive -a0/a1 = +0.1 and to the right if -a0/a1 = -0.1 is negative. The probing function is not shifted if the drift is with the zero steady-state level a0 = 0 and if a0 = a1 = 0 (random walk).
Random walk
0.5
0 θt
-0.5
mt γt
-1 0
1
2
3
4
5 Time, sec
6
7
8
9
10
Fig. 4. The simulated (blue) and estimated process for mean (green) and variance (red) shown in 5 realisations.
18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011
E. The Cost Function Properties The integrand under the cost function (5) is recalled as the instant cost that equals ( λ = -0.15, K = 10−4 )
0.8
Optimal traectory
0.4
0
V = ( ut mt − λ ) + ut2 (γ t + K ) . 2
-0.4 xt -0.8 0
Target 1
2
3
4
5 Time, sec
6
7
8
9
10
Fig. 5. The dual control stabilised trajectories (5 realisations) for the estimated (green) and true (blue) processes.
D. The Estimation Properties The variance is highest around zero mean where the least information is available on unknown drift. Hence, the dual control may be thought to be most active there but not precisely. The dual control degreases the variance most in the same location with the large negative value of probing function. For positive average drift it decreases the variance most on the left and for negative average drift on the right as is demonstrated in Fig. 6: obviously a more negative variation-change-rate γ reduces variation more. For zero average drift the dual control degreases the variance around the zero value. It degreases more in the case of a random walk process expect very close to the zero where the dual control looses information anyway; at zero the dual controls are unable to support the achieved accuracy of estimates. The dual controls cannot affect variation in the exact zero m = 0 because this stops the estimation process (u = 0 and σ = 0). However, in our non-degenerate case (b = 0.05 > 0) the estimation process starts shortly again. Positive drift
The instant cost is generally an asymmetric function because of the feedback control (14) applied ut = u (t , m, γ ) ; only the terminal cost (T = 10) is symmetric. The current cost with penalty for agitation K > 0 and variation γ > 0 has certain horn-like bumps (Figs 7). The horn can be on the left side or right side in dependence on the whether the average drift is positive or negative. In the cases of zero average drift and random walk process there are two horns in the both sides. The horns indicate that the current cost is high in the zero neighbourhoods but not high sharply in the zero because the controls are designed (K > 0) to avoid peak in the zero. Here again high cost pushes the estimation process away from the wrongly estimated value (if the estimate is different by sign from the average drift sign and is neutral if they are signed similarly). The push is weaker and equally distributed around zero in the case of zero average drift; it is somehow stronger in the case of a random walk process.
Positive drift
Negative drift
Negative drift
Zero drift Zero drift
(25)
Mxt=utmt
Random walk
Random walk Fig. 7. The instant cost of dual controls is generally asymmetric but the terminal cost (t = 10) is symmetric and in the cases of zero drift and random walk is symmetric.
The discussed phenomenon is in good agreement with the probing function and with the certainty equivalent (CE) controls (26) Fig. 6. The variance-change-rate is asymmetric function in both cases of the positive and negative average drifts, and symmetric function in the case of zero average drift; it decreased most in the random walk process.
These estimation properties are valid irrespective to the control target. 4129
ut =
mt λ . K + mt2
(26)
The horns are similar but much bigger if the CE controls are applied (Fig. 8) instead of dual controls.
18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011
REFERENCES [1] [2] [3] [4] [5] Fig. 8. The instant cost of CE controls is high for zero average drift.
[6]
Certainly the instant cost V is different from the optimal cost (value function) v because it is not integrated in time and not averaged over the future random processes. Compare these functions for positive drift in Figs 7 and 9. Some weak similarities between the instant cost V and optimal cost v can be reproduced by scaling one of the functions also for negative and zero drift and for a random walk model.
[7] [8] [9] [10] [11]
γ t = 0.25
Positive drift
[12] [13] [14] [15] [16]
Fig. 9 Evolution of the optimal cost (value function) of dual controls.
[17]
V. CONCLUSION The stated dual control problem for a static plant is solved without significant simplifications. The first part of the problem is solved analytically in a semi-explicit form and the rest part solved numerically. The probing mechanism of dual controls is detected afterwards in the numerical analysis. It is shown that a probing function detected controls the estimation process by its effect on the control signal. The probing function affects most initially and later in the average drift neighbourhood if the estimated process is fault estimated by sign. Also in the case of zero average drift the probing function affects most in the zero neighbourhoods where the model is most inaccurate. No other explanations (randomization, extension of the probability space, relaxation of controls) are required than the probing function itself to explain the behaviour of dual controls. 4130
[18] [19] [20] [21] [22] [23]
Borkar, V. S. (2005) Controlled diffusion processes. Probability Surveys. Vol 2, pp. 213-244. Feldbaum, A. A. Dual control theory. I-IV. J. Automation Remote Control. Vol 21. 1960, pp. 874-880, 1033-1039, Vol 22. 1961, pp. 112, 109-121. Filatov, N. M. and H. Unbehauen (2000) Survey of adaptive dual control methods IEEE Proc. Control Theory Appl., 147, pp. 118-128. Filatov, N. M. and H. Unbehauen (2004) Adaptive dual control: Theory and Applications. Lecture Notes in Control and Information Sciences. No 302. 130 p. Fleming, W. H. and H. Pardoux (1982) Optimal control of partially observed diffusion. SIAM J. on Control and Optimization. Vol 20, No 2, pp. 261-285. Lindoff B., Holst J. and B. Wittenmark (1999) Analysis of approximations of dual control. International Journal of Adaptive control and Signal Process. Vol 13, No 7, pp. 593-920. Liptser, R. S. and A. N. Shiryayev (1977) Statistics of Random Processes. I, II. Springer. New York. 427 p., 402 p. Krylov, N. V. (1980) Controlled diffusion processes. Springer-Verlag. New York. 398 p. Krylov, N. V. (2005) On the rate of convergence of finite-difference approximations for Bellman equation with Lipschitz coefficients. Applied Mathematics and Optimization. Vol 52, pp. 365-399. Krylov, N. V. (2007) On the rate of convergence of finite-difference approximations for normalized Bellman equation with Lipschitz coefficients. arXiv:math/0610855v3. 29 p. Kushner, H. J. (1977) Probability Methods for Approximation in Stochastic Control and for Elliptic Equations. Academic Press. New York-San-Francisco-London. 222 p. Rishel, R. (1986) An exact formula for a linear quadratic adaptive stochastic optimal control law. SIAM J. on Control and Optimization. Vol 24, No 4, pp. 667-674. Rishel, R. (1990) A comment on a dual control problem. Proc. 19th IEEE Conference on Decision and Control Including the Symposium on Adaptive Processes, pp. 337-340. Sternby, J. (1976) A simple dual control problem with an analytical solution. IEEE Transaction on Automatic Control. Vol 21, No 6, pp. 840-844. Tenno, R. (1983) Two-step control for partially observed stochastic processes. Proceedings of the Estonian Academy of Sciences. Physics. Mathematics. Vol 32, No 1, pp. 11-18 (in Russian). Tenno, R. (1992) Control using incomplete data. Approximation methods for optimal control. Moscow. Nauka. 183 p (in Russian). Tenno, R. (2009) Dual adaptive control for linear system with unknown drifting parameters. Proc. European Control Conference. Budapest, pp. 412-417. Tenno, R. (2010) Dual adaptive control for linear system with unknown constant parameters, International Journal of Control. Online First. Tse, E. and Y. Bar-Shalom (1973) An actively adaptive control for linear systems with random parameters via the dual control approach. IEEE Trans. Automat Contr. Vol 18. pp. 98-109, 109-117. Tse, E. and Y. Bar-Shalom (1976) Actively adaptive control for nonlinear stochastic systems. Proc. of the IEEE. Vol 64, No 8, pp. 1172-1181. Wittenmark, B. (1975a) Stochastic adaptive control method: a survey. International Journal of Control, 21, pp. 705-730. Wittenmark, B. (1975b) An active suboptimal dual controller for system with stochastic parameters. Auto. Contr. Theory and Appl., pp. 313-19. Wittenmark, B. (1995) Adaptive dual control methods: An overview. 5th IFAC Symposium on Adaptive Systems in Control and Signal Processing. Budapest, pp. 67-72.
[24] Aström, K. J. and B. Wittenmark (1971) Problems of identification and control. Journal of Mathematical Analysis and Applications. Vol 34. pp. 90-113.