A New Convergent Approximation of the Optimal Recursive Parameter Estimator: Averaged Asymptotic Dynamic Analysis

A New Convergent Approximation of the Optimal Recursive Parameter Estimator: Averaged Asymptotic Dynamic Analysis

Proceedings of the 18th World Congress The International Federation of Automatic Control Milano (Italy) August 28 - September 2, 2011 A New Convergen...

358KB Sizes 0 Downloads 56 Views

Proceedings of the 18th World Congress The International Federation of Automatic Control Milano (Italy) August 28 - September 2, 2011

A New Convergent Approximation of the Optimal Recursive Parameter Estimator: Averaged Asymptotic Dynamic Analysis Donald M. Wiberg ∗ ∗

University of California Santa Cruz, Santa Cruz, CA 95064 USA (Tel: 831-459-5560; e-mail: [email protected]).

Abstract: A key feature of a new recursive parameter estimator is its fast convergence. Here averaging theory supports this claim of fast convergence for system parameters, which heretofore has been supported only by example and by analogy with optimal nonlinear filtering. Averaging theory is also used to investigate possible computational shortcuts. Finally, it is proven that the likelihood function has a unique maximum for stable, observable, single output systems in state space form when the parameters are asymptotically identifiable and true values are in the model set. Then the new parameter estimator must converge globally with probabiltity one to true parameter values when the trajectories of the parameter estimates remain in a certain bounded set. Keywords: parameter estimation, system identification, recursive estimation, asymptotic analysis, identification algorithm 1. INTRODUCTION A simple new convergent approximation of the optimal parameter estimator is introduced here in continuous time. The new estimator is a simplification of the 3-OM estimator (Wiberg and DeWolf, 1993) such that it is comparable in computational complexity to the state space form of RPEM (Ljung and Soderstrom, 1983) and to the extended Kalman filter used as a parameter estimator (Soderstrom, 2002). The new estimator is an approximation of the optimal non-linear filter (Kushner, 1964) and so is recursive, with the intent of its discrete version being useful for on-line monitoring, fault detection, and adaptive control. Furthermore, the estimate of parameter variance in the new algorithm is also recursive. Parameter estimates are updated upon receipt of input and output data and no prefiltering of data is necessary, not as in recursive subspace methods such as (Mercere and Lovera, 2007) or recursive instrumental variable methods such as (Young and Jakeman, 1980). The derivation of the new algorithm is based linearly on the derivations in (Wiberg and DeWolf, 1993). The simple example that is given in (Wiberg and DeWolf, 1993) should be referred to before reading this paper, because there is simply not enough space here to repeat this simple example. After reading this simple example, the new estimator equations introduced in Section 4 will be much more understandable. The proofs in (Wiberg and DeWolf, 1993) make it clear why the new estimator must include the approximations Mi (φ) and Mj (t) to third order moments. This paper is organized as follows. Symbols and model are introduced, followed by the underlying assumptions. The new algorithm is presented and motivated. Theorem 1 proves its global convergence. Equations for the averaged 978-3-902661-93-7/11/$20.00 © 2011 IFAC

asymptotic behaviour are derived. Next come theorems 2 and 3, concerning the likelihood function to whose maxima it converges. Theorem 4 gives general conditions for the parameter estimates to converge globally to their time values with probability one. An example is given for a first order system with an unknown process parameter and an unknown process noise variance. Conclusions, references, and appendices containing details of the proofs finish the paper. 2. SYMBOLS AND MODEL Consider a linear single-input, single-output time-invariant continuous time model in Ito form (McGarty, 1974) as dxx (t) = Ax (θ)xx (t)dt + bx u(t) + Gx dwx (t) dy(t) = cTx xx (t)dt + dvx (t) with Ax (θ) = Ax0 + Ax1 θ1 + Ax2 θ2 + ...Axnp θnp Qx (ρ) = Qx0 + Qx1 ρ1 + Qx2 ρ2 + ...Qxnq ρnq (1) In (1) above, y is the output which is measured for each time t, and if the input u is not measured, u is stochastic and u is as given in (2) below. The state n-vector is xx , which depends on the system parameter np-vector θ and variance parameter nq-vector ρ. The measurement noise is the scalar valued Wiener process v with zero mean and known variance rx at unit time. The process noise is the m-vector valued Wiener process w with zero mean and m x m variance matrix Qx (ρ) at unit time. The initial condition xx (0) is zero mean and n x n variance matrix P0 . The input u, noises v and w, parameters θ and ρ, and initial condition x(0) are mutually uncorrelated. Therefore (1) is the standard state space model for the Kalman filter, in which the Axj , Qxi , bx , cx , and Gx are known for j,i = 1...np, nq, but with non-standard parameterization.

3177

10.3182/20110828-6-IT-1002.00362

18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011

This parameterization is non-standard, and so is motivated and explained as follows. The motivation is to simplify the algorithm derivation from the Kushner-Stratonovich equation for the optimal non-linear filter, which was done in (Wiberg and DeWolf, 1993). Note cx is assumed known, because all observable systems can be put in such a parameterized form, and this greatly simplifies the algorithm derivation. Furthermore, rx is assumed known because it can be recursively estimated from the residuals later at each step in time, and the algorithm depends only on the ratio Q/rx . The n-vector bx and n x m matrix Gx are assumed known because the parameterization of transfer function zeros will be incorporated into ρ as will be explained below. The matrices Ax (θ) and Qx (ρ) are linear in θ and ρ respectively to simplify the derivation in (Wiberg and DeWolf, 1993). However, this is not very restrictive, because transfer function parameters can be put into such observable state space canonical form. Also physical parameters and dimensionless parameters can be put in the form of (1) if they enter into their individual equations linearly. Finally, if such physical parameters are non-linear, such as ω in sine ω, where sine ω enters linearly, they can be re-parameterized as sine ω = θ, etc. √ If then cosine ω enters elsewhere, it can be represented as 1 + θ2 , etc., and the parameter domains adjusted accordingly as in the example below. 3. ASSUMPTIONS

      1 z 1 q 0 Q(ρ) = q (1 z) = q = z 0 0 z z2     0 0 0 q ρ2 = Q0 + Q1 ρ1 + Q2 ρ2 , (5) ρ1 + + q 0 q 0 √ with ρ2 = z 2 and ρ1 = z = ρ2 , and the parameter domain √ D for ρ = (ρ1 , ρ2 ) is D = {ρ2 ε[0, +∞) and ρ1 = ρ2 }. Then the parameter ρ1 need not be estimated, and only ρ2 estimated using the new estimation √ algorithm. The original parameter z is then estimated as ρ2 if the system (3) is minimum phase and if the system (3) is known to be non-minimum phase then z is estimated as √ - ρ2 . So G = I2 and independent of θ, and ω is a 2-vector. Second and finally, suppose that u is known as in (a) above. Then (4) becomes as in (2) above, except that ω(t) = u(t) for purposes of asymptotic analysis. In the remainder of this paper case (b) is assumed, using u(t) generated by w(t) and the asymptotic time average of u(t) = 0 and Qu (ρ) − Q0 is a linear function of ρ. Case (c) is covered as a linear combination of these. This finishes the example. HENCEFORTH, the model considered in this paper for the purposes of asymptotic analysis is dx(t) = A(θ)x(t)dt + Gdω(t) dy(t) = cT x(t)dt + dv(t)

where xT = (xTx xTu ) and A(θ), G, c, and ω are composed of (1) and (2) correspondingly. Call r the variance of v. (2) The system order n is a real number that is known. (3) True values (θ0 , ρ0 ) of (θ, ρ) exist in the model set and (θ, ρ)εDB where DB is a closed and bounded set that is known a priori, possibly from physical considerations and that (θ, ρ) is identifiable. (4) (θ, ρ)εDS where DS is a set such that A(θ) is asymptotically stable (i. e. A(θ) has an inverse), (A(θ),c) is observable, and (A(θ), GQk ) is controllable. (5) ρεDN where DN is a set such that Qk > 0 for all ρk > 0 for k = 1, ...nq.

The assumptions on the model (1) above are: (1) The input u is either (a) measured, deterministic, known, and bounded for all time t, or (b) equivalent asymptotically to an unknown stochastic u obeying dxu (t) = Au (θ)xu (t)dt + Gu dwu (t) dy(t) = cTu xu (t)dt + dvu (t) Au (θ) = Au0 + Au1 θ1 + Au2 θ2 + ...Aunp θnp Qu (ρ) = Qu0 + Qu1 ρ1 + Qu2 ρ2 + ...Qunq ρnq (2)

Additional assumptions on the estimator:

In other words, stochastic u is persistently exciting with a finite order state space representation, or (c) u is a linear combination of (a) and (b) above. EXAMPLE 1: Let u be a zero mean pseudorandom binary square wave with asymptotic flat spectrum Su (ω) = q/2π 6= 0. Let v be a Wiener process with variance r at unit time. For parameters z, α1 , α2 > 0, the transfer function, y(s) =

s+z u(s) + v(s) s2 + α1 s + α2

is asymptotically equivalent to        x1 −α1 1 x1 1 d = dt + dw x2 −α2 0 x2 z   x1 dy = ( 1 0 ) + dv x2

(6)

(3)

(6) Define the set D = DB ∩ DS ∩ DN . For all t > 0, and for every parameter estimate initial condition, the ˆ ρˆ(t) of parameter estimates remain trajectories θ(t), in D. (7) Initial parameter estimate variances σj (0), πi (0) , for ˆ ρˆ(t) are large j, i = 1, ..., np, nq, for each estimate θ(t), enough such that Assumption 6 above remains true for all t. (8) The parameter estimate update gains kj (t) and hi (t) are bounded by (kiT (t)kj (t))2 ≤ σj (t)kiT (t)P (t)ki (t)

(4)

First suppose that u is unknown and stochastic as in (b) above. The variance at unit time of w = q. But the z parameter does not enter into the linear parameterization of Q as in (1). Therefore redefine 3178

and (hTi (t)hi (t))2 ≤ πi (t)hTi (t)P (t)hi (t) (7) for all t > 0, in which P is the solution of the Riccati equation (13). (9) Call the nq vector transfer function hk (s; θ) = Qk GT (sI − A(θ))−T c = A∗k (s; θ)/C ∗ (s; θ) where A∗k (s) and C ∗ (s) are polynomials in s, for k = 0, 1, ...nq. Then assume that the collection of |A∗k (jω; θ)|2 = |A∗k (jω; θ0 |2 for all k = 1, ...nq and all real ω implies θ = θ0 .

(8)

18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011

yˆ(t, θ, ρ)2 /r} with probability one if the Assumptions 1-8 above are met.

4. NEW ESTIMATOR The new estimator obeys T

dε(t) = dy(t) − c x ˆ(t)dt ˆ dˆ x(t) = A(θ(t))ˆ x(t)dt + P (t)cdε(t)/rx

(9) (10)

dθˆj (t) = kjT (t)cdε(t)/r

(11)

dˆ ρi (t) = hTi (t)cdε(t)/r

(12)

dP (t) ˆ ˆ = A(θ(t))P (t) + P (t)AT (θ(t)) + GQ(ˆ ρ(t))GT dt −P (t)ccT P (t)/r (13) T ˆ dkj (t) = [A(θ(t)) − P (t)cc /r]kj (t)dt + σj (t)Aj x ˆdt +Mj (t)cdε(t)/r dhi (t) = Mi (t)cdε(t)/r ˆ +[A(θ(t)) − P (t)ccT /r]hi (t)dt dσj (t) = −(cT kj (t))2 dt πi (t) = −(cT hi (t))2 dt dMj (t) ˆ = [A(θ(t)) − P (t)ccT /r]Mj (t) dt ˆ +Mj (t)[A(θ(t)) − P (t)ccT /r]T +σi (t)(Aj + ATj )

(14)

Proof : Theorem 1 of (Wiberg and DeWolf, 1993) applies to (9-19) because the model (6) is a subset of the model considered there and Assumptions 1-8 above assure all other assumptions of that theorem are satisfied. 6. ASYMPTOTIC EQUATIONS To analyze the new estimator (9-19), its asymptotic averaged equations are derived through a change of variables similar to those used in (Ljung amd Soderstrom, 1983) for the RPEM. First, define the state error n-vector as x ˜ = x−x ˆ. The change of variables applicable to the new algorithm (9-19) is

(15) (16) (17)

µj (t) = [1/t][1/σj (t)]

(20)

ηi (t) = [1/t][1/πj (t)]

(21)

fi (t) = tµj (t)kj (t), gi (t) = tπi (t)hi (t), (22) T ˆ ˆ ˆ ˆ Define Ω(t; θ(t), ρˆ(t)) = Ω = A(θ(t))−P (t; θ(t), ρˆ(t))cc /r. Use (20 - 22) in (9-19) to obtain ˆ x(t)dt + P (t)c/rdε dˆ x(t) = A(θ(t))ˆ dθˆj (t) = (1/t)[f T (t)c/rdε(t)]/µj (t) j dˆ ρi (t) = (1/t)[giT (t)c/rdε(t)]/ηi (t)

dMi (t) ˆ = [A(θ(t)) − P (t)ccT /r]Mi (t) dt ˆ +Mi (t)[A(θ(t)) − P (t)ccT /r]T +πi (t)GQi GT (19) Above, the Mj (t) are understood to belong to the system parameter gain kj updates and Mi (t) are understood to belong to the noise parameter gain hi updates. Above, (11,12,16,17) are scalar equations for i = 1, 2, ..., nq and j = 1, 2, ..., np, (9,10,14,15) are vector equations and (13,18,19) are nxn symmetric matrix equations.

+(1/t)sj (t)cdε(t)/r +(1/t2 )fjT (t)ccT fj (t)fj (t)/rµj ˆ dgi (t) = (1/t)[Li (t)c/rdε(t) + Ω(t)g i (t)dt giT (t)ccT gi (t)gi (t)/rtηi (t)]dt ˆ +[A(θ(t)) − P (t)ccT /r]hi (t)dt dµj (t) = [fjT (t)ccT fj (t)/r − µj (t)]/t dt dηi (t) = [giT (t)ccT gi (t)/r − ηi (t)]/t dt dLi (t) ˆ ˆT = Ω(t)Li (t) + Li (t)Ω dt +GQi GT

The new estimator (9-19) is obtained from 3-OM in (Wiberg and DeWolf, 1993) by assuming that parameter θj or ρi is independent of any other parameter, by estimating variances instead of standard deviations, and neglecting all variables that asymptotically do not influence convergence. It is asymptotically equivalent to the state space form of the RPEM of (Ljung and Soderstrom, 1983), but differs in that the parameter update gains kj and hi depend on the Riccati equation (13). It differs from the extended Kalman filter parameter estimator (Ljung, 1979) in that third order moments M are estimated in (18, 19) and sensitivity terms are not present in (13).

5. CONVERGENCE

(24)

(25) dP (t) ˆ ˆ = A(θ(t))P (t) + P AT (θ(t)) + GQ(ˆ ρ(t))GT dt −P (t)ccT P (t)/r (26) ˆ dfj (t) = (1/t)[Ω(t)fj (t) + Aj x ˆ(t)]dt

(18)

Equations (10-13) and (16,17) start from initial conditions that are the means and variances of the initial states and parameters. All third order moment matrices M start at 0, as do all h and k.

(23)

−(1/t)giT (t)ccT gi (t)Li (t)/r dSj (t) ˆ ˆ T + Aj = Ω(t)Sj (t) + Sj (t)Ω dt +ATj − (1/t)giT (t)ccT gi (t)Sj (t)/r

(27)

(28) (29) (30)

(31)

(32)

Assumption 6 assures that the cubic terms in (27) and (28) do not give escape in finite time. Equations (23 - 32) are now in the form to which averaging applies (DeWolf and Wiberg, 1993, theorem 2). Some more definitions need to be made:

Theorem 1. Define E as the expectation operator. The new estimator (9-19) converges to local minima of the negative log likelihood function `(θ, ρ) = limt→+∞ E{(y(t) − 3179

lim

ˆ = θ, lim Li (t) = Λi (θ, ρ) θ(t)

lim

ρˆ(t) = ρ, lim P (t) = Π(θ, ρ)

t→+∞ t→+∞

t→+∞

t→+∞

18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011

lim

t→+∞

Sj (t) = Θj (θ, ρ)

0 = A(θ0 )X + XAT (θ0 ) + GQ(ρ0 )GT

(33)

0 = GQ(ρ0 )G + A(θ0 )W

X = lim E(xxT ), V = lim E(˜ xx ˜T ) t→+∞

t→+∞

+X[A(θ0 ) − A(θ)]T + W T ΩT (θ, ρ)

W = lim E(x˜ xT ), Jj = lim E(˜ xfjT ) t→+∞

0 = Π(θ, ρ)cc Π(θ, ρ)/r

Fj = lim E(fj fjT ), Ti = lim E(˜ xgiT )

+[A(θ0 ) − A(θ)]W + W T [A(θ0 ) − A(θ)]T

Ni = lim E(gi giT ), Yj = lim E(xfjT )

+Ω(θ, ρ)V + V ΩT (θ, ρ) + GQ(ρ0 )GT

t→+∞

t→+∞

t→+∞

T

+XATj + YjT ΩT (θ, ρ)

t→+∞

Ω(θ, ρ) = A(θ) − Π(θ, ρ)C T R−1 C

(34)

0 = A(θ)Π(θ, ρ) + Π(θ, ρ)AT (θ) +GQ(ρ0 )GT − Π(θ, ρ)C T R−1 CΠ(θ, ρ) ˆ i (θ, ρ) + Λi (θ, ρ)Ω ˆ T + GQi GT 0 = ΩΛ ATj

(45)

0 = A(θ0 )Zi + ZiT ΩT (θ, ρ) + W T ccT Λi (θ, ρ)/r (46) T T T T 0 = Ω(θ, ρ)Jj + Jj Ω (θ, ρ) + Yj [A(θ0 ) − A(θ)] +Aj (W − V ) + [V − Π(θ, ρ)]ccT Θj (θ, ρ)/r (47) T T T 0 = Zi [A(θ0 − A(θ)] + Ω(θ, ρ)Ti + Ti Ω (θ, ρ) +[V − Π(θ, ρ)]ccT Λi (θ, ρ)/r (48) T 0 = Aj Yj + [Θj (θ, ρ)cc /r − Aj ]Jj + Ω(θ, ρ)F +{Aj Yj + [Ωj (θ, ρ)ccT /r − Aj ]Jj + Ω(θ, ρ)F }T +Θj (θ, ρ)ccT Θj (θ, ρ)/r (49) T 0 = ΩNi + Λi (θ, ρ)cc Ti /r +[ΩNi + Λi (θ, ρ)ccT Ti /r]T + Λi ccT Λi (θ, ρ)r +Λi (θ, ρ)[A(θ) − Π(θ, ρ)cR−1 ]T + GQi GT (50)

Above, Π(θ, ρ), Λi (θ, ρ) and Θj (θ, ρ) are t → ∞ limits of the solutions to (26),(31)and (32). So

ˆ j (θ, ρ) + Θj (θ, ρ)Ω + Aj + 0 = ΩΘ

(44)

T T

0 = A(θ0 )Yj − W (Aj − Θj (θ, ρ)cc ) /r

Zi = lim E(xgiT )

ˆT

(43)

T

t→+∞

t→+∞

(42)

T

(35) (36) (37)

Next, define: exponential time et = te , the averaged ˆ e ) = θ(te ), the averaged trajectories of trajectories of θ(t ρˆ(te ) = ρ(te ), and just put a bar over µ ¯ and η¯ to denote their averaged trajectories. Then from (22), (23), (29) and (30), dθj = −tr[ccT Jj (θ, ρ)]/rµ¯j dte dρi = −tr[ccT Ti (θ, ρ)]/rη¯i dte d¯ µj = tr[ccT Fj (θ, ρ)/r] − µ¯j dte d¯ ηi = tr[ccT Ni (θ, ρ)/r] − η¯i (38) dte For each i and j, J, T , F and N can be computed from the matrix Liapunov equations for the steady state variance of (6), (10), (27), and (28) put into matrix form as

Only the symmetric part of W needs to be kept in the solution. Then note V (θ0 , ρ0 ) = W (θ0 , ρ0 ) = Π(θ0 , ρ0 ) and Jj (θ0 , ρ0 ) = Ti (θ0 , ρ0 ) = 0.

     x G 0 ˜  G x  −Πc/r  d   =   dw +  dv fj 0 Θj c/r  gi 0 Λi c/r    A(θ0 ) 0 0 0 x Ω 0 0  x ˜  A(θ0 ) − A(θ) + dt(39) Aj Θj ccT /r − Aj Ω 0   fj  T gi 0 Λi cc /r 0 Ω

The stable solution Π(θ, ρ) of the algebraic Riccati equation (35) can be found symbolically as a function of (θ, ρ) for low dimensional systems using the eigenvector/stableeigenvalue solution of the associated Hamiltonian matrix. Given Π(θ, ρ),the solutions of (45 - 50) for J, T , F , and N can be found symbolically as functions of (θ, ρ). Then J, T , F , and N can then be substituted into (38) which gives the trajectories of the averaged estimates (θ, ρ).



If (θ, ρ) converge to their true values (θ0 , ρ0 ), then V (θ0 , ρ0 ) = Π(θ0 , ρ0 ), and, from (47), Jj does not depend on Θj ,i. e. upon the third order moment Mj . Therefore the convergence points of (38) would not depend on Mj . Consequently, satisfactory new algorithm performance may be achieved by omitting some of the innovation updates to system parameter gains kj and the corresponding Mj , though convergence is not guaranteed. Warning: omission of any Mj . invalidates the convergence proof in (Wiberg and DeWolf, 1993).

Define ξ T = (xT x ˜T fjT giT ), so the above equation can be denoted compactly by dξ = Aξ ξdt + Gξ dw + Hξ dv

(40)

The variance Pξ of ξ in steady state of the above compact form of (39) is found from the algebraic Liapunov equation 0 = Aξ Pξ + Pξ ATξ + Gξ Q(ρ0 )GTξ + Hξ RHξT

(41)

Using the definitions (33), the partitions X, W , V , Yj , Zi , Jj , Ti , Fj , and Ni of the above algebraic Liapunov equation can be solved for, in that order, from

7. THE LIKELIHOOD FUNCTION The likelihood function with ` defined in Theorem 1 can have many local maxima in general. Theorems 2 and 3 below show when these local maxima are unique. Theorem 2. If true θ0 of the system parameter θ is known and Assumptions 1 - 8 hold, then the negative log likelihood function `(θ0 , ρ) has a unique local minimum at `(θ0 , ρ0 ),the global minimum. Proof: See Appendix.

3180

18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011

Theorem 3. Suppose assumptions 1 - 9 hold. Then the negative log likelihood function `(θ, ρ) has a unique local minimum at `(θ0 , ρ0 ),the global minimum. Proof: Given the discrete time ARMA model A∗ (q −1 )˜ y (t) = C ∗ (q −1 )ε∗ (t) where q −1 is the backward shift operator, ε∗ (t) is a zero mean, unit variance Gaussian random sequence, and A∗ (q −1 ) and C ∗ (q −1 ) are polynomials in q −1 . (Astrom and Soderstrom, 1974) show that the likelihood function `(θ, ρ) of the y˜(t) sequence in the above ARMA model has a unique local maximum if and only if A∗ (q −1 )Cˆ ∗ (q −1 ) = Aˆ∗ (q −1 )C ∗ (q −1 ) (51) where Aˆ∗ and Cˆ ∗ are the estimated polynomials. With a few trivial modifications, the proof of (Astrom and Soderstrom, 1974) applies to continuous time. Define the unit variance process ek (t) and the process zk (t) to obey ek (t) = √

A∗k (s; θ) zk (t) ρk Ck∗ (s; θ)

(52)

dx = −τ xdt + dw dy = xdt + dv (58) For this example choose R = 1, C = 1, G = 1, Q(ρ) = ρ > 0 and τ > 0. Then ρ is the unknown noise parameter with true value ρ0 , and τ is the unknown system parameter with true value τ0 . Consider the exponential time behaviour of the averaged asymptotic equations (38) for this example (58). From (6), A0 = 0, A1 = 1, so the stable solution to (35) is p Π(τ, ρ) = −τ + τ 2 + ρ p and so Ω(τ, ρ) = − τ 2 + ρ (59) Solutions for Θ, Λ in (36) and (37) are p Θ(τ, ρ) = 1/ τ 2 + ρ p (60) and Λ(τ, ρ) = 1/(2 τ 2 + ρ) To verify Assumption 9, find from (57)

Also define

C(s)/A(s) = 1/(s + τ ) (61)

|A∗k (jω; θ)Ck∗ (jω; θ)|2 = cT (jωI − A(θ))−1 GQK GT (−jωI − AT (θ))−1 C (53) Note the degree of the polynomial A∗k (s; θ) < n, whereas the degree of the polynomial C ∗ (s; θ) = n. The spectrum Sy (ω; θ, ρ) of the output y(t; θ, ρ) from the model (6) is 2π Sy (ω; θ, ρ) = r + cT (jωI − A(θ))−1 GQ(ρ)GT (−jω − AT (θ))−1 C nq X =r+ hTk (jω; θ)Qk hTk (−jω; θ)ρk (54) k=0

In the above, take ρ0 = 1 and hk (jω; θ) as defined in Assumption 9. Define the process y˜k (t; θ, θ0 , ρ) from its spectrum Sy˜k (ω; θ, θ0 , ρk ) where for k = 1, ...nq Sy˜k (ω; θ, θ0 , ρk ) = [hTk (jω; θ)Qk hk (−jω; θ) −hTk (jω; θ0 )Qk hk (−jω; θ0 )]ρk (55) From the continuous-time version of (52), then the result of (Astom and Soderstrom,1974) is that for all ρk > 0, then θ = θ0 must be implied by |A∗k (jω; θ0 )Ck∗ (jω; θ)|2 = |A∗k (jω; θ)Ck∗ (jω; θ0 )|2 Because A∗k (s) is of degree less than Ck∗ (s; θ0 ) for

(56) all k,

then this reduces to, for all k = 1, ...nq,

|A∗k (jω; θ)|2 = |A∗k (jω; θ0 )|2 This is Assumption 9, proving the theorem.

(57)

Theorem 4. If assumptions 1-9 hold, then the new algorithm parameter estimates globally converge with probability one to the true parameter values. Proof: Follows directly from Theorems 1 and 3.

Then condition (8) becomes 1/(ω 2 + τ02 ) = 1/(ω 2 + τ02 ) (62) This is true for all real ω if and onlt if τ 2 = τ02 . Therefore τ = ±τ0 . If τ is any real number, then Assumption 9 is not satisfied. But τ εD restricts τ to be a positive number by stability. Therefore, then Assumptions 1 - 9 are satisfied if the trajectories (τ ( t), ρ( t)) remain in D. In that case, limt→+∞ (τ ( t), ρ( t)) = (τ0 , ρ0 ) with probability one by Theorem 4. Now the parameters of this example are put into dimensionless form. Rather than estimate ρ, estimate φ = −Ω(τ, ρ)/τ0 . Also estimate λ = τ /τ0 . From (42), define κ = 2X/τ0 = ρ0 /τ02 . Then Π/τ0 = φ − λ, τ0 Ω = 1/φ, and τ0 Λ = 1/2φ. Then (43) becomes τ0 W = [ρ0 + X(τ − τ0 )]/(1 + τ0 φφ) So (44) can be rearranged to

(63)

2V (φ − λ)2 λ2 1 − λ2 = + κ( + ) (64) τ0 φ φ 1+φ Note (64) has a unique minimum at (θ0 , ρ0 )εD for all κ > 0. The symbolic variables in MATLAB were used to substitute the above values into (42 - 50) and solve for J, T , F , and N as functions of τ and ρ, which are too long to be presented here. Having J, T , F , and N then permits expressing the averaged estimate trajectory equations (38) as functions of (θ, ρ). To obtain an approximation of their solution, note that F and N are positive, so the µ ¯ and η¯ equations give positive valued solutions that asymptotically decay to µ ¯∞ and η¯∞ , where

8. AN EXAMPLE As an example of finding the trajectories of the averaged parameter estimates, consider the first order linear system

µ∞ = F (τ0 , ρ0 ) and η∞ = N (τ0 , ρ0 ) (65) Assume that off line simulation has given fairly accurate initial values of σ and π, and then the ratio of µ ¯∞ to

3181

18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011

η¯∞ is an adequate approximation of the ratio of µ∞ to η∞ . So the state plane for the averaged estimates and approximately obeys

dτ /dρ = [N (τ0 , ρ0 )/F (τ0 , ρ0 )][J(τ0 , ρ0 )/T (τ0 , ρ0 )] (66)

Fig. 1 plots the state plane for 2 different combinations of τ0 and ρ0 . Fig. 1 shows that the system parameter τ converges quickly at first, and then decays as 1/t, the same rate as the noise parameter ρ. This and other examples simulated by the author exhibit this dynamic behaviour. Fig. 1 analytically supports the claim of fast convergence by the new algorithm because dτ /dρ = (dτ /dt)/(dρ/dt) and ρ convergence is 1/t. The slope dτ /dρ is shown to be very large in Fig. 1.

Fig. 2. State plane of averaged estimates (α0 = −1/5, ρ0 = 5, N (α0 , ρ0 ) = 0.0209, F (α0 , ρ0 = 1.7045) in high noise. 9. CONCLUSIONS If the above Assumptions 1 - 9 hold, then there is no problem of estimates converging to false values at local maxima of the likelihood function that are not the global maximum. If the parameter domain boundaries are properly designed, there should be no problem with convergence. The Assumption 6 at first glance appears to be very ˆ ρˆ(t)) that remain in restrictive. However, trajectories (θ(t), D for a wide range of (θ0 , ρ0 ) and initial conditions can be found quite easily by simulation. The method suggested by (Ljung, 2002) and used in MATLAB for the discrete time RPEM is as follows. The value of any estimate that leaves D at time t∗ is instead taken to be the value of the estimate at time (t∗ −1). This can easily be implemented by ”If” statements in MATLAB. This is equivalent to restarting the algorithm such that the trajectories remain within D, or always hit the boundaries of D after continual re-starts. The latter case can almost always be found by offline simulation. The fast convergence of the new algorithm is useful for applications in failure detection, adaptive control, and on line monitoring. For applications, the new algorithm must first be discretized and analyzed by off line methods. This has been done by the author and a MATLAB program is available by email request.

Fig. 1. State plane of averaged estimates (α0 = −5, ρ0 = 5, N (α0 , ρ0 ) = 0.0055, F (α0 , ρ0 = 0.304) in high noise.

The new algorithm still has many equations even in its simplified form. The computations may be reduced somewhat by omitting some third order updates, but there are somewhat more than the extended Kalman filter used as a parameter estimator. The new algorithm apparently cures the defects of the extended Kalman filter, and can be extended to improvements known for the Kalman filter such as H infinity, hidden Markov models, etc.

3182

18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011

REFERENCES

Ω(ρ) = A − Π(ρ)C T R−1 C

Astrom, K. J., and T. Soderstrom Uniqueness of the maximum likelihood estimates of the parameters of an ARMA model. IEEE Trans. Automat. Contr.,19(6), pp. 769 - 773, 1974. Brewer, J. W. Kronecker products and matrix calculus in system theory. IEEE Trans. Circuits and Systems, 25: 772–782, 1978. De Wolf, D. G. and D.M.Wiberg. An ordinary differential equation technique for continuous time parameter estimation, . IEEE Trans. Automat. Contr.,38, 4, pp. 514-528, 1993. Ljung, L. System Identification: Theory for the User(2nd ed.). Prentice-Hall 1999. Ljung, L. and T. Soderstrom. Theory and Practice of Recursive Identification. MIT Press 1983. Ljung, L. Asymptotic behaviour of the extended Kalman filter as a parameter estimator for linear systems. IEEE Trans. Automat. Contr., AC-24, 1, pp. 36 50, 1979. Wiberg, D.M. and D. G. DeWolf A convergent approximation to the continuous time optimal parameter estimator. IEEE Trans. Automat. Contr.,38, 4, pp. 529-545, 1993. Wiberg, D. M., S.-W. Oh, J.-S. Youn, L. C. Johnson, and S.-K. Hong A fix-up for the EKF parameter estimator. Proc. IFAC 2008, Seoul, Korea. Wiberg, D.M., T. D. Powell and D. Ljungquist An on-line parameter estimator for quick convergence and timevarying linear systems. IEEE Trans. Automat.Contr., 45, 10, pp.1854 - 1863, 2000. Gasser, S. Ph. D. Thesis. School of Engineering, U. Calif. Santa Cruz, to appear, 2010. Kushner, H. J. On the differential equations satisfied by conditional probability densities of Markov processes, with applications. SIAM J. Control, 2, pp. 106 -119, 1964. Mercere, G., and M. Lovera. Convergence analysis of instrumental variable subspace identification algorithms, Automatica,43, pp. 1377-1386, 2007. Young, P.C., and A.J. Jakeman. Refined instrumental variable methods of time-series analysis: Part III, extensions Int. J. Control, 31, pp. 639 - 644, 1980.

Define eigenvalues of Ω as λ1 , λ2 , ...λn , the eigenvectors of Ω(ρ) as b1 (ρ), b2 (ρ), ...bn (ρ), and the reciprocal eigenvectors of Ω(ρ) as β1 (ρ), β2 (ρ), βn (ρ). By definition, bTi (ρ)βj (ρ) = δij , the Kronecker delta. For i = 1, 2, , n, then

In this Appendix, consider the system (6) with known system parameters θ and unknown process noise parameters ρ. Then A(θ) = A in (6) and Ω(θ, ρ) = Ω(ρ) in (34). Use of a general Jordan form with generalized eigenvectors for either A or Ω(ρ) leads to prohibitively lengthy proofs in the following. So it will be assumed here that A and Ω(ρ) each have a full set of n linearly independent eigenvectors, i. e., A and Ω(ρ) are called cyclic or non-derogatory. However, this proof can be adjusted to the general case. Also assume Assumptions 1 - 9. For this entire Appendix, superscript T denotes the complex conjugate transpose. Denote the eigenvalues of A as λ01 , λ02 ...λ0n , unit length eigenvectors a1 , a2 , ...an , and unit length reciprocal eigenvectors α1 , α2 , ...αn .Then the spectral representation of A is A=

+

λ02 a2 α2T

+ ... +

λ0n an αnT .

λi = λ0i − αiT Π(ρ)ccT ai /r, Re(λi ) < 0

(A.3)

bi = ai and βi = αi

(A.4)

Proof: Substitute (A.1) in the definition Ω(ρ)bi = λi bi . Pre- and post-multiply by αiT and ai to obtain (A.3) and (A.4). Since A is stable, Re(λ0i ) < 0, and since the system is observable, Π is non-negative definite, which then shows Re(λi ) < 0 in (A.3). Corollary. The eigenvalues φ of Ω(ρ) + A are φi = 2λ0i − cT Π(ρ)c/r, Re(φi ) < 0

(A.5)

Proof : Use A = 2A in the proof of lemma 1. Lemma 2. The solution Π(θ, ρ) of the algebraic Riccati equation (35) becomes Π(ρ). Then for a constant k0 ≥ 0, and constant nq-vector k whose elements are non-negative,



√ k0 + k T ρ ≤ cT Π(ρ)c ≤ k0 + k T ρ

(A.6)

where ρ means the n-vector whose elements are the positive square root of the elements of ρ and Π(ρ) is monotonic strictly increasing and differentiable. Proof: The algebraic Riccati equation (35) repeated below has a symmetric and positive definite solution Π(ρ) = ΠT (ρ) > 0 because of the controllability and stability in Assumption 4. 0 = AΠ(ρ) + Π(ρ)AT +GQ(ρ)GT − Π(ρ)cT cΠ(ρ)/r

(A.7)

In terms of the closed loop system matrix the above is

Appendix A. PROOF OF THEOREM 3

λ01 a1 α1T

(A.2)

(A.1)

0 = ΩΠ(ρ) + Π(ρ)ΩT +GQ(ρ)GT + Π(ρ)cT cΠ(ρ)/r

(A.8)

Adding (A.7) and (A.8) gives 0 = [AΩ(ρ)]Π(ρ) +Π(ρ)[A + Ω(ρ)]T + 2GQ(ρ)GT

(A.9)

Solving for Π(ρ) and pre- and post-multiplying by cT and c, then calling ε = cT Π(ρ)c gives ε = 2

Z∞

T

cT e(A+Ω(ρ))t GQ(ρ)GT e(A+Ω(ρ)

0

But cT e(A+Ω(ρ))t = cT e−Π(ρ)cc

Lemma 1. From (34), define the closed loop matrix 3183

T

(t/r) 2At

e

)t

cdt (A.10)

18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011

= cT {

∞ X

Substitute the above into (A.19) and perform the integration to get

(Π(ρ)ccT )k (−t/r)k /k!}e2At

k=0

=

∞ X

`(ρ) − `(ρ0 ) =

(εI)k [(−t/r)k /k!]cT e2At

k=0 T (2A+(ε/r)I)t



=c e (A.11) Using this in (A.10) and solving from [Brewer, 1978] gives ε=

n X n X i=1 k=1

βkT Ω(ρ0 )bk

Call = ak . Then use the spectral representation of Ω in the above to get

n X n X p αi aTi ccT aj αjT p G Q(ρ)} (A.12) 2tr{ Q(ρ)GT φi + φ∗j i=1 j=1



Rearrangement and use of the corollary (A.5) gives ε=

n X n X (cT ai )(cT aj )(αT GQ(ρ)GT αj ) i

ε/r − λ0i −

i=1 j=1

λ∗0j

βk [Ω(ρ) − Ω(ρ0 )]bk bTi [Ω(ρ) − Ω(ρ0 )] (A.21) λi + λk

`(ρ) − `(ρ0 ) = n X n X [ak − λ(ρ)][ak − λ(ρ)]

(A.13)

But the limit as ρ → ρ0 implies ak = λk (ρ0 ). Then the negative log likelihood function can be expressed in terms of the closed loop eigenvalues λi as

Proof of the lemma (A.6) follows. Proof of the Theorem: For (6) with Assumptions1 - 8, then (44) with θ = θ0 becomes 0 = Ω(ρ)V + V ΩT (ρ) + GQ(ρ0 )GT +Π(ρ)ccT Π(ρ)/r (A.14) Setting ρ = ρ0 in (A.9) and subtracting from the above: 0 = Ω(ρ)[V (ρ) − Π(ρ0 )] + [V (ρ) − Π(ρ0 )]ΩT (ρ) +Ω(ρ)Π(ρ0 ) + Π(ρ0 )ΩT (ρ) − Ω(ρ0 )Π(ρ0 )

l(ρ) − l(ρ0 ) = −

Note this interesting expression for the change in the likelihood function can be expressed solely in terms of the open and closed loop eigenvalues λ0i and λi of the associated Kalman filter. Now it remains to be proved that the distances δi = |λi − λ0i | are monotonically increasing with ρ, which would prove the theorem. To do this, note that from (A.3) αiT Ω(ρ) = λi (ρ)αiT = αiT λi0

−Π(ρ0 )ccT Π(ρ0 )/r Because Ω(ρ) − Ω(ρ0 ) = [Π(ρ0 ) − Π(ρ)]c/r,

−αiT (αiT Π(ρ)ccT ai /ri ) (A.24)

(A.15)

Since λi0 is the eigenvalue of A, then

Ω(ρ)[V (ρ) − Π(ρ0 )] + [V (ρ) − Π(ρ0 )]ΩT (ρ)

`(ρ) − `(ρ0 ) =

+[Π(ρ) − Π(ρ0 )]ccT [Π(ρ) − Π(ρ0 )]/r = 0 (A.16) Solving the above for gives [V (ρ) − Π(ρ0 )] gives

n X n X

0 (ρ)t]

αiT [Π(ρ)

As in (A.10)

eΩ(ρ)t [Π(ρ) − Π(ρ0 )] T

 − Π(ρ0 )]ccT ai αjT [Π(ρ) −Π(ρ0 )]ccT aj  T  (A.25) αi Π(ρ)ccT ai + αjT Π(ρ)ccT aj −λ0i − λ0j



i=1 j=1

ccT [Π(ρ) − Π(ρ0 )]eΩ

n X n X (λi − λi0 )(λj − λj0 ) (A.23) λi + λk i=1 k=1

−Π(ρ0 )ΩT (ρ0 ) + Π(ρ)ccT Π(ρ)/r

V (ρ) − Π(ρ0 ) = Z∞

(A.22)

λi + λk

i=1 k=1

αiT Π(ρ)ccT ai =

dt/r (A.17)

Z∞

Define

2

T

αiT e(A+Ω(ρ))t GQ(ρ)GT e(A+Ω(ρ)) t ccT ai dt(A.26)

0

ξ(t) = cT eΩ(ρ)t [Ω(ρ) − Ω(ρ0 )]c (A.18) From (A.11) and the spectral representation of A, then Pre- and post-multiplying by cT and c and using (A.17)gives `(ρ) − `(ρ0 ) = cT [V (ρ) − Π(ρ0 )]c Z∞ = ξ 2 (t)dt

T

(A.19)

0

ξ(t) = cT

k=1

(A.27)

k=1

From (A.3) and (A.4), then

From (A.2 - A.4) and the spectral representation of Ω(ρ), then (18) becomes n X

T

e(A+Ω(ρ)) t c = e(2A−(ξ/r)I) t C n X = e−ξt/r e2λ0k t αk aTk c

e(A+Ω(ρ))t c = bk eλk (ρ)t (λk (ρ) − λk (ρ0 )βkT c (A.20)

n X

T

T

e(2λ0k −αk Π(ρ)cc

ak )t

ak αkT (A.28)

k=1

Using (A.27) and (A.28) in (A.26) and integrating, then

3184

18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011

αiT Π(ρ)ccT ai = n X

αkT GQ(ρ)GT αi cT ak cT ai T [αi Π(ρ)ccT ai + ξ/r]/2 − λ0i − k=1

λ∗0k

(A.29)

Note that cT ai can not be zero because the system (6) is observable. Therfore (A.25) is a Hermitian positive definite quadratic form in αiT Π(ρ)ccT ai , as can be seen by induction from the Sylvester criterion. This means that αiT Π(ρ)ccT ai satisfies the same kind of equality as ξ in lemma 2. Consequently αiT Π(ρ)ccT ai is monotone strictly increasing in ρ, and satisfying (A.6), as is ξ. From (A.29) then `(ρ) has a unique minimum at `(ρ0 ), which proves the theorem. The author thanks Profs. Juan Yuz, Juan Carlos Aguero, Rolf Johanson, his students Raj Maitra, Safa Gasser, and Matt Reister and the anonymous reviewers for their help with the paper.

3185