On robust AML identification algorithms

On robust AML identification algorithms

Pergamon Autoramica, Vol. 30, No. 11, pp. 1775-1778, 1994 Copyright © 1994 EL~vier Science Ltd Printed in Great Britain. All rights reserved 0005-I09...

352KB Sizes 0 Downloads 101 Views

Pergamon

Autoramica, Vol. 30, No. 11, pp. 1775-1778, 1994 Copyright © 1994 EL~vier Science Ltd Printed in Great Britain. All rights reserved 0005-I098194 $%00+ 0.00

0005-1098(94)E0012-7

Brief Paper

On Robust AML Identification Algorithms* VOJISLAV ]'.. FILIPOVIt~t and BRANKO D. KOVA(L~EVI~, Key Words--Identification; recursive algorithms; robustness; convergence. Abstract--Strong consistency results for a class of nonlinear approximate maximum likelihood algorithms for robust system identification are developed, where the system is assumed to be of the ARMAX form. The analysis uses the Martingale results, and strong consistency is shown to hold under a new assumption, representing a generalization of the strictly positive-real condition. Arguments are also given for using Huber's nonlinearity, in order to reduce the influence of outliers in practice.

2. P r o b l e m f o r m u l a t i o n

Let the system under consideration be described by a linear singie-input/single-output ARMAX model A(q-~)y(i) = B(q-~)u(i) + C(q-1)e(i),

(1)

where q-1 is the backward shift operator: q-ly(i) = y(i - 1), and

A(q -I)=I+

~ akq-k;

B(q -I)= ~ bkq-~;

k=l

k=l

e C(q -I) = I + ~. ckq -k,

1. I n t r o d u c t i o n

ESTIMATIONALGORITHMSbased on the Gaussian model have been found to be especially inefficient when the real distribution belongs to the heavy-tailed variety, occasionally giving rise to very large outliers (Barnet and Levis, 1978). Considerable efforts have been oriented towards the design of robust estimation algorithms possessing a low sensitivity to distribution changes, usually valid locally within a prespecified distribution class (Ershov, 1978; Hogg, 1979). The fundamental contribution to the field of robust estimation has been given by Huber (1964), who introduced the concept of min-max robust estimation. Further developments of this idea and applications to different types of problems have led to many valuable achievements (Masreliez and Martin, 1977; Poljak and Tsypkin, 1979; Tsypkin, 1984). However, the reported results in the area of robust dynamic system identification are few. An analysis of robustified Gauss-Newton type recursive prediction error algorithms using the ODE approach is given by Ljung and S0derstrOm (1983). Strong consistency results for a class of robustified stochastic approximation (SA) algorithms using the Martingale theory are developed by Filipovi6 and Kova~evi6 (1987). This analysis suffers from a restriction to white and uniformly bounded system disturbances. An application of robustified SA algorithms for identification of linear dynamic systems with correlated disturbances, and a local convergence analysis based on the ODE approach, is discussed in Kova~vi~ and Filipovid (1988). In this paper strong consistency results for a class of robustified approximate maximum likelihood (AML) algorithms of pseudolinear regression (PLR) type are developed, where the system is assumed to be of the ARMAX form. The analysis uses the Martingale results, and strong consistency is shown to hold under a new condition representing a generalization of the strictly positive-real (SPR) condition. Arguments are also given for using Huber's min-max optimal cost function, in order to eliminate the effects of outliers in practice.

(2)

k=l

are characteristic, control and disturbance polynomials, respectively. Here y(i)eR ~, u(i)eR I and e(i)eR I are the system output, the measurable input and the stochastic disturbance, or noise, respectively. It will be assumed that the sequence {e(i)} is a zero-mean white stochastic process, generating an increasing sequence of sub-sigma algebras {F~}, and that constants m,n and ¢ are known a priori. The problem of recursive identificationof the system (I) can be considered as the task of estimating the vector 0 T = {al,..., a,, b~ ..... b=, ci ..... ce} in real-time, on the basis of current measurements. Formulation of the P L R or extended least squares (ELS) type identification schemes reduces to the choice of the performance criterion (e.g. Ljung and S~lerstr0m, 1983) J,(0) = i-I ~ Fie(k, 6)]; e(i,O) = y(i) - zT(i)0,

(3)

k=!

zT(i) = { - - y ( i -- 1). . . . . - - y ( i -- n), u ( i - 1) . . . . . u ( i - m), v ( i - 1) . . . . . v(i - ¢')}, (4) v(i) = y(i) - Z T ( i ) O ( i ) , where e(.) is the prediction error, v(.) is the residual and 0(i) is the parameter vector estimate at time i. In the AML scheme the cost function is chosen as F(.) = -logp(.), where p(.) is the noise probability density function (pdf). Particularly for the Gaussian noise, F(.) is the quadratic function, and the resulting algorithm minimizing (3) is the standard linear AML. However, the ML method is very sensitive to deviations of the real noise pdf from the assumed one, and in the presence of outliers it ceases to work (Poljak and Tsypkin, 1979). Thus, one should modify the algorithm to make it more robust. A possible approach is to use the cost function F(.) that is quadratic for small arguments, but increases more slowly for large arguments. The recursive minimization of such a criterion can be done by using the approximate Newton-Raphson type method (Tsypkin, 1984)

* Received 9 August 1991; revised 10 April 1992; revised 29 October 1992; revised 24 May 1993; received in final form 30 November 1993. This paper was not presented at any IFAC meeting. This paper was recommended for publication" in revised form by Associate Editor I. Mareels under the direction of Editor P. C. Parks. Corresponding author Professor Branko Kova~vi~. Tel. +381 11 324 8464; Fax +381 11 324 8681. l" Cellulose Industry 'Viskoza', 15300 Loznica, Serbia. *Faculty of Electrical Engineering, University of Belgrade, Bulevar Revolucije 73, Belgrade, Serbia.

0(i) = 0(i - 1) - [iV~J~(6(i - 1))]-'[iV~/j(6(i - 1))];

r(0) = rio.

(5)

Moreover, with large i and by virtue of the approximate truth of the optimality conditions, yielding VJ~_t(O)~-O, one obtains iVsJi(fi(i - 1)) = - Z ( i ) q J ( e ( i , fi(i - 1))); iV~J,(6(i - 1)) = a ~ k=|

1775

Z(k)ZT(k),

(6)

1776

Brief Papers

where qJ(.)=F'(.) and a=E{O'(e(i))}. By introducing F(i) = [iV~i(6(i- 1))] -I and applying the matrix inversion lemma 6(0 = 0(i - 1) + F(i)Z(i)qJ(e(i, O(i - 1))),

6(0) = 60, (7)

I'(i - I)Z(i)ZT(i)F(i- 1) c~-~ + ZT(i)F(i - 1)Z(i) '

F(0) = yl, (8)

F(i) = F(i - 1)

where y is a finite positive constant. The robust recursive A M L is defined by (7), (8) and represents a robust version of the conventional linear recursive PLR or ELS method (Ljung and S6derstrtim, 1983).

3. Convergence analysis The convergence property of the proposed robust A M L can be investigated using the standard Martingale theory (Neveu, 1975). The results are summarized in the theorem stated below. However, first we need to prove the following auxiliary lemma.

Lemma 1. Consider the model (1), (2) and the algorithm (7),

H2: {e(i)} is a sequence of independent and identically distributed random variables with symmetric distribution. H3: The function 0(') is odd and continuous almost everywhere. H4: The function q,(.) is uniformly bounded. H5: There exists the passive operator H A such that Hz = d~2(z/C(q-1)) - az/2, where 0(/) = 0(0 - 0 and d~((gx(i - 1)Z(i)/C(q-1)) = e { ~ ( - e ( i , 0(i - 1)))[ F~_,}. H6: There exists a constant c > l such that lim log~r(i)/hmm{F ~(i)} = 0,

where Ami,{'} denotes the minimal eigenvalue. Then the sequence {O(i)} exhibits strong consistency.

Proof. Introducing Lyapunov's stochastic function V (i) = OT(i)F- ~(i)O(i), we obtain from (7) 1)F-~(i)0(i - 1) + 20T(i -- 1)Z(i)t#(e(i, 0(i - 1))) + ZT(i)F(i)Z(i)~O2(e(i, 0(i - 1))).

V ( i ) = O T ( i --

(8) subject to the assumptions:

Assumption 1. The coefficient a is positive and bounded. Assumption 2. The

factor r(i) = Tr{F-I(i)} satisfies l i m i n f r ( i ) = ~ with probability one (w.p.1), where Tr{.}

denotes the trace. Then the weighting matrix F(i) satisfies

~ ZT(i)F(i)Z(i) < ~ i=l log ~ r(i)

c>l,

A = Z(i)

V(i) = V(i - 1) + 20T(i -- 1)Z(i)~O(e(i, O(i - 1)))

I F - ~ ( i ) I - I F - ~ ( i - 1)1 IF-'(i)I '

(10)

+ a(0T(i -- 1)Z(i)) 2

+ ZT(i)F(i)Z(i)q~2(e(i, O(i - 1))). Let us define the function

dp2(OT(~--~(ql_~Z(i))= E{~[J2(-e(i, 6 ( i - 1 ) ) ) lFi_l},

d

IF ~(i)t = ]7 A~{F t(i)}-< adax{F-2(i)},

e(i, O(i - 1))

(11)

0T(i -- 1)Z(i) ~-e(i).

i--1 d

r(i) = E a,(r-'(i)} -> a ~ d r - ' ( i ) } ,

(12)

(19)

C(q -1)

& {oT(i -- 1)Z(i)]

where &{.} is an eigenvalue, with amax{'} being the maximal eigenvalue and d = n + m + tL Taking into account (11) and (12), one concludes

(20)

Under the hypotheses H1, H2 and H4, one concludes

i=1

r(i) >- IF-l(i)l yd.

(18)

where the prediction error (Goodwin and Sin, 1984)

where I'1 denotes the determinant. In addition, we obtain

)<-k,,

(21)

for some 0 < kl < ~. Using H2, H3, H5 and (21), one obtains

(13)

Thus, under Assumption 1, (10) and (13), one obtains

< V(i - 1) 20T(i -- 1)Z(i) - loft r(i) log ~ r(i) [ /OT(i--1)Z(i)~ a0T(i -

1 ~oaZT(i)F(i)Z(i) a i~

(17)

Using (15) and (16), one obtains

"(9)

Then the Schur's formula allows the determinant of a partitioned matrix to be written as a product of the component determinants, yielding

aZr(i)F(i)Z(i)

Moreover, using the matrix inversion lemma, (8) can be rewritten as F - ' ( i ) = r - ' ( i - 1) + aZ(i)Z'r(i). (16)

r(i) = r(i - 1) + azT(i)Z(i). zT(i)] F '(i)J"

(15)

By applying the trace operation on (16)

w.p.1.

Proof. Let us define the matrix A in the partitioned form as a-'

w.p.1,

1og~ r(i)

× [4~1/

< d c ~, I r - ~ ( i ) l - Ir-~(i - 1)t - a ~--~0 IF-I(i)[ l°g~ IF-~(i)I d ~ flr-R~)l dt

~q-:~

" 5 1)Z(')]

)

+ k, zT(i)F(i)Z(i) logc r(i)

(22)

<--a- JIr-'.0)l t log': t 1 ( c - 1 ) l o g ¢-1 IF I(io)l < %

w.p.1.

(14)

Here the facts that [F-~(i)I>I for some i>-io and IF-I(~)I = ~, w.p.1, owing to Assumption 2, are used. The proof is thus completed.

Moreover. using the definition of the passive operator (Desoer and Vidyasagar. 1975), from H5 it follows that

S(i) = 2 k= ~ O ~T( k - 1 ) Z ( k ) [ ~ b , ( aOT(k

O'~(k 1)Z(k) ~-~-~)

2--1)Z(k)]- + k2 -> O,

(23)

Theorem 1. Consider the model (1), (2) and the algorithm (7), (8) subject to the assumptions of the lemma, and assume further that the following hypotheses are satisfied: HI: All zeros of the polynomial C(q -~) are inside the unit circle.

for some 0

<

k2 < ~- Let us define a quantity

S(i) r ( i ) = R ( i ) + logCr(i),

R(i)

V(i)

log~r(i ).

(24)

1777

Brief Papers Using (22)-(24), concludes

and

since

log r(i - 1) <- log r(i),

one

zT(i)F(i)Z(i) ~ .

(25)

E{T(i)lF,_~}<-T(i-1)+k~

Under (25) and Lemma 1, the Martingale convergence theorem (Neveu, 1975) implies l i m T ( i ) = T * , w.p.1, yielding ~im R ( i ) = R*, w.p.1, where T* and R* are finite constants. Additionally, since ~(i) =

Tr{F-~(i)O(i)or(i)} Iog~ r(i) A~,{F-~(i)} II0(i)11a logc r(i)

of the parameter estimates increment A0(i) = 0(i) - 0(i - 1) is given by

e{ao(i) aT0(i) I e,_,}- i-lB-a(i)Z(i)ZT(i)B-t(i)p(~, p), (28) i

where /3 = E{~bE(e(i))} and B ( i ) = ~,Z(k)ZT(k). Thus, (28) 1

depends on qJ(') only through the scalar p(.). If the aim is to desensitize the algorithm with respect to outliers occurring possibly at time i, the choice of ~(.) should thus be such that p(.) is as small as possible. Huber's min-max optimal solution satisfies p(~bo,p) -< p(~0o, P0) <- p(~b, P0), and it is given by (Huber, 1964)

(26)

from H6 it follows that, limll0(i)ll2=0, w.p.1, which completes the proof.

Remark 1. H1 is a commonly used assumption when the standard Martingale results are applied for convergence analysis. Notice also that A(q -~) has not to be stable. H2 represents a standard noise condition in the robust estimation, while H3 and H4 define a class of nonlinearity ~. Particularly, for the linear ~b, H4 is not fulfilled, and hence the theorem excludes the conventional AML. However, we restrict our attention to bounded and continuous ¢, for .practical reasons. Namely, boundedness insures that no single observation can have an arbitrary large influence, while continuity insures that many rounding or quantization errors will not have major effects (PapantoniKazakos, 1977). Moreover, the noise variance has not to be finite provided ~ is bounded. Finally, it is hoped, and numerical experiments seem to substantiate this hope (Kova~vi~ and Filipovi6, 1988), that the robust estimators approach their asymptotic behavior rather fast provided ~ is bounded. The condition H5 is different from the standard AML assumption and represents a generalization of the SPR condition. Namely, for the zero-mean normal noise with variance 002, i.e. p ( e ) ~ N ( O , tr2), and ~b(z)=z/00 2, one obtains a = = 00-2. Thus, (23) reduces to

q'0(') = -[logpo(')]',

Po = arg min I(p),

(29)

where l(p) is the Fisher's information, and P is a given class of pdf's to which the real unknown noise pdf p belongs. In this case, t~ can be approximated by a ~-E{qJ~(.)}= I(po). Let us illustrate this in the following example.

Example. Assume that the noise pdf is a zero-mean normal, i.e. p , ( e ) - N ( O , 002), while the true noise pdf is contaminated normal pc(e) ~ (1 - e)N(O, tr2) + eh(.), where 0 -< e -< 1 and h(.) is zero-mean symmetric pdf with variance 00~>>002. For the standard AML qJ(z)=qJ,(z)=z/00 2, yielding p(~bl,p,)= 002 and p(qJl, p¢)-~e00 2. Thus, the ratio of the norms of the corresponding matrices in (28) is approximately given by P(qh, Pc)/P(~I,P,)= eo~0/o'~,>> 1. This result is the evidence of decrease in the rate of the standard AML convergence, owing to the presence of outliers generated by the pdf h(.). Particulsrly, if h(.) is of the Cauchy type, then 002= ~, and the standard AML does not work. On the other hand, the choice of ~b0(') in (29) introduces robustness, since P(~o, Pc) <- P(~bo, Po) = l-l(Po) • For the contaminated pdf class concerned, the worst case pdf, P0, is normal with exponential tails, giving rise to ~b0(z) = min {Izl/cr2.,k/o'S.}sgnz, where the tuning constant k has to be chosen so as to give the desired efficiency at the normal model (Huber, 1964). Particularly,

i

z2[llC(q -~) - 1/2] + k 2 > 0

for

R ,I / C ( q -~) - 1/2>0.

k=l

a =

p(e)de/o'~,

~l(z/C(q-1)) = olz/C(q-1);

k

In addition, H5 assumes the function tk~ to be monotone increasing and positive (negative) for positive (negative) arguments, as well as that a is positive and finite. Under H3 and H4, this will be fulfilled provided q~ is monotone increasing and piecewise continuously differentiable (Huber, 1964; Poljak and Tsypkin, 1979). Of course, ~b2 in (19) exists under the same conditions as ~ . H6 is one of the weakest conditions for the consistent parameter estimates (Lai and Wei, 1982; Chen and Ruan, 1987).

Remark 2. The convergence for the linear AML has been proven by Chen and Guo (1986) under a weaker noise condition than H2 ({e(i)} is a Martingale difference sequence instead of the white noise). Furthermore, the linear AML possesses a higher rate of convergence than the robust AML (Chen and Guo, 1986). However, this is the price we paid for robustness (see next example). Pro6eeding from (26), one concludes that the rate of estimates convergence is defined by I10(i)11 =O{[log~r(i)/A~i,{F-~(i)}]~/2} .

~2(z/C(q-1)) = 00~2[~(z/C(q-t))2 + cr22k2(1 - ~)]. Thus, H5, i.e. (23), reduces to the SPR condition. To illustrate the character of the resulting algorithm, we have applied the linear AML and the robust AML (RAML) to the model (1) with A(q -1) = 1 + 0.83q -1 - 0.167q -2,

B(q -1) = 0.167q -I,

C(q -1) = 1 +0.2q -1. Here {u(i)} is taken to be white Gaussian noise N(0, 1), while {e(i)} is contaminated normal, distributed as p~(') with e = 0.1 and o-~ = 1. Moreover, Huber's robust procedure is used, i.e. k = 1.5 in ~0. Table 1 depicts the average mean-square error norm, obtained on the basis of 100 Monte Carlo runs and 1000 iterations, for different outlier statistics h. Obviously, the linear AML is slightly superior for the Gaussian noise. However, in the presence of outliers it leads

(27)

Obviously, this is a function of a, since r(i) and F(i) depend on a. However, it is very difficult to express this dependence explicitly. Moreover, one should judge the convergence rate, and consequently the estimator robustness, through the asymptotic estimation error convariance matrix. Unfortunately, this expression cannot be derived for PLR or ELS, robustified or not (Hannan and Deistler, 1988). Therefore, let us consider the problem from another point of view. Namely, linearizing ~(-) in the vicinity of/9 and assuming that [oT(i)Z(i)]2 <--6 << p(~,p) =/3a -~, the conditional covariance

TABLE 1. PARAMETER ERROR NORM COMPARISON FOR A M L AND ROBUST A M L (RAML) ALGORITHMS (N-NORMAL; L-LAPLACE; C-CAUCHY OUTLIER PDF h)

h(.) Method

N(0, 1)

N(0, 10)

L(0, 1)

C(0, 1)

AML RAML

0.0219 0.0353

0.1618 0.0276

0.1544 0.0317

1.9145 0.0482

1778

Brief Papers

to the biased estimates, while RAML performs quite well. The reason lies not only in the nonlinear transformation of the prediction errors, but also in an adequate way of generating F(i), where a keeps the eigenvalues of F(i) at values high enough to obtain fast convergence and, at the same time, small enough to provide noise immunity.

4. Conclusion Strong consistency results have been established for a class of robustified recursive AML identification algorithms. The proposed robust AML differs from the standard AML by the insertion of a suitable chosen nonlinear transformation of the prediction errors, which has to cut-off the outliers. The analysis uses Martingale results, and strong consistency is shown to hold under a new assumption representing a generalization of the SPR condition. Arguments are also given for using Huber's nonlinearity, in order to reduce the influence of outliers in practice. Acknowledgements--The authors wish to express their sincere thanks to the referees for valuable suggestions which improved the final manuscript. References Barnet, V. D. and T. Levis (1978). Outliers in Statistical Data. John Wiley, NY. Chen, H. F. and L. Guo (1986). Convergence rate of least squares identification and adaptive control for stochastic systems. Int. J. Control, 44, 1459-1476. Chen, H. X. and R. Y. Ruan (1987). Strong consistency and convergence rate of ELS. Cont. Theor. Advanc. Techn., 3, 149-159. Desoer, C. A. and M. Vidyasagar (1975). Feedback Systems: Input-Output Properties. Academic, NY.

Ershov, A. A. (1978). Robust methods of parameter estimation: a survey. Autom. Rein. Control, 8, 66-100. Filipovi6, V. Z. and B. D. Kova~vi6 (1987). Analysis of robust recursive identification algorithms. In Proc. lOth IFAC World Congress, Munich, FRG. Goodwin, G. C. and S. Sin (1984). Adaptive Filtering, Prediction and Control. Prentice-Hall, Englewood Cliffs, NJ. Hannan, E. J. and M. Deistler (1988). The Statistical Theory of Linear Systems. John Wiley, NY. Hogg, R. V. (1979). Statistical robustness: a view of its use in applications. Am. Stat., 69, 108-115. Huber, P. J. (1964). Robust estimation of location parameter. Ann. Math. Stat., 35, 73-101. Kova~vi6, B. D. and V. Z. Filipovi~ (1988), Robust real-time identification of linear systems with correlated noise. Int. J. Control, 48, 993-1010. Lai, T. L. and C. Z. Wei (1982). Least squares estimates in stochastic regression models with applications to identification and control. Ann. Star., 10, 154-166. Ljung, L. and T. S6derstr6m (1983). Theory and Practice of Recursive Identification. MIT Press, CA. Masreliez, C. I. and R. D. Martin (1977). Robust Bayesian estimation for the linear model and robustifying the Kalman filter. IEEE Trans. Aurora. Control, 22, 361-371. Neveu, J. (1975). Discrete Parameter Martingales. NorthHolland, Amsterdam. Papantoni-Kazakos, P. (1977). Robustness in parameter estimation. 1EEE Trans. Inf. Theory, 23, 223-233. Poljak, B. T. and Ya. Z. Tsypkin (1979). Adaptive estimation algorithms: convergence, optimality, stability. Aurora. Rein. Control, 3, 71-81. Tsypkin, Ya. Z. (1984). Foundations of Informational Identification Theory. Nauka, Moscow.