Copyright © IFAC Adaptive Systems in Control and Signal Processing. Glasgow. l' K. 1989
ROBUST PARAMETER ESTIMATION PROCEDURE FOR E-CONTAMINATED PROBABILITY DISTRIBUTION MODELS WITH UNCERTAIN GROSS-ERROR K. Uosaki and K. Saito Department of Applied Physics, Osaka University, Suita, Osaka 565, Japan
Abstract. Recently the concept of robustness has attracted considerable attentions of control engineers, and robust procedures have been developed for several control problems. One of the most important ideas on this direction is Huber's M-estimator (maximum likelihood type estimator). He first identified neighborhoods of stochastic models by in terms of the class of £-contaminated probability distributions. Then, he derived an estimator that minimizes the maximum degradation of performance possible for an £-deviation from the assumption. The idea, however, is not applicable without the exact knowledge of the gross -error £ . So, an M-estimator is derived in this paper, which is applicable only with a priori knowledge of the upper bound of the unceratin gross-error. A numerical example illustrates the usefulness of the proposed robust estimation procedure. Keywords . Stochastic systems; parameter estimation; robust procedure; error; M-estimator; minimax variance; identification.
INTRODUCTION
estimator) (1972, 1981). He, first, identified the neighborhoods of stochastic models which are supposed to contain the true distribution generating the data as
Many assumptions commonly made in science and engineering problems are at most approximations to reality and they do not always hold unfortunately. Recognizing this fact, many people have recently given their attentions to the concept of robustness and developed robust procedures in design of control systems, detection systems, filters, etc. (see, for example, Masreliez and Martin (1977), Polyak and Tsypkin (1979, 1980), Dorato (1987), Rougees and coworkers (1987), and Uosaki and Oumi (1988)) . For example, we consider an identification problem of parameter e=(a ,a , ... ,a ) l 2 p regressive (AR) model, Yn
T
p£={FIF =(l-£)F +£F ; O l FO is a known symmetric distribution, Fl is an arbitrary (unknown) symmetric distribution, 0<£<1 is a known fixed constant},
n
n
n=1,2, .. . ,N
(2)
which he called an £-contaminated probability distribution model (gross-error model). Then, he derived the estimator that minimizes the maximum degradation of performance possible for an £-deviation from the assumption. And hence, the estimator behaves optimal over the whole neighborhood in this minimax sense. The estimator, however, depends on the exact value of the gross-error £, which we never know, and hence, we cannot apply this approach in exact manner. This is one of the main objections against this approach. In this paper, we shall be concerned with the case where the exact value of the gross-e rror £ is not available but only its upper bound a«l) is known. We will develop a robust estimation procedure (Mestimator) applicable to this unceratin grosserror situation.
of a finite auto-
a l Yn-1+ a 2Yn-2+' .. +apYn_p+en cjlTe+e,
gross- ..
(1) T
where cjln = (y n- l' Yn- 2' ... ,y n-p ) and {e n } is an observation noise sequence. The estimate of e can be obtained by the maximum likelihood (ML) approach, the maximum a posteriori probability (MAP) approach, or the least squares (LS) approach (Beck and Arnold (!977)). First two approaches require the knowledge of the exact form of the noise distribution F. However, in practice, we would never know the underlying distribution exactly. In such cases, we have to apply the approaches under certain idealized assumptions and should take care of the effects of the deviations from them. On the other hand, it is known that the LS approach is applicable without the knowledge of distribution, but generally loses its efficiency in the presence of more heavy-tailed distribution than the normal. This observation has led to the necessity of the development of robust procedures which prevent disasters due to somewhat larger deviations from the model and avoidable efficiency loss due to small deviations as well. One of the most important works on this direction is Huber's M-estimator (maximum likelihood type
PROBLEM STATEMENT In identification of the AR process (1), the Mestimator eN is obtained by solving an implicit equation, (3)
where £
n
- y _cjlT e . n n
A function
449
~*
is chosen to minimizes the maximal
K. Uosaki and K. Saito
450
asymptotic variance of the estimator eN over PE'
uncertain gross-error E.
sup V( lji* F) = min FS:pP V( lji , F). FE? ' ~ E
Next theorem gives the M-estimator for the cla8S p' of E-contaminated normal distributions with an E
that is, to satisfy (4)
E
Here, the asymptotic variance V(lji ,F) of an M-estimator defined by some function lji at a distribution F is given by
Theorem 1 If FO in (11) is normal with mean zero and variance
0
2
, then the M-estimator is given by v
lji*(v)
7
f lji 2dF (5)
V( lji, F)
Let F* be the least favorab1e distribution, i.e., E
(12)
vo (-2-)sgn(v) , o where vo is the solution of
the distribution minimizing the Fisher information f' 2
20
(6)
J(F) = f ey) dF over all FEP , where f is the density of F. E the M-estimator is given by
Then,
f*' E lji* = -7* f~
(7)
lllin
J(F*)
being the density of
F~.
FEP' J(F) E
Corresponding
min minJ(F) EE [O, a ) FEP E
robust recursive identification procedure is given by (Po1yak and Tsypkin (1979 , 1980», Nakamizo (1984), Uosaki and Oumi (1988», 8
n
8n-l+Pn-l ~nlji*( En)'
(13)
T-a.
(Proof) Let F* be the least favorab1e distribution for the class P~ (12). Then
E
with
2
2F O(v O)-l+ V;fO(vO)
min J (F*) EE [O, a ) E
(14)
where F* is the least favorab1e distribution for E
P n
lji*( En )P n- 1 ~n ~Tp 1 n n-
the class PE (2).
Pn-l-------~-T=-------l+lji*( En ) ~n Pn- 1 ~n
(8)
If FO is normal with mean zero and variance then the density of
F~
0
2
,
is given by
TA
En = Yn -~n en_l If FO is normal with zero mean and variance
0
2
f~(v)
,
2 1-E v = ----exp(- -2) /frio 20 (15)
that is, PE is a c lass of E-contaminated normal 1-E = --exp(-
distributions, then the optimal lji* is given by
/Z:;;o with vD satisfying (10), or (9)
vo =(Z)sgn(v),
o where vo is determined corresponding to the grosserror E by solving the equation 2 20
2F O(vO)-l+ V-fO(v ) = O
o
2
2(1- E)
1 --y:s .
$ 0
2 2 v (J v°exp(_ ~)dv +~exp(- _0_) )=1, 0
20 2
20
v0
2
Corresponding Fisher information is given by 2 2(1E) ( ;( vO v 2exp ( - _v )dv J ( F*) = -5 / 21T0 20 2 :0
(10)
Since this approach is dependent on the exact value of gross-error E as above , it is not applicable without the exact knowledge of E. Then, the problem here is to develop a robust procedure applic able onl y with unceratin knowledge of gross-error, i.e., with the upper bound of the unceratin gross-error .
2
2v v-v O O
(17)
----2---) d v) •
20 Since 2
Vo
2
- 0 v exp(- ---)
M-ESTIMATOR FOR UNCERTAIN GROSS-ERROR We assume here that the exact value of E is not available but we know its upper bound a «l). In this case , the class of E-contaminated probability distributions is defined by P'
E
{F [F=(1-E)F +EF ; 1 O FO is a known symmetric distribution, Fl is an arbitrary (unknown) symmetric distribution, O
o
2i
i
2tO +0 j exp(- ---2)dv o 20
and
.'" JVo
2 2 2v Ov-vO vOexp(- --2-)dv 20
the Fisher information
J(F~)
is rewritten as
Robust Parameter Estimation Procedure
J(F~)
2 o V )d v 2(l-e:)jV - -3exp ( - -2 ffrro 0 20 V
Jo
1 -
0
2
2
J(F*) is monotone increasing with respect to vO.
exp(- ~·)dV+ 20
(23)
(see, Fig. 1). Since the numerator and the denominator are nonnegative monotone increasing and decreasing functions of vo' respectively, the Fisher information
exp(- ~)dv 20
VO
Jo
min
g(v~) - e:e:[O,a] g(v O)
2
O
451
Thus, the value of Vo minimizing
2
v~,
gross-error by (22). (18)
J(F~)
should be
which corresponds to the upper bound a of the
J(F*) =
That is
min J(F*) e:e:[O,a] e:
= J(F a*).
Theorem 1 implies that the M-estimator for the class pI of E-contaminated normal distributions
with
E
x
g(x) -
Jo
2 2 2 0 x exp(- -2)dv+X- exp (- -2). 20 20 v
(19)
Here, we use the relation
normal distributions with known gross-error. It is shown in the following theorems that this result holds for the class of more general E-contaminated distributions.
2(l-e:)g(v ) _
So
0
Since Oa< 1,
followed by (16).
Theorem 2 Assume that the probability distribution FO of the
1<_1_:0 1 1-e: 1-a'
class P
and hence,
E
/2rro /
12ii"0
g Vo -20=£) .. 2(l-a)
(20)
It is easy to show that
!::O g(x)
Cl: FO is three times continuously differentiable. C2: The density fO(v) is monotone decreasing for v>O. C3: -log fO(v) is concave.
g(x) > 0,
C4; Hm f 2 (v)
= t<{>,
v-><"
11m
x_ g
of E-contaminated distributions with
known gross-error, satisfies
JIiio < ( ) -2-
with uncertain gross-error is obtained by simply substituting the upper bound a into E of the Mestimator for the class PE of E-contaminated
(x)
(24)
fT'('V) -
(21)
ffrro
Then the M-estimator for the class PE is given by
= -2-'
fO(v)
and 2
1jJ*(v) =~'
2
g'(X) - -..2...-exp(-~)
(25)
fO(v O)
(fo(Vo»sgn(v),
that is, g(x) is monotone decreasing. Let ~ be the unique solution of the equation
where Vo is a constant uniquely determined by sol(22)
ving (26) Theorem 3 We assume Cl, C2, C3, and
g(v) C4 I: (27)
l2rro
instead of C4 of Theorem 2. Then the M-estimator for the class P~ of E-contaminated distributions
2( 1-a)
with uncertain gross-error is given by (25) with Vo determined by 2
2fO (vO) 2F O(v O)-1- f~(vO)
• ....0-
-.--~----.......... -~
--- - . - - - -~;--~------~-
-L-v~~~----------v~O--------------------~
and the Fisher information for the least favorable distribution F* in this class is given by v J(F*)
Fig.1
(28)
Relation of gross-error e: and end point Vo
where
=
J(F~)
min J(F*) EdO,a] E
J(F~),
is the Fisher information for the
(29)
452
K. Uosaki and K. Saito
least favorable distribution
in the class PE'
F~
where the exact value of the gross-error is not available but only its upper bound is known. It is shown that a robust procedure can be obtained by simply substituting the upper bound of the gross-error into the M-estimator for the exactly known gross-error case. A numerical example illustrates the usefulness of the proposed procedure.
(Proof) See Appendb. Remark An example of the distribution satisfying conditions Cl through C4 (C4') is the logistic distribution. 8exp(-8Ivl)
fO(v)-
8>0
2'
REFERENCES
(1+exp( -81 v I»
Beck, J. V. and K. J . Arnold (1977) . Parameter Estimation in Engineering and Science, J. Wiley, New York. Dorato , P. (Ed.) (1987) . Robust Control, IEEE Press, New Jersey . Huber, P. J. (1972) . Robust Statistics: a review, Ann. Math. Statist., 43 , 1041-1067. Huber, P. ~1981). Robust Statistics , J. Wiley, New York. Masreliez, C. J. and R. D. Martin (1977). Robust and Bayesian estimation for the linear model and robustifying the Kalman filter, IEEE Trans. Auto. Contr., AC-22, 361-371.---Nakamizo, T~8~obust estimation, J. Soc. Instr. Contr. ~, ~, 541-549. Polyak, B. T. and Ya. Z. Tsypkin (1979). Adaptive estimation algorithms: convergence, optimality, robustness, Auto. Remote Contr., 40, 378-391. -- --- --Polyak, B. T. and Ya. Z. Tsypkin (1980). Robust identification, Aut omatica, 16, 53-63. Rougees, A., M. Basseville, A. B;;venieste and G. Moustakides (1987). Optimum robust detection of changes in the AR part of a multivariate ARMA process, IEEE Trans. Auto. Contr., AC-32, 1116-1120. - - - - - - - - - ' Sait~ (1987) . Robust Estimation for £-contaminated ProbabilitY Distributio~odels with Uncewatin Gross-Error , · M. S. Thesis, Department of Applied Physics, Osaka University. Uosaki, K. and T. Oumi (1988) . Convergence analysis of a robust recursive identification method for autoregressive models, Prepri . 8th IFAC ~ Identification and System Parameter Estimation, Beijing, P.R.China, 459-463.
and the corresponding M-estimator for the class p' is £
'/'*(v) ~
-
8(1-exp(-8v» 1+exp(-8v) ,
l l 2-0. ogaIv I :., -2-
8(1-~)8gn(V)
IVI > Tloga-
1
2-0.
V
NUMERICAL EXAMPLE Consider the following first order AR model, Yn - 8Yn_1+en with 8-0.5, and en is independently identically distributed according to £-contaminated normal distribution F -
(1-£)~(v)+£F1(v)
with £-0.05,
F1(v)=~(v/10)
(unknown to the experi-
menter), and ~ is the standard normal cumulative. Following three estimation procedures are applied to estimate 8. (1) ordinary LS procedure (2) robust estimation procedure (9) and (10) using the exact value of gross-error £-0.05 (3) robust estimation procedure (12) and (13) with the uncertain gross-error ££[0,0.5]. Figure 2 shows the mean squared error of 200 simulation runs for each estimation procedure. It indicates that the proposed robust estimation procedure for uncertain gross-error case works well compared to other two estimation procedures.
APPENDIX (Proof of Theorems 2 and 3) Applying the method of variations to the Fisher information J(F), we can see that the least favorable density f*(v) is a s olution of the following equation ,
CONCLUSIONS Robust estimation procedure for £-contaminated probability model has been discussed for the case
l ...o
.
ordinary least squares method .... . . . .. . M-estimator with exact grosserror M-estimator with unc ertain gross-error
\ :.
,
\:
\ '.\ '. \ '. \
'"
..'"
"C
'.
".,'"
\
......
.
C"
",
\
C
''""
E
~:-::-::::-::-: ...... ... ---.... :-- - ......... ... ~:-:----~~~:--lO·4+----,----_r----~--~I----,-----r----.----,Ir----rI.----"
20
200
400
600
n
Fig . 2
Results of parameter estimation
800
1000
Robust Parameter Estimation Procedure 2fjlo(v)fjlo"(v)-fjlo,2(v) _ ).+n(v) , fjlo2(v)
Ivl $vo (A7)
- Cexp (-). Iv I )
fjlo(v»(l-~)fO(v)
(A2)
fjlo(v)z(l-E)fO(v)
$ 0
fjlo(v) - (l-E)fO(v)
(Al)
where ). is a some constant and n(v) - 0,
453
Iv I>v 0
Hence, the M-estimator is given by (25). relation (26) comes from (AS) and
The
!"'fjlo(V)dV - l.
(A8)
.....
Let fjlo(v) - exp(-hjlo(v» and
(A9)
Then, -2hjlo"(v)=hjlo,2(v) = ). 2+ n (v) -2h"(v)+h,2(v) ~ ).2
o
0
(A3)
where
if fjlo(v)=(l-E)fO(v)
g(x) -
by (Al) and (A2), and hO(v) is monotone increasing for v>O by C2.
m
Cexp(-).I vl).
(A4)
In order that f*(v) is continuous, two curves ). Ivl -logC and hO(v)-log(l- ~ ) should smoothly connected at the end point, say vO. ).=
x f~(x) 0 fO(v)dv- f~(x)·
Hence
(All) Let
v~
be the unique solution of an equation
1 g(v~) - 2(l-a).
hO(v ) O
(A10)
It is easy to show that g(x) is poistive and monotone decreasing with respect to x and satisfies
1£ fjlo(V»(l-E)fO(v), i,e" n(v)-O, then fjlo(v)
I
(0<('1< 1)
(A12)
(AS)
Then,
(A6)
g(v~) - EE[O,ajg(v O)' and similar arguments as in the proof of Theorem 1 leads to the conclusion of Theorem 3.
C-(l-E)fO(vO)exp( ).v ): O It is shown (Saito (1987»
that the relation
-2hO(v)+h02(V) ~ ). 2 holds only for I vl ~ vo.
This implies that the
least favorable distribution f*(v) is given by
mm