IFAC [::0[>
Copyright © IFAC Nonlinear Control Systems, SI. Petersburg, Russia, 200 I
Publications www.elsevier.com/locatelifac
NONLlNEAR ESTIMATION OF PARAMETERS IN A STATISTICALLY UNCERTAIN MODEL I.Ya.Kats·,l, S.I.Kumkov.. ·2 , G.A.Timofeeva·,l
• Ural State University of Railway Transport, Kolmogorov str.66, Ekaterinburg, 620034, Russia, G
[email protected] •• Institute of Mathematics and Mechanics UrB of RAS, S.Kovalevskaya str. 16, Ekaterinburg GSP-384, 620219, Russia, e-mail:
[email protected]
Abstract: A problem of esimation of parameters in a statistically uncertain model is discussed. A generalized case is investigated when the measurement error has both a random component with the normal distribution and a statistically uncertain component that obeys only a geometric constraint. An approach to estimation of the set of the most probable states and to analysis of the sample consistency is suggested. Algorithms for direct construction of informational sets are used both for estimation of the admissible values of parameters and for analysis of consistency of the sample. Copyright © 2001IFAC' Keywords: Estimation, linear model, measurement, error, parameter set, minimax. techniques, informational set
observation and control theory under uncertanty conditions (Kurzhanskii, 1977; Chernous'ko and Melikian, 1978; Milanese and Norton, 1996). Various estimation problems under incomplete statistical information on the errors and disturbances were investigated in (Kats and Kurzhanskii, 1975; Kats and Timofeeva, 1994; Kats and Timofeeva, 1995; Koscheev and Kurzhanskii, 1983), and cases with geometrical constraints on the errors were discussed in (Milanese and Norton, 1996; Kumkov et al., 2000; Kumkov, 1998a; Kumkov, 1998b).
INTRODUCTION Estimation problem of parameters for a linear regression, when a measurement error has only a random component with a zero mean value, is solved by finding an equation of the linear regression. To make it, the standard methods of the mathematical statistics are used, and the fiducial domains of the regression parameters have the form of ellipsoids (Afifi and Eisen, 1979). In practice, the measurement error is often statistically uncertain, but only some geometrical constraint on the error value is known. In such a case, the set of admissible parameters consistent with the given sample is constructed in the form of a convex polygon.
In practice, the set of parameters consistent with the given sample can be empty. It can be stipulated both by inexact information about a parameters domain given a priory and by presence of unfiducial measurements (gross errors).
In this case, estimation algorithms are based on the approaches and results of the
So, a formulation of more generalized estimation problem is of a big interest when a measurement error has both a random component with the normal distribution and a statistically
lSupported by RFBR, project no. 99-01-00884 2Supported by RFBR, project no. 00-01-00348
1083
Definition. The set Au of admissible values of the parameter vector (a, b) consistent with the model (1), (2), (4), (6) is called the informational set.
uncertain component that obeys only a geometric constraint. In this problem, the procedures of parameters estimation can be based on the method of constructing the set of the most probable value of parameters (Kats and Timofeeva, 1994; Kats and Timofeeva, 1995). Simultaneously, the algorithms of direct construction of the informational sets can be used to estimate admissible values of the parameters and for analysis of unconsistency conditions of the measurement sample (Kumkov, 1998a; Kumkov, 1998b).
Thus, for a consistent sample Au = {(a, b) : IYi-axi-bl ~ Ui , i = 1,N}. (7) On one hand, the aim of this investigation is to elaborate of algorithms for constructing the set Ap and, on their basis, building the algorithms of consistency analysis of the original sample for the linear model (1),(6). On the other hand, the algorithms of direct construction of the informational set Au for the original sample are suggested. They are used for consistency analysis and for selecting and excluding unconsistent measurements.
1. PROBLEM FORMULATION
The elaborated approach can be used in the more generalized estimation problem when an error is present not only in the function measurement but, also, in the measurement of the argument (Kumkov, 1998a; Kumkov, 1998b):
Consider a problem of estimation of the parameter vector for the linear model Y = ax+b,
(1)
where (a, b) is the parameter vector; Y is the function value; x is the argument.
Yi=a(xi+vi+Si)+b+Ui+ri,
The problem is solved with the given measurement sample of the length N (Xi, Yi),
i E 10
= 1, N.
i=l,N, (8)
where Vi are uncertain errors of the argument with the geometric constraints; Si are random normally distributed errors of the argument.
(2)
Further, the sample (2) is supposed to be ordered on its argument, i.e., for each pair i < j of measurement numbers the inequality Xi < Xj holds.
2. ESTIMATION OF THE SET OF THE MOST PROBABLE VALUES OF PARAMETERS
Assume that the error of each measurement is the sum of two components. The first component Ui is a statistically uncertan value. The second component Ti is a random one. So, each measurement has the following structure
Theorem 1. The set of the most probable values of parameters Ap of the linear model (1)-(5) coinsides with the set of minima of the residual function
Yi
= aXi + b + Ui + Ti,
i
= 1, N.
N
(3) F(a, b)
It is only known that the components Ui can take arbitrary values inside their geometric constraints
var(ri)
= (J';.
fi(z) =
i
= 1, N.
(10)
Theorem 2. If the informational set Au is not empty, then Ap = Au, otherwise, the set of the most probable values of parameters Ap consists of only one point.
(5)
The theorems 1 and 2 follow from the corresponding states in (Kats and Timofeeva, 1994; Kats and Timofeeva, 1995), where a problem of state estimation of a dynamic system was considered under incomplete statistical information about random disturbances.
Simultaneously with the complete model (3), one without random component of the error is considered:
= axi + b + Ui,
min (z - Ui? /(J';
IUil~Ui
= max{O, (lzl- Ui)/(J'it
According to the approach (Kats and Timofeeva, 1994; Kats and Timofeeva, 1995), the set of the most probable values of parameters (a, b) is taken as an estimation domain of the parameter vector. For the model (1)-(5) the set is denoted by Ap = (a p, bp) C R2.
Yi
(9)
where
The components ri are independent normally distributed random values with the given moments
= 0,
fi(Yi - aXi - b),
;=1
(4)
E(Ti)
=L
The minimum of the residual function (9) can be found by means of the standard gradient descent method. Taking into account the forui-of
(6)
1084
In this case it is possible to use (see Section 4) the algorithm of direct construction of the informational set Au. Note, that for the generalized problem (8) where values of the argument are given inexact, the residual function (9) changes, but the gradient method can be used to search its minintum as before.
the function, we obtain the following sequence of values of the parameters: N
= ak + dk L
aH1
x;g;(ak, bk ),
;=1
(ll)
N
bk+ 1 =bk +dk Lg;(ak,b k ),
k=l,N,
;=1
where the function
g;(a, b) = 'V z/i(y; - ax; - b) = 2O';2(y; - ax; - b - Ui ), if Yi - ax; - b > U;, = 0, if Iy; - aXi - bl ~ U;, { 2O';2(y; - ax; - b + U;), ify; -ax; -b < -U;.
3. ANALYSIS OF THE SAMPLE UNCONSISTENCY ON THE BASIS OF GENERALIZED MODEL (12) The sample (2) is called the un consistent with the linear model, if the parameter informational set Au defined by (7) is empty. Let us search the maximal collection of indexes I. E 10 such that the truncated sample
The values of the parameters from the equation of the linear regression constructed on the sample (2) N
(ao,bo )
= argmin L(y; a,b
may be taken as the initial value
Theorem 3. If dk M
=
(L:!1
= d,
ax; - b)2
will be consistent with the linear model (4),(6).
;=1
ao, bo o
To make it, the minimum through the collection Im using the gradient descent method (10) is found on each step of the sample truncation
d E (0, M) and
o';(x; + 1)) -1/2 , then the sequence
minFm(a,b) a,b
(ak' bk ) satisfying the relations (ll) ,(12) converges to the point of minimum of the function (9) , i.e. ,
im+1 aXj - b) ,
;=1
N
;=1
(a2,~)II·
Thus, for d ~ M-I the algorithm (11),(12) converges. Convergence of this procedure can be accelerated by using special methods of the step choosing, for example, the steepest descent method:
= argminF(ak,bk) d>O
= argmax/i(Yi 'El",
a*xi - b*).
(14)
The measurement with the number im+1 is moved off the truncated sample Im+1 = Im/{im+d. To accelerate computations of the minimum (13), it is suitable to take the point of minintum (a;;', b;;') found on the previous step m as the initial point on the (m + 1) step. Thus, the search of the minimum Fm+1 (a, b) is begun from the point (a m +1 ,0, bm +1,o) (a;;', b;;').
~ 2 L(x: + 1)0';2 «a1 - a2? - (b I _ ~)2) 1/2 =
dk
(13)
If mina,b Fm (a , b) > 0, then find the number im+1 of the measurement with the biggest violation of the restriction (4) at the point of the minimum (a*, b*)
N
= 2MII(at,b1 ) -
/i(y; - aXi - b).
=
Convergence of the algorithm (11),(12) follows from the theorem of convergence of the gradient descent method (Polyak, 1983) because the residual function (9) is the sum of differentiable and convex on (a, b) functions. The gradient of the function (9) satisfies the Lipschitz conditions with the constant 2M:
= -2 L(x;, l)T gi(Yi -
a,b iEI",
If mina,bFm(a,b) = Fm(a*,b*) 0, then the search procedure is finished and the truncated sample (x;, y;), i E Im is consistent with the linear model, and (a* , b*) EAu.
lim (ak ' bk) = (a p, bp) E Ap.
k-4OO
V' F(a, b)
= min L
=
On each step the minintum of the residual function decreases: min Fm+1(a , b) a,b
d'VF(ak , bk)) .
< minFm(a,b), a ,b
and after the finite number of steps of the truncated sample constructing we obtain the sample consistent with the model (4), (6). For this sample it becomes possible to construct a non-empty informational set Au of the model parameters.
In the case when mina,b F(a, b) = 0, i.e., the informational set Au is not empty, convergence of the algorithm (11),(12) can be significally accelerated by choosing
dk = F(ak,bk)/II'VF(ak,bk)W.
1085
4. CONSTRUCTION OF INFORMATIONAL SETS
axis, and the parameter b is counted along the ordinate axis, and the apices are counted in the clock-wise order.
Consider the algorithms for direct construction of the informational sets. Let us put the set Hi = [iii, Yi] of uncertainty in correspondence to each measurement of the sample (2)
In the generalized problem (8) with errors both in the function measurements and the argument values, the uncertainty sets are rectangles. The partial informational sets of measurement pairs are polygons with linear frontiers and with number of apices from three upto six. When number of apices is four and more these partial informational sets can be non-convex, and their intersections can be even disconnected sets. But the coordinates of the apices are culculated analytically as before.
ili
= Yi -
Ui, Yi
= Yi + Ui ,
(15)
where Yi, Yi are the lower and upper bounds. The informational set to be found satisfies the following system of double inequalities: Au
= {(a,b): Yi
~
axi + b ~ Yi, i
= I,N}.
(16)
The search of the support function of the set Au is, in its essence, a parametric problem of the linear programming. Such a problem can be solved by the standard methods (Steuer, 1990). But in the case discussed for its solution a special fast algorithm was elaborated. The algorithm is based on application of the partial informational sets (Kumkov, 1998a; Kumkov, 1998b) and uses the intermediate results of their construction in the consistency analysis of the sample. By the partial informational set Pik ... m connected with the collection of measurements of numbers i, k, ... , rn, we call the totality of all admissible points (a, b) such that the corresponding functions pass through the uncertainty sets Hj, j = i, k, ... , m of all measurements of this collection. It means that corresponding collection of the unequalities in (16) is satisfied.
Au=np;j, i=I,N-l; j=i+l,N.
(19)
i,j
If the informational set is not empty, it is a restricted convex polygon. In the plane of parameters a, b the partial informational sets lay each over others in the domain of the non-empty informational sets. In the complete totality of the partial informational sets (17), their sub sequences with a fixed number k = I, ... , N are considered: Pkj ' j
= I,N;
j =j:. k.
(20)
The following corrolary of Theorem 4 holds.
In the considered case of the linear function we shall use the totality of "dual" partial informational sets constructed through all pairs of the uncertainty sets (15) of pairs of measurements {pij, i=I,N-l; j=i+l,N}.
Theorem 4. For a consistent sample the informational set Au is equal to intersection of all dual partial informational sets (17),(18)
(17)
Convenience of using the totality (17) is in the following. For the linear function (1) with the sample ordered on the argument, each partial informational set (a solution of a pair of double inequalities (16)) is a restricted convex polygon with four apices and linear frontiers. The apices of each polygon in the plane of parameters a, b are calculated analitically. For each pair of measurement numbers {i,j}, i < j the apices of the set Pij are calculated by the following relations: al(i,j) = (Yj - Yi)/(Xj - Xi), b1(i,j) = Yi - al(i,j)Xi, a2(i,j) = (Yj - Yi)/(Xj - Xi), ~(i,j) = Yi - a2(i,j)Xi, (18) a3(i,j) = (ilj - Yi)/(Xj - Xi), b3(i,j) = Yi - a3(i,j)Xi, a4(i,j) = (Yj - Yi)/(Xj - Xi), b4(i,j) = Yi - a4(i,j)x;. In the relations (18) it is assumed that in the plane a, b the parameter a is counted along the abscissa
1086
Corrolary 1. For the linear function nonemptiness of the informational set Au takes place if and only if the intersection of at least one subsequence (20) of partial informational sets from (17) for a fixed number k = 1, ... , N is not empty:
Au(k) = nPkj, j = I,N; j =j:. k.
(21)
j
In the collection (17) of partial informational sets each their subsequence of the form (20) (i.e., when one number is fixed but the second one passes through all other values) has a property useful for the numerical procedure of constructing the informational set (19) . Namely, every such subsequence lies whole in some band in the plane of estimated parameters.
5. ANALYSIS OF THE SAMPLE UNCONSISTENCY ON THE BASIS OF INFORMATIONAL SETS Consider now procedures of analysis of the sample unconsistency on the basis of algorithms for direct construction of the informational set. Let the informational set of the original sample (19) be
sets becomes N - L - 1; the numbers Q of the unconsistent measurements are constructively selected by their partial informational sets Pkq, which give empty intersection with the informational set of the maximal consistent subsample.
empty. The following corrolary of Theorem 4 holds. ?orrol~y 2. For the unconsistent sample (the mformatlOnal set is empty) , the intersection (19) of partial informational sets of any subsequence (20) is empty.
For splitting the unconsistent original sample into consistent subsamples, a special fast algoithm was elaborated. It uses mehods of the graph theory (selection of connected branches in a graph); the collection of the dual partial informational sets of the original sample is the input information. The unconsistent measurements Q are moved off the sample. The informational set Amax (22) of the maximal consistent subsample is constructed through numbers of the remained consistent mesurements.
It permits to search measurements unconsistent with the rest part of the original sample by analysis of any suitable subsequence from (20). Application of the algorithms of constructing the informational sets gives an opportunity to carry out this search directly by the form and the relative location of partial informational sets (20) . Here, it is not necessary to make standard analysis of the level of constraints (4) or (5), neither to implement the iterative procedure (11),(12) of the minimum search for the residual function (9) , nor to find the mostly deviated (14) measurement.
The informational set Amax is used for calculation 1, N of the admissible of the tube T; [1'i ,1';] i functions
=
Let us fix some number k and take the corresponding subsequence (20) of partial informational sets. The length of every such sequence is equal to N - 1. A collection of measurements from the original sample (2) of numbers j)'l2, ... ,jm with non-empty intersection (19) of the partial informational sets Pkj, j = ir, l2, ... , jm is called the consistent subsample I(k) = {il,i2, ... ,jm}.
t; =
Ti
=
n
Pkj.
= (a,b)EA max
+ b), (axi + b),
(ax;
maz
k
= 1, N,
(23)
where Ti , Ti are the lower and upper frontiers of the tube for each value X i of the argument. Summary deviation of the unconsistent measurement of the number q (with the argument value Xq) is estimated by its shift Dq relatively the tube (23) :
A collection of measurements with the number sequence j)'l2, ... ,jn of the maximal length and non-empty intersection of the ~artial informational sets Pkj , j E I(k) 15 called the maximal consistent subs ample Imax(k) = jl,l2 , ..·,jn. The following informational set corresponds to the maximal consistent subsample:
Amax
min
(a,b)EAmaz
=
Dq
= min{lyq -
Tql , Iyq - Tql} .
(24)
Systematical deviation Sq stipulated by the random component r q of the error in this unconsistent measurement is estimated by the shift of its uncertainty set Hq (15) relatively the tube (23):
(22)
jE/mu(k)
Sq = min{lyq - Tql, IYq - Tql}·
H the original sample (2) is consistent, then the length of the maximal consistent subs ample is equal to N , and the number of the partial informational sets (20) with non-empty intersection is equal, indeed, to N - 1. H only one unconsistent measurement is present in the sample (2), the length of the maximal consistent subsample is equal to N - 1, and the number of the partial informational sets with non-empty intersection decreases to N - 2. Here, such measurement with a number q is unconsistent whose partial informational set Pkq gives the empty intersection with the informational set (22) of the maximal consistent subsample.
(25)
Lower estimation of the level of real errors in the sample. This estimate is useful in practice, because it characterizes (at least from below) the level of the errors that actually realized in the consistent sample. Let us describe this level by a collection of the summary deviations (25) of measurements from the minimal tube (23) . The minimal tube of the consistent sample is constructed in the following way. The levels of the geometric constraints (4) on the errors are decreasing
Ui(O:)
= o:Ui , i = 1, N
(26)
by
multiplication on the coefficient 0: < 1. Take a decreasing sequence 1 = 00 > ... > O:n-l > O:n > ... > O.
H a few Q = {ql, q2 , ... , qd (L > 2) unconsistent measurements are present, the sample can be split into a number of consistent subsamples of various length. But the mentioned rule remains. Namely, the length of the maximal consistent subsample is N - L; the number of the partial informational
o <
For each new values of the constraints (26) the correspor..ding informational set is calculated. The coefficient is decreased to the value 0* for which
1087
REFERENCES
the informational set degenerates to the limit set that consists of only one point: (27) For the value 0* the tube (23) degenerates to the unique function with the parameters (27) : (28) The function (28) is taken as the support one for calculation of the summary deviations {Dj , i = 1, N} by (24). The collection of these deviations and the collection of the values Oi(O*) = O*Ui, i = 1,N estimate from below the levels of really realized errors of the measurements. The algorithms of direct constructing the informational sets and algorithms of analysis of the sample unconsistency, considered in Sections 4 and 5, were also elaborated for the generalized problem (8) with errors both in measurements of the function and in the argument values. So as in this problem the partial informational sets in the collection (20) can be non-convex, a special procedure of their convexification was made. It permits to use, as before, the fast procedure of intersection of the convex sets in (19) ,(21 ),(22) ,(27) .
CONCLUSIONS In the work, the methods for estimation of the linear model parameters are investigated under assumption of presence of displacements in the measurement errors. The elaborated algorithms permit to analyze both the case of statistically uncertain errors and the more generalized case when the errors are random values but information about their means values is incomplete. The algorithm for construction of the set of the most probable values of parameters was elaborated. On the basis of this algorithm the method for analysis of unconsistency of the original sample and of selection of the unconsistent measurements was also suggested. The algorithms of direct construction of the informational sets were considered. These algorithms are also used for analysis of unconsistency of the sample and for direct selection of the unconsistent measurements. The elaborated procedures based on the algorithms of direct construction of the informational sets are simple and fast in numerical realization. They were applied to solving a number of practical problems of estimation and identification.
Afifi, A.A. and S.P. Eisen (1979) . Statistical Analysis. A Computer Oriented Approach. Acad. Press. New-York. CherIious'ko, F.L. and A.A. Melikian (1978) . Game Problems of Control and Search. Nauka. Moscow. Russia. (in Russian) . Kats, LYa. and A.B. Kurzhanskii (1975) . Minimax estimation for multistage systems. In: Dokl. Akad. Nauk SSSR. Vol. 221, No.3. pp. 95-107. Nauka. Moscow. Russia. (in RUSSian) . Kats, LYa. and G.A. Timofeeva (1994) . Modified discrepancy method in an uncertain problem of estimation. In: AtJtom. and Telemekh .. Vol. 2. pp. 100-109. Nauka. Moscow. Russia. (in RUSSian). Kats, LYa. and G.A. Timofeeva (1995). The most probable states in statistically uncertain dynamic systems. In: Proc. Vol. of the 10th IFAC Workshop Control Application of Optimization. pp. 29-32. Pergamon. London. UK. Koscheev, A.S. and A.B. Kurzhanskii (1983). Adaptive estimation of multistage sistem under uncertainty. In: IztJ. Acad. Nauk SSSR, Tekhn. Kibern .. Vol. 2. pp. 72-93. Nauka. Moscow. Russia. (in Russian). Kumkov, S.L (1998a) . Application of minimax evaluation procedures for expansion of metrological standard on noised measurements processing. In: Proc. Vol. of the Intern. Conf. SIMULATION-98. pp. 162-167. lEE, Publication No.457. York. UK. Kumkov, S.L (1998b). Informational sets in applied problems of evaluiation. In: Proc. Vol. of the IFAC Workshop, Norumooth and Discontinuous Problems of Control and Optimization. pp. 149-154. Pergamon. London. UK. Kumkov, S.L, V.S. Patsko, S.G. Pyatko and A.A. Fedotov (2000) . Four-dimensional informational sets in a problem of aircraft tracking. In: Preprints of the 11th IFAC International Workshop Control Applications of Optimization (V.V. Zakharov, (Ed.). Vol. 2. pp. 126-131. St.-Petersburg State University. St.-Petersburg. Russia. Kurzhanskii, A.B. (1977). Control and Observation Under Uncertainty Conditions. Nauka. Moscow. Russia. (in Russian) . Milanese, M. and Norton, J., (Eds.) (1996) . Bounding Approaches to System Identification. Plenum Press. London. UK. Polyak, B.T. (1983). Introduction to Optimization. Nauka. Moscow. Russia. Steuer, R.E. (1990). Multiple Criteria Optimization: Theory, Computation and Optimization. Wiley. New-York.
1088