A distribution-free criterion for robust identification, with applications in system modelling and image processing

A distribution-free criterion for robust identification, with applications in system modelling and image processing

0005-1098/86 $3.00 + 0.00 Pergamon Press Ltd. ~(~, 1986 lnternationak Federation of Automatic Control Automatica, Vol. 22, No. 1, pp. 105-109, 1986 P...

2MB Sizes 4 Downloads 38 Views

0005-1098/86 $3.00 + 0.00 Pergamon Press Ltd. ~(~, 1986 lnternationak Federation of Automatic Control

Automatica, Vol. 22, No. 1, pp. 105-109, 1986 Printed in Great Britain.

Brief

Paper

A Distribution-free Criterion for Robust Identification, with Applications in System Modelling and Image Processing* A. VENOT,I" L. PRONZATO,~ E. WALTER~ and J.-F. L E B R U C H E C t Key Words--Robustness; parameter estimation; system identification; image processing.

Tsypkin (1980) have proposed to keep a maximum likelihood approach, but to choose the worst possible noise distribution among those allowed by the available information. For example they have shown that when nothing is known on the measurement noise, the worst probability density function is the Laplace distribution, so that the criterion to minimize is the sum of the absolute values of the differences (SAVD). The resulting estimator is far more robust than the classical least-square (LS) estimator. The LS and SAVD estimators belong to a more general class known as M-estimators. By definition, an Mestimator 0u of the parameter vector 0 on the basis of N scalar observations y~ is the solution for 0 of the normal equation

Abstraet--A new distribution-free identification criterion based on sign changes in the residuals is presented. The maximization of this non-differentiable criterion is carried out by a global optimization routine using an adaptive random search strategy which is demonstrated to furnish very satisfactory results. The robustness of the resulting identification procedure is tested by treating three examples. The first one is a simulated example, representative of a situation where many outliers are present. The second one is concerned with the estimation of the parameters of a compartmental model routinely used for the functional study of liver and biliary ducts of patients with Tc-99m-diethyl IDA. The last one deals with the problem of automated registration in digitized medical image processing. The new criterion has already proven to yield original and efficient methods for change detection in dissimilar images. These examples form the basis for a comparison of the new estimator with more classical ones.

,=, ~o o q'(~'(O))

= o,

(1)

with

Introduction

~,(o) = y, - A(0),

WHENEVER outliers have to be feared, or if no reliable information on the noise distribution is available, robust estimators should be preferred to the classical least-square estimators (Launer and Wilkinson, 1979). In this paper a new distribution-free identification criterion is presented. It is compared with more classical approaches and applied to representative examples of problems involving outliers. Section 1 briefly surveys the most common robust estimators. Section 2 introduces the new criterion, called the stochastic sign change (SSC) criterion, as a simple alternative to more classical ones. The SSC criterion function may be multimodal, and its optimization is carried out using a global optimizer advocated in Section 3. Finally in Section 4 three examples are presented. The first two resort to system modelling and the third describes an unusual approach of automated image registration, considered here as a parameter estimation problem in the presence of outliers.

(2)

where J~(O) is the model output and if(e) is a suitably chosen function. Classical ~ps are Huber's: ~ if [el < e~, q/(e) = (e5 sign (~) otherwise,

(3)

and Tuckey's bi-square: ~e(1 - e2/e2) 2 if Is] < e~, ~(e) = ~ [0 otherwise.

(4)

In both cases es has to be chosen according to some heuristic rule. Huber's ~b is said to be non-redescending, contrary to Tuckey's which rejects any data point associated with an error greater than es, such a point being considered as an outlier. Solving (1) for 0 is equivalent to minimizing

1. Classical criteria Jbr robust estimation Maximum likelihood estimators have attractive asymptotic properties (Goodwin and Payne, 1977), but require the knowledge of the statistical properties of the noise perturbating the system. Whenever the noise samples do not satisfy these a priori assumptions, the maximum likelihood approach can yield extremely poor estimates. It is the purpose of robust estimators (Launer and Wilkinson, 1979) to face such situations. Poljak and

j(O) = ~ p(e,i(O))

(5)

i=1

with respect to 0, where

* Received 6 September 1984; revised 29 June 1985. The original version of this paper was presented at the 7th IFAC Symposium on Identification and System Parameter Estimation which was held in York, U.K. during July 1985. The Published Proceedings of this IFAC Meeting may be ordered from Pergamon Press Limited, Headington Hill Hall, Oxford OX3 0BW, U.K. This paper was recommended for publication in revised form by Associate Editor G. C. Goodwin under the direction of Editor P. C. Parks. t F. R. Synth~se et analyse d'images m6dicales, UER CochinPort Royal and INSERM U194 Piti6-Salp~tri~re, Paris, France. :~Laboratoire des Signaux et Syst6mes, CNRS-ESE, Plateau du Moulon, 91190 Gif-sur-Yvette, France.

d

x

~k(e) = ~xxP( )l . . . .

(6)

Using an M-estimator with a redescending ~ ensures a high protection against outliers, but may imply difficulties to find the parameters minimizing the criterion (5). Besides the theoretical problem of the choice of an adequate ~O, practical difficulties may preclude the use of M-estimators. Indeed the computation of the criterion (5) may be quite complex when the number N of data is very high, a situation where protection against outliers is especially required. Moreover the optimization of this criterion can only be carried out with iterative algorithms that require numerous criterion evaluations. 105

106

Brief Paper

Thus any criterion--such as the one to be presented in the next s e c t i o n - - w h i c h results in a drastic simplification of the computations is particularly attractive. This is especially important for the problem of medical imaging to be presented in Section 4, where the results have to be obtained as quickly as possible.

2. S S C criterion

After estimating the parameters by any standard method, it is an advisable practice to test the model for goodness-of-fit by using some distribution-free procedure such as the run test (Draper and Smith, 1981 ). A run is a sequence of residuals el of the same sign which is followed and/or preceded by residuals with a different sign. The number of runs is a random variable whose probability density function has been tabulated (Gibbons, 1971 ), assuming that the N successive residuals are independent realizations of a same random variable. The goodness-of-fit of the model with parameters 0 can be studied a posteriori by checking whether the actual number of runs belongs to the confidence interval for the selected level of significance. The number of runs clearly depends on the value of 0, and the basic idea of the stochastic sign change (SSC) identification criterion is to consider the number of runs (or equivalently the number s(O) of the sign changes in the sequence of residuals) as the criterion to be maximized. Since the numerical value of the residuals is not taken into account, this criterion appears extremely promising regarding the problem of outliers. Moreover, contrary to Tuckey's and Huber's M-estimators, it requires no tuning. Note that the m a x i m u m value achieved for s(O) can be greater than the upper b o u n d of the tabulated confidence interval, but this would only mean that the hypothesis of independent errors has to be rejected, without disqualifying s(O) as a valuable identification criterion. The SSC criterion only takes integer values so that, contrary to more classical estimators, the estimate corresponding to a given set of data is not uniquely defined. There exists a domain D of the parametric space such that any 0 which belongs to D is equally optimal in the sense of the SSC criterion. For that reason the classical notions used to assess the properties of estimators such as bias, consistency and efficiency are not convenient and an extension of these notions would be required. This approach was initially developed for image-processing problems (Venot et al. 1983, 1984a, b) such as the example of image registration presented in Section 4.

where a = [a,,a2 . . . . . a,] r.

(10)

If j(0 +) >j(O), or if 0* is not admissible, 0 + is rejected and Ok+ l = Ok; else 0 + is taken as Ok+ ~. The A.R.S. strategy consists of repeatedly alternating two phases. During the first one (variance selection) a is taken in the set (la, 2a . . . . . 5a I, where l o = 0 .....

0ram,

(1t)

and ia = (~ u a × 0.1.

(12)

With such a choice, ~a is large enough to allow an easy exploration of the whole admissible parameter set, whereas sa is small enough for a precise localization of the optimal parameter vector. In order to allow a comparison to be drawn, all the possible ~as are used lO0/i times, starting from the same initial value of 0. The largest ~as, designed for escaping local minima, are therefore more used than the smaller ones. During a second phase, ~a, the most successful a in terms of the criterion value reached during the variance selection phases is used for one hundred random trials starting from the best 0 obtained so far. A variance selection phase then resumes, unless the decision to stop is taken. In the next section, this strategy is applied to the optimization of all the criteria to be considered.

4. E x a m p l e s

Three examples are now presented. The first two concern system modelling and are useful for comparing the SSC criterion with the classical LS and SAVD criteria. The third example is devoted to the automated registration of scintigraphic images in nuclear medicine. Additional details about the actual performances of the SSC, LS and SAVD criteria in this field can be found in Venot et al., (1984b). Example 1

Two hundred data points have been first generated according to the equation y~ = 1000exp(-0.012i) + b ~ , i e [0, 199],

3. O p t i m i z a t i o n

The SSC criterion may be multimodal, which suggests the use of a global optimizer. A m o n g the numerous methods which have been proposed for global optimization (Dixon and Szego, 1975, 1978), m a n y have to be discarded either because s(O) only takes integer values - - and thus is not differentiable with respect to 0 or because they are limited to problems with less than three parameters. This is why the adaptive random search strategy A.R.S. (Bekey and Masri, 1983) has been selected. We shall now briefly describe our implementation of the A.R.S. strategy. For more details on the structure and tunings of the algorithm, as well as for a comparative study of its performances with those of other global optimizers on numerous examples, see Pronzato et al., (1984). The user is supposed to specify the criterion j to be minimized (here j(0) = - s(O)) and the admissible range for each parameter Oi (i = 1 . . . . . v) 0imin ~ 0 i <~ 0/max.

(7)

No other information has to be supplied. The routine chooses the initial point 0 ° at the centre of the admissible domain. At each iteration k, a random displacement vector A0 is generated 0 + = Ok + A0,

where bi is the ith realization of a white noise, l {0, 20). Afterwards 35 outliers have been introduced by forcing some data to zero: yi=0,

i~ [10,291 u ~50,64~.

li(0) = 01 e x p ( - 0 2 0 .

(9)

(15)

Table 1 gives the parameter estimates and the relative errors on these estimates for the LS, SAVD and SSC criteria. Figure 1 presents the data and best fits in the sense of the LS, SAVD and SSC criteria.

TABLE 1. LS, S A V D AND S S C ESTIMATES AND RELATIVE ERRORS FOR THE SIMULATED DATA OF EXAMPLE ].

True values LS estimates SAVD estimates

~ ( a ) = diag [a 2 , a 2 . . . . . a 2, ],

(14)

The resulting data have then been fitted with the model

(8)

where A0 follows a multinormal distribution with zero mean and covariance ~ satisfying

t13)

SSC estimates

0~

02

1000 450 55.0% 964 3.6 % 1000 0.0 %

0.0120 0.0063 47.5% 0.0117 2.5 % 0.0120 o.o %

Brief Paper

F~6.1. Data(...)andbestfits(

)inthesenseoftheLS, SAVD and SSC criteria.

Example 2 Venot et al. (1979) have demonstrated the clinical interest of performing a quantitative analysis of patient hepatobiliary tract function through the modelling of the pharmacokinetics of Tc99m-diethyl IDA. A cardiac radioactive decay curve is recorded using a scintillation camera, under which the patient must stay motionless for 45 min. Any movement may generate outliers, as illustrated by Fig. 2 (white arrow). These data have been fitted with a model defined by Ji(O) = 01 exp(-Ozti) + 03 exp(-O4ti),

(16)

where ti = 8i,

iE [1,40],

tl = 320 + 16(i - 40),

iE [41,185].

107

(17)

The best fits in the sense of the LS, SAVD and SSC criteria are presented on Fig, 2. Contrary to Example 1, here the outliers weakly differ from significant data, which explains the correct performance of non-redescending M-estimators such as LS and SAVD. This example illustrates a difficulty of the SSC approach: the fit is poor during the initial part of the curve. There is almost no sign change in this zone. These data are ignored since the SSC criterion does not take into account the value of the residuals. It has to be noted that one might meet similar difficulties with redescending M-estimators. The number of significant data points in the fast transient is here insufficient when compared with that in the slow transient. Using the SSC criterion in similar situations would require more appropriate experiment designs. Example 3 The detection of changes between two digitized images acquired under varying experimental conditions first requires a regis-

tration step including geometrical and grey map transformations of one of the images in order to correct for the differences in data collection. Subtraction images can then be generated to visualize the changes. The registration step can be performed by optimizing a similarity measure between the images with respect to the registration parameters (X-Y translational shifts, angle of rotation, additive and multiplicative factors of the grey map transformation). Three particular similarity measures are representative of those used in the literature (Svedlow, McGillem and Anuta, 1978): the correlation coefficient, which corresponds to LS, the correlation function and SAVD. Image registration can be viewed as an identification procedure. The registration parameters and similarity measure can respectively be considered as the model parameters and the identification criterion. In this field the outliers are the changes to be detected between the two images. We thus have a particularly unusual modelling situation where the only interest of accurately estimating the parameters is to permit a correct visualization of the outliers (i.e. the changes). The problem of estimating the registration parameters is therefore a problem of robust estimation in the presence of severe outliers. The use of the classical similarity measures often leads to a misregistration and, in the context of medical imaging, Venot et al. (1984a, b) have demonstrated the superiority of the SSC criterion over them. An example corresponding to two bone gamma-ray scintigraphic images obtained in controlled conditions is given in Fig. 3. After acquisition of Image 3a, Image 3b was acquired according to the following procedure. The patient was moved, the grey scale modified and artificial changes (to be visualized) were induced by setting absorbent materials on the patient. Image 3c is the registered version of Image 3b obtained by maximizing the SSC criterion with respect to the five registration parameters. The SSC criterion is here computed by scanning the subtraction image line by line. Image 3d is obtained by subtracting Image 3c from Image

108

(ai

Brief Paper

........ ~

............. i ~ a ~

J~ ....

~

F](~.2. Data(...)andbestfits(

.....~'~

~a~

~a

)inthesenseoftheLS, SAVD and SSC criteria.

3a and dramatically points out the induced changes. For comparison purposes, Fig. 3e gives the unworkable image obtained by directly subtracting Image 3b from Image 3a, without any preliminary registration step. The global optimizer was implemented in FORTRAN on a DEC LSII1-03 minicomputer directly connected with the scintillation camera, and attached to a FPS AP120B array processor used at each step for the computation of the SSC criterion. The typical time for the registration procedure is 30s.

Conclusion Optimizing the SSC criterion for parameter estimation is the a priori counterpart of using the run test a posteriori for model validation. This distribution-free approach is particularly well suited when numerous data are potentially contaminated by wild out[iers. Similarly to redescending M-estimators, using the SSC criterion efficiently requires a proper initial parameter vector (which can be provided by first minimizing the SAVD criterion). The SSC criterion exhibits two main advantages over those corresponding to M-estimators. First no internal tuning is required. Second its evaluation is extremely simple, which is of particular importance for routine uses such as those required by medical imaging applications. Another similar criterion, named deterministic sign change (DSC) criterion, has proven useful in the case of noise-free image registration, with applications in digitized X-ray subtraction angiography (Venot and Leclerc, 1984). Re]erences Bekey, G. A. and S. F. Masri (1983). Random search techniques for optimization of nonlinear systems with many parameters. Moth. Comput. Simulation, 25, 210-213.

Dixon, L. C. W. and G. P. Szego (Eds.) (1975). Towards Global Optimization, Vol. I. North Holland, Amsterdam. Dixon, L. C. W. and G. P. Szego (Eds.) (1978). Towards Global Optimization, Vol. 11. North Holland, Amsterdam. Draper, N. R. and H. Smith (1981). Applied Regression Analysis, 2nd edn. Wiley, New York. Gibbons, J. D. (1971). Non Parametric Statistical lnlerence. McGraw-Hill, New York. Goodwin, G. C. and R. L. Payne (1977). Dynamic System Identification: Experiment Design and Data Analysis. Academic Press, New York. Launer, R. L. and G. N. Wilkinson (Eds.) (19791. Robustness in Statistics. Academic Press, New York. Poljak, B. T. and J. Z. Tsypkin (1980). Robust identification. Automatica, 16, 53-63. Pronzato, L., E. Walter, A. Venot and J.-F. Lebruchec (1984). A general-purpose global optimizer: implementation and applications. Math. Comput. Simulation, 26, 412 422. Svedlow, M., C. D. McGillem and P. E. Anuta (1978). Image registration: similarity measure and preprocessing method comparisons. IEEE Trans. Aerosp. Elect. Syst., AES-14, 141 149. Venot, A., J.-L. Golmard, J.-F. Lebruchec, L. Pronzato, E. Walter, G. Frija and J.-C. Roucayrol (1984a). Digital methods for change detection in medical images. In F. Deconinck (Ed.), lnjormation Processing in Medical lmaging, pp. 1 16. Martinus Nijhof, The Hague. Venot, A., S. Granjouan, J.-L. Steimer, A. Mallet and J.-C. Roucayrol (1979). Improvement of dynamic cholescintigraphy through mathematical modelling of Tc-99m-diethyl IDA pharmacokinetics. In R. Di Paola and E. Khan (Eds.), InJormation Processing in Medical Imaging, Vol. 88, pp. 459-468. [NSERM, Paris.

Brief Paper

109

FIG. 3. Application of the SSC criterion to the registration of scintigraphic dissimilar images. (a) Original image. (b) Modified version of Image 3a. (c) Registered version of Image 3b. (d) Subtraction image a c. (e) Subtraction image a-b. Venot, A., J.-F. Lebruchec, J.-L. Golmard and J.-C. Roucayrol (1983). An automated method for the normalization of scintigraphic images. J. Nucl. Med., 24, 529-531. Venot, A., J.-F. Lebruchec and J.-C. Roucayrol (1984b). A new class of similarity measures for robust image registration. Computer Vision, Graphics and Image Processing, 28, 176-184.

Venot, A. and V. Leclerc (1984). Automated correction of patient motion and grey values prior to subtraction in digitized angiography. IEEE Trans. Med. lmag., M1-3, 179-186. Walter, E. (1982). ldentifiability of State Space Models, Lecture Notes in Biomathematics, Vol. 46. Springer-Verlag, Berlin.