Information Fusion 8 (2007) 40–55 www.elsevier.com/locate/inffus
Integrity-directed sequential state estimation: Assessing high reliability requirements via safe confidence intervals q Olivier Bilenne
*
Multitel ASBL, Data Fusion, Parc Initialis, Avenue N. Copernic 1, B 7000 Mons, Belgium Received 1 November 2004; received in revised form 1 December 2005; accepted 2 December 2005 Available online 28 February 2006
Abstract This study deals with the problem of dynamic state estimation of continuous-time systems from discrete-time measurements in the context of high-integrity applications. The objective of integrity-directed estimation is to provide confidence intervals for the state with an extremely low risk of error. We suppose that the process noise can be modelled by Gaussian sequences, and that the measurement noise is Gaussian in the normal operating mode of the sensors. The evolution of the posterior probability distribution of the system’s state is deduced from recursive linear MMSE estimation. The estimation scheme presented here is equivalent to the Kalman filter, with the difference that the data is not processed directly, but collected in sets in preparation for an ulterior, slightly delayed, grouped processing. This strategy is particularly suitable for fault detection, because the estimator naturally takes into account the cross-correlations of close-in-time measurements and the decisions can be based on more data. Next, we introduce dynamic tools for detecting faults and sensor failures. A full Bayesian modelling of the sensors leads to the derivation of a dynamic multiple-model estimator performing the linear MMSE state estimation under various fault hypotheses. This estimator provides estimates of the posterior density function of the state, on which safe confidence intervals certifying very high-integrity levels can be fixed. In practice, the exponential growth of the complexity of the multiple-model estimator requires the simplification of the posterior mixture distributions. A new method for limiting the complexity of the posterior distributions in an integrity-oriented context is presented. Unlike the known mixture simplification strategies (GPB, IMM), the present method has the property to quantify and minimize the loss in integrity of the process of simplification of the distributions. Finally, the estimator is tested on a typical rail navigation problem. 2005 Elsevier B.V. All rights reserved. Keywords: Integrity; Dynamic estimation; Fault detection; Safe navigation systems; Rail navigation; Robust estimation; Bayesian framework; Dynamic multiple-model estimator; Gaussian mixture; Kullback–Leibler distance; Overbounding; Markov chains; Odometry; Kalman filter
1. Introduction The present study is motivated by the increasing demand for automatisation of transport systems in today’s society. Various levels of application are concerned, ranging from aid to decision to automatic traffic control. The
q A short version of this study was presented at the Seventh International Conference on Information Fusion (FUSION 2004) in Stockholm, Sweden [1]. * Tel.: +32 65 374738; fax: +32 65 374729. E-mail address:
[email protected] URL: http://www.multitel.be/
1566-2535/$ - see front matter 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.inffus.2005.12.001
decision and control processes of most navigation systems, such as those in use in current railway systems, rely on the estimation of the positions and speeds of the mobiles, carried out for instance with onboard sensors. In the framework of automatic traffic control, it is vital to ensure the safety of the state estimation procedure, in the sense that the estimation algorithms must provide a region of space where the mobile lies, and this with an extremely low risk of mistake. This study deals with the problem of dynamic state estimation of continuous-time systems from discrete-time measurements in the context of applications requiring very high levels of integrity. The problem of integrity-directed estimation is seldom considered in the
O. Bilenne / Information Fusion 8 (2007) 40–55
literature. It is well known that the linear minimum mean square error (MMSE) estimator is optimum in the context of linear dynamics and Gaussian variables. Unfortunately, real sensors are often biased and may provide erroneous measurements (outliers), causing important biases in the linear estimation. As a consequence, the reliability of the estimation may drop considerably. This issue could have dramatic consequences in applications where the integrity of the estimates is of first importance. Unlike linear estimators, robust estimators present a reduced sensitivity to erroneous measurements, and tend to be more powerful in practical applications. The objectives of integrity-directed estimation differ from those of robust estimation. The goal of integrity-directed estimation is to provide extremely safe confidence intervals for the actual value of the system’s state, rather than an accurate estimate of the state. Robust estimation, on the other hand, aims at deriving estimators that perform well not only in ‘normal conditions’, but also in the least favourable conditions of noise. The performance criterion in robust estimation is generally a statistical accuracy criterion, such as the minimum mean square error criterion or derivatives. The robust estimation literature is quite vast, although few papers are really integrity-directed. We owe the work of reference in robust estimation to Huber, who established the basics of static robust estimation in [2]. Huber’s approach of the problem of seeking robust estimators relies on the minimax strategy. The so-called optimal estimator is the one with the best performance under the least favorable noise distribution, the performance criterion being the minimum asymptotic covariance of the estimation error when the amount of measurements grows to infinity. In dynamic estimation however, the asymptotic performance criterion makes little sense because the variables to be estimated are randomly changing over time. Moreover, an accuracy criterion based on the expected variance of the estimation error is highly insufficient in the context of applications demanding very high levels of confidence on the state estimates. For instance, the safe navigation systems according to the European Norm Standards (e.g. EN50126) require Safety Integrity Levels (SIL) of only 107 (SIL3) or even 109 (SIL4) error per hour. Extending the ideas of Huber to the robust dynamic estimation problem, Mangoubi studied in [3] the stability and the performance of dynamic estimators with respect to unknown perturbations of the model’s dynamics and the measurement noise. Mangoubi derives the dynamic state estimators that minimize the impact of those perturbations on the estimate and provides statistics on the estimation error. Schick and Mitter presented in [4] a multiple-model dynamic state-estimator (similar to the estimator developed here) which is robust in the presence of rare and isolated outliers and ensures the minimum estimation error variance. The context of this study is different since in safety applications the integrity of the state estimation prevails over its precision. The objective is no longer to minimize any cost function related to the accuracy of the estimates but to provide safe confidence intervals.
41
The present article is inspired from the work of Ober, who proposed in [5] a complete solution to the static integrity-directed estimation problem for navigation systems. Ober’s approach consists roughly in using fault detection techniques to isolate the faulty sensors, providing a nonlinear estimation from the unbiased measurements only and associating with the state estimate a safe confidence interval which certifies a very high level of integrity. This study represents a continuation of [1], which aimed at showing how those static robust estimation techniques could be adapted to the dynamic problem. In addition, Ober proposed a solution for the robust dynamic estimation problem which consisted of a robust version of the Kalman filter (in the sense of Huber) based on the results of Kovacevic [6]. The robust Kalman filter considers the prior estimates and the innovations as the inputs of weighted least-squares estimation problems, and uses Huber’s M-estimator to deduce the posterior estimates in an automated way. Unfortunately, the robust Kalman filter does not provide confidence intervals for the estimates, nor does it certify the integrity of those estimates. Unlike most approaches of the state estimation problem, which simply provide an estimate of the state optimal according to some precision criterion, we try to estimate the full probability distribution of the state. Thus this study is concerned with probability density function estimation more than typical state estimation. Recent work in dynamic estimation [7] presents the sequential Monte Carlo (SMC) methods and particle filters as an efficient solution to the dynamic problem of fault detection and nonlinear density estimation. Those methods derive an approximation of the posterior distribution of interest by simulating the evolution of samples of the density function [8]. The SMC methods are known to reveal the main lobes and local extrema of the posterior density functions. However, the relative precision in the tails of the posterior densities depends directly on the quality of the sampling. In our opinion, the amount of particles and computation volume needed to achieve the integrity level SIL4 with SMC methods is unmanageable. The density estimation method presented in this article is based on the decomposition of the density function in Gaussian components rather than particles. In this study, we suppose that the process noise of the system and the measurement noises of the sensors in their normal operating mode follow Gaussian distributions. Gaussian distributions, which are fully defined by their first two moments, are intensively used in the engineering literature. The choice of Gaussian distributions is motivated by the central limit theorem on the one hand, and the recently developed techniques of overbounding on the other hand. Overbounding aims at replacing the actual distributions of the variables by safe Gaussian pseudo-distributions that still guarantee the final integrity of the estimation process [9]. We use the recursive linear MMSE estimator, also called Kalman filter, as a basic Gaussian density estimation tool. In Section 2, we explain the discretization process of the continuous-time system, and show how the equations of
42
O. Bilenne / Information Fusion 8 (2007) 40–55
the Kalman filter can be adapted to offer a degree of freedom in the frequency of estimation, deal with cross-correlations between non-simultaneous measurements and increase the flexibility of the fault detection devices. The concept of integrity is introduced in Section 3, where the overbounding techniques, although out of the scope of this article, are briefly presented as well. Sections 4 and 5 extend the density estimation problem to the case of faulty sensors, and lead to the derivation of a multiple-model estimator [10]. The multiple-model estimator operates under various fault hypotheses, and requires a full (dynamic) modelling of the stochastic properties of the variables in presence. Section 6 contains a discussion on the performance of the practical implementations of the multiplemodel estimator in terms of integrity and proposes a sub-optimal method adapted to the integrity certification issue. Finally, the integrity-directed estimator is tested in Section 7 on a typical rail navigation problem. 2. Dynamic density estimation under the Gaussian hypothesis This section presents the problem of dynamic estimation from discrete-time observations in the particular context of Gaussian variables. We consider continuous-time systems of known linear dynamics. The problem consists of deriving, in real time, the posterior distribution of the state of the system from the discrete-time measurements. The dynamics of the system is described by the equation x_ ðtÞ ¼ F ðtÞxðtÞ þ GðtÞuðtÞ þ xðtÞ;
ð1Þ
where x is the n-dimensional state, F is the state matrix, and vector u is a known control signal. The process noise is modelled by the random process x. The discrete-time measurement model is y½t ¼ H ½txðtÞ þ J ½tuðtÞ þ t½t;
ð2Þ
where brackets are used to identify the functions that are only defined for discretes values of time. The discrete-time sequences of observation vectors and observation matrices are denoted by y[Æ] and H[Æ] respectively. The measurement noise is modelled by the random sequence t[Æ]. The impact of the control signal on the state and measurement vectors is described by the gain matrices G(Æ) and J[Æ]. In this study, we make no assumption on the observation rhythm and allow the measurements to be collected asynchronously. We use the continuous-time dynamics of the system in its discretized form, resulting from the partition of the timeline in finite intervals and the integration of the state and perturbation processes over those time intervals. In this section, we consider that the perturbation sequences resulting from the discretization process are sequences of Gaussian variables. It is well-known that the posterior distribution of the state of a system with linear dynamics, exposed to Gaussian process noise and observed by sensors with Gaussian error distributions is also Gaussian. The Wiener estimator (or optimal estimator), which provides the linear MMSE estimate, is the common tool for estimat-
ing Gaussian variables. Indeed, the MMSE estimate of a random variable corresponds to its mean, the covariance matrix of the estimation error (which is also estimated by the Wiener estimator) corresponds to the covariance matrix of the distribution of the variable, and the mean and the covariance matrix are all the parameters needed to identify a Gaussian distribution. Since the Wiener methods only require the knowledge of the second-order statistics of the estimated and observed variables, they apply to any linear estimation problem. When a linear estimator is applied to time signals, the problem becomes intractable as the dimensions of the covariance matrices of the stochastic processes in presence grow endlessly. The introduction of a state model describing the state propagation and obvervation processes allows to compute and update the second-order moments recursively. The Kalman estimator provides a practical solution to the dynamic linear MMSE estimation problem by processing the data recursively and exploiting the dynamic modelling of the random processes [11]. In the context of linear MMSE estimation, it is generally essential to synchronize the bounds of the time intervals of the discretization process with the times at which the measurements are collected to meet the assumption of uncorrelatedness between the measurement noise and the process noise (see e.g. [10]). This involves processing the data as they are observed and providing a new estimate of the state at each new measurement. If there exist correlations between the biases of non-simultaneous measurements however, this method does not apply without increasing the dimensionality of the (augmented) statespace by addition of dynamic models of the measurement noises [12]. An alternative strategy, which would be unsatisfying in a safety context, consists of processing the correlated measurements together by making the approximation that they occur simultaneously. The objective of this section is to show that these solutions can be avoided and the constraint on the time intervals lifted easily with simple corrections to the equations of the estimation algorithm. We decide to perform the estimation asynchronously. The particularity of desynchronized estimation is that the estimation times t1, t2 , . . . , tk , . . . , that is the times at which the estimator is requested to provide an estimate of the distribution of the state, are chosen arbitrarily by the user. We make a distinction between the estimation times, which delimit the steps of the estimation algorithm, and the observation times, at which the data are collected. We denote by xk the random variable representing the n-dimensional state of the system at estimation time tk. Step k (k = 1, 2, . . .) is defined as the part of the algorithm taking place in time interval ]tk1; tk]. The objective of step k is to lead to the estimation of xk from all the measurements prior to tk. During step k, a set of nk observations symbolized by random variables y k1 ; y k2 ; . . . ; y knk are collected at the observation times ok1 ; ok2 ; . . . ; oknk , strictly included in time interval ]tk1; tk]. We represent the state of the system at observation times oki by random variables xki (i = 1, 2, . . . , nk). We
O. Bilenne / Information Fusion 8 (2007) 40–55
carry out a first discretization of the time-line according to the estimation and measurement times, so that we keep expressions of what is estimated (xk) and what is actually observed ðxki Þ. The process of discretization of the continuous-time state-space model is done by solving within every time interval the first-order non-homogeneous linear differential equation depicted in Eq. (1). The dynamics of the discretized model is linear and can be written under the following discrete-time state equations: for all k P 1, k x ¼ F k xk1 þ Gk uk þ xk 1 1 1 1 1 k ði ¼ 2; . . . ; nk Þ; ð3Þ xi ¼ F ki xki1 þ Gki uki þ xki xk ¼ F k xk þ Gk uk þ xk nk þ1 nk þ1 nk þ1 nk þ1 nk y ki ¼ H ki xki þ J ki uki þ tki
ði ¼ 1; . . . ; nk Þ;
ð4Þ
where F ki is the full-rank transition matrix, H ki the observation matrix, uki a known control signal, random vector xki is the process noise of covariance matrix Xki , and random vector tki is the observation or measurement noise of covariance matrix ki . Appendix A shows how the sequences of variables introduced in Eqs. (3) and (4) can be derived from the initial state-space model described by Eqs. (1) and (2). The (discretized) process noise xki and the measurement noise tki are modelled by sequences of random variables. In this section, we suppose all those sequences of random variables to be Gaussian, or that they can be safely substituted by Gaussian sequences by the technique of overbounding, presented in Section 3.2. It is assumed that the process noise is a white Gaussian random sequence (this is the case, for instance, when the continuous-time noise process x(t) is white and Gaussian). Because the measurement errors of various sensors may have a common cause, we consider the possibility of correlations between the measurement noises of close-in-time measurements. Eqs. (3) and (4) handle random variables symbolizing the state at the observation times. From the user’s perspective, only the variables related to the estimation times are of interest. Therefore, we have decided to free ourselves from the state variables corresponding to the observation times by gathering all the relevant variables of a same step k into composite vectors. In this work, we consider those composite vectors as the actual variables of the system. We denote by zk the composite vector containing all the measurements of step k, and consider all the measurements of step k as ‘shifted observations’ of the same variable xk, symbolizing the state at time tk. We call tk the composite vector of all the measurement noise variables of step k, xk the composite vector of all the process noise variables of step k, and uk the composite vector of the commands. By ignoring the observation times, and solving the propagation model (3) for variables xk, the discretized state equations of the system can be rewritten under the following form: xk ¼ F k xk1 þ Gk uk þ W k xk ; zk ¼ H k xk Hk xk þ J~k uk þ tk .
ð5Þ ð6Þ
43
Indeed, appropriate substitutions of variables xki and xk within Eqs. (3) lead to xk ¼ F knk þ1 F knk F k1 xk1 þ
nk X
F knk þ1 F knk F kjþ1 ðGkj ukj þ xkj Þ
j¼1
þ Gknk þ1 uknk þ1 þ xknk þ1 ;
ð7Þ
k1 k1 xki ¼ F k1 iþ1 F iþ2 F nk þ1 xk
nX k þ1
k k k1 k1 k F k1 iþ1 F iþ2 F j ðGj uj þ xj Þ
j¼iþ1
ð8Þ for i = 1, . . . , nk. Eq. (7) univocally defines the block matrices Fk, Gk and Wk and the composite vectors uk and xk introduced in Eq. (5). Using Eq. (4), we have, for i = 1, . . . , nk, k1 y ki ¼ H ki F k1 iþ1 F nk þ1 xk
nX k þ1
k k k1 k H ki F k1 iþ1 F j ðGj uj þ xj Þ
j¼iþ1
þ J ki uki þ tki .
ð9Þ
Eq. (6) is in fact the block matrix representation of Eq. (9), summarising all the observations y k1 ; . . . ; y knk in a composite vector zk. The composite vectors zk and tk and the block matrices k, Hk, Hk, and J~k are defined implicitly through Eqs. (6) and (9). Full definitions of the composite vectors uk, xk, zk, tk, the covariance matrices Xk, k and the block matrices Fk, Hk, Gk, J~k , Wk, Hk are listed in Table 1. The observation matrix Hk propagates xk from estimation time tk back to the observation times of step k. Matrix Hk can be seen as a rectification matrix which compensates for the surplus of process noise posterior to the observations and contained in variable xk. One feature of those equations is the conditional dependency between the observations zk and the process noise xk given state xk. Moreover, we suppose the uncorrelatedness between the process and Table 1 Definition of composite vectors zk, tk, xk, uk and block matrices k, Xk, Fk, Gk, Wk, Hk, Hk, and J~k corresponding to model (3), (4) and appearing in Eqs. (5) and (6) zk ¼ vectfy k1 ; y k2 ; . . . ; y knk g tk ¼ vectftk1 ; tk2 ; . . . ; tknk g k ¼ Eftk tTk g xk ¼ vectfxk1 ; xk2 ; . . . ; xknk þ1 g Xk ¼ diagfXk1 ; Xk2 ; . . . ; Xknk þ1 g, where T Xki ¼ Efxki xki g (i = 1, . . . , nk + 1) uk ¼ vectfuk1 ; uk2 ; . . . ; uknk þ1 g F k ¼ F knk þ1 F knk . . . F k1 Gk ¼ F knk þ1 . . . F k2 Gk1 F knk þ1 . . . F k3 Gk2 F knk þ1 Gknk Gknk þ1 W k ¼ F knk þ1 . . . F k2 F knk þ1 . . . F k3 F knk þ1 1 n o k k1 k k1 H k ¼ vect H k1 F k1 . . . F k1 . . . F k1 2 nk þ1 ; H 2 F 3 nk þ1 ; . . . ; H nk F nk þ1 Hk: block matrix of size nk · (nk + 1) with block elements (Hk)i,j defined by 0 if j 6 i ðHk Þi;j ¼ k k1 H i F iþ1 . . . F k1 if j > i ði ¼ 1; . . . ; nk and j ¼ 1; . . . ; nk þ 1Þ j J k ¼ diagfJ k1 ; J k2 ; . . . ; J knk g Nk ¼ diagfGk1 ; Gk2 ; . . . ; Gknk ; Gknk þ1 g J~k ¼ J k Hk Nk
44
O. Bilenne / Information Fusion 8 (2007) 40–55
measurement noise sequences, and the whiteness of those sequences of composite vectors, that is Efxi xTj g ¼ 0 Efxi tTj g
and
Efti tTj g ¼ 0
if i 6¼ j;
¼ 0 for all i; j.
ð10Þ ð11Þ
The conditions of Eq. (10) are met in particular when the process noise is a white noise sequence, and the measurement noise correlations only relate to the measurements of a same step. In this section, we assume that the process noise and the measurement noise are sequences of Gaussian variables. Recursive estimation is based on the concepts of prediction and innovation. The procedure followed at the kth step of the recursion can be divided into three phases. Firstly, the algorithm predicts a prior (or a priori) estimate ^xkjk1 of xk by propagating through the dynamic model the last estimate computed at step k 1: ^xkjk1 ¼ F k ^xk1jk1 þ Gk uk .
ð12Þ
We call ekjk1 the estimation error on the prior estimate at step k, and Pkjk1 the covariance matrix of this error: ekjk1 ¼ xk ^xkjk1 ; P kjk1 ¼
Efekjk1 eTkjk1 g.
ð13Þ ð14Þ
Next, the prediction is confronted with the new observation zk. The resulting information is usually called innovation. The innovation ck at step k is defined as the difference between the observation zk collected at step k and the predicted observation ^zkjk1 . We call Ck the covariance matrix of the innovations at step k: ^zkjk1 ¼ H k ^xkjk1 þ J~k uk ; ck ¼ zk ^zkjk1 ;
ð15Þ ð16Þ
Ck ¼ Efck cTk g.
ð17Þ
Finally, the posterior (or a posteriori) estimate ^xkjk is obtained by updating the prediction ^xkjk1 according to the Orthogonality Principle Efekjk cT g ¼ 0;
ð18Þ
which states that the estimation error (ekjk) of the optimal estimator is orthogonal to the data used for the estimation (ck) [11]: ^xkjk ¼ ^xkjk1 þ Efekjk1 cTk gEfck cTk g1 ck .
ð19Þ
Similarly, we call ekjk the estimation error at step k and Pkjk its covariance matrix: ekjk ¼ xk ^xkjk ; P kjk ¼
Efekjk eTkjk g.
ð20Þ ð21Þ
Appendix B shows that Eqs. (12)–(17), (19)–(21) and hypotheses (10) and (11) lead to the recursive algorithm summarized in Table 2. The algorithm can easily be generalized to time-correlated process and measurement noise, provided that the dynamics of those noise sequences can be modelled, by considering additional state variables for the noise
Table 2 Recursive linear MMSE estimator 9 k 0 = ^x0j0 ¼ ^x0 Initialization ; P^ 0j0 ¼ P^ 0 Repeat: k k+1 9 ^xkjk1 ¼ F k ^xk1jk1 þ Gk uk = P kjk1 ¼ F k P k1jk1 F Tk þ W k Xk W Tk Prediction ; ^zkjk1 ¼ H k ^xkjk1 þ J~k uk 9 ck ¼ zk ^zkjk1 = Ck ¼ H k F k P k1jk1 F Tk H Tk þ k Innovation ; þ ðH k W k Hk ÞXk ðH k W k Hk ÞT 9 K k ¼ ðP kjk1 H Tk W k Xk HTk ÞC1 = k ^xkjk ¼ ^xkjk1 þ K k ck Correction ; P kjk ¼ ðI K k H k ÞP kjk1 þ K k Hk Xk W Tk End of loop
sequences. One can see that this algorithm is very close to the basic equations of the ‘classical’ Kalman filter, except for a few additional terms issued because of the dependency between the observation vectors and the process noise. We called this estimation scheme desynchronized Kalman estimation because of its ability to deal asynchronously with the measurements. This estimator presents a degree of freedom in the choice of the estimation times. In a sense, it can be seen as a generalization of the Kalman algorithm, bridging the gap between the static Wiener estimator (all the data processed at once) and the basic recursive formulation of the Kalman estimator (hypothesis of orthogonality between the process and measurement noise sequences). This suggests to choose, as far as possible, the time intervals of the discretization according to the correlations of the data so that correlated measurements can be processed in a same step of the recursion, without the need of introducing dynamic models of those correlations, as required by the basic Kalman estimator. In Sections 4 and 5, we consider the presence of faulty sensors and introduce a fault detection module. Due to its ability to easily deal with the cross-correlations between non-simultaneous measurements, the desynchronized Kalman estimator offers the possibility of deriving better estimates of the measurement errors (by following a procedure such as the one described in Section 5.1) and providing more reliable decisions based on large sets of measurements. 3. Integrity This section introduces the concept of integrity and describes the basic procedure of integrity certification used in this study. Next, we briefly present the technique of overbounding, which is currently employed in satellite navigation systems to guarantee the integrity of static estimation algorithms. 3.1. Integrity certification Integrity is a quality measure which is related to the safe use of the system. It measures the probability of misleading
O. Bilenne / Information Fusion 8 (2007) 40–55
information. Misleading information (MI) is said to occur when the state estimate lies outside its multi-dimensional confidence interval. The integrity of a system corresponds to the volume of the density situated in the (possibly multi-dimensional) tail of the distribution, outside the confidence interval. The confidence interval can be determined from the probability density function of the state according to the precision and continuity requirements of the problem. It must be safe, and thus requires a conservative modelling of the system and robustness to the uncertain behaviours of the sensors. Integrity represents a capital concept in fields where estimation mistakes cannot be tolerated, such as rail and air navigation. The integrity of the estimator is compromised whenever the system provides misleading information at a rate greater than the maximum tolerated MI rate. The process consisting in fixing the confidence interval on the density function of the state is called integrity certification (IC). In [5], Ober suggests to place the confidence interval in the state space in such a way that the actual state has the highest posterior probability to lie inside the interval, for a confidence interval of minimum size. Finding the optimal confidence interval for a given distribution is not a trivial problem in the multidimensional case. 3.2. The overbounding technique The technique of overbounding consists of using, instead of the real distributions in presence, Gaussian distributions that still guarantee the performance assessment. In the context of this study, the overbounding technique is concerned with the state estimation from measurements corrupted by non-Gaussian noise. It aims at replacing the noise distributions (which are non-Gaussian and may have some degree of uncertainty) by pessimistic Gaussian distributions, performing the density estimation process on the pseudo-densities, fixing a confidence interval on the resulting distribution, and certifying that the confidence interval is also valid for the actual distributions. The interest of the technique lies in the fact that, under certain conditions, it is possible to find such Gaussian pseudo-distributions and to perform safely a blind estimation on those pseudo-distributions. The overbounding technique was introduced recently in the GPS augmentation programs, such as the Local Area Augmentation System (LAAS), to link the range-domain and position-domain errors [5]. The main overbounding techniques are listed in [13]. Those techniques propose different definitions of the overbounding distributions (tail-overbounding, PDF-overbounding, CDF-overbounding). The technique called cumulative density function (CDF) overbounding is particularly suitable for the problem of state estimation from measurements affected by nonGaussian noise. In CDF-overbounding, the cumulative distribution function of the overbound must always be shifted towards its tail, relative to the actual cumulative distribution function. DeCleene introduced the first bounding strategy, now used in the LAAS algorithms. In [9], he
45
proved that the convolution operation preserved the CDF-overbounding property in the case of symmetric, zero-mean, unimodal distributions. In other words, he showed that it is possible, from two symmetric, zero-mean, unimodal distributions, to find CDF-overbounding distributions, such that the convolution of the two pseudo-distributions overbounds the convolution of the two real distributions. Unfortunately, convolution does not preserve CDF overbounds when the actual distributions are either asymmetric, shifted-median or multimodal. To make up for this restriction, Rife proposed the strategy of paired overbounding, which is an alternative way to use CDFoverbounding by associating with the actual distribution not a single overbound, but a pair of overbounding distributions [13]. As the present study is confined to the Gaussian setting, overbounding is not directly used in the test section of this study. Nevertheless, the overbounding techniques are still under development and represent a promising methodology for extending our results to non-Gaussian noise distributions. 4. Robust integrity-directed estimation The Kalman estimator of Section 2 is optimal when the sensors have Gaussian behaviours. Unfortunately, when a sensor failure occurs, the linear MMSE estimator stops making sense, and a nonlinear estimator is needed. In this section we present a nonlinear recursive state estimator which relies on the Bayesian approach and the assumption that the system can be completely modelled as a Bayesian network. This involves assigning a priori probabilities to all the possible events that are likely to take place during the estimation process. In particular, the sensors are given a priori probabilities of presenting anomalies. We consider, at each step of the recursion, different hypotheses concerning the working modes of the sensors: normal operation (Gaussian distribution), permanent bias and failure. To each hypothesis corresponds one optimal way of using the information provided by the sensor: acceptance, correction or rejection of the measurement. We call single hypotheses the fault hypotheses related to a given step of the recursive estimation algorithm. The simple hypotheses at step k are represented by random variables hk, which take their values hki in the discrete hypothesis space Nk Hk ¼ fhki gi¼0 , where Nk is the number of possible single hypotheses at step k (k P 0). We have decided to group in families the various types of fault hypotheses. The no-fault hypothesis ðhk0 Þ supposes all the sensors to work perfectly at step k and provide measurements affected by Gaussian noise. The single fault hypothesis occurs when an isolated outlier appears among a series of valid measurements. The sensor failure hypothesis corresponds to the full failure of a sensor, in that case the sensor provides erroneous measurements containing very little information on the actual state and the misleading sensor must imperatively be dismissed from the state estimation process. The hypothesis of a permanently biased sensor considers that all the
46
O. Bilenne / Information Fusion 8 (2007) 40–55
measurements of a sensor are shifted by a constant (bias) from the expected value of the measurement. Ideally, the bias should be detected and estimated, and the measurement used accordingly by the estimator. Finally, we mention the complex hypotheses, which are mixed combinations of all the faults listed above. We suppose to know the exhaustive list of all the faults that may occur at each step of the estimation algorithm. Moreover, we consider that we can provide realistic and reliable modellings of the inferences between those hypotheses and the observation variables. In a Bayesian context, this involves fixing beforehand prior probabilities for all the hypotheses, and possibly transition probabilities if the hypotheses are modelled by a full stochastic process such as a Markov chain. An example of modelling of the fault hypotheses can be found in Section 7. We call hypothesis sequence of order k the combination of k successive single hypotheses related to the first k steps of the algorithm. We denote by Hk the random variable standing for the hypothesis sequences of step k. Variable Hk takes its values H ki in the hypothesis sequence space P
ð22Þ Hk ¼ H 0 H 1 H k ¼ fH ki gi¼1k ; Qk where the cardinality Pk ¼ j¼1 N k of Hk grows exponentially with time. The Bayesian approach to state estimation relies on the supposition that the posterior probability of each hypothesis (which is an up-to-date measure of the likelihood of the probability based on the observations) can be estimated from the observation of the measurements. In a Bayesian context, the density function of the state at the end of the kth step of the recursion can be seen as the weighted sum of the conditional density functions under all the possible hypothesis sequences. The Total Probability theorem states that pðxk jzk ; . . . ; z1 Þ ¼
Pk X
terior distribution of the state is actually a Gaussian mixture.1 The Kalman filter propagates only the first two moments of the density function of the state. Therefore, handling non-normal state distributions should be avoided if possible under penalty of losing track of the shapes of those distributions throughout the recursion of the Kalman filter. In the frame of this work, we have chosen to deal with all the Gaussian components of the mixture distribution separately and to propagate them independently through parallel Kalman filters, each equipped with its own fault detection module performing under the linear Gaussian setting. We obtain the actual distribution of the state by marginalisation over the hypothesis sequences, as indicated by Eq. (23). The finite Gaussian mixture has the property to be defined by a limited number of parameters, namely the mean and covariance matrix characterizing the Gaussian distribution of each component of the mixture, and the respective weights (or probabilities) of the components. The posterior distribution of the state at the end of each step of the algorithm can be represented by a table summarizing all the parameters of the Gaussian mixture. Fig. 1 depicts the evolution of the table over time. Each line of the table contains the three-tuple corresponding to one component of the Gaussian mixture. The Nk three-tuples ðpki ; lki ; Rki Þ of the kth column represent the actual values of the parameters of the posterior distribution at the end of step k, where pki stands for probability P ðH k ¼ H ki jz1 ; . . . ; zk1 ; zk Þ, and the Gaussian density function pðxk jz1 ; . . . ; zk ; H k ¼ H ki Þ is denoted by N ðlki ; Rki Þ. Because the number of hypothesis sequences grows exponentially with the recursion of the Kalman filter, some hypotheses must imperatively be rejected or grouped together to prevent the order of the Gaussian mixture to grow to infinity. This problem is discussed in Section 6, where we propose and evaluate various strategies for limiting the multiplication of the hypothesis sequences over time.
P ðH k ¼ H ki jz1 ; . . . ; zk Þ
i¼1
pðxk jz1 ; . . . ; zk ; H k ¼ H ki Þ;
ð23Þ
where capital letter P denotes the (conditional) probability of a discrete variable, while p denotes the (conditional) probability density function (PDF) of a continuous variable. Eq. (23) is the basis of the dynamic multiple-model estimator [10], which relies on the combination of several models of the system, symbolized here by the various possible hypothesis sequences. The estimated distribution of the state is obtained by combining the conditional distributions under all the possible fault sequence models. Because we have decided, under each hypothesis, to reject from the estimation process all the measurements that are supposed to be faulty, the conditional estimation is done from measurements supposedly affected with Gaussian measurement noise only. Hence, the conditional posterior distributions (xkjz1, . . . , zk, Hk) are Gaussian as well. On the other hand, the posterior distribution (xkjz1, . . . , zk) is no longer Gaussian and not necessarily symmetric nor unimodal. The pos-
4.1. Robustness to uncertain modelling and integrity certification The estimation of the posterior probabilities is the crucial part of the algorithm. We will see in Section 5 that the estimation of those probabilities is straightforward if the properties of the sensors are well-known in the normal
1
An N-component Gaussian mixture U of density function fU is defined as a convex combination of N Gaussian distributions: fU ðxÞ ¼
N X
ki /ðx; li ; Ri Þ;
with
i¼1 N X
ki ¼ 1 and ki P 0 for i ¼ 0; . . . ; N ;
ð24Þ
i¼1
where Gaussian distributions /(x; li, Ri) are the components of the mixture and ki are the mixing weights. The mixture is completely defined by the N three-tuples (li, Ri, ki) containing the parameters and weights of the Gaussian components. We denote the Gaussian mixture U by the set fðli ; Ri ; pi ÞgNi¼1 of its N three-tuples.
O. Bilenne / Information Fusion 8 (2007) 40–55
47
Fig. 1. Evolution of the posterior distribution.
operating mode as well as in the case of a failure. In real applications unfortunately, the behaviour of a faulty sensor is very hard to predict, and the measurement noise models of the valid sensors themselves may be affected by uncertainty parameters (see for instance Huber’s -contaminated neighbourhood [2]). To be fully applicable in everyday problems, the integrity-directed estimation algorithm must be robust to uncertain modelling. The common approach to the problem of robust estimation is called minimax. The philosophy of this approach is to consider the worst-case noise conditions so that good performance is ensured for every possible realization of the uncertainty variables. In the case of the density estimator of Eq. (23), the worst-case approach would consist in assigning safe bounds to the conditional posterior probabilities. This is not a trivial task, considering that those posterior probabilities are normalized and their sum must always be equal to 1. This involves that all the posterior probabilities cannot all be overestimated at the same time. However, if we have a look at the estimation algorithm depicted in Table 2, we see that the observation of any additional measurement always leads to a contraction of the covariance matrix Pkjk1 of the state, which is reduced by the semi-positive definite T T T term Dk ¼ ðP kjk1 H Tk W k Xk HTk ÞC1 k ðP kjk1 H k W k Xk Hk Þ derived in Eq. (B.17). Indeed, Bayes’ rule states that the observation of any measurement results in the multiplication of the prior density function of the state by another Gaussian density, and the sharpening of the slope of the distribution in every (observed) direction of the state-space. This property gives sense to the minimax approach in the context of integrity-directed estimation, provided that it is always possible to find a finite neighbourhood of the posterior estimate such that the density of the state after the observation is nowhere greater than the prior density outside that neighbourhood. The property implies that, if the probability that the measurement is erroneous is overestimated, the estimated density of the state will be above the actual density everywhere outside the neighbourhood. This also means that fixing a confidence interval on the pessimistic density will be conservative, as long as the interval bounds are not too close to the posterior estimate, as the confidence interval will overestimate the risk of misleading information. The problem discussed in this section is simi-
lar to the problem of overbounding, as they both aim at certifying the conservativeness of the estimation process when a pessimistic modelling of the system is used (pessimistic measurement or process noise distributions for the overbounding problem, and pessimistic failure probabilities for the mixed density estimator of Eq. (23)). We leave to future research the problem of robust integrity-directed estimation in the presence of modelling uncertainties. 5. Fault detection tools and posterior probabilities of the fault hypotheses In this section we try to determine the posterior probabilities of the fault hypotheses and the possible biases of the sensors from the observed measurements. In Section 5.1, we derive a conditional MMSE estimator which estimates the measurement noise, and present it as a useful tool for estimating the biases of the sensors. In Section 5.2, we evaluate the posterior probability of each hypothesis sequence needed in the multiple-model estimator of Eq. (23). 5.1. Estimation of the measurement noise The corrupted measurements must be uncovered by the state estimator. One interesting way of checking for faults or biases consists of estimating the measurement noise tk of the sensors. The measurement noise estimation can be seen as a direct procedure for estimating the biases of the sensors. This proves to be particularly useful if we consider the hypothesis of permanently biased sensors. Ober [5] suggests to perform the linear MMSE estimation of the measurement noise tk. The measurement errors can be estimated recursively by means of a prediction/correction process. The optimal estimation theory states that ^tkjk ¼ ^tkjk1 þ Efðtk ^tkjk1 ÞcTk gEfck cTk g1 ck ;
ð25Þ
where ^tkjk1 is the prior estimate of the measurement noise before the observation of the measurements of step k, and ^tkjk is the posterior estimate after step k. In case the sequence of composite vectors tk is non-time-correlated, the measurement noise tk can be estimated directly from the innovations and Eq. (25) reduces to ^tkjk ¼ k C1 tkjk is then given by k ck . The covariance matrix of ^
48
O. Bilenne / Information Fusion 8 (2007) 40–55
k C1 k k . In the Gaussian setting, the estimation error ðtk ^tkjk Þ follows a centered normal distribution with 1 covariance matrix k ð 1 k Ck Þ k . Normalized versions of the posterior estimate of the measurement noise were used, for instance, in the fault detection algorithms of [1,5]. It is interesting to note that the algorithm of Table 2 implemented according to the correlations between close-in-time measurements has the advantage over the basic Kalman estimator that it decorrelates the data in a simple and straightforward way through Eqs. (19) and (25), by directly expressing these correlations in the covariance matrix Ck. 5.2. Posterior probabilities of the fault hypotheses In this section we show how the posterior probabilities of the hypothesis sequences can be estimated from the innovations of the Kalman estimator (a similar procedure was followed e.g. in [10]). In the present study, we assume that the dynamics of the hypotheses is modelled by the stochastic process defined by an initial probability distribution P(h0) and a transition probability law of the kind2 P ðhk jz1 ; . . . ; zk1 ; H k1 Þ; k P 1. ð26Þ Bayes’ rule leads to an expression for the posterior probability of an hypothesis sequence in function of its prior probability and the conditional likelihood of the innovation: P ðH k jz1 ; . . . ; zk1 ; zk Þ 1 ð27Þ ¼ P ðzk jz1 ; . . . ; zk1 ; H k ÞP ðH k jz1 ; . . . ; zk1 Þ; Rk where the factor p(zkjz1, . . . , zk1, Hk) = p(ckjz1, . . . , zk1, Hk) represents the likelihood of the measurement vector (or equivalently the innovation) at step k under a given hypothesis sequence, while P(Hkjz1, . . . , zk1) is the prior probability distribution of an hypothesis sequence, and the denominator Rk = p(zkjz1, . . . , zk1) is a normalizing factor common to each hypothesis. The latter expression can be written in function of the single hypothesis of step k and the hypothesis sequence of lower order it derives from. Indeed, we have P ðH k jz1 ; . . . ; zk1 Þ ¼ P ðhk jz1 ; . . . ; zk1 ; H k1 ÞP ðH k1 jz1 ; . . . ; zk1 Þ;
ð28Þ
where the first factor is the probability law of the single hypotheses introduced in Eq. (26), and P(Hk1jz1, . . . , zk1) is the posterior distribution of the hypothesis sequences of the previous step. A recursive method for computing the posterior probabilities of the hypothesis sequences is directly suggested by Eqs. (27) and (28). The posterior probability of 2 The conditioning to z1, . . . , zk1 is relevant when the possibility is considered that sensors may have permanent biases. In that case, the state estimation must be coupled with a process of bias estimation. The sensors in question require the introduction of additional variables symbolizing their biases and, possibly, a modelling of the evolution over time of those biases.
an hypothesis sequence does not depend on the strategy used to discretize the time-line and group the measurements. Hence, the least computationally complex grouping strategy, which considers the smallest sets of measurements that allow to model all the cross-correlations between the measurements, is recommended. Under each hypothesis sequence, the Gaussian assumption ensures the normality of the distribution of the innovations corresponding to the hypothetically valid measurements. The density function of the valid measurements is completely defined by its mean and covariance matrix, and is computed by the recursive linear MMSE estimator. Hence, the evaluation of Eq. (27) is trivial when all the sensors are supposed to work perfectly. On the other hand, when an hypothesis sequence assumes that one of the measurements observed at step k is provided by a faulty sensor, the posterior probability assessment is more complicated because the distribution of the measurement noise of faulty sensors is often hard to predict. The state estimation in the presence of uncertainties on the modelling is the object of the theory of robust estimation. The literature proposes various ways to model probability distributions containing uncertainty parameters. For instance in [4], distributions similar to Huber’s -contaminated neighbourhood are used [2]. In the general case however, finding a conservative modelling of the behaviour of faulty sensors is a difficult task, involving issues of robustness and integrity. The problem of dealing with modelling uncertainties is closely linked to the discussion of Section 4.1. 6. Limitation of the number of elements of the Gaussian mixture In this section, we see how the number of components forming the Gaussian mixture of the state distribution can be limited. For obvious reasons of precision and integrity, the simplification of the Gaussian mixtures cannot be done arbitrarily. We have considered a variety of more or less successful techniques. A first idea consists of replacing, at the end of each step of the algorithm, the Gaussian mixture with the closest single normal distribution. This amounts, at the end of each step of the algorithm, to reducing the Gaussian-mixture table of Fig. 1 to a single entry before moving to the next step. A popular measure of distance for probability distributions is the relative entropy between these distributions [14]. The relative entropy was first introduced by Kullback and Leibler as a generalization of the concept of quantity of information [15], and is often referred to as the Kullback–Leibler distance. If the density functions f1 and f2 correspond respectively to two hypotheses H1 and H2, the relative entropy I(f1kf2) of distribution f1 with respect to f2 is given by Z f1 ðxÞ Iðf1 kf2 Þ ¼ ln f1 ðxÞ dx; ð29Þ f2 ðxÞ and can be seen as the mean information for discrimination in favor of H1 against H2 per observation under H1. The Kullback–Leibler distance between two normal distribu-
O. Bilenne / Information Fusion 8 (2007) 40–55
tions is always finite and computable analytically. In [16], Pen˜a and Guttman propose to use the normal distribution whose first two moments correspond to the mean and variance of the posterior mixture. They justify their choice by proving that the single normal distribution resulting from the so-called moment matching or collapsing by moments procedure is the distribution with the minimum Kullback–Leibler distance to the original posterior mixture, that is the distribution with the maximum mean log-likelihood. The moment matching technique is overly simplified in the sense that it generally does not preserve the essential characteristics of the multilobe profile of the posterior mixture distribution. The moment matching procedure was also used in [1], where its unclear and inaccurate performance called in question the validity of the fault detection procedure and did not allow to certify the final integrity of the estimation process. It is thus necessary to keep a richer estimate of the Gaussian mixtures. More elaborate methods consist of replacing the actual Gaussian mixture by another ‘lighter’ Gaussian mixture made of less components. The generalized pseudo-Bayesian (GPB) and the interacting multiple model (IMM) algorithms [10] simplify the Gaussian mixture according to the mode history analogies of the components, regardless of their relative distances. Subsets of components are merged together if the corresponding hypothesis sequences are similar, the hypothesis history being limited to the last few single hypotheses. Unfortunately, the latter methods provide little information on the loss of accuracy caused by the mixture simplifications. The mixture simplification procedure presented in this study seems more efficient for integrity-directed applications as it relies on the criterion of minimal Kullback– Leibler distance between the initial and simplified Gaussian mixtures. The problem of approximating a given density with a Gaussian mixture is recurrent in the kernel-based unsupervised learning literature. The most popular method for learning mixture models derives from Dempster’s numerically well-behaved Expectation-Maximization (EM) algorithm, which is a much used iterative algorithm for maximum log-likelihood estimation [17]. The EM algorithm is shown to converge to a locally optimal solution but is highly dependent on the choice of the initial mixture of the recursion [18]. The Kullback–Leibler distance between two Gaussian mixtures is a non-convex function in the parameters of the components with multiple local minima. The parameters of the minimum distance Gaussian mixture cannot be found analytically. Iterative loss-minimizing algorithms such as the EM algorithm constitute an efficient and common approach to complex minimization problems. In the context of this study, we decided to use a slightly different approach. Although we did not definitively reject the possibility of using them, we have chosen to avoid iterative algorithms such as the EM algorithm because they distort all the components of the initial mixture, and lose track of their respective identities, that is the hypothesis sequences on which they rely, which are essential for the estimation of the posterior probabilities of the hypothesis sequences. To
49
the EM algorithm we preferred a suboptimal but more direct procedure based on the simple selection of the most relevant components and elimination of the unimportant and redundant ones. After the selection phase, the selected components are adjusted to remain as close as possible to the actual Gaussian mixture. We have tested several criterions for selecting the less informative components of the mixture and various ways of adjusting the remaining components. A first idea was to remove the components affected with insignificant mixing weights, and to distribute the rejected components amongst the remaining components, proportionally to their respective weights. For this purpose, we first chose to impose a lower bound on the mixing weights. In other tests, we imposed a threshold for the maximum number of components in the mixture. In both cases unfortunately, the method did not give satisfying results, as we observed a demultiplication of the most relevant components. After several steps of the recursive estimation algorithm, the Gaussian mixture resulting from the successive approximation processes was made of a collection of components with similar shapes and affected with equal weights. The equipartition of the mixing weights pointed out the inefficiency of the method. Our conclusion was that the components with similar shapes would imperatively have to be fused together. Our second approach thus consists of getting rid of the redundant components of the mixture. At each step of the algorithm, the dissimilarities between all the possible pairs of three-tuples are estimated, listed, and ordered by decreasing likeness. The least informative pairs of three-tuples are fused two by two into single Gaussians by the ‘moment matching’ technique, which provides the closest Gaussian distribution in terms of Kullback–Leibler distance. The single Gaussian resulting from the fusion of the pair of components i and j is weighted by the total weight of the pair, and its mean and covariance matrix are those of the virtual second-order mixture distribution Ui;j ¼ fðlki ; Rki ; pki =ðpki þ pkj ÞÞ; ðlkj ; Rkj ; pkj =ðpki þ pkj ÞÞg made of the two components in question weighted in the same proportion. Similarly, the hypothesis sequences corresponding to the two components are merged and generate a fuzzy combination of the sequences denoted by a set of pairs (hypothesis sequence, weight). For example, the fusion at the end of step k of three-tuples i and j gives 3 ðlki ; Rki ; pki Þ 7 ðlki;j ; Rki;j ; pki;j Þ fðh; whi Þgh2Hk 7 7 ! ; ð30Þ ðlkj ; Rkj ; pkj Þ 7 fðh; whi;j Þgh2Hk 5 fðh; whj Þgh2Hk where pki;j ¼ pi þ pj ;
ð31Þ
lki;j
ð32Þ
¼ ðpi li þ pj lj Þ=pi;j ; 2
2
Rki;j ¼ pi =pi;j ½Ri þ ðli li;j Þ þ pj =pi;j ½Rj þ ðlj li;j Þ Þ; whi;j
¼ ðpi whi þ pj whj Þ=pi;j ;
k
h2H .
ð33Þ ð34Þ
50
O. Bilenne / Information Fusion 8 (2007) 40–55
The Kullback–Leibler distance between two distributions is non-symmetric with respect to those distributions. To select the pairs of components that need to be replaced by a single Gaussian, we naturally headed for another (symmetric) discrimination measure called divergence [14]. The Kullback–Leibler divergence J(f1, f2) of two distributions of density functions f1 and f2 is defined by J ðf1 ; f2 Þ ¼ Iðf1 kf2 Þ þ Iðf2 kf1 Þ.
J ðfim ;jm ; fUm Þ 6 BðUm Þ=pim ;jm ;
M X ðpim þ pjm ÞJ ðfim ;jm ; fUm Þ. J ðfU ; fU0 Þ 6
ð36Þ
m¼1
This result can be derived easily from the log-sum inequality,3 which is a direct consequence of the convexity of entropy and Jensen’s inequality [19,20]. Moreover, we have the following inequalities: IðfUm kfim ;jm Þ 6 IðfUm kfjm Þ 6 Iðfim kfjm Þ; pj p Iðfim ;jm kfUm Þ 6 im Iðfim ;jm kfim Þ þ m Iðfim ;jm kfjm Þ pim ;jm pim ;jm
ð39Þ ð40Þ ð41Þ
where Eq. (39) is derived from the definition of the moment matching function fim ;jm , Eq. (40) is due to the property of monotonicity of function J ðfjm þ kðfim fjm Þ; fjm Þ in k for k P 0, and Eq. (41) is a direct consequence of the convexity of the logarithm.
3
The log-sum inequality [19] claims that if ai P 0 and bi P 0 (i = 1, . . . , N 0 ), ! P 0 N N0 N0 X X ai ai ai ln P ai ln Pi¼1 . ð37Þ N0 b i i¼1 i¼1 i¼1 bi If we replace terms ai and bi of Eq. (37) with the weighted components of the N 0 -component mixtures fU and fU0 respectively, we find
ð42Þ
where we define the analytically computable expression BðUm Þ ¼ pim ;jm minfIðfim kfjm Þ; Iðfjm kfim Þg
ð35Þ
It would be interesting to consider the impact on the initial Gaussian mixture of the fusion of pairs of components. As for the Kullback–Leibler distance, there is no known analytical expression of the divergence of a Gaussian mixture. However, it is possible to find an upper bound to the divergence between the initial Gaussian mixture and the new Gaussian mixture obtained after replacement of a selection of pairs of components. Suppose that, after a given step of the recursive estimation algorithm, M pairs are turned into single Gaussians. Let fim and fjm be the density functions of the two components of pair m, fUm the density function of the fictitious Gaussian mixture made of those two components, and fim ;jm the density function of the new component resulting from their fusion. We have the following upper bound to the divergence between the distributions fU and fU0 of the initial and simplified Gaussian mixtures:
fU ðxÞ ln
Eqs. (36), (39) and (40) lead to the following upper bound:
þ pim Iðfim ;jm kfim Þ þ pjm Iðfim ;jm kfjm Þ.
ð43Þ
From Eq. (42), we derived a likeness criterion for the pairs of Gaussians. Our strategy is to remove one (and only one) set of pairs of components such that the sum of their weighted divergences is smaller than threshold Jmax: MergefUm gM m¼1 :
M X
BðUm Þ 6 J max .
ð44Þ
m¼1
Next, the procedure is repeated until such a set of pairs of components no longer can be found in the mixture. The main property of this strategy is that it ensures that the divergence between the actual Gaussian mixture and the simplified Gaussian mixture does not exceed threshold Jmax after each simplification. We have J ðfU ; fU0 Þ 6 J max .
ð45Þ
In Section 6.1, we show the impact of the latter property in the context of a process of integrity certification of the state estimation. Indeed, upper-bound Jmax suggests a conservative procedure for fixing a confidence interval respecting the integrity constraints on the approximate Gaussian mixture regardless of the modifications of the Gaussian mixture. 6.1. Loss of accuracy of the Gaussian mixture in terms of additional risk of misleading information The aim of this section is to show how it is possible to evaluate the influence, on the integrity of the whole estimation process, of one step of the procedure of simplification of the Gaussian mixture. We try to determine to what extent the simplified Gaussian mixture can be trusted. We choose, as an inaccuracy measure of the new Gaussian mixture, the maximum risk 0 under fU caused by a confidence interval respecting the integrity level 1 under fU0 . We adopt a worst-case approach. The positivity of the divergence measure d U;U0 ðxÞ ¼ ½fU ðxÞ fU0 ðxÞ lnðfU ðxÞ= fU0 ðxÞÞ allows to reduce Eq. (45) to the pessimistic constraint Z d U;U0 ðxÞ dx 6 J max ; ð46Þ A
M f/ ðxÞ fU ðxÞ X 6 . ðp þ pjm Þf/m ðxÞ ln m 0 fim ;jm ðxÞ fU ðxÞ m¼1 im
ð38Þ
Repeating the operation by exchanging roles between fU and f the upper bound to the divergence of Eq. (36).
U0
leads to
where we denote by A X the region of the state-space situated outside the confidence interval. One can show that 0 presenting the smallest divergence from the distributions U
O. Bilenne / Information Fusion 8 (2007) 40–55
U inside A for given values of and 0 respect the conditions fU 0 ðxÞ ¼ =0 fU ðxÞ, for all x 2 A, and4 Z d U;U 0 ðxÞ dx ¼ ð0 Þ lnð0 =Þ. ð47Þ
log[(ε′−ε)/ε] 4 2 0 −2
A
As a consequence, for any confidence interval, the distributions presenting a divergence with the initial distribution fU not greater than Jmax can only provide risks less or equal to the risk 0 which verifies ( 0 ) ln( 0 /) = Jmax. We deduce the following upper bound 0 for the alert rate: 0 ¼ ð1 þ gðJ max =ÞÞ;
ð48Þ
where the function g(x) representing the additional relative risk is defined as the positive root of equation y ln(y + 1) = x. Another interpretation of ratio 0 / is that an integrity of 1 on the initial Gaussian mixture is automatically ensured if the confidence interval fixed on the simplified Gaussian mixture respects an apparent error risk less than (/ 0 ). We see in Fig. 2 that this bound is only useful when threshold Jmax is of the same order of magnitude as the alert risk or smaller. If we impose the constraint Jmax , then 0 ’ , and the simplified Gaussian mixture U 0 can be considered as an approximation of the actual mixture U in agreement with the integrity requirements of the problem. This may sound particularly restrictive in the practical applications demanding extremely high-integrity levels, such as the safe navigation systems for instance, where the expectations in terms of integrity reach the target risk level SIL4 tolerating 109 error/hour. In those applications a very low threshold Jmax must be chosen. Even for very small values of the threshold, this result is not completely satisfying. The recursive structure of the Kalman estimator is such that the estimation of the state distribution at step k + 1 is based on the previous estimate at step k. Consequently, the approximation errors at step k have the property to corrupt (to a decreasing extent) the distributions of the state at future times (tk+1, tk+2, . . .) as well. To preserve the integrity of the whole estimation process, the influence of the simplification of the Gaussian mixtures on future estimates must be monitored and quantified. A complete description of the signature of some errors on the future estimates of a Kalman filter can be found in [21]. The article leaves open 4 For any density functions fU and fU0 providing the respective risks e 0 and e, it is always possible to find a scalar k and two functions gU and u defined on A such that Z Z gU ðxÞ ¼ 1; uðxÞ ¼ 0; f U ðxÞ ¼ 0 gU ðxÞ and A
51
A
f U0 ðxÞ ¼ ½gU ðxÞ þ kuðxÞ. The divergence of fU and fU0 over A is given by expression Z 0 d U;U0 ðxÞ dx ¼ ð0 Þ ln þ 0 IðgU kgU þ kuÞ A þ IðgU þ kukgU Þ;
ð49Þ
ð50Þ
which is minimum when k = 0, that is when fU0 is a scaled version of fU on A, independently from function u.
−4 −6 −12
−10
−8
−6
−4
−2
0
2
log(Jmax/ε)
Fig. 2. Pseudo-risk 0 .
the question of certifying the safety of the Gaussian mixtures simplifications for the overall system. 7. An application: Safe rail navigation In this section, we consider the problem of real-time estimation of the speed of a train from the measurements of various onboard speed sensors. The classical ambiguous situations of rail navigation include inter alia isolated failures, permanently-biased sensors, slipping wheels at sudden acceleration or braking, unavailability of the GPS receivers inside tunnels, etc. Our tests were realized on a database recorded in Italy during an Arezzo-Pontassieve journey. The train speed estimation was based on the data collected by three devices: a wheel sensor and two Doppler radars. The measurements given by the sensors were the projections, in the direction of motion of the train, of the three-dimensional speed vectors, restricting the problem to a scalar unknown. We assumed that when no failure occurred, the devices provided measurements normally distributed around the actual speed of the train with pessimistic standard deviations of rWS = 1 m/s, rDR1 = 0.5 m/s, and rDR2 = 2 m/s respectively. In the case of sensor failure, we made the pessimistic supposition that the prior distributions of the measurements where relatively compact, uniform and centered around the collected values. The data of each sensor are provided asynchronously at the frequency of 20 Hz. Furthermore, we assigned to each sensor heuristic a priori probabilities of not being in their normal operating mode. The probability of each sensor to be in a state of failure during a step (0.8 s) of the recursion was arbitrarily set to 0.01 failure/step, corresponding to a disponibility rate of 99%. For simplicity, we did not consider the hypothesis of slightly and permanently biased sensors. We took, as fault hypothesis space, the set of all the possible combinations of faulty sensors during a step of the recursive algorithm. The probabilities of failure of a sensor were assumed to depend on the past behaviour of the sensor. Indeed, when the failure of a sensor occurs, the chances are high that the sensor remains in a state of failure during several steps of the recursion. Therefore, the fault hypotheses were modelled as the variables of a one-step memory Markov chain and the probability law of Eq. (26) reduced to P(hkjhk1), k P 1. The probability of failure at step k under the hypothesis of a failure at step k 1 was set to 0.9. This value corresponds to an expected length of 10 steps (8 s) for the failures. Table 3 depicts the state
52
O. Bilenne / Information Fusion 8 (2007) 40–55
fit Eqs. (5) and (6). The choice of a constant acceleration model is quite popular in the rail navigation literature, and proves to be particularly suitable for this speed estimation problem. The inefficiency of the linear MMSE estimator in the presence of failures is shown on the same database in [1]. That study also compares the performance of the multiple-model estimator presented in Section 4 used with various estimation frequencies and in different situations. In the present work, we will focus on the evolution over time of the structure of the Gaussian mixture of the integrity-directed multiple-model estimator of Eq. (23).
Table 3 Dynamic modelling of the normal/failure mode switch of a sensor Stationary 1-step memory Markov chain P(Fk = 1) = p1 P(F0 = 1) = p1 P(F P k = 1jFk1 = 1) = p1j1 i2f0;1g P ðF k ¼ 1jF k1 ¼ iÞP ðF k1 ¼ iÞ ¼ P ðF k ¼ 1Þ
(Disponibility) (Initialization) (Dynamics) (Stationarity)
equations of the stationary one-step-memory Markov chain. We denote by Fk the binary random variable that symbolizes the operating mode of a sensor at step k. The value of Fk is 0 if the sensor is in its normal operating mode during the kth step, and 1 in the case of a failure. Scalars p1 and p1j1 are respectively the a priori probability of failure and the conditional probability of two consecutive failures. The missing parameters of the probability law can be deduced from the stationarity equation, which ensures that the a priori probabilities remain unchanged over time.
7.2. Evolution of the Gaussian mixture Fig. 3 depicts the evolution over time of the speed measurements of the wheel sensor and the radars on a fourminute section of the journey. The right figure is a local enlargement of the rectangular box inserted on the left figure and containing all the data of time interval [130 s, 133 s]. The dots, squares and circles represent the measurements of the wheel sensor, the first radar, and the second radar respectively. The database did not provide the evolution over time of the actual speed of the train. Nevertheless, one can see that the poorly performing second radar provides biased measurements when the train accelerates or decelerates. The multiple-model estimator was used at estimation frequency 1.25 Hz to estimate the speed of the train. The method of Eq. (44) was used to limit the order of the Gaussian mixtures. The strategy was tested with various values of the threshold Jmax. Table 4 shows statistics (mean, median, 90th and 95th percentile) on the number of components remaining after simplification of the Gaussian mixture at the end of each step of the recursion in function of Jmax. We see that the complexity of the
7.1. Modelling of the system’s dynamics The dynamics of the train was modelled by a constant acceleration model [10]. The speed x1 and acceleration x2 of the train form the two-dimensional system state. The state equations are x_ 1 ðtÞ ¼ x2 ðtÞ; x_ 2 ðtÞ ¼ qðtÞ;
ð51Þ ð52Þ
where the stochastic process q is an additive white noise causing a standard deviation rq = 0.04 ms2 per second. Perturbation q represents the jerk endured by the train. By discretizing Eqs. (51) and (52) according to the procedure described in Section 2, and using the results A.7 of Appendix A, we obtain discrete-time state equations which
km/h 50
wheel sensor radar 2 radar 1
km/h 50
40 45 30 20
40
10 35 0 20
40
60
80
100
120
140
160
180
200
220
s
130
131
132
s
Fig. 3. Measurements of the speed sensors (time vs speed).
Table 4 Statistics on the number of components of the Gaussian mixture Jmax Mean Median 90th p. 95th p.
1
101
103
105
107
109
1011
1013
1015
1 1 1 1
2.04 2 3 3
2.87 2 5 6
4.17 3 9 11
5.74 4 11 15
8.85 6 20 27
12.55 11 20 27
16.95 14 30 36
21.07 18 38 44
O. Bilenne / Information Fusion 8 (2007) 40–55
53
km/h 50 40 30 20 10 0 20
40
60
80
100
120
140
160
180
200
220
s
20
40
60
80
100
120
140
160
180
200
220
s
50 40 30 20 10 0
Fig. 4. State estimation and components of the Gaussian mixture.
Gaussian mixtures increases with smaller values of Jmax. Our results reveal a trade-off between the loss in integrity and the complexity of the Gaussian mixtures, which determines the computation times. Fig. 4 shows the performance of the estimator for Jmax = 1013. The thick continuous lines represent the alert limits of the confidence interval corresponding to an integrity level of 1 1011, in agreement with condition Jmax . The crosses replace the measurements of the sensors that were taken as faulty by the most likely hypothesis of the current step, that is the hypothesis with the highest posterior probability. One sees that the estimator detects the failures of the second radar and ‘rejects’ most biased measurements. At the bottom of Fig. 4 is represented the evolution of the order of the Gaussian mixture. We can see that the number of components of the mixture and the size of the confidence interval vary strongly according to the general shape of the speed curve and the quality of the available measurements. This illustrates the way of reasoning of the estimator. The loss in precision of the estimated risk was observed to remain below 1.0 · 1012 at each step of the algorithm. Next, the same tests were carried out using the IMM algorithm. The best bounds on the precision loss of the estimated risk that we were able to find for the IMM using Eqs. (45) and (48) reached peaks larger than 1.0 · 102, which was highly insufficient. 8. Conclusion and future work In this study, we tackled the problem of dynamic state estimation of continuous-time systems from discrete-time measurements in the context of high-integrity applications. We presented an integrity-directed multiple-model estimator which relies on the following modelling hypotheses: the measurement noises are supposed to follow Gaussian distributions in the normal operating mode of the sensors (or they can be safely replaced by overbounding Gaussian
distributions), and the probability laws of the sensor-failure hypotheses and the stochastic properties of the measurement noises in the case of a failure are considered known as well. The estimator was derived from Bayes’ rule, and from a desynchronized version of the Kalman estimator dealing naturally with the correlations between closein-time measurements, and allowing the user to choose the estimation times arbitrarily. This estimator derives the posterior density function of the state, rather than single estimates. We explained how the integrity of the estimation process could be certified by fixing, on the estimated density function, the alert limits of a safe confidence interval. Next, we presented the recursive MMSE estimation of the measurement noise as a practical way of estimating the biases of the sensors. We also studied how the number of components of the Gaussian mixture of the state distribution estimated by the multiple-model estimator could be reduced to prevent the complexity of the algorithm from growing to infinity. We presented a strategy of simplification of the Gaussian mixture based on a criterion of limited divergence between the actual and simplified distributions. Then we showed how the impact of the Gaussian mixture simplification on the integrity of the posterior state estimate could be quantified and limited. The integrity-directed multiple-model estimator was tested on a rail navigation problem. The tests were performed on a database of real measurements from various speed sensors. The sensor failures were modelled by a one-step memory Markov chain. The simplification procedure of the Gaussian mixtures was tested with various threshold values. The results revealed the trade-off between the integrity losses and the complexity of the resulting mixture distributions. The estimation method presented in this study has the advantage over the particle filters and Monte-Carlo methods that it allows to monitor the accuracy in the tails of the posterior distributions and to introduce confidence intervals certifying extremely high-integrity levels.
54
O. Bilenne / Information Fusion 8 (2007) 40–55
Many questions remain open regarding the application of robust dynamic estimation techniques to problems demanding very high levels of safety. Firstly, the noise distribution of the sensors is often hard to predict in the case of a failure. As a consequence, the likelihood of the hypothesis of failure of a sensor cannot be measured easily. The common, robust approach to the problem of uncertain modelling consists of considering the ‘worst-case model’. Unfortunately, the determination of a worst-case noise distribution of the sensors is never an easy task. Moreover, even if we manage to determine the most pessimistic noise distribution according to the observations, that is the realistic noise distribution maximizing the posterior probability of a failure, the integrity of the final ‘pessimistic’ posterior distribution of the state will only be certified if the pessimistic distribution is a valid overbound of the actual distribution and allows the confidence interval to be fixed safely. Finally, several ideas of this study rely on the hypothesis that an overbounding technique can be used safely, namely, the use of Gaussian modellings of the measurement noise in the normal operating mode of the sensors, and the linearization of the system’s dynamics and the introduction of a Gaussian process noise. The overbounding techniques, currently in application in the field of satellite positioning, represent a promising way to guarantee the integrity of the estimation process in a nonlinear and non-Gaussian framework. Acknowledgements
Appendix A. Discretization of the continuous-time state space model The discrete-time model described by Eq. (3) is a discretization of the continuous-time model of Eq. (1) if, for all k and for i = 2, . . . , nk, the variables of the two models verify F k1 ¼ f ðtk1 ; ok1 Þ;
Gk1 uk1 ¼ gðtk1 ; ok1 Þ; xki ¼ xðoki Þ; xk ¼ xðtk Þ;
xk1 ¼ hðtk1 ; ok1 Þ;
F ki ¼ f ðoki1 ; oki Þ;
Gki uki ¼ gðoki1 ; oki Þ;
ðA:1Þ
xki ¼ hðoki1 ; oki Þ;
F knk þ1 ¼ f ðoknk ; tk Þ;
Gknk þ1 uknk þ1 ¼ gðoknk ; tk Þ;
xknk þ1 ¼ hðoknk ; tk Þ;
and the covariance of the discretized process noise is given by Xk1
¼
iðtk1 ; ok1 Þ;
Xki
s1
¼
iðoki1 ; oki Þ;
Xknk þ1
¼
iðoknk ; tk Þ; ðA:2Þ
s1
ðA:6Þ In the application of Section 7 for instance, the above functions reduce to
1 ðs2 s1 Þ f ðs1 ; s2 Þ ¼ ; 0 1 ! R s2 ðs2 sÞqðsÞ ds s1 R s2 ; hðs1 ; s2 Þ ¼ qðsÞ ds s1 ! 3 2 1 ðs2 s1 Þ 12 ðs2 s1 Þ 3 r2q . gðs1 ; s2 Þ ¼ 0; iðs1 ; s2 Þ ¼ 2 1 ðs2 s1 Þ ðs2 s1 Þ 2 ðA:7Þ Moreover, the following relations between the measurement models (1) and (3) must hold for all k and for i = 1, . . . , nk: y ki ¼ y½oki ;
This work was made possible with the support of Multitel ASBL. The data are property of ALSTOM Transport Belgium. Sincere acknowledgements to Multitel, the Data Fusion Group at Multitel, Andreea I. Carnu, Rocı´o Moraga, and the editors and reviewers for their valuable contributions.
xk1 ¼ xðok1 Þ;
where functions f, g, h and i are defined by Z s2 F ðsÞ ds ; ðA:3Þ f ðs1 ; s2 Þ ¼ exp s1 Z s2 gðs1 ; s2 Þ ¼ f ðs; s2 ÞGðsÞuðsÞ ds; ðA:4Þ s Z 1s2 hðs1 ; s2 Þ ¼ f ðs; s2 ÞxðsÞ ds; ðA:5Þ s Z s12 Z s2 T T iðs1 ; s2 Þ ¼ f ðs; s2 ÞEfxðsÞ½xðkÞ g½f ðk; s2 Þ ds dk.
H ki ¼ H ½oki ;
J ki uki ¼ J ½oki uðoki Þ;
tki ¼ t½oki . ðA:8Þ
The introduction of a common control vector uki for the state and measurement equations is not compulsory, but it may lead to simpler expressions for the control inputs in some practical cases, for instance when the control signal is constant over a sampling period. Appendix B. Derivation of the equations of the desynchronized recursive linear MMSE estimator This section is dedicated to the derivation of equations of the recursive linear MMSE estimator depicted in Table 2. We follow a well-known procedure that derives the basic equations of the Kalman filter from the principle of orthogonality between the estimation error and the data, with the difference that the process and measurement noise sequences are no longer orthogonal. The Orthogonality Principle and the theory of optimum estimation allows to compute the state estimates sequentially [11]. As already stated in Eq. (19), we have ^xkjk ¼ ^xkjk1 þ K k ck ;
ðB:1Þ
where xkjk and ^xkjk1 are the prior and posterior state estimates and K k ¼ Efekjk1 cTk gEfck cTk g1
ðB:2Þ
O. Bilenne / Information Fusion 8 (2007) 40–55
is the gain of the filter. From Eqs. (5), (12), (13) and (6), (15), (16) respectively, we derive new expressions for the prior estimation error ekjk1 and the innovation vector ck: ekjk1 ¼ F k ðxk1 ^xk1jk1 Þ þ W k xk ; ck ¼ H k ekjk1 Hk xk þ tk .
ðB:3Þ ðB:4Þ
Eqs. (13), (B.1) and (B.4) allow to rewrite the definition (20) of the posterior estimation error as ekjk ¼ ekjk1 K k ck ¼ ðI K k H k Þekjk1 þ K k Hk xk K k tk .
ðB:5Þ ðB:6Þ
The next two equations are consequences of the whiteness of the noises expressed in Eqs. (10) and (11): Efekjk1 tTk g ¼ 0;
ðB:7Þ
Efek1jk1 xTk g
ðB:8Þ
¼ 0.
From Eqs. (B.3) and (B.4), one has ekjk1 cTk ¼ ekjk1 eTkjk1 H Tk ðF k ek1jk1 þ W k xk ÞxTk HTk þ ekjk1 tTk .
ðB:9Þ
Using Eqs. (B.7)–(B.9), we have Efekjk1 cTk g ¼ P kjk1 H Tk W k XTk HTk .
ðB:10Þ
By inserting Eqs. (17) and (B.10) into Eq. (B.2), we find K k ¼ ðP kjk1 H Tk W k XTk HTk ÞC1 k .
ðB:11Þ
Eqs. (B.3) and (B.8) and definition (14) lead to the expression for the prior error covariance matrix: P kjk1 ¼ F k P k1jk1 F Tk þ W k Xk W Tk .
ðB:12Þ
According to Eqs. (B.5) and (B.6), we have after observation of the innovation ck: ekjk eTkjk ¼ ekjk eTkjk1 ekjk cT K Tk ¼ ðI
ðB:13Þ
K k H k Þekjk1 eTkjk1
K k tk eTkjk1
ekjk c
T
þ
K k Hk xk eTkjk1
K Tk .
ðB:14Þ
Using Eqs. (B.3) and (B.8), we have Efekjk1 xTk g ¼ W k Xk .
ðB:15Þ
Eqs. (18), (B.7), (B.14) and (B.15) allow to compute the posterior error covariance matrix: P kjk ¼ ðI K k H k ÞP kjk1 þ K k Hk Xk W Tk ¼ P kjk1 T C1 k ðP kjk1 H k
ðP kjk1 H Tk
W
W
T T k X k Hk Þ .
ðB:16Þ
T k Xk Hk Þ
ðB:17Þ
Finally, an expression for the covariance matrix of the innovations can be derived by inserting expression (B.4) into definition (17) and using the results of Eqs. (11), (B.7), (B.12) and (B.15):
55
Ck ¼ H k P kjk1 H Tk þ k þ Hk Xk HTk H k W k Xk HTk Hk Xk W Tk H Tk ¼
H k F k P k1jk1 F Tk H Tk
ðB:18Þ
þ k T
þ ðH k W k Hk ÞXk ðH k W k Hk Þ .
ðB:19Þ
References [1] O. Bilenne, Fault detection by desynchronized Kalman filtering, introduction to robust estimation, in: Proceedings of the Seventh International Conference on Information Fusion, Stockholm, Sweden, 2004, pp. 99–106. [2] P.J. Huber, Robust Statistics, John Wiley, New York, USA, 1981. [3] R.S. Mangoubi, Robust Estimation and Failure Detection, A Concise Treatment, Springer-Verlag, Berlin, Germany, 1998. [4] I.C. Schick, S.K. Mitter, Robust recursive estimation in the presence of heavy-tailed observation noise, Ann. Statist. 22 (1) (1994) 1045– 1080. [5] P.B. Ober, Integrity prediction and monitoring of navigation systems, Ph.D. Thesis, Technische Universiteit Delft, The Netherlands, 2003. [6] B.D. Kovacevic, Z.M. Durovic, An adaptive robustizing approach to Kalman filtering, Control Comput. 22 (1) (1994) 7–11. [7] V. Kadirkamanathan, P. Li, T. Kirubarajan, Sequential Monte Carlo filtering vs. the IMM estimator for fault detection and isolation in nonlinear systems, Proc. SPIE Int. Soc. Opt. Eng. 4389 (2001) 63–274. [8] C. Andrieu, A. Doucet, S.S. Singh, V.B. Tadic, Particle methods for change detection, system identification, and control, Proc. IEEE 92 (3) (2004) 423–438. [9] B. Decleene, Defining pseudorange integrity, in: Proceedings of the ION-GPS, Salt Lake City, Utah, 2000. [10] Y. Bar-Shalom, X. Rong Li, T. Kirubarajan, Estimation with Applications to Tracking and Navigation: Theory Algorithms and Software, John Wiley and Sons, 2001. [11] D.G. Manolakis, V.K. Ingle, S.M. Kogon, Statistical and Adaptive Signal Processing, McGraw-Hill, Boston, MA, 2000. [12] F. Gustafsson, Adaptive Filtering and Change Detection, John Wiley and Sons, New York, 2000. [13] J. Rife, S. Pullen, B. Pervan, P. Enge, Paired overbounding and application to GPS augmentation, in: Proceedings of the IEEE Position, Navigation and Location Symposium, Monterey, CA, 2004, pp. 439–446. [14] S. Kullback, Information Theory and Statistics, John Wiley and Sons, New York, 1959. [15] S. Kullback, R.A. Leibler, On information and sufficiency, Ann. Math. Statist. 22 (1951) 79–86. [16] D. Pen˜a, I. Guttman, Optimal collapsing of mixture distributions in robust recursive estimation, Commun. Statist., Theory Methods 18 (3) (1989) 817–833. [17] G. McLachlan, D. Peel, Finite Mixture Models, John Wiley and Sons, New York, 2000. [18] A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood estimation from incomplete data via the EM algorithm (with discussion), J. Roy. Statist. Soc., Ser. B 39 (1977) 1–38. [19] T. Cover, J. Thomas, Elements of Information Theory, John Wiley and Sons, New York, 1991. [20] Y. Singer, M.K. Warmuth, Batch and on-line parameter estimation of Gaussian mixtures based on the joint entropy, in: Advances in Neural Information Processing Systems, vol. 11, MIT Press, Cambridge, MA, 1998, pp. 578–584. [21] M. Basseville, I. Nikiforov, Detection of Abrupt Changes: Theory and Application. Information and System Science Series, PrenticeHall, Englewood Cliffs, NJ, 1993.