Copyright © IFA C Iden tifi ca ti on a nd Syste m Para me te r Estim a tion 1982. W as hingto n D.e.. USA 1982
A COMPUTATIONALLY EFFICIENT GLR ALGORITHM FOR DETECTING AND ESTIMATING STATE JUMPS IN LINEAR SYSTEMS* H. L. Stal£ord Practical Sciences, Inc ., 40 Long Ridge Road, Carlisle, Massachusetts 01741 , USA
Abstract. A generalized likelihood ratio (GLR) algorithm which is computationally efficient is developed for detecting and estimating state jumps in discrete linear stochastic systems. A jump error state variable concept is used to decompose the failure signature matrix into a new failure signature matri x which is independent of the jump time and another matri x which depends only on the jump time. The product of the latter matri x with the jump vector provides a constant jump error state variable. The nondependency of the new failure signature matrix on the jump time and the transformation of the jump error ~tate to a constant provides for the development of a GLR algorithm with considerably reduced computational and storage requirements. The new algorithm uses the recursive least squares approach together with an efficient search technique for locating the maximum of the log likelihood ratio. It avoids much of the updating and storage of large matrices for past observation times. *Research supported by the U.S. Naval Research Office under NONR N00014-80-C-0775. Keywords. Stochastic systems; discrete time systems; nonlinear filtering; state jump estimation; Kalman filters; generalized likelihood ratio algorithm. INTRODUCTION AND PROBLEM STATEMENT
quantities are not changed in the filter by the impulse change in the estimated state. Let x2(k lk) denote the filter's state estimate of x(k) for the case that the impulse jump b.Xq is zero. This estimate satisfies the well-known recursive equation
Let x(k) be the n-dimensional state of a discrete linear stochastic system. Consider a Kalman-Bucy filter whose state estimate x(k lk) of the x(k) undergoes jumps according to
X2(k lk) = K(k) z(k) +
(1 )
b.-~ (k,k-I) x2(k-1 1k-I)
where b. x is an independent n-dimens i ona 1 random q variable which is normally distributed with mean b. Xq and covariance b. Pq and oqk is the Kronecker delta. Here and below we let "+" denote the quantity immediately after the impulse jump. Eq. (1) describes a jump in the filter estimate. Let b.x (k) denote the error that is propagated in the filter estimate x(k lk) due to the impulse b. Xq at time q.
(2)
where b.~
(k,k-I) = [I - K(k) H(k)]
~ (k,k-I)
(3)
In view of Eqs. (1) and (2), the estimate x(k lk) satisfies x(k lk)
=
K(k) z(k) - tlXq Oqk + b.-~ (k ,k-I) x(k-I l k-I)
Let ~ (k,k-1) and H(k) denote the state transition matri x and the measurement matri x, respectively, of the linear system. We denote the m-dimensional measurement vector by z(k). The measurement noise covariance matri x is denoted by R(k) which is bounded positive definite. We let P(k lk), K(k) and V(k) denote the state covariance, the Kalman gain and the predicted measurement residual covariance, respectively, of the above Kalman-Bucy filter; note that these
(4)
The estimate x(k lk) and X2(k lk) are identical for k
where b.x (q) = tl Xq. Using Eqs (2)-(5) we observe that the error b. x(k) satisfies the recursive equation 927
H. L. Stalford
928
f.\ x(k) = 6-Q (k,k-l) t1 x(k-l), k>q
(6)
The measurement equation for .'I X is obtained by expanding the (a posteriori measurement) residual f.\
z ( k) = z ( k) - H( k) x ( kl k)
(7)
Using Eq. (5) the expansion results in 6 z (k) = H( k) 6 x(k) + f.\IJ (k)
(8)
where 6IJ (k) is the a posteriori measurement residual for the case of no jump (i .e., H q = 0) tIlJ (k) = z(k) - H(k) X2(k lk)
(9)
Recall that this latter residual tIlJ (k) is a zero-mean white Gaussian sequence covariance matri x 6 R(k) given by t1 R(k) = R(k) V- 1 (k) R(k)
(1")
We refer to the error t1 x(k) as the .'I -state variable. The iinear equations for the £I -state systems are given by Eqs (6) and (8). Let i x(k Ik) denote the optimal estimate of .'I x(k) as given by a Kalman-Bucy .'I -state filter; the initial conditions are t1~xq and .'I Pg. We denote the .'I -state covariance by t1 P\k lk). Define (11)
Using Eq. (5) we note that x(k lk) is as close to the estimate x2(k lk) as t1 x(k l k) is to the error t1~x (k). The quantity x(k k) is the optimal estimate of the state x(k) for the case when a Kalman-Bucy filter's estimate undergoes impulse jumps described by Eq. (1). Consider the two residuals X2(k lk) - x(k) and i x(k lk) - t1 x(k). Since they are uncorrelated at time q it is easily shown by induction that they are uncorrelated for all k> q. It follows from Eqs. (5) and (11) that the covariance P(k lk) of' x(k lk) is given by P(klk) = P(k \k) + t1 P(k i k)
(12)
Let .'I K(k) denote the Ka 1man ga in for the .'I -state filter. The optimal estimate t1~( k l k ) satisfies the well-known recursive equation
t1o (k,k-l) = [I - K(k) H(k)]
(k,k-l) t1~x(k-l l k-l)
x+ (k) = x ( k) + .'I Xq 6q k
In view of Eqs (4), (7), (11) and (13), the recursive equation for x(k lk) is given by x(k lk) = K(k) z(k) + H (k,k-l) x(k-1 Ik-1) (14)
where K(k) and H (k,k-l) are given by K(k) = K(k) + t1 K(k) [I - H(k) K(k)]
(15)
(17)
Let x(q lq) denote the optimal estimate of x(q) before the jump of Eq. (17). Let P(q lq) denote its covariance. Consider a KalmanBucy filter with initial covariance p(q lq) and initial state estimate xt(q lq) where x2+ ( qlq) =
x(q Iq)
+ .'I Xq
We refer to thi s fi lter .as the "j ump is known" filter. We denote its estimate by x2(kl k) for k>q. Now consider another Kalman-Bucy filter with the same initial covariance P(q lq) but with the different initial state estimate x(q lq) where x( q Iq) = xi(qlq) - t1 Xq
(18)
We refer to this filter as the "jump is unknown" fi lter. We denote its estimate by x(k lk) for k>q. Here as before we let t1 x(k) denote the error that is propagated in the filter due to the error t1Xg. Eq. (1) describes a jump in the fiTter estimate whereas Eq. (17) described a jump in the state. We observe from co~paring Eq. (18) with Eq. (1) that an impulse jump in the state can be approached and solved as an impulse jump in the filter estimate. Eqs. (11) and (12) show that the optimal estimate x of the state x can be computed by means of a two-stage Kalman-Bucy filtering system. The first stage provides x and the second computes i x. In the above development of the .'I -state fi 1ter, the jump time q is assumed known. Below q is an unknown. Below we apply the generalized likelihood ratio (GLR) approach to detect and estimate the jump time. In this paper we address the following discrete linear stochastic system with unknown state jumps at unknown times. (The unknown jump time q is a member of a sequence of unknown jump times qb i=1,2, ... ) The argument (k+l,k) of ~ and the argument (k) of r are omitted. x(k+l) =
(13)
(16)
We are now ready to address the problem under investigation. Consider a discrete linear stochastic system whose state undergoes impulse jumps .
6 x(k Ik) = .'I K(k) .'I z(k) + [I - .'I K(k) H(k)] £1-0
~ (k,k-l)
~ [ x (k)
+ t1 Xq 6qk ] + r w(k) z(k) = H(k) x(k) + IJ (k)
(19) (20)
where w is a r-dimensional process noise vector. The noise sequences w(k) and IJ (k) are zero-mean, independent, white Gaussian sequences with covariances Q (k) 6kj and R(k) 6kj' respectively. The inital state x(O) is nonnally distributed with mean x(O) and covariance P(O). We assume that the system is observable.
929
A Computationally Efficient GLR Algorithm
The problem is to develop a computationally efficient algorithm (using the above twostage filtering process) for optimally estimating the state x, each state jump 6 Xq and each jump time q. As pOinted out above, Eqs. (11) and (12) is a two-stage filtering process for obtaining the optimal estimates, given that the jump time is known. The first Kalman-Bucy filter processes the measurements as if there is no jump. The residuals of that filter are the measurements for the second Kalman-Bucy filter. The Kalman gain of the first filter is used to define the transition matrix of the state to be estimated by the second filter. Also, the predicted measurement residual covariance of the first filter is used in defining the measurement noise covariance for the measurements of the second filter. Such a two-stage filtering system was first discussed in [1] for estimating a constant bias vector which has zero correlatidn with the state vector; the extension to the non-zero correlation case is given in [2]. The Friedland separatebias estimation technique, [1] and [3], is extended in [4] in deriving a three-stage unaugmented filtering technique for estimating piecewise constant controls in discrete linear stochastic systems. The separate-bias estimation technique has been used in deriving algorithms for detecting and estimating failures in linear systems [5]-[7].
N = 1 + 2 Ln/M-I) Ln 2)
(22)
Here, Ln is the natural log function. M=50 we have N=I2.
For
DECOMPOSITION OF FAILURE SIGNATURE MATRIX We transform the time varying 6 -state 6X into a constant 6 -state vector 6y by defining an nxn matrix ~ (k), all k, as
COMPUTATIONAL BURDEN OF ESTIMATING THE JU~lP TIME One of the most attractive and promising methods for detecting and estimating jumps in linear stochastic systems is the GLR method, [8]-[9]. The GLR method processes the residuals from a Kalman-Bucy filter (e.g., the residual 6 z(k)) and computes the maximum likelihood estimates (MLE) of the jump time and the jump magnitude. Using these estimates it evaluates the log-likelihood ratio for jump versus no jump. A jump is declared if the evaluated ratio is larger than a set threshold. The implementation of the GLR requires a linearly growing bank of filters in.order to compute the MLE of the jump time, [10]. The search for the jump time is usually restricted to a moving window of size M, i.e., within the M most recent past data points. A bank of M filters is needed for the moving window case. The GLR estimate of the jump time is that time that maximizes the log likelihood ratio (LLR) scalar function. We let ~ (k;i) denote this function where k is the current time and i belongs to the moving window. This function is given by
~ (k ; i) = 5xi T(k ! k) 6 Pf 1 (k!k) i xi (k !k)
assumption that the jump took place at time i. Executing M such filters at each pOint k of a data set of J pOints requires a large amount of multiplications. For example, let M=50, n=IO, m=3 and J=500. The multiplications required to compute the predicted measurement residual covariances 6V, the Kalman gains 6 K and the update of the state cova ri ances 6 P over the movi ng wi ndow and over the entire data set is greater than M4n 2m J/2 which is 15 million. The amount needed to propagate the estimated state and its covariance between observations for the filters of the moving window over the entire data set is greater than Mn2(n+I) J/2 which is about 14 million. The above two computational requirements are reduced to about two million by (1) employing a particular decomposition of the failure signature matrix (defined below), (2) using the recursive least squares approach, [11], to define the LLR function and (3) instituting the search procedure of the Appendix which uses only N evaluations of the LLR function in locating its maximum where N is given by
(21)
where 6 Xi(k !k) and 6 Pi(k !k) are the output at time k of a 6 -state filter under the
~
( k) = 6-<1> ( k , k-1)
~
(0) = I
~
( k-1 ), k> 0
(23)
The matrix ~ (k) is positive definite for all k; it has inverse ~ -I(k). The 6 -state equation (6) can be rewritten as 6 x(k) =
~ (k) ~ -I(k-I)
6 x(k-I), k>q
(24)
We define a neW 6 -state variable 6y as (25) The transformation of the 6 -state from 6 X to 6y results in the constant 6 -state equation: (26) Ir. view of Eq. (25), the measurement Eq. (8) becomes ~z(k)
= 6 H(k) 6y(k) + 61.J (k)
(27)
where the observation matrix is 6 H(k) = H(k)
~ (k)
(28)
Equations (26) and (27) define the linear 6 -system for the 6 -state 6y. Eq. (25) gives the transformation back to the 6X state.
H. L . Stalford
930
Eq . (8) appears in [9J and [10J in the a priori measurement form y(k) = G(k;p) [, xp + Y2(k), k-.::.p
(29)
where p= q+1, lI lSp = ", (p,q) !'l Xq, G(k;p) is the failure signature matrix and the residuals y (k) and Y2(k) are the predicted measurement residuals y (k) = z(k) - H(k) x(k ! k- 1)
(30)
Y2 (k) = z(k) - H(k) X2(k l k- 1)
(31)
Eqs . (30) and (31) compirre with Eqs . (7) and (9) . In our terminology G(k;p) is given by G( k ; p) = H( k)
(
k , k- 1) 1I-", ( k- 1 , P)
COMPUTATIONAL EFFICIENT GLR TECHNIQUE (32)
[I - K(p) H(p)J where 1I (k-1,p) is the product 1I- (k- 1,k-2) . . . 1I (p+1 ,p)
The matrix G(k;p) depends both on the current time and on the jump time q = p- 1. We decompose it into a product of two matrices . One matrix is independent of the jump time and the other depends only on the jump time . Define the matrices Gi(k), i = 1,2, as (33)
~ (k-1)
G1(k)
H(k) (k,k-1)
G2(P)
~ - l(p) [I - K(p) H(p)J
(34)
We have the following decomposition for G(k;p) G(k;p) = G1(k) G2(P)
(35)
Equation (29) becomes ~ (k) = G1 (k) G2 (p) lI Xp + Y2(k)
(36)
But note that G2 (p) !'l xp = lI y(q) (37) Therefore, Eq. (29) can be rewritten as (38) Eq . (38) is similar to Eq . (27) . One is formu 1a ted in terms of the a pri ori (predicted) measurement residuals and the other in terms of the a posteriori measurement residuals. The transformation as given by Eq . (25) leads to the decomposition in Eq . (35) . Note the similarity between Eqs . (28) and (33) . Using Eqs. (25)-(28) we observe that the failure signature matrix which we have decomposed in Eq. (27) is G*(k;q) = H(k) ~ (k) ~ -l(q) which compares with Eq. (32). " z(k) = G*(k;q) :'l Xq + [,\J (k)
Consider the computational savings reaped by the decomposition (27) . First, we have a constant state equation, Eq . (26) . Second, we do not compute the matrix product ~ (k) ~ - l(i) as one must in Eq . (40) for each i in the moving window. Consequently, we save the 14 million multiplications in the above example (M=50, n=10, m=3 and J=500) since the propagation equations are identities for constant state equations without process noise . However, we do have to compute ~ (k) for each k in the data set. This requires n3 J multiplications which is 0. 5 million for the example . We show below that we can save also the 15 million of the above example .
We present a technique for estimating the jump time. This technique is computationally more efficient than that of operating M filters . We now set up the equations for the recursive least squares approach [llJ . The following matrices C(k;O) and d(k;O) are computed recursively at each time k: C(k;O) = C(k - 1;0) + lI HT(k) lI R-1(k) lI H(k) (41) d(k;O) = d(k-1;0) + lI HT(k) lI R- 1(k) lI z(k) (42) where ll z, lI R and lI H are given by Eqs . (7), (10) and (28) . The initial conditions C(O;O) and d(O;O) are zero matrices. The matrices C(k;O) and d(k;O) are stored at each time k. It is unnecessary to check for a jump at each time k. It is frugal to examine the consistency between the residuals lI z(k) and their statistics hypothesized on no jump . The covariance, conditioned on no jump, of the residual ll z(k) is lI R(k). One may use the consistency test of [12J, that of comparing mean square values of the residuals with !'l R(k). We only check for a jump if the consistency test fails. Assume that the consistency test fails at the time k. We apply the search procedure described in the Appendix for locating the maximum of the LLR function. That procedure requires at most N evaluations of the LLR function where N is given by Eq. (22) . Let i be a time for which Eq. (21) is evaluated in the search procedure. Define C(k;O) C(i;O) + !'l P1 - 1 (i) (43) C(k;i) d(k;i)
(39) That is,
d(k;O)
d(i;O) + lI P1-1(i) f Yi (44)
where (40)
(45) (46)
A Computationally Efficient GLR Algorithm
The quantities lI'Xi qnd II Pi refer to the initial conditions lI Xq and II Pq of the jump. If these are constant over the data set then i xq and lI Pq appear in Eqs. (45) and (46) rather than the time varying quantities. The estimate i Yi(k l k) that would be obtained by operating the lI Y lI -state filter (starting at time i) is identical to the estimate lIy(k;i) which is a solution to the equation: C( k ; i) II'y ( k ; i) = d ( k; i )
(47)
The value of Eq. (21) is identical to R,( k;i) = dT(k;i) lI'y (k;i)
(48)
Note that the matri x C(k;i) is the information matri x and is the inverse of lI Pi(k l k). Let q be the i that ma ximizes Eq. (48) during the employment of the search procedure. A jump is declared at time k for the jump time q provided £(k;q)
(49)
> C
where the value c is chosen to provide a reasonable tradeoff between false and missed alarms. It is only for the case that Eq. (49) holds that the inverse C-1(k;q) is computed. It is never computed otherwise. If Eq. (49) holds we make the definitions lI P(k lk) = ~ (k) C- 1 (k;q) ~T(k) (50) i x(k l k) = ~ (k) lI'y (k;q)
(51)
The optimal estimate of the state x(k) is given by x(k lk) = x(k lk) +
~ (k)
lI'y(k;q)
(52)
with covariance
At this point k we can either (1) compute the optimal estimates using (52) and _(53) and reinitialize the x-filter using p+(k lk) = P(k lk) and x+(k lk) = x(k l k) or (2) initialize a lI y-filter with the initial conditions lI'y(k lk) = lI'y (k;q) and lI Py(k l k) = C-1(k;q) and then operate the lI y-filter over a few data points before reinitializing the x-filter. The option (2) allows additional time for a false alarm to be realized before the irreversible process of reinitializing the x-filter is carried out. We examine the computations needed to execute the above technique. Consider the example: M=50, n=10, m=3 and J=500. Also assume that the consistency test fails 20 times and the threshold in Eq. (49)is exceeded 10 times. Eqs. (41) and (42) are computed 500 times. The multiplications required to compute Eqs. (41) and (42) over
931
the set is less than 0.3 million. Eqs. (43)(48) are computed less than 20 times N. Recall that N=12 for M=50. The requirement for Eqs. (47) and (48) is less than 0.1 million for the data set. In many applications the initial condition EXi is zero and lI Pi is taken as infinity. For such cases no multiplications are required in computing Eqs. (43) and (44). For the case that lI Pi-1 and lIxi are nonzero, Eqs. (43)-(46) require at most 2n 2(n+1) multiplications; this results in a requirement for about 0.5 million multiplications over the data set. If, in addition, we had to compute the inverse lI Pi-1 from lI Pi there would be an additional requirement for 0.4 million . The requirement for computing C-1(k ;q ) is insignificant since it is computed only 10 times. In summary, less than 1.5 million multiplications are required for the entire data set, even for the extreme case that lI Pi-1 must be computed at each evaluation of the LLR function. This is considerably less than the 15 million needed to update ll Pi(k lk) and compute lI Vi(k) and lI K·(k) for each i in the moving window of M Ka~man-BuCY filters over the entire data set. APPENDIX The following procedure provides a means of estimating the jump time by evaluating Eq. (21) or Eq. (48) no more than N times where N is given by Eq. (22). We are taking advantage of an interesting property of the log likelihood ratio (LLR) function. We say that a scalar function is a "G" type function if it has a global maximum and if it is monotonically increasing function on the left side of the ma ximum and a monotonically decreasing function on the right side. The LLR function is a "G " type function which is superimposed with noise. Define the function K(n) = (M-1)/2 n for all integers n. Let II t be the time interval between time points. Let a(l) and c(l) be the left and right endpoints, respectively, of the moving window of size M. Let b(l) be the midpoint. (Here, and in what follows when one of the a,b,c's falls in between two time points it is immaterial whether they are set to the larger or the smaller of the two times.) We evaluate Eq. (48) at each of the points a(l), b(l) and c(l). In general, for n=2,3 ... , define b(n) to be the maximizing one of the three a(n-1), b(n-1) and c(n-1). Al so, define a(n) = b(n) - K(n) II t and c(n) = b(n) + K(n) lit . Evaluate Eq. (48) for a(n) and c(n). (If any of the points a(n) and c(n), n=2,3, ... , fall outside the moving window, the value of Eq. (48) is defined to be "minus infinity.") Let j be the smallest integer such that K(j)
932
:1]
H. L. Stalford
points and it evaluates Eq. (48) at fifty percent of the closest surrounding twenty pOints. But note that N=21 for M=1025.
[6JChang, C. B. and K. P. Dunn. (1979) . On GLR detection and estimation of unexpected inputs in linear discrete systems . IEEE Trans. Autom. Control, AC-24, 3, 499 - 501.
REFERENCES
[7JCaglayan, A. K. (1980) . Simultaneous failure detection and estimation in linear systems. Proc . 19th IEEE Conf. on Deci sion & Control, 3, 1038-1041.
Friedland, B. (1969). Treatment of bias in recursive filtering. IEEE Trans. Autom. Control, AC-14, 4, 359-367.
[8J~1cAulay,
:2J Ignagni, M. B. (1981). An alternate derivation and extension of Friedland's two-stage Kalman estimator. IEEE Trans. Autom. Control, AC-26, 3, 746-750. ~3JStalford,
H. L. (1982). A Friedland-like filtering technique for estimating piecewise constant controls in discrete linear stochastic systems. Proc. 1982 American Control Conference, Arlington, Virginia.
[4JFriedland. (1969). Treatment of bias in recursive filtering. IEEE Trans. Autom. Control, AC-14, 4, 359-367. [5JFriedland, B. and S. M. Grabousky. (1980) . On detecting sudden changes (failures) of biases in linear dynamic systems. Proc. of the 19th IEEE Conference on Deci~& ~ontrol, 1, 1043-1043.
R. J. and E. Denlinger. (1973). A decision-directed adaptive tracker. IEEE Trans. Aerospace Electronic System, Vol. AES-9, 7, 229 - 236.
[9JWillsky, A. S. and H. L. Jones. (1976) . A generalized likelihood ratio approach to the detection and estimation of jumps in linear systems. IEEE Trans. Autom . Control, AC-21, 5, 108-112. [10JJazwinski, A. H. (1970) . Stochastic Processes and Filtering Theory, Academic Press, New York. pp . 205- 207 . [llJJazwinski, A. H. (1969). Adaptive filtering. Automatica,~, 2, 477478.