Analytica Chimica Acta, 162 ( 1 9 8 4 ) 2 5 3 - - 2 6 2 Elsevier Science Publishers B.V., A m s t e r d a m -- Printed in T h e Netherlands
A KALMAN F I L T E R F O R CALIBRATION, E V A L U A T I O N OF UNKNOWN SAMPLES A N D Q U A L I T Y C O N T R O L IN D R I F T I N G SYSTEMS Part 2. Optimal Designs
P. C. THIJSSEN
University of Nijmegen, Faculty of Sciences, Department of Analytical Chemistry, Toernooiveld, 6525 E D Nijmegen (The Netherlands) University of A msterdam, Laboratory for Analytical Chemistry, Amsterdam (The Netherlands) (Received 27th March 1984)
SUMMARY It is well k n o w n that the o p t i m a l design for linear calibration is found at the e x t r e m e s of the c o n c e n t r a t i o n range. F o r the evaluation o f u n k n o w n samples, the precision is m a x i m a l at the m e a n value of the calibration standards. If the static m o d e l is replaced and stochastic behaviour of t h e parameters with e x t e n s i o n to drift is assumed, o n l y m i n o r differences on these generalizations are obtained. T h e intelligent analyzer described h e r e integrates online procedures for calibration, evaluation, quality c o n t r o l and o p t i m i z a t i o n .
In order to evaluate u n k n o w n samples, it is usually necessary to calibrate an analytical instrument. Little attention has been paid to the arrangement o f the calibration standards, the usual practice being to distribute the experimental design uniformly. Only if a particular model can be assumed from prior information results on suitable designs will be obtained. For a linear calibration graph, the optimal design comprises replicate standards at t w o points, which are chosen such that they occur at the extremes of the limited concentration range [1]. For a quadratic graph, the centre of the working range should also be f o u n d [2]. More generally, when a polynomial model is used, the optimized calibration pattern is f o u n d by taking as many uniformly distributed points as there are parameters [3]. This is analogous to the practice when no m o d e l can be assumed from prior information. An extensive review on optimization of regression designs is available [4]. In the analytical literature, various criteria have been investigated for the optimization procedure. Aarons [2] maximized the final e n t r o p y (i.e., the determinant of the covariance matrix) which is directly related to the precision o f the estimated parameters. Identical results are obtained by using the imprecision function of an evaluated sample as criterion. Eckschlager [5] introduced this function in information theory in order to derive some practical relations such as the effect of the sensitivity, concentration range 0003-2670/84/$03.00
© 1 9 8 4 Elsevier Science Publishers B.V.
254
or noise variance. For the evaluation of u n k n o w n analytes, a maximal a m o u n t o f information is located at the mean value of the preliminary calibration standards, which is of importance for the choice of the concentration range. Recently, a recursive formulation was suggested for the optimal design c o m p u t a t i o n in least-squares problems with information theory [6]. A similar formulation may be derived for the Kalman filter based on a state space model. Here, the information yield is divided into two parts: a loss of information caused by the prediction and a gain of information as a result of the filtering. For optimal performance of the Kalman filter, the information yield has to be maximized in the measurement sequence. When the theory proposed in this paper was applied to a calibration problem, i.e., a drifting system involved with dynamic and system noise [7], only small differences were obtained, compared with the traditional results for a linear calibration graph. Although only the calibration problem is investigated, the presented theoretical framework should combine rules-oft h u m b on optimization into organized knowledge for a variety of analytical problems, such as sampling and process control. THEORY Problem
statement
A linear dynamic system is described by the stochastic difference equations x(k) = F ( k ) x ( k
--
1) + w(k -- 1)
(1) (2)
z(k ) = ht(k )x(k ) + v(k )
where z ( k ) is the measurement and x(k) is the n-dimensional state vector. The n-dimensional system noise vector w(k -- 1) and the scalar measurement noise v ( k ) a r e defined as gaussian white-noise sequences with zero means and covariance matrices Q(k -- 1) and R(k), respectively. The state x(k) is normally distributed and independent o f the noise, initialized by x(0) with mean i ( 0 / 0 ) and covariance P(0/0). Moreover, the n , n transition matrix F(k) and the l * n measurement vector ht(k) have to be known deterministically. When the state model (1, 2) and the noise covariances Q(k -- 1), R(k) are assumed to be known, the filtered estimates f ~ ( k / k ) m a y be determined by the Kalman filter ~(k/k
--
1) = F ( k ) i ( k -- 1 / k - - 1)
(3)
P(k/k
--
1) = F(k)P(k -- 1 / k - - 1)Ft(k) + Q(k -- 1)
(4)
f~(k/k) = ~(k/k -- 1) + k ( k ) { z ( k ) P(k/k)
= P(k/k
k(k) = P ( k / k
--
-- ht(k)f~(k/k
1) -- k ( k ) h t ( k ) P ( k / k
-- 1)h(k)~ht(k)P(k/k
--
--
--
1)}
1)
1)h(k) + R(k)} -~
(5) (6) (7)
255 where i ( k / k - - 1) is the estimate of x(k) given the measurements z(1), z(2), ..., z ( k - - 1); k(k) is a n * l filter gain vector. P(k/k -- 1) and P(k/k) are the error covariance matrices of the difference between x(k) and i ( k / k -- 1) respectively x(k) and i ( k / k ) , providing statistical measures of the uncertainty in the predicted or estimated state. The covariance matrix P ( k / k ) is affected by the experimental design, in particular the transition matrix F(k ), measurement vector h(k) and the noise covariances Q(k -- 1) and R(k). The problem of interest is how to select optimal elements in these terms in order to acquire the best performance of the Kalman filter. Information
theory
For a continuous variable x with a multidimensional probability density p(x), the e n t r o p y is defined as H = --
p(x) Id (p(x)} dx
(8)
where dx stands for the volume element. Assuming linear state dynamics and gaussian distributions for the noise terms and the state, the resultingprobability density function for the Kalman filter is also gaussian with the mean i ( k / k ) and covariance matrix P ( k / k ) [8]. As a result the e n t r o p y o f the n-dimensional normal density is H(k) = 1/2 ld {(2~e)" I P ( k / k ) l }
(9)
The information yield I(k) = H(k -- 1) -entropy, obtained by processing the next The total information yield I t o t ( k ) = H ( 0 ) start (k = 0) up to the measurement k, is information yields o r I t o t ( k ) = Z~= 11(i). Straightforward formulation produces for
H(k) is defined as a change in measurement in the sequence. - - H(k -- 1), acquired from the the summation of all individual the information yield
I(k) = 1/2 ld {IP(k -- 1 / k - - 1 ) l / I P ( k / k ) l } = 1/2 l d { I P ( k - - 1 / k - - 1 ) ] l / I F ( k ) P ( k - - 1 / k - - 1)] F t ( k ) + + 1/2ld{ht(k)P(k/k
= Ipre(k) + Ifil(k)
-- 1)]h(k)lR(k)
Q(k -- 1)1}
+ 1}
(10)
The information yield of the Kalman filter separates into two parts; one from the prediction and another resulting from the filtering equations. The system noise covarianee Q(k -- 1) has a direct bearing on the magnitude of the covariance matrix. Less obvious is the effect of the dynamics, i.e., the transition matrix F(k), on covariance behaviour. Usually the covariance matrix or the uncertainty in the state is increased by the prediction equations (Eqns. 3, 4). The corresponding loss of information is represented by Ipre(k ). If the state is not measured, the covariance matrix increases to
256
infinity. Consequently, the total information yield Itot(k) and also the information loss Ipre (k) will tend to zero. The filter equations (Eqns. 5--7) reduce the uncertainty gradually b y means of the information I~u(k) supplied by the new measurement. This term weights the covariance matrix P(k/k -- 1) with the measurement vector h(k) and noise variance R(k) and should be positive. It may be noted that this part does n o t involve the calculation of determinants. If the system noise covariances are n o t zero, the covariance matrix reaches a steady state where an equilibrium is achieved between the information loss Ipre(k)and the information gain Iest(k). For every measurement processed, less information I(k) is gained and the total information Itot(k) will converge to a fixed level. Finally, no more information can be obtained by the Kalman filter. The optimal design problem is essentially a sequence of maximized experimental choices with the information yield as a criterion. As a result, the total information yield will also be maximized [6]. For optimal performance, the elements o f the noise covariances Q(k -- 1) and R(k) should be minimized in order to obtain maximal information Itot (k). If R(k) equals zero, an infinite information yield I(k) is acquired. For a zero system noise covariance matrix, the total information yield Itot(k ) which can be obtained in the sequence is infinite. In statistical terms, this means that the Kalman filter knows the state x(k) exactly. More complicated are the effects of the elements in the transition matrix F(k) and measurement vector h(k), which depend strongly on the problem investigated. Questions like observability and stability o f the model involved have to be considered first to guarantee the existence and uniqueness o f an optimal solution. It should be noted that the model structure used embodies in principle all intuitive rules on optimization in estimation problems. As an example, this will be illustrated for the drifting calibration system.
A drifting calibration system For a linear calibration graph with stochastic drifting parameters, a discrete linear dynamic system (1, 2) has been derived [ 7]
{ a ( k ) \ /'Io b(k)}_~O
10 1 0
~c~(k)l-~O
0 1
\~(k)l
0 0
\0
1) •
I b ( k -- 1)
~a(k--1) \13(k--1)/
+
w2(k -- 1)
w3(k--1) w4(k--1)
(11)
z(k) = (c, 1, 0, 0 ) . i/a(k) \ + v(k)
/b(k)I The state of the system is determined by measuring standards with known concentrations, that are processed by the Kalman filter. The predicted state
257 is employed for the evaluation of u n k n o w n analytes. After each measure~ ment, a quality control algorithm decides whether to calibrate again or to process the n e x t unknown. The criterion used is based on a preselected precision of the expected results for the evaluation. The decision to recalibrate is followed by selecting which one of the available concentration standards gives the best performance o f the Kalman filter. If the noise covariances are assumed to be constant, the concentration c is the only variable that can be manipulated experimentally. This variable is involved in the filtering part but n o t in the prediction part of the information yield. Only the information yield Inl(k) has to be maximized for an optimal selection, because Ipre (k) is constant. If the dual logarithm is o m i t t e d and R(k) is assumed to be constant over the entire graph, the decision on which concentration should be used to calibrate the system is given by maximize h t ( k ) P ( k / k - - 1 ) h ( k ) / R ( k )
+ 1
(12)
or maximize c 2 p l l / R + 2 c p 1 2 / R + p 2 2 / R + 1 where Pii is the i , j t h element o f the matrix P. This equation has a m i n i m u m at c = --P12/P11. Because of the limited concentration range, the optimal position in the calibration graph is always f o u n d at one of the extremes. Which of the extremes will be chosen depends on the covariance matrix P ( k / k - - 1), which is influenced by the elements of the system noise covariance Q(k -- 1). If the matrix Q(k -- 1) equals the zero matrix, the highest and lowest concentration will be repeated. The effect of the concentration range on the covariance matrix is difficult to show. However, if the optimal positions are f o u n d at the extremes of the working range, there may still be better concentrations outside this range. Consequently, the concentration range o f the calibration standards should be maximized. In summary: I~(k) is increased if the noise variance R(k) decreases; I~,(k) is increased if the concentration range increases; and Ira(k) is maximal at one o f the extremes o f the working range. For the imprecision o f the evaluated unknowns ~(k) = [z(k) -- b(k)]/~(k), it is possible to derive the corresponding information yield Iun (k) Iu~(k) = 1/2 ld (var(Co)/Var[~ (k)]) = 1/2 Id ( v a r ( C o ) ~ ( k ) : / ( h t ( k ) P ( k / k
- - 1)h(k) + R(k)}}
(13)
with vat(c0) equal to the variance of the evaluation before the measurement is processed. If nothing is known in advance, var(c0) should be set to 1 initially. If the evaluation o f the sample is based on more measurements, var(c0) is the precomputed variance. From Eqn. 13 it follows that: Iun(k) increases with increasing sensitivity a(k); Iun(k) increases with decreasing noise variance R(k); Iun(k) increases if the concentration range is increased; and Iun(k) has a m a x i m u m somewhere within the concentration range. The m a x i m u m is located at the mean
258 value of the used concentration standards, if system noise and drift are not involved. Further, the resulting variance will become smaller for each additional measurement used for the evaluation of an u n k n o w n sample. This means that the total information concerning the sample increases. Because of the dynamics and system noise involved, the covariance matrix in the sequence increases and the acquired information will become smaller. However, the information gained should be at least zero. All these results differ only slightly from the traditional conclusions on optimal designs in calibration problems. F o r higher-order calibration graphs, similar results can be obtained. In the appendix some equations are given for a linear calibration graph w i t h o u t drift and system noise. RESULTS
The applicability of the theory presented here is demonstrated on the same simulated example as employed earlier [7]. A Kalman filter was used to process the calibration measurements. The unknowns were evaluated by means o f the predicted state of the filter. The quality control algorithm was implemented to decide when to re-calibrate and thereafter to select the concentration standard giving the best performance, by means of the optimization algorithm. For the optimization procedure, the system may choose freely from the available concentration standards 1, 2, 3, 4 and 5. Figure 1 shows the situation where the samples are used either for calibration or for evaluation by means o f the quality control algorithm. The critical imprecision Nc~t = 5% was used. Figure l(a) depicts the measurements including drift and noise and Fig. l ( b ) shows the concentrations. The measurements and the concentrations are the data used b y the Kalman filter to estimate and predict the state of the drifting system (Fig. lc). The performance of the quality control algorithm to decide whether to take an u n k n o w n sample or to calibrate again is shown in Fig. l(d). When the maximal imprecision Nmax exceeds the preset limit Ncnt = 5%, re-calibration is initiated. The system starts with a minimal n u m b e r of four calibrations with t w o different concentrations which are chosen suitably for observability. Figure 2 shows some results o f the optimization procedure. Figure 2(a) depicts the information function for the decision on which concentration gives the best performance. When the system is observable {k /> 4), a minim u m is f o u n d somewhere in the concentration range. Therefore the best concentration is always found at one of the extremes; this is confirmed by Fig. l(b). When the system is n o t observable (k <~ 4), there is no minimum. However, the system still decides to start calibrating twice with the highest concentration available, followed by taking twice the lowest concentration standard. Hereafter, the lowest and highest concentration are alternated for re-calibration. Figure 2(b) shows the information yield found for the evaluated constituents in the calibration graph. Somewhere in the concentration range, a m a x i m u m is found, as it should be. Figure 2(c) depicts
259
............. .........
.......
.-=.
:.:-;--.:-:~
.:::-'_:.=........... ..::::::::.: ::::::::::::::::
......... ....... .......
::r:~ :::~.::~:~
.-::.-:-:-:::-:.-::
. . . . . .
":: :'-:E'-Z:Z-~
.... :::::::-::::::r:: ........... :::::':-"
P~ •~
0~
~~ o =0
---.:---~=_.==~.---.............
---:-':2
...........
:::::::::.-:::
.~ .......
:::::
:::::::--'~!
-;.:.7"~ ...............
:
:
:
:
1
1
=
1
',
1
:
:
:
:::::: -:::::::::
i
I
i
I
1
I
I
]
I
I
I
I
I
I
I
~
Q;
I
P.
o
<--
-~
o~
.......-:::::::::::::
T
..... ::::::: o_.:::::::: ........ :::::::::::::: ....... :::::::::::r:-.:::::.; ........... :::::::::-:: •
. . . . . . . .
.......
: : : : : : : : : : : : :
::-.::.:.:~L:-:.:.:.=.---~
.... :::::::::::--:-:::::: ........................
.................. :::1::::
..............:-:-:.:::::1
.~J
......:-.::o-:.::::::: ...... :::::::: ..........::::::::1 ::::::::::::::::::::::::: ................. :::::::: -:::.:::::::: ................. :::::::::
1
l
',
:
1
1
(~)z ~-- ~
1
1
~
1
1
~
:
~
,'.
:
"
i
m
f
0
~
~
260
u
-T
A'~
/,,Z/" I.!/ /z,:/
~-H 0
~.-~ ~
m O
O
I
I
I
i
I
I
I
i
I
I
I
I
I
I
It~
• uol i~aql
~ I
L~o
t
~
I
t
I
~
I
1-°~
al
--,--:t m
/,.,.. ii;~
.~
.=;
N --=c--Z
/
~°
i
I
I
I
I ea
I
I ~
I
I ~
I
I e~
I
I e~
I
I
I
I
:
. . . . . . .
I
261
the resulting information yield of the integrated system of calibration, evaluation, quality control and optimizatiom A positive information value is found when the system re-calibrates. The gain in information is negative when the system evaluates unknown samples. In addition, the total information yield, Itot(k), acquired by the system is given; this is forced to remain above a fixed level by the control procedure applied. The effect of various system noise covariances Q33 and Q44 is shown in Fig. 2(d) for both the normal and the optimal selection procedures. For the normal procedure, the system re-calibrates twice, initially with the most and least concentrated available standard. When the optimal selection procedure is used, there is a small profit of 0--15% on the number of required calibrations. For different system noise covariances, the savings are similar. Conclusions A method is presented for on-line optimization of experimental design in a Kalman filter based on information theory. The information yield comprises a loss caused by the prediction in the sequence and a gain arising from the filtering of the new measurement. For optimal performance of the Kalman filter, the information yield has to be maximized in the measurement sequence. In general, the noise covariances should be minimized. More complicated are the effects of the transition matrix and the measurement vector, which depend greatly on the problem investigated. For a drifting calibration system, only small differences are obtained in comparison with the traditional results for the linear graph. If the optimization algorithm is integrated in the procedures of calibration, evaluation and quality control, then there is a small profit on the number of calibrations required.
This work was supported by the Netherlands Research Organisation Z.W.O. Special thanks to Prof. Drs. G. Kateman and Dr. H. C. Smit for evaluating the manuscript. APPENDIX Results f o r a linear c a l i b r a t i o n g r a p h Yi = a c i + b ; i ffi 1, 2 . . . . . k, w i t h o u t drift a n d s y s t e m noise. I f c o n s t a n t noise variances R i are a s s u m e d a n d t h e m e a n value c = 1/k~,~=lci is d e f i n e d for k calibrations, t h e r e s u l t i n g e n t r o p y H o f t h e e x p e r i m e n t a l d e s i g n is d e s c r i b e d b y H = 1 / 2 ld{[(2~e)21k]
[RlY.~=I(c i _ ~)2])
T h e i n f o r m a t i o n yield, Ifil, ~Ipr e = 0~ for a c o n c e n t r a t i o n ~ t o b e c h o s e n f o r c a l i b r a t i o n is Ifil = 1 1 2 / d { [ ( ~ - - ~ ) 2 1 Z ~ = l ( C i __~)2] + l / k + i~ T h e i n f o r m a t i o n yield, Iun , f o r t h e e v a l u a t i o n o f an u n k n o w n s a m p l e ~ = (z i -- b)/~ w i t h v a t ( c 0 ) ffi 1 is
fun = --I/2 Id {(R/a 2) {[(~-- ~)2/~=
1(ci -- ~)2 ] + I/k +
It m a y be noted that IfiI has a m i n i m u m equations hold only for k ~ 2.
1}}
at ~" = b and Iun has a m a x i m u m
at ~ = b. Both
262 REFERENCES 1 C. K. Shukla, Technometrics, 14 (1972) 547.
2 L. Aarons, Analyst (London), 106 (1981) 1249. 3 V. V. Fedorov, Theory of Optimal Experiments, Academic Press, New York, 1972. 4 D. M. Steinberg and W. G. Hunter, Technometrics, 26 (1984) 71. 5 G. Eckschlager, Collect. Czech. Chem. Commun., 37 (1972) 137. 6 P. C. Thijssen, G. Kateman and H. C. Smit, Anal. Chim. Aeta, 157 (1984) 99. 7 P. C. T hijssen, S. M. Wolfram, G. Kateman and H. C. Smit, Anal. Chim. Act,a, 156 (1984) 87. 8 A. Gelb (Ed.), Applied Optimal Estimation, MIT Press, Cambridge, MA, 1974.