Copyright (£l IFAC Algorithms and Architectures for Real-Time Control, Palma de Mallorca, Spain, 2000
PROBABILITY ESTIMATION ALGORITHMS FOR SELF-VALIDATING SENSORS. A. W. Moran, Dr. P. O'Reilly and Prof. G. W. Irwin
Intelligent Systems and Control Group, School of Electrical and Electronic Engineering, The Queen's University of Belfast, Ashby Building, Stranmillis Road, Belfast. UK BT9 SAH
Abstract: Three alternative approaches are investigated for probability estimation for use in a self-validating sensor. The three methods are Stochastic Approximation (SA), a reduced bias estimate of this same approach by Nairn and Kam (NK) and a method based on the Bayesian Self-Organising Map (BSOM). Simulation studies show that the BSOM based method gives superior results when compared to the NK algorithm. It has also been demonstrated that the BSOM method is more computationally efficient and requires storage space for fewer variables. Copyright© 2000 IFAC Keywords: Probability, Sensors, Stochastic approximation, Gaussian distributions.
1. INTRODUCTION
mocouple (Yang and Clarke, 1997), a DOx sensor (Clarke and Fraher, 1996), and a coriolis massflow meter (Henry, 1996) and (Henry, 2000). One self-validating approach was described by (Yung and Clarke, 1989) and also implemented on a thermocouple system (Moran et al., 2000). This involves developing a parametric model of the sensor output during fault-free operation. This model is then used as an inverse filter to generate an innovations sequence which should be white noise (Ljung, 1987) if the model constitutes an accurate description of the sensor output. Any change to the statistics of the innovations sequence can be related to the occurrence of sensor faults (Upadhyaya, 1985).
The use of smart instruments in industry has increased over recent years as engineers have taken advantage of the added features that they offer over more conventional "dumb" alternatives. These devices are able to apply corrections to the raw measurement in order to provide such features as outputs in engineering units and compensation for thermal drift (de Sa, 1988). This has generated research interest in instruments that can not only compensate for some undesired physical property, such as non-linearity, but that can also detect and, more importantly, counteract internal faults. Such types of instruments have been termed "Self-Validating" (Henry and Clarke, 1993), in that they are able to provide an indication as to the validity, or confidence, in the measured value and also allow an indication of the health of the instrument to be generated. Such an instrument must therefore be able to extract more than just the process measurement from the sensor output. Various self-validating instruments have been developed including a ther-
Previous work by the present authors has demonstrated how a change to the variance of such an innovations sequence can be detected, both in simulation (O'Reilly, 1998) and using data from a practical temperature system (Moran et al., 2000). The detection method involved is a Likelihood Ratio test for two hypotheses, the null hypothesis, Ho, where there has been no change and the test hypothesis, HI, where a change has occurred. The
111
probabilities of HI and Ho were found using online stochastic approximation (SA). (Naim and Kam, 1994) have shown that probability estimates, produced by stochastic approximation, will only be unbiased if the decision on the alternative hypotheses is error free, i.e. the probabilities of false alarm and missed detection are zero. This situation is unlikely to occur in practice and the fact that errors occur in the decision process may then lead to biased probability estimates.
had not changed, the test hypothesis, HI, that a change had occurred due to a sensor fault. The probabilities that a given data set, d, asserts, or refutes, each of these two hypotheses can be calculated. The decision, as to which of the two hypotheses is more likely, can be expressed as the Likelihood Ratio, L: L = p(djHd
(1)
p(dIHo)
The trend towards instruments with higher levels of sophistication means that sensors are COnstructed with an increasing amount of local processing power. However, the computing power of anyone instrument will still be limited and so the best use must be made of the available resources. It is therefore desirable that the on-board algorithms required for self-validation should be efficient both computationally and in their storage requirements.
The decision is now one of determining whether this Likelihood Ratio is above or below a threshold value, T. if L > if L ~
T
T
then HI is accepted; then Ho is accepted.
(2)
If the threshold, T, in (2) is calculated with reference to PHD and PHI (the probabilities of Ho and Hd, then the Bayes risk can be minimised:
(ClD - COO)PHO T=-'-..,.---.,....--'-:::-
The aim of this work is to devise an improved probability estimation algorithm suitable for online implementation in the self-validating sensor test-bed described in (Moran et al., 2000).
(3)
(COl - Cll)PHI
If the costs of making a correct decision (Coo and Cl l ) are zero and the costs of making an incorrect decision (ClD and Cod are equal, the Likelihood Ratio test can be simplified to:
Naim and Kam's (1994) method for generating reduced bias estimates relies On producing an estimation of the bias and then using this to correct the probability estimates. This paper will confirm that the technique described does indeed lead to estimates with reduced bias but that similar, if not superior, performance can be achieved by means of a very simple and computationally efficient method based on the Bayesian Self Organising Map (BSOM).
If L
>
If L
~
pPHO ,then HI is accepted; HI
(4)
PHD, then Ho is accepted. PHI
3. THE ALGORITHMS
The decision theory background is outlined in Section 2 which describes the requirement for accurate probability estimates. Section 3 presents the reduced bias stochastic approximation approach of Naim and Kam (referred to here as 'NK') together with an approach based on a Bayesian Self Organising Map (referred to as 'BSOM').
To use (4) it is necessary to know the probabilities of the two hypotheses. Although the exact values may not be known in practice, they can be estimated on-line by a recursive stochastic approximation (SA) method (Naim and Kam, 1994). Thus: jj{k) _ p(k-l)
A simple illustrative example is described in section 4, which also contains the main comparative results for the three approaches SA, NK and BSOM. The implementation aspects for a realtime self-validation sensor are discussed in Section 5 while section 6 contains conclusions and future work.
+ .!. (
HI k p(k) _ 1 _ p(k) HO HI HI -
U
(k) _ p(k-l)) HI
(5)
Pi:;
Here: is the estimate of PHI at time-step k and u(k) is the binary value of the decision at timestep k (i.e. u(k) = 1 if HI is accepted and u(k) = o if Ho is accepted). As can be appreciated, there is a need for accurate estimation of PHI and PHD and (5) will only give unbiased estimates if the decision process is error free such that the probabilities of false alarm and missed detection are zero. Since this is unlikely to occur in practice, other methods will be required to provide the probability estimates. The two methods for generating these probability estimates will now be described.
2. DEFINING THE PROBLEM The decision process described for detecting a change to the innovations variance used a Likelihood Ratio test together with dual hypothesis testing (Moran et al., 2000), (O'Reilly, 1998). The null hypothesis, Ho, assumed that the variance
112
3.1 Reduced Bias Estimates
In the application described here, the means and variances of the source data are known. It is then only necessary to adjust the priors of each of the two kernels. Note that in this application the priors are equivalent to the probabilities. Since the problem to be solved here is that of estimating two probabilities, the method described by (Yin and Allinson, 1997) simplifies even further as only two kernels are involved.
Nairn and Kam (1994) were concerned with distributed detection whereby the decisions of a number of local sensors were transmitted to a central data fusion centre. The decision process for each sensor used an on-line stochastic approximation method for determining the probabilities of two hypotheses. This method of probability estimation is employed here.
The BSOM based method is as follows:
The Nairn and Kam (NK) algorithm for reduced bias estimation is as follows :Step 1:
Set initial conditions, i.e. randomly select a value for P(H1 ) within the range 0 to 1.
Step 2:
Calculate the decision threshold, T
Step 3:
=
(1 -. : .
P(H1 )(k) ) P(H1 )(k}
Step 2:
Calculate the weighted output for each of the two kernels. Pi(x) = Wi . Pi (X!J.Li , ai), i = 1,2 (11)
Step 3:
Calculate the output over the sum of the weighted kernel outputs:
Calculate the Likelihood Ratio of the two pdfs for the scalar input value d:
2
P(x) =
(7) Step 4:
= { 1, L 2: 0, L <
T
(8)
T
Update the probability estimate according to,
+~ (U(k) - B(Hd(k-l} - P(Hd k- 1 »)
et
1
- f5(H 1 )(k) P;';} k = k
+ 1,
-
(
et,
(13)
and go to
10 ) 1000+ k
(14)
4. ALGORlTHM PERFORMANCE For the three cases described below the innovations variance data was simulated as the mixed output from two Gaussian noise sources with equal variances. The means were chosen to give a reasonable overlap of the two probability density functions and for each case different values for P(H1 ) and P(Ho) were chosen.
Approximate the bias in P(Hd by,
(1 _P(H )(k») p~k}
Adjust adaptive gain, step 2. (k) _
(9)
Approximate the probabilities of false alarm and missed detection, PF and PM, respectively.
B(H1 )(k) =
Step 8:
Update the weights:
(k) (k-l) wi = wi + (k-l) w i(k-l) . Pi(k-J) _ (k-l) ] et W. [ P(x)
Step 7:
(12)
1
P(H1 )(k) = P(H1 )(k-l)
Step 6:
L Wi . Pi (XIJ.Li , ai)
Determine the binary decision u(k}
Step 5:
Set the initial conditions. Since the probabilities are unknown, randomly select a value for P(HI) (~ Wi) within the range 0 to 1.
(6)
L = p(dlHd p(dIHo) Step 4:
Step 1:
(10)
Go to step 2
A change to the variance of the innovations sequence could be indicative of a change to the time constant (Yung and Clarke, 1989) and (Moran et al., 2000).
3.2 Bayesian Self Organising Map
Yin and Allinson (1997), described how an arbitrary probability density function can be approximated using a mixture of Gaussian kernels. These kernels each have three parameters that can be adjusted during training, the mean, variance and priors. These are all adjusted at each time-step by way of an adaptive gain, et, that is set to decrease with time.
As an illustration of the performance of these three algorithms the Figures (1), (2) and (3), below, show the development of the estimate of P(Hd, for a true value of 0.7. For all cases the means and standard deviations of the noise sources are identical" J.Lo = 0.0, J.Ll = 4.0, 170 = 0"1 = 1.58.
113
Bias (actual): P(H 1 ) - P(Hd - this is the mean of the final values of the bias, B(H 1 ), over 200 runs.
0.95
~ 09
~o"r\
Case One
1 oar ""'--~-------------1 .rOo75 01
...-
P(H1 )
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - --
0.65 0
1000
2000
JOOO
otOOO
,
5000
6000
7000
8000
9000
= 0.3, J..Lo = 0.0, J..Ll = 4.0, ao = al = 1.58
10000
Table 1. True P(H 1 ) = 0.3
Fig. 1. Estimate of P(H 1 ) using the SA algorithm.
SA
095
f
0_9
r~ t~--~--=-=-=-::-:_o-:_:-:_:-:_~_-=_-=_-=-_ =-_=-=-O~~-d 0.1
=__
1000
2000
XIOO
4000
5000
,-
6000
7000
eooo
9000
0.95
J:~~~W!-''-..... .~ 0.65 0
0.2570
0.3109
0.2987
P(HJ) : SD
0.0411
0.0199
0.0101
Bias (est):
0.0352
01271
Bias (actual):
0.0430
-0.0109
0.0013
Case Two
D."
0.7
P(HJ): mean
Here the SA approach did indeed give a biased estimate for P(H 1 ), while the NK method reduced it. However, the bias from the BSOM algorithm was appreciably lower again and the standard deviation of the probability estimate was approximately halved.
10000
Fig. 2. Estimate of P(H 1 ) using the NK algorithm for reduced bias estimation.
~
BSOM
-=-_-=-_
- - - - - - -
065 0
NK
P(H1 )
v.-,~vr.,t>f""""v-.-..-.",....-'""\.,~,,
- •.
1000
2000
JOOO
4000
,sooo-
Table 2. True P(H1 ) = 0.5
- - - - - - •
6000
7000
8000
= 0.5, J..Lo = 0.0, J..Ll = 4.0, ao = al = 1.58
9000
10000
Fig. 3. Estimate of P(H 1 ) using the BSOM algorithm.
SA
NK
P(HJ): mean
0.5005
0.5004
0.4991
P(HJ) : SD
0.0134
0.0103
0.0116
Bias (est):
At first glance the performances appear to be very similar. The SA algorithm definitely shows a biased estimate of the probability, the final value being approximately 0.73. The NK algorithm does give a more accurate estimation, however a bias is still evident after 10,000 data points. The BSOM probability estimates actually are "noisy" as the probability estimates, in fact the priors, are updated directly from the data, which is in effect a noise source. What is evident, on closer inspection, is that the estimate from the NK algorithm converges to the true value, 0.7, fairly slowly. By comparison the BSOM estimate converges within 1,000 data points.
Bias (actual):
0.0000
-0.0009
-5.37e-4
-3.79e-4
BSOM
9.48e-4
In this example the SA and NK algorithms both produced very good results. However, this is a special case because with equal probabilities for the distributions and the means being at 0 and 4, the probabilities of false alarm and missed detection were equal which results in the bias actually being zero. The BSOM results were not quite so good, but the standard deviation of the probability estimate was comparable to the previous case. Case Three
P(H1 )
These figures are for only one run and a clearer indication of the performance of the algorithms can be seen from the results of Monte Carlo simulations of 200 runs, each of 10,000 data points.
= 0.7, J..Lo = 0.0, J..Ll = 3.0, ao = al = 1.58 Table 3. True P(Hd
= 0.7
SA
NK
BSOM
In tables 1, 2 and 3:
P(Hl): mean
0.7129
0.6953
0.6989
P(Hd: mean - this is the average of the final values of P(Hr) over the 200 runs.
P(HJ) : SD
0.0133
0.0104
0.0101
Bias (est):
-0.0145
-0.0667
Bias (actual):
-0.0129
0.0047
P(Hd: SD - this is the standard deviation of the final values of P(H 1 ) over the 200 runs.
0.0011
Once again the performance of the SA algorithm was bettered by the NK one as would be expected. The results for the BSOM algorithm were better still.
Bias (estimate): - these are the estimated biases for the SA and NK algorithms as given by equations 11 and 16 in (Naim and Kam, 1994). Note that there is no equivalent estimation of the bias for the BSOM algorithm.
In general it was found that the BSOM approach was the most consistent of the three methods, with
114
very similar standard deviations for the estimate of P(H1 ) in all three cases.
algorithm. The result for the SA algorithm is included as an indication of its simplicity.
5. DISCUSSION
6. CONCLUSIONS
The NK algorithm in (9), (10) is fairly complex. The recursive part comprises seven distinct stages. To calculate the Likelihood Ratio, L, the outputs of two distributions, assumed to be Gaussian, must be calculated at each iteration. In order to find the false alarm and missed detection probabilities, PF and PM respectively, it is necessary to carry out integration in order to find the area under the probability density function curve. Since this is a non analytical process, it is necessary to use some other method such as the error function or a look-up table. This obviously adds computational complexity to the algorithm, if the error function is to be coded, or an additional storage requirement should a look-up table be used.
It has been shown that accurate probability es-
timates can be found by a method based on the Bayesian Self-Organising Map. The accuracy of the method has been compared to that of the Reduced Bias Estimate algorithm described by (Nairn and Kam, 1994), and has been shown to be superior. It has also been demonstrated that the method is more computationally efficient and requires storage space for fewer variables. Further work will be to extend the work in simulation, shown here, in an effort to implement the method as part of the test-bed system described in (Moran et al., 2000).
For the BSOM algorithm, however, the recursive part comprises of only four steps. This is much simpler than the NK method and the computational load is, consequently, much lighter. In fact, the only thing that the two algorithms have in common is the need to calculate the outputs of two Gaussian distributions at each iteration. However, in the BSOM case this is a major part of the algorithm and not just the calculation of a subsidiary value as for the NK algorithm.
7. ACKNOWLEDGEMENT
The first author would like to thank the Department of Education for Northern Ireland (DENI) for financial support.
8. REFERENCES Clarke, D. W. and P. M. A. Fraher (1996). Modelbased validation of a dox sensor. Control Engineering Practice 4(9), 1313-1320. de Sa, D. (1988). The evolution of the intelligent measurement. Measurement and Control 21, 142-144. Henry, M. P. (1996). Programmable hardware architectures for sensor validation. Control Engineering Practice 4(10), 1339-1354. Henry, M. P. (2000). A self-validating digital coriolis mass-flow meter (1): Overview. Control Engineering Practice. Henry, M. P. and D. W. Clarke (1993). The self-validating sensor: Rational, definitions and examples. Control Engineering Practice 1(4),585-610. Ljung, 1. (1987). System Identification: Theory for the User. Prentice-Hall. London. Moran, A. W., P. G. O'Reilly and G. W. Irwin (2000). A case study in on-line intelligent sensing. In: Proc. of the American Control Conference. Nairn, A. and M. Kam (1994). On-line estimation of probabilities for distributed bayesian detection. Automatica 30(4),633-642. O'Reilly, P. G. (1998). Local sensor fault detection using bayesian decision theory. In: UKACC International Conference on Control. pp. 247-251.
It is worth pointing out that the increased complexity of the NK algorithm also gives rise to an increased number of program variables for which storage space would be required. This is especially true if the error function were not coded but replaced by a look-up table. For an embedded system, the efficient use of storage space is necessary as it is often limited. As an indication of the computational complexity of the algorithms used a simple test was carried out. The test comprised of timing the execution speed of each algorithm for a Monte Carlo simulation of 200 runs, with each run comprising of 10,000 iterations. The algorithms were each encoded as 'c' programs and run as DOS programs under windows 95™. Table 4. Algorithm timings
Execution time emS)
SA
NK
BSOM
9.79
49.75
15.39
The actual execution times are not particularly important, it is the relative difference between the results that is of interest. Although these results are only a rough indication of algorithm performance, they do show that the BSOM algorithm is approximately 3.2 times faster than the NK
115
Upadhyaya, B. R. (1985). Sensor failure detection and estimation. Nuclear Safety 26(1), 32-43. Yang, J. C.-Y. and D. W. Clarke (1997). A selfvalidating thermocouple. IEEE Transactions on Control Systems Technology 5(2), 239253. Yin, H. and N. M. Allinson (1997). Bayesian learning for self-organising maps. Electronics Letters 33(4), 304-305. Yung, S. K. and D. W. Clarke (1989). Local sensor validation. Measurement and Control 22, 132-141.
116