Fault detection and isolation in an unsupervised learning environment

Fault detection and isolation in an unsupervised learning environment

Pattern RecognitionLetters 15 (1994) 235-242 North-Holland March 1994 PATREC 1177 Fault detection and isolation in an unsupervised learning environ...

506KB Sizes 1 Downloads 38 Views

Pattern RecognitionLetters 15 (1994) 235-242 North-Holland

March 1994

PATREC 1177

Fault detection and isolation in an unsupervised learning environment A r v i n d Srinivasan a n d Celal Batur * Department of Mechanical Engineering, Universityof Akron, Akron, OH 44325-3903 USA

Received 27 November 1992;revised 1 September 1993 Abstract

Srinivasan, A. and C. Batur, Fault detection and isolation in an unsupervised learning environment, Pattern Recognition Letters 15 (1994) 235-242. A new approach to detect and isolate faults in dynamic systems is proposed. System parameters are identified by suitable recursive identification techniques. Faults in the system are detected when changes in the estimated parameters occur. The convergence of the identified parameters is determined by performing three different statistical tests on the residuals in a moving window of fixed length. The converged system parameters are classified using decision directed clustering which operates in an unsupervised learning environment. The main advantage of the method is that the algorithm does not require the relationship between the actual physical parameters and the estimated system parameters. Also, the clustering algorithm does not require a complete knowledge of all possible faults and their effects on the identified system parameters. Finally, the algorithm is used to detect faults in a position control system.

I. Introduction The existing Fault Detection and Isolation (FDI) techniques are summarized in Willsky (1976), Isermann (1984) and Frank (1990). In this paper, we consider FDI algorithms that make use of the input-output representation of systems (Isermann, 1984). These algorithms estimate the system parameters and the resulting estimates are used to determine the changes in the actual physical parameters. These changes are used to detect and classify faults. This approach has the following limitations: (a) Establishment of a relationship between the estimated system parameters and the physical parameters may be difficult due to non-uniqueness of these relations.

(b) Automation of the decision, based on the changes in the physical parameters is difficult even with the use of an expert system. In systems where such automation is possible, the diagnostic procedure may not work with a new combination of physical parameters. Some of these limitations can be overcome by directly using the estimated parameters for FDI. This technique has been attempted by Vachtsevanos et al. (1990) using fuzzy logic. The major motivation of our paper is to propose an FDI algorithm which is based on the changes in the estimated process parameters. In particular, we introduce an algorithm that can detect new faults or operating conditions.

* Corresponding author. 0167-8655/94/$07.00 © 1994--Elsevier ScienceB.V. All rights reserved SSDI 0167-8655 (93)E0040-U

235

Volume 15, Number 3

PATTERN RECOGNITIONLETTERS

2. Proposed algorithms This algorithm contains three major steps: a parameter estimator, transition zone detector and classifier. The basic idea is to use the estimated parameters directly to perform FDI. In the next three sections, the individual steps are described.

2.1. Parameter estimator The following observable model structure is assumed to represent the system on which FDI is to be performed.

y(t) = A ( z -~)y(t) + B ( z - l - ~ ) u ( t ) + C-'(z-l)e(t)

( 1)

where u and y are the process input and output, z - 1 is the delay operator, l is the dead time index of the process and e(t) is the white noise. A, B, C are polynomials of z - ~ describing the dynamics of the process and the noise. Different identification methods can be used depending on the C polynomial. If C ( z - ~) = 1, the recursive least squares algorithm can be used and in cases when this is not true, other techniques such as generalized least squares, extended least squares, instrumental variables and m a x i m u m likelihood can be used. These techniques are extensively described in Ljung and Soderstrom (1983) and Ljung (1987).

2.2. Transition zone identifier There are two instances in which the identified parameters will go through a transition zone: one, during the start-up and two, during the occurrence of faults. The estimated parameters in the transition zone do not model the actual system due to lack of convergence. This transition zone must be found and should not be used for fault decisions though such a zone might signify a change in the system status. The residuals in a moving window of fixed length and the trace of the covariance matrix are used to detect the transition period. The residual is the estimate of the equation error e(t) in (1) and in the case of the least squares, it is given as

~(t)=y(t)-A(z-~)y(t)-B(z-~)u(t) 236

.

(2)

March 1994

When the system estimates are in the transition period, the residual has the following characteristics: ( 1 ) The residual is not white as assumed initially, resulting in a non-zero auto-correlation. (2) The residual may have non-zero mean causing bias in the estimate. (3) The sum of squares of residuals in the moving window may increase in magnitude. The beginning and the ending of the transition zone are identified by devising suitable statistical tests to detect the above characteristics. Similar tests are also described in Basseville and Benveniste (1986). The whiteness of the residual can be tested through an auto-correlation test (Soderstrom and Stoica, 1989) which is based on the fact that the estimate of the covariance function, as defined below, is zero except at z=0:

l N~'t" re,(r)= ~ t ~ l e(t+r)~(t), z>.O, N = window length.

(3)

The residual in the moving window is considered white with a confidence level of 95%, when the following whiteness test is satisfied:

r,°(T)|

r,, ( 0 ) x / ~ ~<1.96.

(4)

The mean level of the residual can be tested under the hypothesis that the residual has zero mean. With a confidence level of 95%, this test takes the following form: ~-~<

1.96

(5)

where s is the sample variance and Y is the sample mean of the residuals within the moving window. During the transition period, the sum of squares of residual r, d 0 ) increases. A user specified threshold or is used to detect the transition zone as indicated by the following conditions: r~,(0) < a ~ .

(6)

The trace of the inverse of the covariance matrix,

P, is proportional to the gain of the recursive least squares algorithm (see Eq. ( 19 ) ). This trace will decrease with every added data point, resulting in a reduced gain. Thus a threshold on the trace of matrix

Volume 15, Number 3

PATTERN RECOGNITIONLETTERS

P, ae, can be used to judge the convergence characteristics. Note that we are not necessarily checking if the estimates are actually converging to the true values. What is performed here is to check if the estimates are not changing significantly. This discussion leads to the following criterion for determining the end of the transition zone: trace(P) ~
(7)

The gain of the identification algorithm may reduce rapidly and after which the estimates do not vary significantly. This makes the identification algorithm slow and sluggish to sudden changes in the system dynamics. In order to overcome this problem, either covariance resetting or exponential forgetting can be used (Astrom and Wittenmark, 1984). While using exponential forgetting, it is difficult to detect the transition zone as it tends to worsen the statistical properties of the residual. These problems can be solved by the covariance resetting but rules are needed to determine the times at which the covariance matrix has to be reset. In this application, when a transition region is found using the method described as before, the covariance matrix is reset. This allows the identification algorithm to 'wake-up' and follow the changes in system dynamics much more rapidly.

2.3. Fault classifier

March 1994

learned through a set of representative examples. However, the classifier using this learning scheme cannot recognize new patterns, which in our application domain, represent a new fault or a different operating point. In unsupervised learning, properties of each cluster are determined as the process proceeds. Therefore new faults or new operating conditions can be detected on-line. One of the classification algorithms that operates in an unsupervised environment is the decision directed clustering.

2.3.1. Decision directed clustering In this algorithm (Young and Calvert, 1974), each fault or pattern is described by the mean of the cluster mk, the covariance matrix Rk and the number of patterns in the cluster nk. The total number of clusters or faults is denoted by Kc. The classification process starts after having resolved i patterns into Kc(i) clusters with mk(i), Rk(i) and nk(i) as the parameters for the kth cluster. When a new pattern xi is observed, the following parameter is calculated

Qk(Xi) =[xi--mk(i)]TRzl(i)[xi--mk(i)l

(8)

This is essentially the square of the weighted distance between xi and the mean of the kth cluster. We define Qj as the minimum value of Qk as

Qj(xi)
.

k = l , 2 ..... Kc(i) .

(9)

If Qj is less than a predetermined threshold A, i.e.,

Qj(xi)
(10)

then it is decided that x~ belongs to thejth cluster and the parameters corresponding to this particular cluster are updated as

nj(i+ l ) = n j ( i ) + l , mj(i+l)=mj(i)+

1

nj(i)+~ [x,-mj(i)]

,

1 R9( i + 1 ) = Rj( i ) + nj( i ) +~-----~ F nj(i) L n ~t l T

1 (x,-mj(i) )

× (xi-mj(i) )T--Rj(i)].

(11) 237

Volume 15, Number 3

PATTERN RECOGNITION LETTERS

All other estimates remain unchanged. If, for a prefixed threshold B > A,

Qj(xi)>~B

(12)

then a new cluster is generated which consists of xi only. The mean vector is simply xi and a predetermined covariance matrix is assigned to the new cluster. The number of clusters is increased by one. If xi falls into the guard zone, i.e.,

A
(13)

then this pattern is temporarily stored for later processing. This will guard against generating unnecessary clusters.

3. Discussion The proposed algorithm can be tailored to the problem by manipulating the covariance resetting criterion, the window length N, the threshold values aT, ae and finally, the region parameters A and B. The proposed algorithm may indicate a change in operating conditions even if there is none (false alarm), or it may miss a change altogether (missing fault). The false alarm and missing fault rates depend strongly on the parameter estimation scheme and the effective window length. For example, if the window length is large, the algorithm may not respond to faults fast enough due to the strong effect of the past data. On the other hand, windows with small lengths will increase the variance of the identified parameters, therefore delaying the transfer of the identified parameters to the clustering algorithm. The statistical properties and the convergence rate of the identification algorithm have similar effects on the false and missing alarm rates. If the residuals have non-zero mean, the estimates are biased and therefore not reliable to detect faults. Unbiased but high variance estimators will also cause delayed response because the procedure may not send the estimates to the clustering algorithm. The alarm rates and the time to decision properties of the proposed algorithm are difficult to establish analytically. This is due to the fact that these properties are determined by the convergence rate of the recursive identification algorithm which depends strongly on the noise characteristic (Soderstrom et al., 1978). 238

March 1994

The decision region parameter A is related to the confidence interval for the estimated process parameters. Following Ljung (1987), this interval can be written as Pr{ [ (0-0o)X~P -1 (O-00) ] >X2(2n)) = a

(14)

where O and 0o are the estimated and the true parameters, ;~ is an unbiased estimate of the noise variance and a is the confidence level of the Z z distribution. Following Eq. (14), we can find soft bounds such that the probability of true values falling within an ellipsoid

[(O-Oo)X~P-'(O-Oo)] =Z2 (2n) will be greater than ( 1 - a ) . In Figure 1, we illustrate a two-dimensional case where each ellipsoid shows the soft bounds for 0o= [a b] a"that corresponds to a particular fault. I f faults are such that these ellipsoids intersect as in the case of I and J, no value of the threshold parameter A will be able to separate these faults. However, if there is no intersection then the value of A should be selected such that the clustering algorithm will create a new fault if and only if the estimate falls outside the range of the ellipsoid. These ellipsoids strongly depend on the confidence factor and furthermore they cannot be constructed a priori. Therefore it is decided to tune A to the problem at hand.

4. Fault detection and isolation in position control system In order to demonstrate the algorithm, we present an example on the fault detection in a DC motor

Fault i ~ ' ~

Fault 1

,

_.___.Z.~~

b

.

-7/~ r --)--_ Fault 2 ~'!~_ ~ " .

.

.

I

Fault J

~-~at we ~ ~fexpect the true 1value ~ fall with ~ I(1-~ )% confidence

Figure 1. Scattering of estimated parameters for a two-dimensional case.

Volume 15, Number 3

PATTERN RECOGNITION LETTERS

based servo system. The schematic diagram of the DC motor used to control the position of a shaft is given in Figure 2. Ignoring armature inductance La, the resuiting transfer function between the armature voltage and the position of the output shaft can be obtained as ~o(S) Vs(S )

_

K1

(15)

S(TmS-'~ 1 )

where K 1 and Zm are related to the physical parameters such as the armature resistance Ra, the back EMF constant, the inertia J and the viscous friction coefficient F. ~Uois the position of the output shaft and Vs is the applied armature voltage. The block diagram of the position control system is given in Figure 2. The resulting closed loop transfer function between ~Uoand ~i is

~lo(S ) ~//i(S) -- 17mS2"~- (1

KI K3 +K1K2)s+KIK3

"

(16)

The numerical values for K,, K2, K3, are chosen such that the system response has certain desirable characteristics including no overshoots. This system is simulated for various faults. Sampled input-output data is obtained every 0.02 seconds. Since the transfer function in (16) is of second order, the sampled input-output system can be described as a secondorder discrete time model as ~'o(k) = a l ~ o ( k - 1 ) + a z ~ o ( k - 2) + b , ~ui( k - 1 ) + b2 ~/~(k- 2) + e(t) (17) where al, a2, bt, b2 are the model parameters to be identified and e(t) is the equation error. In order to obtain consistent estimates of the parameters, a per-

v~

i••

March 1994

sistently exciting input signal is needed. This is achieved by using a pseudo-random binary signal ( P R B S ) , so that the input ~i is perturbed slightly about the operating point with a variance of 0.2. Different operating conditions (including faulty situations) are simulated by using appropriate values for the physical parameters and these in turn change the values of the system parameters. These are identified using a recursive least squares algorithm. The model structure is assumed to have a form as given in ( 17 ). Defining the following quantities

0 x= [al

a2 bl b 2 ] ,

~ T ( k ) = [ ~ o ( k - 1) ~/o(k-2) ~/i(k- 1 ) ~ui(k-2) ] , the least squares estimate can be written as (~(k+ 1 ) =O(k) + P(k)~o(k+ 1 ) [ y ( k + 1 ) - ~oT(k - 1 )G(k) ] [ ~ T ( k + 1 ) P ( k ) ~ ( k + 1)+ 1] (18)

P(k+ 1 ) P ( k ) ~ ( k + 1 )~V(k+ 1 )P(k) = P ( k ) -- [ t p T ( k + 1 ) P ( k ) ~ ( k + 1 ) + 1 ] (19) where P is the inverse of the covariance matrix. A window length of 100 samples is used for the purpose of transition zone detection. This window length corresponds to about 6 time constants of the nominal system. The detection time is about 20 seconds which corresponds to the length of the moving window. Hence, the selected window length has sufficient data to observe the changes in the system and also the time

Motor & Load Ra La

03 i

v_ I- °l~f- I s(~s+l) L I I c°°*~°"~ I ' ~ F~back Position Feedback

,

I

Figure 2. Armature controlled DC motor and the block diagram of the control system. 239

Volume 15, Number 3

PATTERN RECOGNITION LETTERS

Sum of squares of { { residuals, y. e2 I

Threshold

ao kq 8

/,,

Decision from { the classifier {

i Transi

March 1994

region

I

4

/

.

3

r..) [3

1[30

2[30

300

400

500

600

700

No. of samples Figure 3. Sum of squares of residuals and decision from the classifier for fault (a).

[ Sum of squares of residuals, Z e2 x 10-3

] Decision from I the classifier

"I

/

11 I[3 9 e'l~O

Z

8 7

,

,



300

~

Transition region_

1

6 100



6oo

No. of samples Figure 4. Sum of squares of residuals and decision from the classifier for fault (b).

to detect faults is satisfactory. Although three tests are proposed to determine the convergence of estimated parameters, we found that if the model is unreliable, the error variance is a more robust test than the mean and the covariance tests. We therefore used only the sum of squares of residual test. We choose a threshold of 0.01 on the sum of squares of residuals. This number is obtained as a good trade-off between the false alarm and missing fault rates through simulations. Once the sum of squares of residuals in the window falls below the threshold aT and the trace of the covariance falls below the threshold %, the esti-

240

mated parameters are sent to the classifier for faults classification. Initially, the clustering algorithm has no knowledge of any class. Hence, when it receives the first set of input, it classifies it as a new class and a class number 1 is assigned. However, apart from giving the class number, the classifier does not give any additional information. So, it is necessary to find out the actual physical fault and store the information with the class number. This is needed whenever a new fault or operating condition is encountered. However, it is not necessary to have knowledge of all possible faults

Volume 15, Number 3

PATTERN RECOGNITION LETTERS

Sum of squares of residuals, Z

March 1994

Decision from the classifier

0.014 0.013

I

0,012

~creasedfrictio~

c ~ 0,011

Threshold

0.01

M

3

o.oo9

7

2 0.008 0.007

aN, r-/b I

Nominal s~te~

0.006 0

100

200

400

300

500

600

700

No. of samples

Figure 5. Sum of squares of residuals and decision from the classifierfor fault (c).

while designing the FDI system, rather, new faults are determined on-line. In order to verify the proposed algorithm, the following faults are created:

also simulated and are successfully determined as class 3, class 4 and are shown in Figures 4 and 5.

5. Conclusion

(a) no speed feedback (K2=0), (b) reduced armature resistance (Ra= 3.2 ~ to 1.6 f~), (c) increased friction in the bearing (F=0.001 to

0.0O5 ). Figure 3 shows the variations of the sum of squares of residuals in the moving window and also the corresponding output from the classifier. The initial transients are discarded and as a result of this, the sum of squares of residuals is below the threshold at the beginning of the graph. Also, the clustering algorithm classifies the present status as class 1 (nominal system). At point 'a' in Figure 3, a fault is created by making K2--0 which corresponds to no speed feedback. As seen in Figure 3, the residuals go above the threshold showing the possibility of a status change and hence the estimated parameters are not sent to the classifier. At point 'b' of Figure 3, the estimated parameters meet the convergence criteria. The clustering algorithm now creates a new class 2. The advantage of unsupervised learning can be seen in the ability of the system to classify new unencountered situations. Two more faulty situations (b), (c) are

Fault diagnosis is essential to maintain a high level of performance in control systems. In this paper, a new approach to detect and isolate faults using the input-output model is proposed. System parameters are estimated using suitable identification methods and the converged estimates are classified by a decision directed cluster which operates under an unsupervised learning environment. This technique eliminates the need for relations between the estimated system parameters and the actual physical parameters. Also, the clustering algorithms isolate the fault by making use of the converged estimated parameters. The convergence of estimated parameters is determined by the three statistical tests and the trace of the covariance matrix. The learning ability of the classifier enables the algorithm to detect and isolate new faults. The proposed algorithm is successfully applied to classify faults in a position control system.

References Astrom, K.J. and B. Wittenmark (1984). Computer Controlled Systems. Prentice-Hall, EnglewoodCliffs, NJ. 241

Volume 15, Number 3

PATTERN RECOGNITION LETTERS

Basseville and Benveniste, Eds. (1986). Lecture Notes in Control andlnformation Sciences. 77, Springer, Berlin. Frank, P.M. (1990). Fault diagnosis in dynamic systems using analytical and knowledge-based redundancy - A survey and some new results. Automatica 26 (3), 459-474. Isermann, R. (1984). Process fault detection based on modeling and estimation methods - A survey. A utomatica 20 (4), 387404. Lippmann, R.P. (1987). An introduction to computing with neural nets. IEEEASSPMagazine, April, 4-22. Ljung, L. (1987). System Identification: Theory for the User. Prentice-Hall, Englewood Cliffs, NJ. Ljung, L. and T. Soderstrom (1983). Theory and Practice of Recursive Identification. MIT Press, Cambridge, MA.

242

March 1994

Soderstrom, T., L. Ljung and I. Gustavson (1978). A theoretical analysis of recursive identification methods. Automatica 14, 231-244. Soderstrom, T. and P. Stoica (1989). System Identification. Prentice-Hall, Englewood Cliffs, NJ. Vachtsevanos, G., H. Kang, I. Kim and J. Cheng (1990). Managing ignorance and uncertainty in system fault detection and identification. Proc. IEEE Symposium on Intelligent Control, 558-563. Willsky, A.S. (1976). A survey of design methods for failure detection in dynamic systems. Automatica 12, 601-611. Young, T.Y. and T.W. Calvert (1974). Classification, Estimation and Pattern Recognition. Elsevier, Amsterdam.