Copyright @ IF AC Advanced Control of Chemical Processes Pisa, Italy, 2000 '
ON THE STATISTICAL PROCESS CONTROL OF MULTIVARIATE AUTOCORRELATED PROCESSES
E. Kaskavelis*, E. B. Martin,* A. J. Morris* and P. Jonathan**
*Centrefor Process Analytics and Control Technology Merz Court, University ofNewcastle, NEl 7RU, UK ** Statistical Services, Shell Global Solutions, Cheshire Innovation Park, P.o. Box 1, Chester, UK.
Abstract: This paper examines how the dynamic information present in a multivariate set of data can be used to improve the performance of a monitoring scheme of the multivariate mean. Optimum performance of such a scheme requires the use of the covariancelcorrelation matrix of the time-lagged variables. In the general case, this would require variables to be lagged with respect to one another. A Statistical Process Control scheme is described where the Mahalanobis distance is monitored based on optimally lagged variables. The benefits of the method are demonstrated using Monte Carlo simulations. Copyright © 2000 IFAC Keywords: Multivariate; Statistical process control; Auto correlation; Time Lag
the cross-correlations between this variable and the others. If all the maximum cross correlations occur at lag 0 then there are no time delays present and no time lagging is needed. However if the maximum cross-correlation occurs at a lag, other than zero, then the variables require to be lagged to ensure the covariance matrix has the minimum determinant.
1. INTRODUCTION
Statistical techniques for monitoring the mean of univariate and multivariate processes have been well described in the literature and successfully applied in a number of industrial applications and process simulations. Extensions of such techniques to cases where autocorrelation is present in the data have also been examined (e.g. Harris and Ross (1991); Montgomery and Mastrangelo (1991).
Wachs and Levin (1999) proposed the use of a matrix of 'quality' variables that were assumed to be aligned in time (i.e. the maximum cross correlations occurred at lag zero). Each process variable was then time aligned with this 'quality' matrix. An iterative search was then carried out using different lags until the lag that minimised the determinant of the covariance matrix was found. The above procedure was repeated sequentially for each individual variable, thus identify the optimum lag of each new variable.
Applying multivariate statistical projection techniques (Principal Components Analysis, Projection Latent Structures) for the off-line analysis of industrial data, and for the implementation of online monitoring schemes, requires the original data matrix to be first time lagged to accommodate the time delays in the process, thereby resulting in an optimum representation of the covariance matrix. One approach is to identify a key input or output variable, as the reference variable, and then calculate
Implementing the above technique off-line requires the elimination of some samples from the original
165
data matrix to obtain a time aligned matrix. The modified data matrix forms the nominal data set and a process representation can then be developed. If a new data set is also to be analysed off-line, then lagging the variables according to the a-priori calculated time delays is straightforward.
Mathematically let X(nxp) be a matrix of n samples and p variables and assume that Xc ' C E [1, .., p] is the reference variable. A vector of time delays can be constructed according to:
However, the on-line implementation of such a scheme requires a different approach. Assume that on-line monitoring starts at time zero when no previous samples are available. Thus in order to lag a variable according to its predetermined optimal delay, samples equal to the value of the optimum time lag need to have become available. This situation may be undesirable since in the situation where the maximum time delay is large, then the process will initially operate without any monitoring up to the point of the maximum time delay. This can result in the failure to detect process faults or mean shifts that might have occurred during this period. Thus there is a need for a methodology that will enable operators to conunence process monitoring from process start-up.
This vector is used to optimally time lag the variables. The maximum value of the elements of the above vector, 'tmax=max(Topt}, defines the maximum number of samples that are required to be available for the data to be optimally time shifted. Furthermore for any given time point T::;;T max a new vector can be defined, by which the available data is to be Jagged:
Toprargrnax(corr(Xc(·),xi(-+t» i=l, ... ,p
(1)
TT,Opt=argmax(corr(Xc(·),Xi(o+t» I=l ,.. p t=O,.:t
(2)
The concept of partial lag is illustrated using Monte Carlo simulations.
2. THE PROCESS
The process used to investigate the partial lag approach consists of a simple bivariate system with two auto- and cross-correlated stationary variables:
A partial lag approach is proposed to address this problem It uses the information from time point zero until data corresponding to the optimal lag becomes available. Fig. 1 shows a cross correlation function of two variables where the maximum cross correlation occurs at time point 't"# O. This example is used to describe how an SPC scheme based on the partiallag concept works. Suppose at time point 1, the cross correlation is greater than the cross correlation at lag O. In this case the variable will be lagged by one sampling unit. Subsequently at time point 2 the data will be lagged by one, if the cross correlation at this point is greater than the cross correlation at both time points 0 and 1. If not, then the previous lag will be retained.
X
I
= aX 1-1
+ e xl
(3)
Yt+ t opt =J3Y t+t opt -1 +eyt+t opt where e = text
eyt+topt}- N(O, V) and
V
U~J
(4)
=
The above process corresponds to two AR( 1) processes that are cross-correlated in time since cov(ext ,eyt+T... } = O. The maximum cross-correlation between X and Y occurs at time Topt
Top<
where
= argmax(corr{XI'Xt+t}} and 't is the time delay of
Y with respect to X at any given time point. Assuming, without loss of generality, the error variances and are equal to unity and the mean,
(J;
(J;
variance and covariance of Xt and Yt are given by: E(X,)= oand var(X) = 1/(I-a 2)
(5)
E(Y,)= oand Var(y,)= 1/(I-p2)
(6)
cov(X .. Yl+t
...........
...
)= afJE(X'_IY ' H ... -1)= _0_ l-afJ
(7)
Fig. 1. Cross correlation function of two variables when optimum occurs at't = 6
Therefore the covariance matrix 1: of variables X and Y with maximum cross-correlation at lag 'topt is:
This approach can be used until samples corresponding to the optimum lag, 't = 6 in Fig. 1, become available. After this point the variable can be lagged using this maximum time delay, thereafter the data matrix corresponding to the maximum crosscorrelation will always form the basis of the analysis.
However the covariance of X and Y will in general be a function of the lag between them. In the case
166
where the lag is optimum ('topd then equation (8) holds. When the lag between the variables is suboptimal (i.e. 'topt-'t) the covariance between the two variables is given by:
The general form of the becomes:
~opt
matrix therefore
(10) 1 _
are set to 2 or 3. From equations (5) and (6), it was seen the variance of the autoregressive process depends on the value of a and is greater than unity when a o. Therefore using limits based on the white noise case is inappropriate and results in an increased number of false alarms since the process variance is not considered. Fig. 2 shows the on-target average run length (ARLO) for a Shewhart chart as a function of a using the limits based on the white noise case. It is clear that correct variance scaling of autocorrelated data is important to avoid increased false positive rates.
*
4OO~-------T--------~------~
~ 2
300~----~-+--------+-------~
It is clear from Equation (10) that ~opI is a limiting case of ~t. At 't 'topt, the data are optimally lagged and the covariance matrix is that ~t which has minimum determinant since ne (0,1) . At the start of the process, r-O, no variable time lag can be implemented thus the covariance of the variables is a minimum.
200~-------+~~----+-------~
=
ARU
l00~-------+----~~+-------~
0.3 AR(I)c_
(11)
Yt + t ... = I3Yt + t ... -I + e-yt+t...
(12)
0.9
Fig. 2. On target ARL for Shewhart Chart for different values of a using white noise limits
Since the aim of the work is to assess the performance of multivariate SPC schemes with respect to event detection and false alarm rates, a second test data set was generated. This new set has the same time series structure as described in Equations 3 and 4 but the mean of the X variable has been shifted. The equations are now:
Xt = aX t_1 +e xt +~
0.6
However even after scaling the X variable by its variance, the different values of the autocorrelation coefficient n will still have an effect on the on-target and off-target ARLs in an SPC scheme set up to monitor the mean. This can be illustrated by considering the Mahalanobis distance of the X 2 variable E(Mah) =E(X 2)/ a~ that is distributed as a X with one degree of freedom. The confidence limits are given by the standard X2 distribution.
where e - N{O, V) .
In the cases examined, the value of ~ is a multiple of the standard deviation of X. To obtain a shift of k process standard deviations in the mean of X, the value of ~ to be added to each time point is:
~=k(l-a)/.J1-a2
(13) 0.1
0 .2
OJ
O.
05
0..
V.... olab X YMIIbII>
3. THE EFFECT OF AUTOCORRELATION Fig. 3. Ratio of (ARLa./ ARLO) for the univariate case ('*'95%, '0'99% and '~199 . 73%)
Before applying the partial lag approach to the simulation, the impact of autocorrelation on the detection abilities of an SPC scheme are examined. The aim in this study was to focus on the effect of autocorrelation on a defined process and not to carry out a comprehensive review. Thus only a few simulated results will be presented to illustrate the univariate case. The generalisation to more than one variable is straightforward.
Fig. 3 shows the ratio of on-target ARL for different values of n, ARLa, to the on-target ARLs for three different cases (95%, 99%, 99.73%). These limits correspond to ARLO for the white noise case of 20, 100 and 370 respectively. It is clear that the ARLn for the autocorrelated data increases with increased autocorrelation and as the confidence limits become narrower. For high values of n, the ARLa increases
In the white noise case n = 0, the standard deviation of X is equal to unity and thus the control chart limits
167
by a factor of 2 or 3 with respect to the ARLO for the white noise case.
Consider the case where ~ =I; O. By substituting Equations (5) to (10) into Equation (15), the E(Mah), is given by:
The higher values of ARLa can be explained by considering the distribution of false alarms for the white noise and the autocorrelated case. Since the data follows a multivariate normal distribution, the number of points lying outside a specified limit should be the same for any value of a, to within sampling variation. On target ARLs are based on the first sample detected to be out of the limits. For a higher value of a, e.g. a>O.6, the first false alarm will occur, on average, later than the fIrst false alarm for the white noise case. However, increased autocorrelation leads to an increased probability of successive false alarms. In contrast for weak autocorrelation there is a low probability for the occurrence of successive false alarms.
E(Mah)= 2 ~(l +a) (l-a~)2 + (I-a) (l-aPY -(I-a 2XI-p2 ~2a~Topt-'t)
It is clear that the Mahalanobis distance will change as a function of time due to the parameter 'to Two cases are considered based on Equation (16):
No mean shift in the test data set, ; = 0: - In this case 2 E(Mah) is equal to 2 (degrees of freedom from a X distribution) irrespective of the values of't and u. Mean shift in one ofthe variables in the test data set, ; ;f. 0: - In this case E(Mah) will change according to Equation (16) for different values of't. In Fig. 4, the theoretically calculated distances from Equation (16), are given. The values of the parameters used were a=0.7, P=0.8, 0=0.8, 'topt= 6 and k E [0,1,2] for shifts of 0, 1 and 2 standard deviations in the mean of variable X.
4. MULTIVARIATE SPC IMPLEMENTATION The aim of this section is to illustrate how the technique of partial lagging can help improve the detection abilities of a scheme for monitoring the mean of multivariate process. A number of schemes for lagging are considered using the simulated process described in Section 2.
4
2
No time log: In this case the cross correlation information at each time point is not utilised and the data are not lagged.
0
.
Partial log: Lagging of both the training and test data sets as samples become available. Once the maximum time delay is reached, the optimum lag between the two variables can be obtained and subsequent samples are lagged by 'topt.
4
0
"'---
---
4
•
•
Fig. 4. E(Mah) distance with respect to lag 't ('-': No shift, '+' 1 std shift, '*' 2 std shift)
E(Mah) =
COV(XI'Yl+t)]-I(_Xt)J (14) Var(Yl+ Yl+ t )
~
/'-
"'~
LagofV wIlh.-pect1o X -t.'6A5
The general formula for E(Mah) when 't samples are available and assuming X = 0 and Y = 0 is:
t
----
/
/\ / \
2
Optimal log: In this case, instead of adopting the partial procedure, the Y variable is lagged once the topt sample becomes available.
Var(X ) 1.J4((Xl' ytH ft cov(x t ,Yl+ t )
(16)
t
From matrix multiplication, Equation (14) becomes:
~Maij= Va(Yt+t ~X2 )- XO~Xt Yt-t-r ~Xt Yt-t-r )+ Va(X t ~Yt~) (15) Va(X t )va(Yt+t )-CO~Xt, Yt-t-r) This form represents the case of partial shift. Cases 1 and 3 can be regarded as limiting forms of the partial shift case where 't =0 and topr=O.
168
The straight line plot for E(Mah)=2 represents the on- target case (k = 0) whilst the other two lines represent the cases of mean shifts of 1 and 2 standard deviations. The larger values of the Mahalanobis distance associated with the optimal lag imply that the optimum detection abilities for the off-target case will be achieved when the data are lagged according to 'topt. The drawback in this case is that for this scheme to start operating, at least 6 samples require to be available. However, the partiallag scheme can operate from the first sample point thereby increasing the chance of detecting a possible change in the process earlier. In Fig. 4, values for lags greater than 6 are also given to illustrate the danger of excessive lagging. In this case the Mahalanobis distance decreases since the covariance of X and Y is changing according to Equation 10 but as a function of p and not a.
5. ADJUSTING THE '1..2 LIMITS
the cases was compared using the 95%, 99% and 99.73% limit ofax~ distribution. In this paper only the results for the 99.73% limit are shown since this corresponds to a widely used on-target ARL of 370 for the white noise case. However some conunents about the results for the other two limits are also made. Four pairs of [a,~], given in the frrst column of Table 1, were used to describe four different levels of autocorrelation, with values ranging from 0 (white noise) to 0.9 (strong autocorrelation).
One approach to addressing the issues discussed in the previous section is to adjust the limits to obtain a desired on-target ARL. This adjustment is carried out using the concept of the correction factor (CF): CF= (RealLimit-p)/~;,p-p)
(17)
where p is the degrees of freedom and a is the significance level. The 'Real limit' is the value that the limit should be set to in order that the on-target ARL is equal to the desired value, e.g. 370 for a 97.3% confidence limit. The correction factors have been calculated for a set of "known" processes which in this case are 2 independent AR(1) representations with the same value of AR coefficient. These then are used for any arbitrary multivariate process that is "similar" to the known processes. By mapping the ACF of the Mahalanobis distance of any arbitrary process to that of the known process this "similarity" can be quantified in terms of the minimum squared difference between the ACF's. The approach is illustrated in Figure 5 using the process described in Section 4. The most "similar" ACFs are those of two AR(I) processes with AR parameter equal to 0.5 and 0.6 corresponding to correction factors of 0.91 and 0.86 for the 95% limit, respectively. Interpolating between these two values, a CF of 0.89 is obtained.
Table 1 Correction Factors for 1 2 limits [a,(}]
[00] [0.20.3] [0.5 0.6] [0.8 0.9]
1\ , 0.7.'.\ 01 ..
)0.' \\\ : iO.5 \
io. 0.3 0.2
~t\ /
,.
-:
.:
,
;
;
;
-:
:
: ,
Auto None [00] correlation Mean Shift ARLO 371.1 NoLag 369.2 PartLag OptLag 371.1 Mean Shift ARLO.5 203.1 NoLag 100.8 PartLag 103.2 OptLag Mean Shift ARLl NoLag 67.6 PartLag 20.5 22.0 OptLag Mean Shift ARL1.5 23.4 NoLag 8.2 PartLag 9.5 OptLag Mean Shift ARL2 NoLag 9.4 PartLag 5.1 6.9 OptLag
,
,.0..,1\
.~
~~.'\~~..iAail)iDoii..(i·····: ··:tAJllf>Fj···~
99% 1 0.99 0.96 0.79
99.73% 1 1 0.99 0.88
Table 2 ARLs for 99.73% limit
\
0.1 \
95% 1 0.97 0.89 0.52
··~·······:····1
°oL.-.---'-_~~~iiiiiiOI;;;;;=-+-~_--:!
Fig. 5. Mapping of ACF of Mahalanobis distance of an arbitrary process with "known" processes. 6. SIMULATION STUDY
Weak Mild Strong [0.20.3] [0.5 0.6] [0.8 0.9] ARLO ARLO ARLO 374.1 363.8 375.5 372.6 363.9 381.4 374.6 367.8 388.7 ARLO.5 ARLO.5 ARLO.5 205.6 208.0 226.3 149.9 104.3 110.5 108.0 113.6 155.1 ARLI ARLl ARLI 94.1 69.9 75.9 44.7 22.1 25.5 46.9 23.6 27.4 ARL1.5 ARL1.5 ARL1.5 43.6 25.3 30.4 19.7 8.9 10.6 20.4 10.1 11.5 ARL2 ARL2 ARL2 10.9 14.5 23.9 11.7 5.7 6.8 12.1 7.1 7.6
The on-target ARLs were adjusted to 370 using the procedure described in section 5. The correction factors for the four processes are shown in Table 1. The ARL for the on-target case and for shifts of 0.5, 1, 1.5 and 2 standard deviations in the X variable are summarised in Table 2. The four plots in Fig. 6 compare the 3 lagging scenarios for different levels of autocorrelation. The results for the on-target ARLs for the white noise case are consistent with that expected from the theoretical analysis. For example, a 99.73% confidence limit corresponds to one false alarm approximately every 370 samples. The correction factors used to adjust the limits were found to be sufficiently accurate. It can be argued though that for the case of [0.8,0.9], the factors are slightly moderate (e.g. on-target ARL of 380)
In this section, the simulated results for the on-target and off-target ARLs for the no lag, partial lag and optimal lag scenarios are given. Simulations were carried out for different values of the auto regressive parameters and mean shifts. A total of 100,000 independent realisations were run for each combination of detection limit, autocorrelation coefficient and mean shift. The simulations were initiated using values of X and Y that lay within the respective limits. This avoids the situation where immediate out of control signals occurred due to initial out of control values. In all the cases when an out of control signal was detected a new realisation was initiated. The Mahalanobis distance for each of
169
1.... ",)-(0. 0 .0 .0)
1.... PJ'"'I0. 2 .0. 3)
.. ... .. ........... ... -............... . . . . . . . . . .. . ..... . ....... ....... - .... ..
. ....... .......... . . .. . ... -... - ..... . . . ..... . . . ...... .. ......... -... - . - .... .
~~ _
10·
~ 1~: ~ ~ iij ~~~ i~ ~! ~: ~;: i:!! :::) 1[ij~ ~ ~] ~ i:] ~ ~
10·
i: l:) ~! r) ::: ~;;: ~: ~ ~:; ~:!! ::: :j j~l i ~:] [: t 1:
':::~ '::.~ ....-.. Shift .... p _ _ _ _ Id
. . . . . - . . .. .. .. ....
-
...............
M _ n a"." ....
'..
__ . .
.. . . . . . . . . . .. . ........ . . . .. ........ .
.=:; --== ~ i :i ]! ~n ~ :;~ :~ ! ~ ~ ~ ; ~ ~ !~ ! : i :[:! j : t : 1: !]!
'0· 1 : ]
!S
~
~ ~ 11 ! ~ ! ~ ~ H ~ ! ! ~ ~ Ii ! ! i :! ~ ! ~ ! ! ! : :::! : : : ; ; ; : :
'0'
o
0.5
,
1 .S
t!:: rii ~ !n !1~ : i;~ ~ ~ ;;~ ;;;~ :::i~ i i ~ r~ : :: ~ i
'0·
!S
: : : : : : : ::: : : :: ::. , : .::: ::: . :
:: : : : : :::: : ::: :' : :: ... : . : . . .
2
.
! L : ! ! j ! ! ( ! ! ! ! 1! ! ! i i ! ! ! ~ ! ! ! ! ! ( i ~ ! ! : : ! ! 1
'0'
0
0.5
f
1 .5
2
Fig. 6. ARLs for different mean shifts - 99.73% limit 'x' No Lag, ,'0' Partial Lag , '*' Optimal Lag whilst for that of [0.5,0.6] they are slightly high (eg. on-target ARL of 360). However such deviations can be expected since the matching of the ACFs can never be exact and interpolation between the known values of the correction factors always introduces discrepancies.
correlated in time has been proposed. Lagging the variables in a partial manner as samples become available enables the implementation of SPC and MSPC schemes from the first sample. This is of particular relevance in processes where frequent and long start-up periods are the norm. Making use of the increase in the cross-correlation, as the lag between the variables approaches the optimal value, the detection of mean shifts can be enhanced. The increased benefits of the partial lag approach in the case of autocorrelated data were demonstrated using Monte Carlo simulations.
Both the partial and optimal lag outperform the nolag case for the 99.73% limit (Fig. 6). Partial lagging appears to give slightly better results compared with those of optimal lagging. However in this example, the maximum time delay was 6 and, it is conjectured. that the results for partial lagging will not be much better than those of the optimal lag since the period of "waiting" for the optimal lag is not very large compared to the ARLs for detecting the shifts. It was also observed that for large mean shifts (> 1.5 std), there are occasions where the no shift case gives faster detection than the other two methods; particularly when 95% and 99% control limits are used. This implies that there is no need to "wait" for the appropriate number of samples to become available since the relatively large size of the mean shift can be rapidly detected, even where the crosscorrelation between the variables is weak.
Future work includes the presentation of results from a dynamic simulation of a chemical plant. In addition a globally optimum way of lagging the variables can be adopted. In this approach not only is one variable used as the reference but all possible combinations of lagged variables are examined to identify which combination of lags gives the optimum representation of the data set with respect to the covariance matrix.
ACKNOWLEDGEMENTS E. Kaskavelis acknowledges financial support from Statistical Services, Shell Global Solutions, Cheshire Innovation Park, Chester, UK and the EU ESPRIT project PERFECT, No. 28870.
The benefits gained from the partial lag scheme can be expected since it operates from the first sampling point thus by using the gradually increasing correlation between the two variables, faster detection results. This is true especially for medium to large mean shifts. It can be argued that there is an incentive to apply partial lagging when the autocorrelation in the data is relatively strong and the time delays between the variables is large. Furthermore it is also of interest to note that for autocorrelation coefficients of up to 0.6, there are no significant differences in the results with respect to the white noise case.
REFERENCES Harris T. J. and W. H. Ross. (1991). Statistical Process Control Procedures for Correlated Observations. The Canadian Journal of Chemical Engineering, 69, 48-57. Montgomery D. C. and C. M. Mastrangelo. (1991). Some Statistical Process Control Procedures for Autocorrelated Data and Discussion. Journal of Quality Technology 23(3), 179-204. Wachs A. and D. R. Levin. (1999). Improved PCA Methods for Process Disturbance and Failure Identification. AIChE Journal. 45(8), 1688-1700.
7. CONCLUSIONS AND FUTURE WORK A modification of the standard approach for the time lagging of variables that are auto- and cross-
170