Chemometrics and Intelligent Laboratory Systems 41 Ž1998. 73–81
Adaptive batch monitoring using hierarchical PCA Stefan Rannar ¨ a
a,)
, John F. MacGregor a , Svante Wold
b
Department of Chemical Engineering, McMaster UniÕersity, Hamilton, ON, Canada b Department of Organic Chemistry, Umea UniÕersity, Umea, Sweden
Abstract A new approach to monitoring batch processes using the process variable trajectories is presented. It was developed to overcome the need in the approach of Nomikos and MacGregor wP. Nomikos, J.F. MacGregor, Monitoring of batch processes using multi-way principal components analysis, Am. Inst. Chem. Eng. J. 40 Ž1994. 1361–1375; P. Nomikos, J.F. MacGregor, Multivariate SPC charts for batch processes, Technometrics 37 Ž1995. 41–59; P. Nomikos, J.F. MacGregor, Multi-way partial least squares in monitoring batch processes, Chemometrics Intell. Lab. Syst. 30 Ž1995. 97–108x for estimating or filling in the unknown part of the process variable trajectory deviations from the current time until the end of the batch. The approach is based on a recursive multi-block Žhierarchical. PCArPLS method which processes the data in a sequential and adaptive manner. The rate of adaptation is easily controlled with a parameter which controls the weighting of past data in an exponential manner. The algorithm is evaluated on industrial batch polymerization process data and is compared to the multi-way PCArPLS approaches of Nomikos and MacGregor. The approach may have significant benefits when monitoring multi-stage batch processes where the latent variable structure can change at several points during the batch. q 1998 Elsevier Science B.V. All rights reserved. Keywords: Batch monitoring; Variable trajectories; Hierarchical PCA; Adaptive PCA; Multi-way PCA
1. Introduction Many chemical, pharmaceutical and biochemical products are produced in batch reactors, and many other manufacturing processes are batch in nature Ži.e., they have fixed start and end points.. In most of these processes product quality variables are only measured after the end of each batch, often hours later in a quality control laboratory. This makes it difficult to achieve control over product quality or to monitor the progress of these batch processes. However, online process computers routinely collect data on the
)
Corresponding author.
time history of many process variables such as temperatures, pressures and flowrates throughout the duration of each batch run. Nomikos and MacGregor w1–3x and Kourti et. al. w4x have presented very powerful process analysis, monitoring and diagnostic procedures which utilize these process variable trajectory data. These procedures, based on multi-way PCA ŽMPCA. and multi-way PLS ŽMPLS. methods w5x, are now being widely adopted by the batch chemical industry. The methods have proven to be very powerful for analysing historical data from past production, and diagnosing operating problems. They have also proven to be very effective for the on-line monitoring of new batches. However, one characteristic of
0169-7439r98r$19.00 q 1998 Elsevier Science B.V. All rights reserved. PII S 0 1 6 9 - 7 4 3 9 Ž 9 8 . 0 0 0 2 4 - 0
74
S. Rannar et al.r Chemometrics and Intelligent Laboratory Systems 41 (1998) 73–81 ¨
the on-line monitoring procedure is that the same MPCA or MPLS model developed from the completed batch runs is also used for the on-line monitoring. The implication of this is that when monitoring a new batch, at any point during the batch, data on the deviations of the variables from their average trajectories for the remainder of the batch is not yet available. Therefore, the MPCA model, which assumes the complete history of the batch data, cannot be used directly. Nomikos and MacGregor w2x proposed several different approaches to handle this problem, most of them involving estimating the missing data on trajectory deviations from the current time until the end of the batch. These approaches have worked quite well in practice. However, it would be nice to have MPCArMPLS monitoring approaches which do not depend upon having to fill in these missing data. In this paper we present a modification of the monitoring method which does not require estimates of the data for the uncompleted portion of the batch. The approach is based on a recursive hierarchical or multi-block PCA or PLS algorithm which processes the data in a sequential manner. For monitoring future batches, one need only store the loading matrices for the local model at each point in time and the score matrix from the last time step. This adaptive approach often requires fewer latent variable dimensions and appears to work as well as the existing methods. It may offer potential advantages in situations where the batch has several stages and the latent variable structure changes from stage to stage.
ingredient fed to the batch w4x. In more complicated cases Dynamic Time Warping methods from the speech recognition literature can be used to align the batches w6x. Typical data collected from industrial batch processes takes the form of a three dimensional array. Within each batch run j s 1, . . . , J variables are measured at each of k s 1,2, . . . , K time intervals throughout the batch. Similar data will exist on a number of batches i s 1,2, . . . , N in the historical database. ŽThere is no need for the time intervals to be equally spaced as long as the intervals chosen are consistent from batch to batch.. This leads to the data array illustrated in Fig. 1. As also illustrated in Fig. 1, the array can be unfolded to get a short and wide matrix that can be analyzed by multivariate projection methods like PCA. By subtracting the mean from each column in this unfolded form, one is removing the average trajectory from each of the variables and hence PCA looks at the variance about these average trajectories. However, one problem arising when using this PCA model to monitor an new batch is that only data up to the present time is available and nothing is known about the deviations of the variables from their nominal trajectories for the remainder of the batch. At a specific time, k ), only k ) = J observations are available, whereas the PCA or PLS model needs the vector to be of full length, K = J, in order to calcu-
2. Batch monitoring using MPCA In a typical batch process, a number of variables are measured at certain time intervals over the total duration of the batch process in order to detect any problem with the current batch as early as possible. In this study we will assume that the duration of the batches is fixed. In some processes the duration may vary from batch to batch. This can introduce some difficulties with the multivariate approaches. However, this can often be accommodated by simply replacing the time variable with another process variable which varies monotonically throughout the batch such as conversion or the cumulative amount of a key
Fig. 1. Overview of the MPCA method. The three dimensional Žbatches=variables=observations. array is unfolded to give the matrix to be analyzed with ordinary PCA.
S. Rannar et al.r Chemometrics and Intelligent Laboratory Systems 41 (1998) 73–81 ¨
late the score vectors for the present batch. Nomikos and MacGregor suggested three different approaches to solve this. The first was to assume that the process would continue as a normal batch from the present time until the end of the batch. This means putting zeros in the vector for all remaining trajectory deviations. The second approach filled in the future deviations with the currently observed deviation. The third suggestion used missing data handling features of PCA or PLS algorithms to provide a prediction of the scores for the remaining time periods Ž k s k ) q 1, . . . , N . by projection of the available data Ž k s 1, . . . , k ). onto the plane of the PCA or PLS model. The control limits on the resulting monitoring charts depend heavily upon the approach used to fill in the missing data. All these methods have been seen to work quite well in practice Žsee Nomikos and MacGregor w2x, for more discussion.. However, it would be more satisfying if one would not have to make such assumptions when monitoring an ongoing batch. In the following sections of this paper it is shown that by using a version of hierarchical PCA or PLS, this can be accomplished. Only a PCA version is presented, but its extension to include PLS is straightforward and will not be discussed further in this paper.
3. Hierarchical PCA To better understand the adaptive hierarchical algorithm in Section 4 one must be familiar with the ordinary hierarchical PCA algorithm. In the latter algorithm the X-block is divided into a number of blocks. It is important that the these blocks are constructed in such a way that the variables in each block are connected to each other in a natural way. For instance, in an industrial process variables measured at different stages of the process can be blocked so that all variables from one stage of the process are put in one block and all variables from a second stage are put in another block. This will ensure that the model will be relevant and that the interpretations will make sense, since improved interpretation is the major advantage of hierarchical methods. The hierarchical PCA algorithm presented by Wold et. al. w7x is shown in Fig. 2. The latent vectors t a are computed sequentially Ž a s 1, . . . , A.. The
75
Fig. 2. Overview of hierarchical PCA. The score vectors Ž r . from all the individual blocks are organized in a consensus matrix R from which the overall score vector t and weight vector w are calculated.
first step of the algorithm is to select an arbitrary starting score vector t a for a s 1, usually one of the columns from the X-block. This vector is then normalized to have length one and then multiplied by each individual X-block to give the block loading vectors pa k . t a s t ar5 t a 5 pa k s X aX k t ar
Ž
t Xa t a
.
Ž 1. Ž 2.
Here the subscript a indicates the latent variable dimension being considered Ž a s 1, . . . , A. and the subscript k indicates the data block. Each X a k-block is then multiplied by its loading vector to calculate a set of new block score vectors r a k , one for each block k. r a k s d k X a k pa k
Ž 3.
d k is a block weighting parameter usually set equal to 1rŽnumber of variables in the block., but can be set to other values if one wish to give greater importance to some block. The former default weighting results in equal weight being placed on each block. The block score vectors, r a k , from all the blocks are now combined in the consensus matrix R a and used to calculate the block weights wa and the new score vector t a . wa s RXa t ar Ž t Xa t a . t a s R a wa
Ž 4. Ž 5.
After normalizing the new t a score vector to length one, the convergence of the t a is checked. If
76
S. Rannar et al.r Chemometrics and Intelligent Laboratory Systems 41 (1998) 73–81 ¨
the difference between this new t a vector and the previous t a vector is larger than some preset criteria, one starts the next iteration using this new t a vector in Eqs. Ž1. – Ž5.. When converged, each block is deflated according to the following equation. XŽ aq1. k s X a k y t a pXa k
Ž 6.
The iteration is then repeated for the Ž a q 1.-th latent vector.
4. Adaptive batch monitoring In this batch oriented hierarchical PCA algorithm, the blocks are the time slices from the three dimensional batch data matrix, giving K blocks each with N batches and J variables. What makes the use of this algorithm different from using ordinary hierarchical PCA algorithm is that it only looks at one time slice at a time rather than all of the blocks at once. This will give separate score vectors t a k for each individual time slice X a k , Fig. 3. The R a k matrix at each stage consists of two columns, one containing the previous overall score vector t aŽ ky1. which summarizes the recent prior history, and the score vector r a k from the current time slice, which summarizes the new information coming available.
The algorithm proceeds as follows. The initial step is to calculate a one component PCA model for the first time slice, giving the first block’s t a k score vector and pa k loading vector. The hierarchical part of the algorithm starts with the second time slice and continues for the rest of the batch duration Ž k s 2, . . . , K .. The first step in the building of the model for time k is to use the t aŽ ky1. score vector from the previous time slice model as a starting vector for t a k to give the block scores for the present X a k-block. pa k s X aX k t a k
Then the score vector r a k is calculated and normalized to have length one. r a k s X a k pa k
Ž 8.
r a k s d k r a kr5 r a k 5
Ž 9.
As in the hierarchical PCA algorithm, the d k is a weighting factor. In this adaptive MPCA algorithm this factor plays the same role as the exponential weighting factor in an EWMA model. It controls the amount of weight given to the new information Ž r a k . at time k vs. the recent history Ž t aŽ ky 1. .. The weighted score vector r a k is then placed together with the score vector t aŽ ky1. in the R a k matrix and multiplied by the t a k vector Žin the first iteration this will be t aŽ ky1. . to give the weights, wa k , for the two vectors in the R a k-matrix. wa k s RXa k t a k
Fig. 3. Overview of the adaptive MPCA method. For each time slice X k , the corresponding score vector r k is calculated, weighted using the weighting factor d, and placed in the R matrix together with the previous overall score vector t ky 1 . This matrix is then used to calculate the new overall score vector t k .
Ž 7.
Ž 10 .
Since the vector t aŽ ky1. holds the weighted information from all previous time slices, the algorithm can be seen as an exponentially weighted PCA model. A high value of d k will make the model adapt very fast while a low value puts more weight on the previous history of the batch. The value of d k depends on the type of process to be monitored. A stable one-stage batch process can have a low value while a multistage process might require a high value of d k to make the model adapt fast to the next stage of the process. Setting d k to a number very much larger than unity will be equivalent to calculating separate PCA models for each time slice since the hierarchical approach then uses no memory of any previous history of the batches. It will therefore follow all small variations in the training set.
S. Rannar et al.r Chemometrics and Intelligent Laboratory Systems 41 (1998) 73–81 ¨
The new score vector t a k is then calculated and, as in the ordinary hierarchical PCA algorithm, normalized to have length one. t a k s R a k wa k t a k s t a kr5 t a k 5
Ž 11 . Ž 12 .
At this stage the convergence of the t a k vector is checked and if the difference between the old and the new score vectors is small one proceeds to the deflating. Otherwise, one iterates Eqs. Ž7. – Ž12. until the preset convergence criteria is reached. After convergence each X a k-block is deflated by the converged t a k score vector and the block loading vector pa k according to Eq. Ž13.. These deflated blocks can then be used to calculate the next PCA dimension. XŽ aq1. k s X a k y t a k pXa k
Ž 13 .
Starting with a s 1, the converged latent vector t a k is calculated for each time step k s 1,2, . . . , K. The whole iterative process is then repeated to obtain the converged latent vectors for dimensions a s 2, . . . , A using the residuals from the previous dimension ŽEq. Ž13... Once the adaptive model has been built it can be used to monitor the progress of new batches. To apply the on-line algorithm to future batches one must store the following elements of the model: pa k , wa k , and d k for a s 1, . . . , A and k s 1, . . . , K. As a new batch evolves one obtains observations sequentially as new row vectors xX1 , xX2 , . . . , xXK . At time period k the hierarchical prediction algorithm is as follows. For a s 1,2, . . . , A compute r a k s xXa k pa k t a k s t aŽ ky1. ,d k r a k wa k xXŽ aq1. k s xXa k y t ak pXa k The prediction error is given by A
ek s x k y
Ý t a k pa k ss1
5. Application to industrial batch polymerization data To investigate the performance of the adaptive MPCA the same industrial data set as in the paper by
77
Nomikos and MacGregor w2x was used. This data set was supplied by DuPont and is a polymerization reaction process that can be divided into several stages. An initial solvent vaporization phase, which lasts approximately 1 h, is followed by the main polymerization phase lasting about 1 h. The variables measured throughout the total duration of the batch were temperatures Žvariables 1, 2, 3, 6 and 7., pressures Žvariables 4, 8 and 9. and flowrates Žvariables 5 and 10.. In total, 10 variables were recorded at 100 different time intervals for 55 good and bad batches. From the 55 batches Nomikos and MacGregor selected a subset of 36 good batches using the MPCA method. Good batches were those with both acceptable product quality and with normal variable trajectories. These 36 batches will be used to train the models in following sections in order to find representative models for process monitoring and to calculate control limits. More detailed information about the data can be found in Nomikos and MacGregor w2x and in thesis of Nomikos w8x. The variables in the DuPont data were centered about their average trajectories and scaled to have unit variance prior to the analysis. The data was analyzed with adaptive MPCA and the results were compared with the results from the previous analysis using the MPCA method. Unless otherwise specified the weight of the r score vectors Ž d k . in the R matrix in the adaptive MPCA will be set equal to one, thereby giving equal weight to past and current data. Cross validation when using the regular MPCA method indicated that approximately three components were needed to describe well the data. The cumulative overall sum of squares explained by the models were 56.8% Ž38.5, 11.7 and 6.6%. for MPCA and 62.4% Ž44.9, and 17.5%. for the adaptive MPCA. Since the adaptive MPCA creates local models at each time interval it is not surprising that it explains more of the data than MPCA and according to the behavior of the sum of squares ŽSS. explained by these local models, it seems that only two adaptive components are needed to explain the same amount as three components from MPCA. In Fig. 4, the sum of squares explained by each latent variable dimension from the analysis with MPCA is shown, and in Fig. 5 the corresponding results from the adaptive MPCA are shown. The results from both methods show that in the first part of the process, only the first principal
78
S. Rannar et al.r Chemometrics and Intelligent Laboratory Systems 41 (1998) 73–81 ¨
Fig. 4. Sum of squares explained by individual MPCA components at each time interval.
component is needed to explain the variation. Around observation 50 the importance of the first component decreases radically and between observations 50 and 60, the model cannot describe the variation very well at all. However, after observation 60, the influence of the second and third components increases for the regular MPCA model while the sum of squares explained by first component continues to decrease. Comparing these results with the results in Fig. 5 one finds that, after around observation 60, the sum of squares explained by the first component does not decrease for the adaptive hierarchical MPCA, but instead remains fairly constant. If one decreases the value of the weighting factor d k giving a slower adapting model one can see in Fig. 6 that after time 60 the importance of the first component continues to
Fig. 5. Sum of squares explained at each time interval by individual components from adaptive MPCA with ds1.
Fig. 6. Sum of squares explained at each time interval by individual components from adaptive MPCA with ds 0.33.
fall and that of the second component starts to increase similar to the results from the MPCA analysis. This is what can be expected since lowering the value of the weighting factor will make the adaptive MPCA
Fig. 7. Comparison of loading vectors over time period 35–40. Ža. Average loadings for the first MPCA component. Žb. Average loadings for the first adaptive MPCA component.
S. Rannar et al.r Chemometrics and Intelligent Laboratory Systems 41 (1998) 73–81 ¨
79
and 2 is shown together with the average loading for the first component of the adaptive MPCA. The loadings of the first component of adaptive MPCA seems to be a combination of the first two components from the old MPCA method which would explain why fewer components are needed in the adaptive version.
6. On-line monitoring of new batches The MPCA and adaptive hierarchical MPCA models generated above are now used to monitor the developing behavior of a new batch. This new batch is the same as the one monitored in Nomikos and MacGregor w2x that yielded poor final product quality. In the monitoring phase two statistics with complementary information were used, the squared pre-
Fig. 8. Comparison of loading vectors over time period 90–95. Ža. Average loadings for a combination of the first and second MPCA components. Žb. Average loadings for the first adaptive MPCA component.
model slower to adapt and therefore more similar to the fixed MPCA solution. To better understand what is actually happening in the adaptive MPCA one can investigate the block loadings from the two different approaches. The block loadings in the first part where only component one is of importance can be illustrated by showing the average loadings over blocks 35–40 ŽFig. 7a and b.. The loadings pattern for component 1 is very similar for the two different methods, only the scale is different. This difference in scale is due to different normalization procedures within the two algorithms. If one takes a look at the end of the process one can find what is causing the changes in the explained sum of squares. In Fig. 8a and b the corresponding comparison is shown for the average loadings but now taken over time period 90–95. For MPCA the combined loadings from components 1
Fig. 9. Monitoring of bad batch using adaptive MPCA with ds1. Ža. T 2 -statistic; Žb. squared prediction error ŽSPE..
S. Rannar et al.r Chemometrics and Intelligent Laboratory Systems 41 (1998) 73–81 ¨
80
calculated model together with the selected method of filling in the missing data until the end of the batch Žsee Nomikos and MacGregor w2x.. In the adaptive MPCA, since no missing data replacement is necessary, the control limits can be calculated directly using the residuals and scores from the model building stage. Results from monitoring the new batch using the adaptive hierarchical MPCA approach are shown in Fig. 9 for a weighting factor of d s 1.0 Žfast updating.. The results are very comparable to those obtained for the regular MPCA approach. A very clear signal of abnormal behavior is detected around time period 57 in the SPE statistic, and slightly later in the T 2 statistic. The reader is referred to Nomikos and MacGregor w2x for comparisons and for a discussion of the reasons for the abnormal behavior. Fig. 10 shows the adaptive hierarchical MPCA results using d s 0.33 Žslow updating..
7. Results and conclusions Fig. 10. Monitoring of bad batch using adaptive PCA with ds 0.33. Ža. T 2 -statistic; Žb. squared prediction error ŽSPE..
diction error ŽSPE. and the Hotelling’s statistics ŽTA2 . based on the first A latent variables. J
Ý e k2 j
SPE k s
Ž 15 .
js1 A
Ta2k s
Ý as1
t a2k
Ž 16 .
s a2 k
Where the e k j ’s are the prediction errors of the variables at each new time k, and s a k is the standard deviation of the t a k score vector from the reference distribution Žthe training set.. The distribution of T 2 is related to the F distribution w9x as TA2k s
AŽ N 2 y 1. N Ž N y A.
FŽ A , NyA.
Ž 17 .
In MPCA the standard deviation and the control limits for the two statistics are based on the distribution of the score values and errors calculated by a subsequent monitoring of the training set using the
In this paper a new adaptive MPCA algorithm based on hierarchical PCA has been presented for the purpose of monitoring batch processes. This approach is shown to give monitoring results similar to those of the regular MPCA approach on the investigated data set. An advantage of this adaptive hierarchical version is that in the monitoring phase, there is no need to assume anything about the future deviations of the ongoing batch from the nominal trajectories. In MPCA one has to assume a behavior of the future of the batch in order to fill in the data vector from the current time period until the end of the batch. The adaptive MPCA only works with the present and past history of the batch thereby avoiding this fillingin procedure. A second advantage of this approach is the ability of the model to adapt to different stages of the batch process making it very suitable for monitoring multistage batch processes. In the two stage industrial example presented in this paper the adaptive MPCA only needed two components compared to the three components used by MPCA to give the same results. If one has a process with even more stages the adaptive MPCA could be advantageous since it will reduce the number of dimensions needed
S. Rannar et al.r Chemometrics and Intelligent Laboratory Systems 41 (1998) 73–81 ¨
and thereby simplifying the presentation of the monitoring results.
References w1x P. Nomikos, J.F. MacGregor, Monitoring of batch processes using multi-way principal components analysis, Am. Inst. Chem. Eng. J. 40 Ž1994. 1361–1375. w2x P. Nomikos, J.F. MacGregor, Multivariate SPC charts for batch processes, Technometrics 37 Ž1995. 41–59. w3x P. Nomikos, J.F. MacGregor, Multi-way partial least squares in monitoring batch processes, Chemometrics Intell. Lab. Syst. 30 Ž1995. 97–108. w4x T. Kourti, P. Nomikos, J.F. MacGregor, Analysis, monitoring
w5x
w6x
w7x
w8x w9 x
81
and fault diagnosis of batch processes using multi-block, multi-way PLS, J. Proc. Control 5 Ž4. Ž1995. 277–284. S. Wold, P. Geladi, K. Esbensen, J. Ohman, Multi-way principle component analysis and LS analysis, J. Chemometrics 1 Ž1987. 41–56. A. Kassidas, J.F. MacGregor, P.A.Taylor, Synchronization of Batch Trajectories using Dynamic Time Warping, Am. Inst. Chem. Eng. J., In press Ž1998.. S. Wold, S. Hellberg, T. Lundstedt, M. Sjostrom, PLS Modeling with Latent Variables in Two or More Dimensions, PLS Meeting, Frankfurt, September 1987. P. Nomikos, SPC of Batch Processes, PhD thesis, Chem. Eng. Dept., McMaster University, Hamilton, ON, Canada, 1995. N.D. Tracy, J.C. Young, R.L. Mason, Multivariate control charts for individual observations, J. Quality Technol. 24 Ž1992. 88–95.