Copyright IQ IFAC Dynamics and Control of Process Systems. Corfu, Greece. 1998
COMPARISON OF METHODS FOR HANDLING UNEQUAL LENGTH BATCHES
S.G. Rothwell, E.B. Martin and A.J. Morris
Centre for Process Analysis, Chemometrics and Control University of Newcastle, Newcastle-upon-Tyne, NE1 TRU UK Tel :- +44 (0) 191 2225382; Fax :- +44 (0) 191 2225292 E-mail :
[email protected];
[email protected];
[email protected]
Abstract: It is generally accepted that multivariate statistical quality predictor models can be implemented on batch processes. However, most approaches presented in the literature assume that all batches produced are of the same length. This is very unusual in practice. This paper contrasts and compares three different methods of addressing this situation with application to Multi-way Partial Least Squares (MPLS) for predicting final batch quality. A simulated batch bioprocess operating with uneven batch lengths is used to illustrate and compare the techniques. Copyright © 19981FAC Keywords : Batch processes; Multi-way Partial Least Squares; Uneven Length Batches.
I. INTRODUCTION
performance in differentiating between nominal and faulty batch production. The two other approaches considered were the use of a surrogate variable and dynamic time warping (Gollmer and Posten, 1995).
Published methods for predicting final batch quality using Multi-way Partial Least Squares (MPLS) are only applicable to batches of equal length. In the literature, batches described as nominal for the purpose of building an MPLS model are all of equal length. When there is a large historical database available, it may be possible to select batches that are all of equal length. In most industrial situations frequent operational changes mean that the historical database for any particular nominal operating region is seldom large enough to select only those batches that are equal in length in order to build the MPLS model. However, batches that are longer or shorter than the selected length could provide valuable information on the nominal operating region and may be as important in defining the nominal operating space as batches that are of the desired or most commonly occurring length. In the majority of applications of MPLS, it is assumed that the chosen batch length is taken to be that of the shortest batch and in the case of any batches longer than this, the data is ignored. In effect all batches are 'chopped' to the minimum nominal batch length. The aim of this work is to suggest other methods of dealing with uneven length batches and to compare their
2.
PARTIAL LEAST SQUARES (PLS) AND MULTI-WAY PLS
Projection to latent structures (PLS) has been discussed at length in the literature (Geladi and Kowalski, 1986; Hoskuldsson, 1988). It is a projection based regression method strongly related to Principal Component Analysis (Wo Id et al., 1987a) whereby directions of greatest variability are found in the X-block data matrix (process data) which are most predictive of the Y -block matrix (quality data). The principal component axes defined as being the directions of greatest variability in the data are in effect rotated slightly so they define variability that affects quality, hence any variation in the X-block that does not affect Y-block values is not modelled. PLS can be formulated by considering two principal component analysis (PCA) decompositions, one on the process data matrix, X and one on the quality matrix, Y:
67
axes system to the mean of the data and prevents data of large magnitude but small relative variance from 'swamping' important variation in variables of a smaller scale. Alternative scaling methods have been proposed in the literature, such as scaling by an estimate of standard deviation of noise on the variable, and weighting the variables according to knowledge of the process, however mean-centring and unit variance scaling perform adequately for the datasets under consideration .
k
X=Tpf +E=
L t,p,f +E = L uq .. +F i_ I
k
Y=UQ f +F
T
i~1
These are called the outer relations on the X and Y block respectively. A linear regression is then performed between the scores on the X and Y blocks, giving the inner relation :
After rescaling, the PLS algorithm is then applied in the usual way, considering each variable at a particular time point as a separate variable. One score element is now obtained for each batch in each latent variable direction, while the loading vectors are of length JK in each direction. The Y-block matrix usually only contains quality variable measurements on each batch at the final time point, hence no unfolding is necessary.
where b is the regression coefficient, analogous to that in mUltiple linear regression (MLR). PLS succeeds where MLR fails because of it's ability to handle highly correlated data through the PCA decomposition of the outer relations and also it ' s ability to handle data sets with less samples than variables, both conditions which result in illconditioning of the MLR matrix . Both leave-one-out and block cross-validation were used with a minimum PRESS criterion to aid the choice of number of latent variables to retain in the MPLS model.
3.
Cut to minimum length :- Observations on all process variables for all batches are taken from the start of the batch up until the batch termination time, T , of the shortest batch. Hence all information after time T for all batches longer than T will be lost. This is not necessarily a disadvantage. Any event within the batch that causes a deviation from the nominal trajectory may have to occur fairly early in the batch to affect the quality of the batch and it may also be argued that by the time all batches reach time T, the main events within the batch process have occurred and the nature of the batch is more or less defined . This becomes more of an issue when the time between the shortest length batch and the longest length batch becomes ' large ', or the process in question is very sensitive to small changes. These factors are process dependent and, whilst not explicitly stated, this method is thought to be used in many industrial applications cited in the literature.
Batch data usually takes the form of several variables , each measured at pre-defined time intervals throughout the duration of the batch. To build a model from several batches representing nominal operating conditions, a three way matrix (batches x variables x time) needs to be considered. This is most easily done by partitioning the matrix into twodimensional slices and unfolding the matrix by the third variable.
B aKhc,:s
T (l)
METIlODS FOR MANIPULATING UNEVEN LENGTII BATCHES
T(2 )
Dynamic Time Warping (D1W) of batches to common length :- Dynamic time warping (DTW) is a method which attempts to match features in a test pattern or profile to a reference profile in order to determine a measure of fit between them. It has previously been used as a method of pattern matching in speech recognition. A recent application has been to the classification of faults in a continuous chemical process (Kassidas et al. , 1997). DTW distorts the test profile by stretching or compressing it to the length of the reference profile. The original use of DTW in isolated word speech recognition is restricted to being a distance measure , whereas this paper seeks to construct a predicted process variable profile of fixed length in time by retaining information on the optimal path taken to obtain the
Fig. I . Multi-way unfolding of three-dimensional nominal batch dataset. Several methods of unfolding can be used depending on what type of variation is required to be modelled . Bearing in mind the aim of comparing variation between batches over time or a substitute indicator variable , the unfolding method used is that shown in Figure I and is used in many applications in the literature (Wold et al., 1987b; Nomikos and MacGregor, 1995). After unfolding the three-way Xblock batch data matrix, the new unfolded X-block and the Y -block are usually mean-centred and variance scaled column wise . This prevents the fitting of the first component from the origin of the original
68
minimum distance measure. The DTW algorithm is essentially the same for all cases, however different global and local constraints can be specified to limit how the minimum distance and optimal path are calculated. This paper uses the conditions specified below (Gollmer and Posten, 1995) but variations on the specified constraints produced very similar results. (Note that the use of DTW here is on one process variable at a time. )
slope intensity constraint, P = I, which state that the only way to reach point (i ,j) on the warping path is through the points (i-l,j-2), (i-I ,j-I) or (i-2,j-I) ; and the path must proceed with the same number of steps in the i direction as in the j direction. A symmetric weighting function is used, giving equal weight to steps in both the i and j directions. This can be expressed as: w(k) = (i(k) - i(k - I» + (j(k) - j(k - I»
The DTW method attempts to map a test pattern, t = {t(i)}, i= 1,...,1 ; and a reference pattern, r = {r(j)}, j= 1,.. . ,J ; via a non-linear function w(k) = liCk) , j(k)].
i(O) = j(O) = 0 The Dynamic Programming problem is :-
k = I, ...,K, which matches together points on each pattern such that the accumulated distances, D* from the first to last points is a minimum. This can be achieved by Dynamic Programming :
D(I , I) = 2d(l ,I) D(i -I ,j - 2) + 2d(i,j -I) + d(i,j)] D(i,j) = min D(i -I,j -1) + 2d(i,j) [ D(i - 2,j-l) + 2d(i -I , j) +d(i, j)
D(t, r, wO» = d(t, r, wO» · w(l) D(t, r, w(k» =
D(t, r, w(k - 1) min w (k _ I )[ + d(t,
] .
r, w(k» . w(k)
,k
= 2,3,
D *(t,r) = D(I,J) / (I + J)
.. . K
Once the minimum accumulated distance is found, the path can be traced backwards finding which preceding point gave the optimal solution at each point, thus constructing the optimal path. The optimal path states which observations of the test pattern map to which observations of the reference pattern. Interpolating the warping path by the original reference pattern time axis gives a predicted test pattern that is of the same length as the reference pattern. Repeating this procedure between the corresponding process variable in each batch as a test pattern and the batch chosen to represent the desired length batch as the reference profile, for each process variable in each batch in turn results in predictions of all batches that are now of equal length.
D *(t, r) = D(t, r, w(K» where d(t,r ,oo(k» is a distance measure between a point on the test pattern and a point on the reference pattern, e.g. the quadratic distance: d( t , r , w(k» = (t(i)-r(j» '
and w(k) is a non-negative weighting function for the distance d(t(i), r(j), w(k» , constrained to prevent unfeasible mappings of points from one pattern to the other. Constraints are usually placed on the first and last points (endpoint constraints), on the local transition allowed from the preceding point (local slope constraints) and on the allowable search region for the entire mapping (global slope constraints), however in this example the global constraint is defined by the combination of the endpoint and local slope constraints. The end point constraints are chosen to be a set of points from the beginning and end of the profiles which are allowed to be the first and last points on the optimum path, here called the degrees of freedom of the warping path. Due to the nature of the simulated data, zero degrees of freedom were selected for both the beginning and end of the path since these points are reasonably well synchronised in time . However, up to ten degrees of freedom at the beginning of the warping path produced very similar results . These conditions can be expressed as:
Change of axis to indicator variable : - Instead of taking each new observation in a batch at the next step in time, the observations can be taken relative to another process variable that possesses similar properties in time, i.e. it is smooth, continuous, monotonic and spans the range of all other process variables within the batch dataset. In the batch fermenter simulation, several variables satisfy these conditions, however substrate concentration (S) was chosen as the most suitable because it also defined the termination condition of the batch (see below). A measurement on all process variables is then taken at equal intervals on this variable. This method could prove extremely useful in industrial processes where process variables are recorded only when a change occurs rather than at set time intervals.
w(l) = [l,l] w ( K) = [I , l]
The local slope constraints were taken as the Sakoe and Chiba constraints (Sakoe and Chi ba, 1978) with
69
4.
EXAMPLE - BATCH BIO FERMENTER SIMULATION.
model X-block matrix for method 3 compared to the other methods and is hence computationally more economic.
The bioprocess simulation used in this example is that of a batch fermenter programmed in the GPROMS software package library. developed by the Centre for Process Systems Engineering. Imperial College of Science, Technology and Medicine , UK . The process and quality variables monitored are recorded every 0.5 hours. The batch terminates when substrate concentration (S) falls below a threshold value of 1x 10. 9 The process variables are not recorded at the time point at which this occurs. Five process variables are recorded ; non-viable biomass concentration (X_non vi), specific growth rate (mu) , substrate concentration (S) , viable cell biomass concentration (X) and oxygen uptake rate (OUR) ; together with one quality variable, product concentration (P) , which is also available on-line from the simulation. Although with existing sensor technology several of these variables would not be available on-line in an industrial fermenter they will be used to demonstrate the described techniques. Fifty nominal batches (batch numbers I-50), ten batches of low initial biomass yield (51-60) and ten batches of low initial product yield (61-70) were computed by Monte-Carlo simulation within predefined ranges on the maximum specific growth rate, initial biomass yield , initial product yield, and the initial biomass (inoculum). Thirty nominal batches (batch numbers 1-30) were used to build the MPLS models based upon each of the three methods described previously. The remaining twenty nominal batches (31-50) and ten batches of each low yield were used as unseen validation batches to test the performance of each MPLS model.
°r~ 71 o
50 Time
50 Time
100
1 1':!O] ~I L ! :I\~Ii i i i i i - ~II 50
100
0
Time
':!O] o
100
50
100
Time
...........
50 Time
100
0
50 Time
100
Fig. 2. Nominal batch profiles before being processed by each method.
°t~l ~ I 0
50 Time
100
01~~
d
0~5: 0
0
100
50 Time
::1<] ~ 1 ':10] ~ 0
':F 0
50 Time
~ 50 Time
100
0
50
100
Time
1
100
t' 0
II1II 50 Time
100
Fig. 3. Nominal batch profiles processed by method I - cutting to minimum batch length.
5. QUALITATIVE DISCUSSION OF DATA AFfER PRE-TREA TMENT FOR LENGTH ADJUSTMENT
Figures 2, 3, 4 and 5 show the batch process variable profiles for the untreated nominal batches, and as treated by methods 1,2 and 3 respectively . Figure 3 shows the variation present in all the original untreated batches, but only up to time 73 .5 hours. Figure 4 shows that this variation is very much reduced by using the DTW method and the profiles now show much less variation from the nominal batch for each process variable . Figure 5 shows that the process variables are very much linearised by switching scale to an interval of one unit steps in substrate concentration while retaining a large proportion of the between batch variation at each sampling point; a feature which may be process dependent. It is also noted that the time variable is now transformed by the new scale, showing a low sampling rate over the start of the batch, increasing to a higher more constant rate from around halfway, to the end of the batch.
To perform pre-treatment of the data to implement method 1, the minimum length batch was found to be 73.5 hours, giving all batch process variable datasets dimension (148 x 5). Using method 2, two nominal batches of the median length of 83 .5 hours were averaged and used as the reference process variable profiles, resulting in all batch process variable datasets being of dimension (168 x 5). Using the median length caused as many batches to be stretched as were compressed . Using method 3, one sample is taken on each process variable everyone unit of substrate concentration decreasing from 100 to 5 units, giving each batch a process variable dataset of dimension (96 x 5). It is noted that there are still five process variables because substrate concentration has been replaced by time sampled on the new indicator variable scale. This shows a significant reduction in the size of the data used in the MPLS
70
0:E71 o
::r o
':F 0
50 Time
100
~
50 Time
:s Time
~ 1 ':10]
50 Time
was not considered an advantageous trade-off against the increase in complexity of the model by adding these components. Perhaps the most interesting result of Table I is the low degree of X-block variation explained by method 2, even for seven LV s, relative to the other methods . This will be considered later.
100
0
:IOYBJ
1 100
0
50 Time
1
+
38
j 50 Time
Final Balch Quality 40r---~--~--~----~--~--------.
100
+ + + + +
+
+
+
+ +..... + + ++ ++ + + ++ + + + + .....+ + ++ + +
36
1 100
+
34+ P final
+
32
+ + +
+
+
+
+ +
+ +
• • • • + ". •• •
30
Fig. 4 . Nominal batch profiles processed by method 2 - DTW.
Ilk
xX'S< Xx ":
28
1~loom==-= •. ~bW=X=-=·.b=~z=~~I~~__~__~____~~
26~
o
0:L 21 o
50
100
0
100
0
IlndlCator vanable sample no.
50
~t=x, dtw=o, int=+
2 3
70
'bOOt>
10
100
50
°°
°
100
ilndlCator vanable sample no.
.15
o/c X-block varIance explained
% Y-block
3 7 2
99.8 58.2 99 .3
97.2 100.0 97 .2
X
X L -_ _~_ _--'-_ _~~_ _~_ _--'-_ _ _ _'------'CJX
o
10
20
30 40 Batch number
50
60
70
Fig. 7. Prediction error as a percentage of final quality for each batch, by each method (positive value indicates larger prediction than actual value) .
RESULTS OF MPLS ANALYSIS.
No . LVs chosen
x
°
°
·10
Table I. Number of latent variables suggested by leave-one-out cross-validation and 'ft explained variance information . Method
60
Prediction Error (%) 15 r---~---r--~----~---r----r---.
Fig. 5. Nominal batch profiles processed by method 3 - substituting an indicator variable (S) for the time scale.
6.
50
!Indicator .... anable sarJ1)le no.
!IndICator van able sample no
50
40
Fig. 6. Actual final batch quality for all batches.
':ISI ':[ 21 :IOYBJ -I o
30
Batch number
!IndICator vanable sample no
~1:;Z1 50
20
100
ItndlCator .... anable sample 00.
o
10
Prediction Error (%) magnitude only
14
variance explained
Fu1=x, dtw=o, Int=+
I
0° ° ° ex
12 10
°
8
Q)X
°
6
° The choice of three latent variables (LVs) for method 1 and seven LV s for method 2 were made by leaveone-out cross-validation on the model building batches, however for method 3, two LVs were chosen instead of the seven L Vs suggested by leave-one-out cross-validation, as the small decrease in the Prediction Error Sum of Squares (PRESS) obtainable
X
° °
X
X
2
o
o
10
20
30
40
50
60
70
Batch number
Fig. 8. Absolute prediction error as a percentage of final quality for each batch, by each method .
71
Figure 6 shows the actual quality values for all batches and Figures 7 and 8 show the performance of the three methods in predicting the final quality measurements. Batch numbers are as referred to in Section 3. From Figures 7 and 8, it can be seen that all three methods give accurate predictions for the model building batches, indicated by the percent error being close to zero and generally less accurate predictions for the unseen validation batches, as is expected. Predictions for the validation batches of low yields (numbers 51-70) are in fact extrapolations of the model, since no information was present in the model building batches of this type of batch behaviour. In practice it would be important for a model of this type to be reliable in regions slightly outside the scope of the batches used to build the model, increasing it's robustness. While method 2 (DTW) gives the most accurate predictions of all three methods for the model building batches, it gives the least accurate predictions for all 3 types of validation batches, unacceptably large errors occurring for the low product yield batches (numbers 61-70) which are consistently overestimated (see Figure 7). Method I gives reasonably good performance for all batches except the low product yield, where it consistently underestimates the final quality by between five and fourteen percent. Method 3 gives consistently good predictions, the maximum error being less than five percent with no apparent bias towards over or underestimating the quality.
than possibly being excluded by using batches cut to minimum length, especially when the difference between maximum and minimum batch lengths is large. Similar results have been achieved when the comparisons were applied to a Batch Polymerisation process, using conversion as the indicator variable and predicting three quality variables in the MPLS application (MIMO system). Current work includes extending the methods to on-line monitoring and prediction schemes.
8. ACKNOWLEDGEMENTS The authors would like to thank Dr. J. Glassey for her help regarding selection of the data, and gratefully acknowledge the support of CPACC and the BBSRC for support of Stuart Rothwell.
9. REFERENCES Geladi, P. and Kowalski , B.R. (1986), Partial least squares regression: a tutorial, Analytica Chimica Acta 185, pp. 1017. Gollmer, K. and Posten, C. (1995), Detection of distorted pattern using Dynamic Time Warping algorithm and application for supervision of bioprocesses, Preprints of the IFAC workshop on on-line fault detection and supervision in the chemical process industries, Newcastle, June 1213 th Hoskuldsson, A. (1988), PLS regression methods, 1. Chemometrics 2, pp. 211-228 . Kassidas, A., MacGregor, 1.F. and Taylor, P.A. (1997), Fault Diagnosis in continuous dynamic processes using speech recognition methods, Preprints of IFAC ADCHEM International Symposium on Advanced Control of Chemical Processes, Banff, Canada, June 9-11 th Nomikos, P. and MacGregor, J.F. (1995), Multivariate SPC charts for monitoring batch processes, Technometrics 37, no.!. (1978), Dynamic Sakoe, Hand Chi ba, S. programming algorithm optimization for spoken word recogmtlOn , IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP26 (I), pp. 43-49. Wold, S. , Esbensen, K. and Geladi, P. (1987a), Principal component analysis, Chemometrics and Intelligent Laboratory Systems 2, pp. 37-52. Wold, S., Geladi, P., Esbensen, K. and Ohman, J. (l987b), Multi-way principal components and PLS analysis, 1. Chemometrics 1, pp. 41-56.
7. CONCLUSIONS By switching from monitoring variables on a time scale to using an indicator variable from the available process variable set, an improvement over existing methods of cutting batches to minimum length has resulted , in terms of reduced prediction error of final quality via MPLS analysis and a more generalised model in terms of the type of variance contained in the model. A second method involving a DTW algorithm was also presented and evaluated against the other two methods. This method , while providing adequate performance, proved less successful than the other methods in the test cases. In contrast the indicator variable method is process specific and requires a process variable to be available that has similar properties to time , such as being continuous, monotonic and spans the entire range of all other process variables being used in the multivariate statistical model. Such a variable may not exist in every industrial process, in which case one of the other described methods could be used with confidence. Other advantages to using the indicator variable method are reduced size of the MPLS Xblock matrix, leading to decreased computational cost. Variation towards the end of the batch that may be important would be included in the model, rather
72