8th IFAC Symposium on Fault Detection, Supervision and Safety of Technical Processes (SAFEPROCESS) August 29-31, 2012. Mexico City, Mexico
Online batch fault diagnosis with Least Squares Support Vector Machines ⋆ P. Van den Kerkhof J. Vanlaer G. Gins J.F.M. Van Impe BioTeC, Department of Chemical Engineering, Katholieke Universiteit Leuven, W. de Croylaan 46, B-3001 Leuven, Belgium (e-mail: {pieter.vandenkerkhof, jef.vanlaer, geert.gins, jan.vanimpe}@cit.kuleuven.be). Abstract: A new fault identification method for batch processes based on Least Squares Support Vector Machines (LS-SVMs; Suykens et al. [2002]) is proposed. Fault detection and fault diagnosis of batch processes is a difficult issue due to their dynamic nature. Principal Component Analysis (PCA)-based techniques have become popular for data-driven fault detection. While improvements have been made in handling dynamics and non-linearities, correct fault diagnosis of the process disturbance remains a difficult issue. In this work, a new data-driven diagnosis technique is developed using an LS-SVMs based statistical classifier. When a fault is detected, a small window of pretreated data is sent to the classifier to identify the fault. The proposed approach is validated on data generated with an expanded version of the Pensim simulator [Birol et al., 2002]. The simulated data contains faults from six different classes. The obtained results provide a proof of concept of the proposed technique and demonstrate the importance of appropriate data pretreatment. Keywords: batch control; fault detection; fault identification; data processing; statistical process control. 1. INTRODUCTION In comparison to continuous processes, batch processes have a lower capital cost and a higher flexibility to produce multiple products or grades. Therefore, batch processes play an important role in the chemical and biochemical industries for the production of high added value products (e.g. pharmaceuticals, food products, polymers, semiconductors). A batch process can be prone to a number of process disturbances such as impurities in the raw materials, fouling of heat exchangers, sensor failures, plugged pipes, etc. The dynamic nature of batch processes presents a challenging problem for fault detection and diagnosis. Today’s process plants dispose of large historical databases containing the frequent measurements of online sensors on hundreds of variables. Statistical Process Monitoring (SPM) aims to exploit these existing databases for process monitoring, fault detection and fault diagnosis, and therefore has a tremendous potential for industrial applications. ⋆ Work supported in part by Projects OT/09/25/TBA and PVF/10/002 (OPTEC Optimization in Engineering Center) of the Research Council of the K.U.Leuven, Project KP/09/005 (SCORES4CHEM) of the Industrial Research Council of the K.U.Leuven, and the Belgian Program on Interuniversity Poles of Attraction initiated by the Belgian Federal Science Policy Office. P. Van den Kerkhof and J. Vanlaer are funded by a Ph.D grant of the Institute for the Promotion of Innovation through Science and Technology in Flanders (IWT-Vlaanderen). J. Van Impe holds the chair Safety Engineering sponsored by the Belgian chemistry and life sciences federation essenscia. The authors assume scientific responsibility.
978-3-902823-09-0/12/$20.00 © 2012 IFAC
432
Most recent research within the field of SPM has been devoted to fault detection and identification using techniques based on Principal Component Analysis (PCA). While progress has been made in improving fault detection performance by including process dynamics (e.g., batch dynamic PCA [Chen and Liu, 2002], auto-regressive PCA [Choi et al., 2008]) or non-linear extensions of PCA (e.g., kernel PCA [Lee et al., 2004]), correct diagnosis of the process disturbance remains a difficult issue. Examining contribution plots, which chart the contribution of each variable to the out-of-control statistic, is by far the most popular approach to find the cause of an alarm signal [Westerhuis et al., 2000]. The generation of contribution plots requires no prior knowledge about process disturbances. However, process knowledge is necessary for interpreting the contribution pattern and finding the actual cause. Cho and Kim [2004] proposed a Fisher Discriminant Analysis (FDA)-based classifier which provides a more direct diagnosis. The classifier is trained on historical faulty data and assigns the cause of a detected fault to the class it most resembles. The drawback of this method is the need of a significant amount of historical faulty data as FDA requires a number of past fault batches greater than the dimensionality of the fault data. For example, Cho and Kim [2004] needed 700 faulty training batches per class in their case study. As process plants are monitored and controlled to achieve satisfactory product quality and prevent process faults, the number of faulty batches available is limited. Therefore, in most practical cases, pseudo-batches have to be generated to account for the data insufficiency [Cho and 10.3182/20120829-3-MX-2028.00118
SAFEPROCESS 2012 August 29-31, 2012. Mexico City, Mexico
Kim, 2005]. The limited availability of faulty batches is an important consideration for the design of a data-driven fault diagnosis scheme. Cho [2007] extended the linear FDA approach to non-linear problems by employing kernel FDA and reduced the need for pseudo-batch generation. Recently, Support Vector Machines (SVMs) were utilized as a learning algorithm for fault classification of continuous processes [Y´elamos et al., 2009]. SVMs are based on statistical learning theory developed by Vapnik [1998] and have shown to exhibit a large generalization performance, especially when the number of training samples is small [Abe, 2005]. This is an important advantage as the availability of faulty data is a common bottleneck in developing data-driven diagnosis techniques. In this paper the application of Least Squares SVMs (LSSVMs; [Suykens et al., 2002]) to data-driven fault diagnosis of batch processes is explored. As a case study, data of an expanded version of the Pensim simulator developed by Birol et al. [2002] is used. First, the basics of PCA and its application to fault detection in batch processing are summarized in Section 2. Next, LS-SVMs are briefly introduced with a focus on multi-class problems in Section 3. The proposed LS-SVMs-based methodology for batch fault diagnosis is explained in Section 4. Section 5 describes the case study on which the fault diagnosis method is validated, followed by a discussion of the obtained results in Section 6. Finally, conclusions and future research directions are provided in Section 7. 2. PCA-BASED FAULT DETECTION 2.1 PCA for batch data Industrial data is typically heavily correlated as the measured variables are connected through physical laws, mass balances, redundancy of sensors, etc. PCA reduces the number of measured variables to a smaller number of uncorrelated variables or scores by exploiting these correlations [Jollife, 1986]. While PCA is applicable to two-dimensional matrices only, a batch data set is inherently three-dimensional as it contains I batches of which J variables are measured at K different time points. Nomikos and MacGregor [1994] solved this issue by first unfolding the I ×J ×K batch data array to a two-dimensional matrix. In this paper, the data is normalized around the mean trajectory to zero mean and unit variance and subsequently unfolded using the variable-wise unfolding method proposed by Wold et al. [1998]. The K × J measurements of each batch are placed under each other to obtain an IK ×J data matrix X. After unfolding, each column of the variable-wise unfolded data matrix X is normalized before applying PCA. The PCA model approximates X with a lower dimensional matrix T containing R scores (R ≤ J) for each row of X. The scores matrix is found by projecting X on a loading matrix P: X = TPT (+EX ). (1) where EX represents the residuals. The sizes of the matrices T, P, and EX in Eq. 1 are IK × R, J × R and IK × J respectively. The R columns of P correspond to the principal components. The first principal component is 433
the direction of maximum variance in the data; subsequent components explain gradually less variance. When applying PCA to process data, large variance are assumed to be important dynamics, while smaller variances represent noise. The number of principal components R to include has to be decided by the user. 2.2 Fault detection statistics In Statistical Process Monitoring, abnormal behavior is detected by comparing measured process data against a reference dataset obtained under Normal Operating Conditions (NOC). Each new 1 × J measurement vector xk is projected on P to obtain its 1 × R score vector tk and 1 × J residual vector ek . The current score vector and residuals are compared to the NOC data by computing two scalar fault detection statistics. The Hotelling’s T 2 statistic monitors the scores and checks if a new observation projects onto the model plane defined by the loading matrix P within the limits determined by the NOC data. The Squared Prediction Error (SPE) statistic monitors the residuals to detect the occurrence of any abnormal events that cause new observations to move away from the model plane. Upper control limits are established for both statistics based on the reference data set [Nomikos and MacGregor, 1994]. 2.3 Fault detection versus fault identification Fault detection statistics only indicate if process behavior is normal in comparison to the NOC reference data. When a fault is detected, they provide no information about the cause of the out-of-control signal. In this work, an LS-SVM classifier is trained on historical data of past faulty batches to provide online fault diagnosis. 3. INTRODUCTION TO LS-SVMS The concept of LS-SVMs and their extension to multi-class problems is discussed in Section 3.1 and 3.2 respectively. For more information on LS-SVMs, the interested reader is referred to the book of Suykens et al. [2002] for a detailed treatment. 3.1 LS-SVM basics Consider a dataset consisting of N samples of M dimensional input data xi (i = 1 . . . N ) belonging to two classes (Fig. 1). Each sample can be labeled with a scalar yi ∈ {−1, +1} for the positive and the negative class respectively. In their simplest form, LS-SVMs train a linear decision function or hyperplane y = wT x + b (2) to separate the input data, where w is an M -dimensional vector and b a scalar bias term. For a new data point xk , Eq. 2 is evaluated and xk is labeled as the sign of yk . An infinite number of separating hyperplanes exist. LS-SVMs seek the hyperplane that maximizes the margin between the two classes (Fig. 1b). This maximum margin principle leads to a higher generalization performance, i.e. an increased correct classification rate of unseen data points. The four samples lying on the margin in Fig. 1b are called support vectors.
SAFEPROCESS 2012 August 29-31, 2012. Mexico City, Mexico
Fig. 1. Illustration of linearly separable data separated by a hyperplane H with (a) a small margin and (b) maximized margin. In the latter case an increased generalization performance can be expected. LS-SVMs can be extended to not linearly separable data by allowing a small amount of classification error. These so called soft margin LS-SVMs have a regularization parameter which reflects the trade-off between maximization of the margin and minimization of the classification error. LSSVMs are also readily extended to non-linear classifiers by employing the kernel trick. According to Mercer’s theorem, kernel functions exist which correspond to a dot product in a higher dimensional space. Rewriting the LS-SVMs algorithm in terms of dot products and substituting them by a kernel function K(xn , xm ) transforms Eq. 2 to y=
N X
αn yn K(xn , x) + b
(3)
n=1
where N is the number of training data points and K(xn , x) is the kernel function of the n-th vector of the training set and the unseen vector x [Abe, 2005]. The αn ’s are found by solving a set of linear equations. Substituting a non-linear kernel function yields a non-linear LS-SVMs classifier. In this work, the simple linear and the popular Radial Basis Function (RBF) kernel are studied. K(xn , xm ) = hxn , xm i linear kernel ! 2 kxn − xm k K(xn , xm ) = exp − RBF kernel 2σ 2 where σ is a parameter called the kernel width. 3.2 Multi-class LS-SVMs A standard LS-SVM is a binary classifier whereas fault diagnosis is a multi-class problem. Several techniques for decomposing a multi-class problem into a series of binary classification problems (binarization) exist. Two widely used techniques are the One-versus-One (OvO) and the One-versus-All (OvA) approach. The OvO approach involves training a binary classifier for each pair of fault classes. In analogy with Eq. 2, the decision function Dij (x) for class i against class j is given by T Dij (x) = wij x + bij . (4) The input vector x is classified by voting. The decision function for class i thus becomes Cf X Di (x) = sign(Dij (x)) (5) i6=j,j=1
where Cf is the number of fault classes. x is assigned to the class for which Di (x) is at a maximum. If no single maximum exists, then x is unclassifiable.
434
The OvA method entails the training of Cf classifiers to discriminate between each fault class and the remaining training data. If Di (x) > 0 for class i only, then x is assigned to class i. If the decision function is positive for more than one class or zero for all classes then x is unclassifiable. The number of binary classifiers to be trained in this approach (Cf ) is lower compared to the OvO approach in which Cf (Cf − 1)/2 binary classifiers have to be trained. But the number of training data for each classifier is higher and unclassifiable regions are larger for the OvA approach. The multi-class LS-SVMs are trained using the LS-SVMlab toolbox [Suykens et al., 2002] for MATLABr available online at http://www.esat.kuleuven.be/sista/lssvmlab. LS-SVMlab employs a two-step optimization procedure to find the optimal parameters (kernel and regularization parameters) by minimizing the crossvalidation error. Coupled Simulated Annealing determines suitable parameters which are subsequently fine tuned with a simplex method. 4. PROPOSED FAULT DIAGNOSIS METHOD The proposed diagnosis methodology entails two phases: (i ) an offline model building phase and (ii ) an online diagnosis phase. A general scheme of the new method is presented in Figure 2. 4.1 Offline model building The first step of the offline model building phase consists of scanning past faulty batches by applying the NOC PCA model and corresponding fault detection statistics. By combining process knowledge and operator experience, the cause of each detected fault is studied. Based on this study, fault classes are defined and the detected faults are assigned to one of these fault classes. The definition of fault classes is an important step. A large number of fault classes reduces classification performance due to a decreasing amount of training batches per class and an increasing similarity between different classes. On the other hand, coarsely defined classes are unhelpful for straightforward corrective actions. Therefore in practice, the number of classes is a trade-off between classifier performance and practical use of the diagnosis results. The goal of the second step is to obtain a uniform characteristic fault pattern for each fault class. Contrary to FDA-based methods, the classifier in the proposed method is not trained on the data of each entire faulty batch. To obtain a characteristic pattern for each fault class, it is important to focus on the data obtained at the origin of the faulty period as samples obtained at later time points might be influenced by different corrective actions depending on the operator. Moreover, multiple faults can occur in a single faulty training batch. Different choices of the data window are possible. A first option is to include only the 1×J measurement vector at the time of detection. The advantage of this approach is that as soon as a new fault is detected, the current measurement vector can be passed to the classifier to obtain the diagnosis result. On the other hand, using only the J measurements at detection sometimes provides insufficient discrimination between similar faults. Also, when applying the classifier
SAFEPROCESS 2012 August 29-31, 2012. Mexico City, Mexico
Table 1. Variables of the Pensim simulator. Time Dissolved Oxygen (DO) Fermentation volume Dissolved CO2 pH Reaction temperature
Feed rate Agitator power Feed temperature Coolant flow rate Base flow Acid Flow
Table 2. Simulated fault classes and their magnitude and starting time ranges. Fault type Feed concentration step Coolant temperature step Agitator power drop Aeration rate drop Feed rate drift DO sensor drift
Fig. 2. Scheme of the proposed batch diagnosis method. at subsequent faulty samples, classification performance degrades as the characteristic pattern changes when a fault propagates through the system. A second option is to add measurements at N −1 later time points in a 1×JN vector. In this case, diagnosis of a new batch is only possible when all these measurements are available, but classification performance might increase. A missing data estimator can be added to overcome the latter problem. The effect of an increasing time window will be shown in Section 6. Appropriate scaling increases similarity within each fault class and avoids overloading the classifier with irrelevant features. For example, normalizing around the mean trajectory reduces the difference between faults of the same class occurring at differing time points. As the choice of a suitable scaling method depends on process knowledge, other scaling methods will be demonstrated in Section 6 after the case study has been introduced in Section 5. After collecting the data and converting it into a suitable form for classification, the LS-SVM classifier is trained. In this work, both the OvO and OvA techniques to decompose the multi-class identification problem into a series of binary classification problems will be investigated. 4.2 Online fault diagnosis During the online or application phase, the current batch is monitored using the existing fault detection system. If a fault is detected, the data of the current batch undergo the same pretreatment steps as the faulty reference data. The pretreated data are subsequently passed to the trained classifier which assigns a fault class to the current disturbance. Based on this information, operators take corrective actions or abort the process if the batch can not be salvaged. Monitoring the fault detection statistics and successive fault classifications reveal if the operator actions bring the process back within normal operating conditions. If the diagnosis result was satisfactory, the data can be added to the reference set and the classifier retrained. 5. CASE STUDY As a case study, data of a simulated industrial-scale biochemical process for penicillin fermentation is obtained using an extended version of the Pensim simulator of Birol 435
Magnitude ±[1% , 10%] ±[1% , 10%] [−30% , −5%] [−90% , −70%] ±[.15%/h , .35%/h] ±[.50%/h , .75%/h]
Start time 0h 0h 20h − 380h 20h − 380h 70h − 380h 20h − 380h
et al. [2002]. The production process involves two phases: a batch phase and a fed-batch phase. Initially, the bioreactor is operated in batch mode. When the substrate concentration has decreased to 0.3 g/L (after about 43 hours), the fed-batch phase is started, and additional substrate is continuously fed into the reactor. After adding 25 L of substrate (after approximately 460 hours), the fermentation is stopped. During the fermentation, 11 sensors record various flows, temperatures, and pH. The time signal is added as an extra variable. The variables are listed in Table 1. The measured signals are aligned and resampled to a length of 101 samples for the batch phase and 501 samples for the fed-batch phase, using the indicator variables proposed by Birol et al. [2002]. 200 NOC batches with varying initial conditions are simulated as a reference set for the PCA fault detection model. The data is first scaled around the mean trajectory to unit variance and zero mean, then unfolded variable-wise and finally auto-scaled. A separate PCA model is constructed for each phase. The number of principal components is determined with an adjusted Wold criterion with a threshold of 0.90 on the fraction of explained variance. For the batch phase 3 principal components are selected and 4 for the fed-batch phase, explaining 60% and 64% of the total variance respectively. For training and validation purposes of the LS-SVM classifier, a number of faulty batches are generated. Each of the generated faulty batches contains one of the six faults described in Table 2. The starting time and magnitude of each fault is randomly chosen from a uniform distribution with bounds stated in Table 2. For each fault class 4 faulty training batches and 100 faulty validation batches are generated, resulting in a total set of 624 batches. For faults where the magnitude can be positive or negative (e.g. drifts on the DO sensor) an equal number of batches of each sign is simulated. 6. VALIDATION AND RESULTS In this section, the proposed method is validated on the generated faulty batches following the steps of Fig. 2. The first step depicted in the scheme entails the definition of fault classes, i.e. determining the scope of the classification problem. In this case study, the fault classes are given in the first column of Table 2. Note that fault magnitude,
SAFEPROCESS 2012 August 29-31, 2012. Mexico City, Mexico
Table 3. Correct classification rates of each disturbance for One-versus-One and One-versus-All. One-versus-One Feed concentration step Coolant temperature step Agitator power drop Feed rate drift Aeration rate drop DO sensor drift Global One-versus-All Feed concentration step Coolant temperature step Agitator power drop Feed rate drift Aeration rate drop DO sensor drift Global
Raw µ σ 100.0% 0.0% 100.0% 0.0% 65.9% 2.4% 93.4% 11.6% 38.4% 7.2% 23.5% 4.9% 70.2% 2.7%
Scaled µ σ 100.0% 0.0% 98.0% 0.0% 91.4% 0.8% 99.0% 0.0% 44.4% 0.7% 44.2% 0.7% 79.5% 0.2%
Normalized µ σ 100.0% 0.0% 98.0% 0.0% 100.0% 0.0% 100.0% 0.0% 74.0% 0.9% 64.4% 10.6% 89.4% 1.8%
Data window 5 µ σ 100.0% 0.0% 100.0% 0.0% 100.0% 0.0% 100.0% 0.0% 89.8% 3.6% 81.2% 5.3% 95.2% 0.4%
Data window 10 µ σ 100.0% 0.0% 100.0% 0.0% 100.0% 0.0% 100.0% 0.0% 96.5% 2.2% 94.1% 1.7% 98.4% 0.2%
Raw µ σ 99.0% 1.3% 99.8% 0.5% 75.5% 0.5% 27.7% 24.7% 50.3% 1.6% 9.6% 1.4% 60.3% 3.6%
Scaled µ σ 98.0% 0.4% 100.0% 0.0% 84.2% 0.5% 71.8% 0.5% 65.9% 1.2% 0.0% 0.0% 70.0% 0.3%
Normalized µ σ 100.0% 0.0% 97.0% 0.0% 100.0% 0.0% 95.6% 2.5% 55.3% 5.3% 27.4% 7.0% 79.2% 1.5%
Data window 5 µ σ 100.0% 0.0% 99.6% 0.8% 100.0% 0.0% 97.1% 3.4% 93.3% 2.4% 79.7% 1.5% 94.9% 0.7%
Data window 10 µ σ 100.0% 0.0% 99.9% 0.3% 100.0% 0.0% 99.4% 1.2% 93.7% 2.4% 93.5% 1.2% 97.8% 0.5%
sign and occurrence time are of no interest. The scope of the classifier is solely the determination of the fault type. Step two involves the collection and pretreatment of training data. For this purpose, the faulty batches are monitored with the developed NOC PCA model and fault detection statistics. An alarm signal is raised when a statistic is above its control limit at three or more consecutive time points. All faults were detected after less than 10 sample points from the onset of the fault. For each faulty batch, the moment of detection is recorded and the data prior to the alarm signal of the fastest statistic discarded. After the collection and pretreatment steps, the classifier is trained on 4 batches per fault class and validated on 100 batches per fault class. A linear and an RBF kernel soft margin LS-SVM is identified for each multi-class technique (OvO and OvA). The probabilistic optimization routine yields slightly different values of the regularization parameter and/or kernel parameter for each binary classifier and, hence, slightly different results. Therefore, the cycle of training and subsequent validation is repeated 100 times and the mean values and standard deviations of the correct classification rates are computed. The results obtained using the OvO and OvA approach using a linear kernel are summarized in Table 3. Five combinations of scaling and data window width are investigated. The first combination (Raw) applies no scaling and uses a data window of one sample which comes down to supplying only the 1×J vector of the raw measurements at the moment of detection to the classifier. The second combination (Scaled) employs the scaling explained in Section 5 of the fault detection system, i.e. normalization around the mean trajectory followed by auto scaling of the variable wise unfolded data. It supplies the classifier with the scaled measurement vector at the time of detection.
The third combination (Normalized) takes the absolute value of each element of the scaled vector and divides it by the sum of the absolute values over all elements before passing the vector to the classifier. The fourth combination (Data window 5) widens the data window of the previous method to 5 samples by concatenating a 1 × 5J vector of the scaled and normalized measurement vectors at the moment of detection and the 4 subsequent time points. The fifth and last combination (Data window 10) employs a data window of 10 samples using the scaled en normalized measurement vectors. It is important to study the performance of each fault class separately instead of only considering the global performance, as the pretreatment methods can influence the individual performances in different ways. Table 3 lists the correct classification rates of each fault class for the linear case. For brevity, the results obtained using an RBF kernel are omitted, but the individual correct classification rates show the same trends. As can be seen from the tables, using the raw data has the lowest global correct classification rate. As a consequence of the transient nature of batch processes, the data pattern depends on the moment of detection. This adds additional uninformative variability to the fault patterns and hence lowers the classification performance. In contrast to the other fault classes, faults on the feed concentration and coolant temperature have a high correct classification rate. These faults occur at a fixed time point (see Table 2) which explains their already high performance. The next section of Table 3 lists the classification rates when using scaled measurement vectors as classifier input. Because the data is scaled around the mean trajectory, the influence of the fault detection time is reduced substantially. This leads to a more uniform fault pattern pre-
Table 4. Global correct classification rates.
Linear kernel, One-versus-One RBF kernel, One-versus-One Linear kernel, One-versus-All RBF kernel, One-versus-All
Raw µ σ 70.2% 2.7% 46.8% 4.5% 60.3% 3.6% 40.1% 4.2%
Scaled µ σ 79.5% 0.2% 83.3% 2.0% 70.0% 0.3% 75.5% 7.5%
436
Normalized µ σ 89.4% 1.8% 88.8% 1.4% 79.2% 1.5% 85.6% 2.1%
Data window 5 µ σ 95.2% 0.4% 94.8% 0.7% 94.9% 0.7% 93.4% 1.2%
Data window 10 µ σ 98.4% 0.2% 97.3% 1.1% 97.8% 0.5% 96.2% 1.6%
SAFEPROCESS 2012 August 29-31, 2012. Mexico City, Mexico
sented to the classifier and improved performance which is reflected in the significant increases in correct classification rates compared to using the raw data. Further improvement is possible by taking the absolute value and dividing each element by the sum of the absolute values over all elements before passing it to the classifier. Normalizing has a double effect: (i ) it eliminates the difference between negative and positive fault magnitudes and (ii ) it reduces the influence of the absolute magnitude on the fault pattern. After these steps, the correct classification rate of faults on the agitator power and feed rate are close to 100%. The performance on faults of the aeration rate and DO sensor on the other hand, remains unsatisfactory. Both these faults influence the DO variable which leads to a partial overlap of the characteristic signatures. However, by observing the time evolution of the DO measurements, it is possible to discriminate between both faults as illustrated in Fig. 3. By extending the data window to 5 or 10 time points, the correct classification rates in Table 3 exhibit a significant increase for aeration and DO sensor faults. While data pretreatment has a large influence on the classifier’s performance, as evidenced from the previous results, the choice between an OvO or OvA technique is of lesser importance. Overall, the OvO approach has a higher correct classification rate, but when the data is pretreated appropriately, the difference becomes negligible. Table 4 lists the global rates for both kernels to compare the performance of the RBF to the linear kernel. The results indicate a higher spread on the classification rates when using an RBF kernel as an extra parameter (the kernel width) must be optimized. Using an RBF kernel improves classification performance for the scaled data and also for the normalized data in the OvA technique. In general, class boundaries are more complex when separating one class from all other classes as opposed to separating two classes. The RBF kernel introduces increased flexibility which explains why the RBF kernel leads to a higher performance improvement for the OvA technique. When the data is pretreated appropriately, the performance difference between the linear and RBF kernel is negligible. 7. CONCLUSIONS AND FUTURE WORK A new batch fault diagnosis method based on PCA and LS-SVMs was developed and validated on data of a simulated fed-batch fermentation process. The importance of an appropriate data scaling method and data window to achieve satisfactory correct classification rates was demonstrated. If the data is pretreated appropriately, the choice of a kernel function and binarization technique (OvO or OvA) is of lesser importance. The proof of concept provided in this work opens the door to validation on industrial data. Future work consists of managing the occurrence of false alarms and studying the influence of the amount of training data available. REFERENCES S. Abe. Support Vector Machines for Pattern Classification. Springer-Verlag, London Limited, 2005. 437
Fig. 3. Mean trajectory of the scaled and normalized DO measurement during a step decrease in aeration rate () and DO sensor drift (•). ¨ G. Birol, C. Undey, and A. C ¸ inar. A modular simulation package for fed-batch fermentation: penicillin production. Computers & Chemical Engineering, 26:1553– 1565, 2002. J. Chen and K.-C. Liu. On-line batch process monitoring using dynamic PCA and dynamic PLS models. Chemical Engineering Science, 57:63–75, 2002. H.-W. Cho. Nonlinear feature extraction and classification of multivariate process data in kernel feature space. Expert Systems with Applications, 32:534–542, 2007. H.-W. Cho and K.-J. Kim. Fault diagnosis of batch processes using discriminant model. International Journal of Production Research, 42(3):597–612, 2004. H.-W. Cho and K.-J. Kim. Diagnosing batch processes with insufficient fault data: generation of pseudo batches. International Journal of Production Research, 43(14):2997–3009, 2005. S.W. Choi, J. Morris, and I.-B. Lee. Dynamic modelbased batch process monitoring. Chemical Engineering Science, 63:622–636, 2008. I.T. Jollife. Principal Component Analysis. Springer Verlag, New York, 1986. J.-M. Lee, C.K. Yoo, S.W. Choi, P.A. Vanrolleghem and I.-B. Lee. Nonlinear process monitoring using kernel principal component analysis. Chemical Engineering Science, 59:223–234, 2004. P. Nomikos and J.F MacGregor. Monitoring batch processes using multiway principal component analysis. AIChE Journal, 40(8):1361–1375, 1994. J.A.K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor and J. Vandewalle. Least Squares Support Vector Machines. World Scientific, Singapore, 2002. V.N. Vapnik. Statistical Learning Theory. John Wiley & Sons, New York, 1998. J.A. Westerhuis, S.P. Gurden and A.K. Smilde. Generalized contribution plots in multivariate statistical process monitoring. Chemometrics and Intelligent Laboratory Systems, 51(1):95–114, 2000. S. Wold, N. Kettaneh, H. Frid´en and A. Holmberg. Modelling and diagnosis of batch processes and analogous kinetic experiments. Chemometrics and Intelligent Laboratory Systems, 44:331–340, 1998. I. Y´elamos, G. Escudero, M. Graells and J. Puigjanera. Performance assessment of a novel fault diagnosis system based on support vector machines. Computers & Chemical Engineering, 33:244–255, 2009.