Real-time batch process supervision by integrated knowledge-based systems and multivariate statistical methods

Real-time batch process supervision by integrated knowledge-based systems and multivariate statistical methods

ARTICLE IN PRESS Engineering Applications of Artificial Intelligence 16 (2003) 555–566 Real-time batch process supervision by integrated knowledge-ba...

748KB Sizes 7 Downloads 68 Views

ARTICLE IN PRESS

Engineering Applications of Artificial Intelligence 16 (2003) 555–566

Real-time batch process supervision by integrated knowledge-based systems and multivariate statistical methods 1 . Cenk Undey , Eric Tatara, Ali C¸ınar* Department of Chemical and Environmental Engineering, Illinois Institute of Technology, 10 W 33rd St, Chicago, IL 60616, USA Received 29 April 2003; received in revised form 29 April 2003; accepted 1 September 2003

Abstract Real-time supervision of batch operations during the progress of a batch run offers many advantages over end-of-batch quality control. Process monitoring, quality estimation, and fault diagnosis activities are automated and supervised by embedding them into a real-time knowledge-based system (RTKBS). Interpretation of multivariate charts is also automated through a generic rule-base for efficient alarm handling and fault diagnosis. Multivariate statistical techniques such as multiway partial least squares (MPLS) provide a powerful modeling, monitoring, and supervision framework. Online process monitoring techniques are developed and extended to include predictions of end-of-batch quality measurements during the progress of a batch run. The integrated RTKBS and the implementation of MPLS-based process monitoring and quality control are illustrated using a fed-batch penicillin production benchmark process simulator. r 2003 Elsevier Ltd. All rights reserved. Keywords: Knowledge-based systems; Statistical process monitoring; Multiway partial least squares; Quality prediction; Real-time supervision

1. Introduction Monitoring of batch operations is crucial in many pharmaceutical and specialty chemicals production processes. Batch processes are characterized by prescribed processing of raw materials for a finite duration to convert them to products. A high degree of reproducibility is desired to obtain successful batches that produce products conforming to product specifications consistently. Real-time process monitoring, fault diagnosis, and process control will enhance product quality and yield. Traditionally, final product quality is assessed by a number of quality measurements made at the quality analysis laboratories on product samples collected at the end of a batch run. Deviations in process variables during the progress of the batch can provide information about product properties and the likely quality of the final product well before the completion of the batch. Multivariate statistical techniques process monitoring and fault diagnosis have been very effective in achieving these goals of process supervision. Auto*Corresponding author. Fax: +1-312-567-8874. E-mail address: [email protected] (A. C¸ınar). 1 Present address: Amgen, Inc., West Greenwich, RI 02818, USA. 0952-1976/$ - see front matter r 2003 Elsevier Ltd. All rights reserved. doi:10.1016/j.engappai.2003.09.003

mation of the implementation of these powerful multivariate statistical tools for process supervision and the interpretation of the statistical information provide a valuable decision making asset for process engineers and plant personnel. Integration of low-level operations in process supervision such as filtering data and adjustment of PID control settings, and high-level qualitative decisions such as implementing different operational policies and fault diagnosis in a unified environment has many advantages. Experience of process operators and engineers is crucial for the success of process supervision and should be incorporated in an automated supervisory system. Real-time knowledge-based systems (RTKBS) provide such an environment where high level automated process supervision can be achieved. KBS developments for chemical process industries included mostly rule-based expert systems and has been based on extensive search algorithms and qualitative search models but lack statistical inference (Petti et al., 1990; Ramesh et al., 1988; Stephanopoulos, 1990; Venkatasubramanian and Rich, 1988; Venkatasubramanian et al., 2003). Research on developing integrated knowledge-based systems (KBS)-statistical process monitoring (SPM) for process supervision and fault detection and diagnosis

ARTICLE IN PRESS 556

. C. Undey et al. / Engineering Applications of Artificial Intelligence 16 (2003) 555–566

(FDD) has progressed during the last decade. Norvilas et al. (2000) proposed an intelligent SPM framework by interfacing KBS and MV techniques and demonstrated its performance with simulation studies. This system was also extended to sensor validation (Tatara and Cinar, 2002). Integrated use of MSPM techniques and RTKBS for real-time on-line monitoring and FDD of fermentation processes was recently proposed (Undey et al., 2000; Glassey et al., 2000; Leung and Romagnoli, 2002). Industrial applications of the integrated RTKBS and MSPM techniques are also reported (Albert and Kinley, 2001). Most of the recent applications are developed using Gensym’s G2 software (Gensym Corporation, 2001). Successful development and implementation of RTKBS using G2 for supervision of industrial fermentation processes are reported (Alford et al., 1999a,b). Multivariate (MV) statistical techniques such as principal component analysis (PCA) and partial least squares (PLS) are found to be very suitable for online SPM, and FDD of batch processes. There is a growing interest in the use of MV techniques in batch process modeling, monitoring, and FDD (Glassey et al., 2000; Undey et al., 2000; Albert and Kinley, 2001; Lennox et al., 2001; Cinar et al., 2003). The synergistic integration of KBS and multivariate SPM (MSPM) tools offers advantages. Integrating MSPM and RTKBS enables the automated interpretation of MV charts during the abnormal situations and relates this information with process knowledge. MSPM tools based on PCA and PLS usually rely on arrangement of data from a three-dimensional array (batch  variable  time) to a two-dimensional array. Usually the rearrangement is made preserving the batch direction. While this is a good choice for many reasons, it necessitates estimation of the unrealized portion of variable trajectories to the end of the run for online monitoring. A different online MSPM framework has been suggested by arranging process data array by preserving the variable direction to eliminate the arbitrariness introduced by various assumptions used for trajectory estimation (Wold et al., 1998; Guay, 2000). Enhancements to this new MSPM framework with online quality prediction using multiway PLS (MPLS) and its integration with a real-time supervisory intelligent KBS are reported in this study. The methodology is illustrated using simulated data from fed-batch penicillin fermentation. Interpretation of multivariate charts is automated via a generic statistical rule-base for reporting process upsets to plant engineers and operators. The structure of the RTKBS and the integration of MSPM techniques with an RTKBS is presented in Section 2. Details of MPLS model development for both online process monitoring and fault diagnosis, construction of multivariate charts and the modification of MPLS modeling for online quality prediction are discussed in Section 3. Finally, the use of

the integrated intelligent system is illustrated via case studies in Section 4 for simulated fed-batch penicillin fermentation data.

2. Integration of MSPM techniques with RTKBS The integrated intelligent system development and deployment are performed using G2 software (Gensym Corporation, 2001) in this study. G2 is a graphical knowledge base development environment for creating intelligent real-time applications. It has a hybrid KBS paradigm with classes and objects for knowledge presentation and rules for inferencing. Applications developed in the G2 environment are called knowledge bases (KBs) and contain workspaces where object code is organized. Workspaces are arranged in a hierarchical structure such that the source code can be easily managed. Workspaces contain all the rules, variables, and objects that constitute a KB. The G2 programming language is very similar in structure to the English language, allowing for rapid software development. Graphical representation of data in G2 is performed through several different types of real-time charts that are customizable by the user. The displays may be placed on any workspace and the shape, position, and colors of the display may be modified as needed by the user. Customizable graphical user interface (GUI) controls such as buttons and text entry boxes are available in G2. G2 provides an excellent platform for the development of supervisory KBSs. However, complex numerical calculations such as process simulations or the matrix manipulations needed for analysis cannot be efficiently implemented in G2. One may take advantage of more sophisticated programs for numerical analysis and simulation by linking those programs to G2. G2 provides several options for networking and passing data to and from external programs. The G2 standard interface (GSI) bridge allows the KB to access remote procedures written in C. All of the routines for statistical analysis were initially developed in Matlab 6.1 (Mathworks, 2001a). The Mathworks provides a Matlab Compiler (Mathworks, 2001b) that converts the standard Matlab functions into ANSI C files. The functions contained in the Matlab-compiled C files are callable by a custom GSI bridge code that communicates with G2 via TCP/IP. The executable bridge code does not need to be executed on the same CPU on which G2 is also currently running. This allows the KBS to run on one machine while the bridge code can be executed on a remote machine via the Internet. The RTKBS developed has a top layer module that provides a GUI and manages the flow of information between sub-modules (Fig. 1). A graphical pull-down menu provides access to the functions of the various

ARTICLE IN PRESS . C. Undey et al. / Engineering Applications of Artificial Intelligence 16 (2003) 555–566

Fault Detection

557

Multivariate Charts

Fault Diagnosis Quality Prediction

Multivariate Statistics Rule-base

Combined Intelligent Rule Inference Engine

Trend Analysis

Process Heuristics Rule-base

User Interface

High Level G2 Modules

External Modules

MPLS Models MPLSV

MPLSB

Process Variable Trajectories Database

Product Quality Database

Data Acquisition Module

Fig. 1. Modularized integration of the intelligent process supervision system.

modules as shown in Fig. 2. The Data Acquisition menu allows the user to select the source of the data from either real-time monitoring hardware, dynamic simulation, or previously stored data files. In addition, data collection may be started, paused, and stopped at any time. Various statistical monitoring techniques can be selected via the Monitoring menu. RTKBS findings about process faults and warnings regarding variable trends are communicated with the user via the Messages menu. Finally, all real-time charts and message boards can be accessed from the Displays menu. Process data are acquired using the Data Acquisition module that passes the current set of process observations to the top level module. Process data may be displayed on trend charts for each process input and output. After process data are received by the top level module, they are passed to the appropriate MPLS module for analysis as necessary. The MPLS module is further divided into two separate modules that preserve variable and batch directions, respectively (MPLSV and MPLSB), for calculating real-time statistical monitoring as well as end-of-batch product quality prediction. The MPLSV module is called at each time step by the MPLS module, while the MPLSB module is only called at the appropriate pre-scheduled intervals based on the batch progress for each phase. The multivariate statistics calculated by the MPLS modules are returned to the top level module for display on trend charts as well as storage.

The statistics returned by the MPLS modules are analyzed automatically by the KBS. However, it is sometimes useful for the operator to observe trend charts of the statistics. Fig. 2 shows multivariate process monitoring charts for the squared prediction error (SPE) and T 2 statistics. Each statistic is plotted as a time series versus the percent completion of the phase along with their upper control limits (UCL). On the screen, the 95% UCL is green, while the 99% UCL is red to enhance the user’s awareness of the severity of the limits (in Fig. 2 they are specified as 95% and 99% next to limit lines). The T 2 limits are fixed for the duration of the phase; however, the SPE limits are variable. Variable contribution bar charts are displayed directly below the SPE and T 2 charts. Each vertical bar represents the variable contribution for the corresponding statistic at the current time. The height of each bar changes with time due to normal fluctuations in the batch or process faults. The chart of variable contributions to T 2 displays a 99% UCL for discriminating which variables are responsible for an out-of-control condition. The chart of variable contributions to SPE displays a 99% UCL as well. A progress bar at the bottom of the KBS window (Fig. 2) indicates how the progress of the current batch compares to the regular operation. A pair of brackets follow the bar indicating the minimum and maximum limits for the current batch process. Other real-time statistical data such as score charts are available to the user at any time via the Displays menu.

ARTICLE IN PRESS 558

. C. Undey et al. / Engineering Applications of Artificial Intelligence 16 (2003) 555–566

Fig. 2. RTKBS fault and warning messages (upper right) at the time of the out-of-control signal for the agitator power step case. Upper left: both the T 2 and SPE are represented as a time series along with their control limits. Variable contributions are represented by a vertical bar with their limits represented by a line. Lower left: the predicted end-of-batch quality at the time of the fault is indicated with ð&Þ; the mean value is represented by the center line and the confidence limits are represented by the upper and lower lines.

The inference engine provides automated real-time diagnosis and advising capabilities via rules comprised of process-independent and process-dependent information. Process-independent information includes information related to statistical inference such as when a particular statistic exceeds its limit. This information is not specific to the process because it is based solely on data-driven methods. Process-specific information includes information related to the various parameters of the physical system such as the fact that the feed flow rate affects the volume of the bioreactor. Processspecific rules are used primarily to supplement information contained in the process-independent rules. A KBS consisting only of one type of rules or the other is less robust than a KBS containing both types of rules. Process-specific rules must be written by a domain expert, whereas process-independent rules contain information pertaining to the statistical tests and their limits. Interfacing statistical techniques with processindependent rules results in a system that can be used to

monitor any process, compared to a system composed primarily of process-specific rules. Furthermore, such a system provides excellent diagnosis assistance to plant personnel who may be knowledgable about plant operations, though not experts in MSPM techniques. The process-independent rule-base consists of several different modularized rule bases that include information for diagnosing statistics including SPE, T 2 ; scores, and their respective contributions. Furthermore, the rules are organized according to the severity of the condition that a limit violation represents. SPE and T 2 limit violations are treated as the highest severity level (process faults), while score limit violations are treated as less severe (warnings). This discrimination is necessary because, while only the SPE and T 2 indicate an outof-control condition, trends in the score charts can alert the operator to a pending fault. When an out-of-control situation is detected by monitoring the SPE or T 2 statistics, the KBS checks the contribution charts of the appropriate statistic to determine the cause of the fault.

ARTICLE IN PRESS . C. Undey et al. / Engineering Applications of Artificial Intelligence 16 (2003) 555–566

The following is a sample rule for checking the variable contributions to T 2 :

559

3. Statistical monitoring of process performance Batch process measurements made on J variables at K time intervals for I batches with acceptable product quality form a three-way array X of size ðI  J  KÞ as depicted in Fig. 3. Underlined boldface uppercase symbols represent three-dimensional arrays, boldface uppercase symbols represent two-dimensional arrays used in computation. Product quality measured at the end of a batch on M variables form a matrix Y of size ðI  MÞ: Multivariate statistical modeling techniques such as PCA and PLS are suitable for explaining variations in the process ðXÞ and the product space ðYÞ and providing a foundation for a process monitoring and fault diagnosis framework. PCA is an effective multivariate statistical technique to explain process variability when only process variables are available. PLS is a biased regression technique that relates quality variables to process variables by extracting information on process variables that are most predictive on product quality (Kourti and MacGregor, 1996). PCA and PLS techniques have been extended to multiway PCA (MPCA) and PLS (MPLS) to account for the three-way data array decomposition of batch processes (Wold et al., 1987; Nomikos and MacGregor, 1994,1995).

for n ¼ 1 to no.var do if (t2.con½n > t2.con.99½n) call analyze.contributionsðnÞ; end; The variable contributions that exceed their statistical limits are passed to the analyze.contributions() function which further analyzes the variable contribution to determine if the process variables is higher or lower than the expected value. A list that contains the names of the process variable provides a symbolic representation of the process variable. A text message is produced to identify the responsible variable, the affected variable(s), and their relative magnitude of deviations (high or low). The text messages are reported by the rule-base in the ‘‘Process Faults’’ category on the ‘‘KBS Observation Messages’’ workspace as shown on the right-hand side of Fig. 2. Similarly, when a score exceeds its limit, the variable score contributions to that score are checked. The variable score contributions that exceed their statistical limits for each score are reported by the rule-base in the ‘‘Process Warnings’’ category on the ‘‘KBS Observation Messages’’ workspace.

K Time

1

. . . . . . . .

K=2

K=1 1

J

2J

KJ

1

Batches

X (IxKJ)

X I

I

1 J Variables

(b) J

1 1 I=1

K I=2

2K

. . . . . .

. . . . . .

I

IK

X (IKxJ)

(a) Fig. 3. Three-way array formation and unfolding of batch process data: (a) by preserving variable direction, (b) by preserving batch direction.

ARTICLE IN PRESS . C. Undey et al. / Engineering Applications of Artificial Intelligence 16 (2003) 555–566

560

w

(Fig. 3a) to obtain an IK  J matrix or the batch direction (Fig. 3b) to obtain a I  KJ matrix (Henrion, 1994). Batch evolution can be monitored by developing a predictive MPLSV model between autoscaled X (IK  J) and an autoscaled time stamp vector (or an indicator variable vector resembling the time stamp vector) z (IK  1) (Fig. 3a). In this case, MPLSV decomposes X and z into a combination of scores matrix T ðIK  RÞ; loadings matrix P ðJ  RÞ and vector q ðR  1Þ; and weight matrix W (J  R) with R latent variables (model dimensions)

T

J

z

t

u 1

X IK q

IK (a)

p

T

wT

X ¼ TPT þ E; M

KJ t X

u

Y

I

I

pT (b)

qT

Fig. 4. MPLS modeling by using different unfolding approaches. (a) Model blocks for predicting progress of the batch (MPLSV), (b) model blocks for predicting product quality (MPLSB).

An online MSPM framework can be established by unfolding the three-way data array to two-dimensional matrices by preserving variable direction (Wold et al., 1998; Guay, 2000). In this MSPM framework (Fig. 3a), it is not necessary to estimate the unobserved future portions of variable trajectories as needed in most batchwise unfoldings (Fig. 3b). MPCA or MPLS models can be developed and used for online monitoring. A methodology has been proposed by developing an MPLS model between process variable matrix that is unfolded in the variable direction and local time stamp (or a variable that also indicates the batch progress) to use in the alignment of trajectories (Wold et al., 1998). Therefore, MPLS-based monitoring with this type of three-way data unfolding provides solutions to both data alignment and future value estimation problems. This technique is also coupled with the conventional MPLS technique that preserves batch direction when online/offline quality prediction and end-of-batch monitoring are aimed. The MPLS technique that preserves variable direction is called MPLSV (Fig. 4a), and the conventional technique that preserves batch direction is called MPLSB (Fig. 4b). Model formulation for MPLSV is outlined in Section 3.1; model formulation for MPLSB is discussed in Section 3.2. 3.1. MPLSV When the process measurements array X is unfolded, a two-dimensional X is formed after rearrangement of data slices by preserving either the variable direction

z ¼ Tq þ f;

ð1Þ

where E and f are the residuals matrix and vector, respectively. Equalization and alignment of trajectories are required if batches in this reference set ðXÞ are of different lengths. Data alignment using an indicator variable can be performed in different ways. If there exists an indicator variable (IV) that other process variables can be measured against its percent completion, variable trajectories in the reference set are resampled by linear interpolation techniques with respect to this indicator variable. Otherwise, a similar alignment is performed by using predicted local time vector which behaves as an IV (since it is a model of process variable trajectories). The predicted IV (Eq. (1)) can be used as a maturity indicator of the batch. If it is smaller than the observed value, then the batch is progressing more slowly with respect to the reference batches used in MPLSV modeling. During the progress of a new batch, a vector xnew of size ð1  JÞ becomes available at each sampling time k: After applying the scaling of the reference set to the vector of new observations, scores can be predicted for time instant k by using the MPLSV model parameters k

¼ xnew WðPT WÞ1 :

ð2Þ

New residuals and batch progress predictions (on IV) can be calculated as ek ¼ xnew k PT ;

ð3Þ

zpred;k ¼k q:

ð4Þ

zpred in Eq. (4) can be calculated for each batch in the reference set and control limits (plus or minus three standard deviation of zpred for each time interval k) can be constructed to monitor batch maturity. The ultimate objective in MPLSV model development is to use it for online real-time process monitoring and fault diagnosis activities. Various MV charts are constructed to achieve this goal. One contribution of this study is incorporation of these charts to an intelligent process monitoring system for automated interpretation.

ARTICLE IN PRESS . C. Undey et al. / Engineering Applications of Artificial Intelligence 16 (2003) 555–566

3.1.1. MSPM charts The score plots of latent variables are used to detect any departure from the in-control region defined by the confidence limits calculated from the reference set. The upper and lower control limits (UCL, LCL) for new independent t-scores under the assumption of normality are defined as 7tn1;a=2 sref ð1 þ 1=nÞ1=2 ;

where k is the predicted score vector of the new batch calculated using Eq. (2), I the number of batches in the reference set, and R the number of latent variables retained in the model. Predicted score vectors of reference batches are gathered to form the reference scores matrix T: Average score traces k and covariance matrix of mean-centered reference scores at each time interval Sk are calculated from T: The squared prediction error (SPE) chart shows large variations and deviations from normal operation that are not defined by the model. SPE values that are calculated for each time interval k using the residuals vector in Eq. (3) are well approximated by the chisquared (w2 ) distribution SPEk ¼

J X

where CSPE;ijk is the contribution of batch i to the SPE value for process variable j at time k (Miller et al., 1998; Westerhuis et al., 2000). Variable contributions to T 2 and mean-centered scores are calculated similarly (Nomikos, 1996). We modified the variable contributions formulation for MPLS as

ð5Þ

where tn1;a=2 is the critical value of the t-student test with n  1 degrees of freedom at significance level a=2; n and sref are the number of observations and the estimated standard deviation, respectively, of the t-score sample at a given time interval k: Charts of T 2 detects small shifts and deviations from normal operation defined by the model. T 2 and the corresponding statistical limits (UCL only) are also calculated by using the mean-centered score matrix. T 2 values for each sampling time k follow an F -distribution: IðI  RÞ BF ðR; I  RÞ; Tk2 ¼ ðk k ÞT S1 k ðk k Þ RðI 2  1Þ ð6Þ

561

CT 2 ¼ k

R X

 S1 rr trk xnew;k Wr;j ;

ð9Þ

r¼1

where matrix W of size ðJ  RÞ is defined as W ¼ WðPT WÞ1 : Control limits for contribution plots have been suggested to facilitate identification of variables that inflated the SPM statistics (Westerhuis et al., 2000). Control limits for variable contributions to SPE (Eq. (8)) follow the w2 distribution and they can be calculated as defined in Eq. (7). Limits for contributions to T 2 are computed using jackknife procedure, where each reference batch is left out once, and variable contributions are calculated for each batch that is left out. Then, the mean and variance of these contributions are calculated from I batches for each jth variable at kth time period to compute the UCL. The UCL for contributions is the mean of the variable contributions at each time interval plus three times the corresponding standard deviation (Westerhuis et al., 2000). Charting variable deviations from average trajectories at each time instant can also be used as a diagnostic tool (Wold et al., 1998). The same jackknife procedure is also used to construct upper and lower control limits for these deviations. They provide information on how variable trajectories are deviating about the mean trajectories, but their univariate nature hinders effective diagnosis especially in the case of drift type of disturbances. 3.2. Quality prediction

e2jk

Bgw2h ;

ð7Þ

j¼1

where g is a constant and h is the effective degrees of freedom. Contribution plots are used for fault diagnosis. T 2 and SPE charts and score plots produce an out-of-control signal when a fault occurs, but they do not provide any information about the cause. Contribution plots for T 2 ; SPE, and scores identify the variable(s) responsible for indicating the deviation from normal operation. Identification of these process variables aid plant personnel in inferring about the equipment failures or other source causes to diagnose the fault. The inference activity can be automated by using a KBS. Contributions to SPE can be calculated by CSPE;ijk ¼ e2ijk ;

ð8Þ

MPLSV modeling technique lacks online prediction capability of end-of-batch quality in real time. A twostep approach is proposed to account for online quality prediction. After reference batch data are aligned using the IV technique, the progress of a batch run is determined according to percent increments on local batch time (or another IV) so that batches in the reference set are partitioned based on these increments that are chosen arbitrarily such as 10%, 20% of zpred (Fig. 5a). Each partition of X ðIK  JÞ is rearranged and inserted into matrix X ðI  KJÞ (Fig. 5b). Whenever a partition is rearranged, i.e. some percent of the batch is completed, another MPLSB model is developed between this partial data and the final product quality matrix Y: This gives an opportunity to predict end-of-batch quality on percent progress points reflected by

ARTICLE IN PRESS . C. Undey et al. / Engineering Applications of Artificial Intelligence 16 (2003) 555–566

562

1

50%

50%

100%

20%

X1,1

X1,2

X2,1

40% . . . . . . . . . . . KJ

M

i=1

20%

. . . . .

100%

40%

I

X2,2

. . . .

Y

X(IxKJ) I

60% 80% 100% 50%

. . . . .

Phase 1 Phase 2 . . . . . . . . . . .

i=1, k=K i=2, k=1

z

J

i=1, k=1

Phase 1 Phase 2

X(IKxJ)

(a)

IK

IK

(b)

Fig. 5. Integrated quality prediction: (a) partitioning of process measurements space and (b) restructuring for online quality prediction framework.

partitions. The number of quality predictions will be equal to the number of partitions. Each MPLSB model decomposes X and Y into a combination of score matrices T ðI  RÞ and U ðI  RÞ; loading matrices P ðKJ  RÞ and Q ðM  RÞ and weight matrix W ðKJ  RÞ as shown in Fig. 4b X ¼ TPT þ E;

Y ¼ TQT þ F;

ð10Þ

where E and F are the residuals matrices, respectively. When a prescheduled portion of the batch is complete (e.g. 10%, 20%, etc.) the corresponding local MPLSB model is used to predict scores and product quality at that point of the process as ¼ xnew WðPT WÞ1 ;

¼ QT ;

ð11Þ

where xnew is equal to unfolded and scaled partial data and is ð1  MÞ predicted end-of-batch quality measurements vector. Confidence intervals at significance level a are also suggested for these predicted quality values (Nomikos and MacGregor, 1995): 7tIR1;a=2 ðMSEÞ1=2 ð1 þ ðTT TÞ1 TÞ1=2 ;

ð12Þ

where tIR1;a=2 is the critical value of the Studentized variable with I  R  1 degrees of freedom at significance level a=2; T and MSE are the scores matrix and mean-squared error from MPLSB model and is the estimated scores vector of new batch.

site www.chee.iit.edu/~control/software.html. A descriptive flow chart of the process is depicted in Fig. 6. The model has five input variables (1–4 and 14), nine process variables (5–13), and five quality variables (Table 1). Feedback controllers keep pH and temperature near their desired values. Penicillin cultivation process has two operational phases. It has actually four physiological phases: lag, exponential cell growth, stationary, and cell death. The first two phases are conducted as batch operation, while the following two phases are conducted as fed-batch operation. Consequently, we grouped the four phases of fermentation into two operational phases. In the first operational phase, fermentation is carried out in batch mode to promote biomass growth resulting in high cell densities. The second phase is a fed-batch operation. When the initial amount of glucose is consumed by the growing cells, additional glucose is fed during the fed-batch operation until the end of the run. In a batch fermentation process that lasts several days, some microorganisms may have different generation times. Slight changes in operating conditions during critical periods may have a significant influence on growth and differentiation of microorganisms, and impact final product quality and yield. 4.2. Data generation, pretreatment and modeling

4. KBS implementation 4.1. Fed-batch penicillin fermentation process Fed-batch penicillin fermentation is used as a case study. Process data are generated using a detailed mathematical model and a simulator named PenSim (Birol et al., 2002). The simulator is available at the web

To simulate the physical uncertainty present in each batch due to variable metabolic responses, very small perturbations are introduced into the parametric space and the input variables used in the simulator while generating the batch data set. A reference data set of 60 batches is simulated under nominal conditions with small perturbations using PenSim, resulting in unequal batch lengths. Data are divided into two parts

ARTICLE IN PRESS . C. Undey et al. / Engineering Applications of Artificial Intelligence 16 (2003) 555–566

563

REACTOR

FC

pH T

ACID

DO

PUMP-2

PUMP-1

FC BASE

PUMP-3 SUBSTRATE

COOLING-WATER

HOT-WATER

PUMP-4

PUMP-5

AIR

Fig. 6. PenSim flowchart: fed-batch penicillin cultivation simulator (Birol et al., 2002). (available at www.chee.iit.edu/~control/software.html.) Table 1 Input (1–4), process (5–14), and product variables ðy15 Þ of simulated fed-batch penicillin fermentation

Table 2 Explained variance on X and Y blocks in MPLSB modeling with partial data

Variable no.

Definition

Model no.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 y1 y2 y3 y4 y5

Aeration rate Agitator power input Substrate feed rate Substrate feed temperature Substrate concentration Oxygen saturation (%) Biomass concentration Penicillin concentration Culture volume Carbon dioxide concentration Hydrogen ion concentration (pH) Temperature in the bioreactor Generated heat Cooling water flow rate Amount of substrate fed (computed) Final penicillin concentration Overall productivity Yield of penicillin on biomass Yield of penicillin on substrate Amount of penicillin produced

corresponding to batch (phase 1) and fed-batch (phase 2), respectively. Separate indicator variables are selected for each phase. The culture volume (variable 9) decrease is found as a good candidate in phase 1. A derived variable called ‘percent substrate fed’ is calculated from substrate feed rate (variable 3) and used as an indicator variable in phase 2. It is assumed that phase 2 is

1 2

Batch progress (%)

X block (%)

Phase 1

Phase 1

50 100 Phase 2

3 4 5 6 7

20 40 60 80 100

71.57 64.44

Y block (%)

32.68 38.42

Phase 1+phase 2 58.14 54.21 51.97 51.91 49.94

66.38 84.20 92.56 96.06 97.94

completed when 25 l of substrate is added to the bioreactor. Reference batch data are re-sampled by linear interpolation at each 1 percent completion of volume decrease in phase 1 and at each 0.2 percent of total substrate added for phase 2. Data alignment is achieved yielding in equal number of data points in each batch such that the data lengths are K1 ¼ 101 and K2 ¼ 501; respectively. MPLS model development includes two stages. In the first stage, an MPLSV model is developed between the autoscaled process variables matrix and the corresponding autoscaled IV vector as discussed in Section 3. This model is used for online SPM and fault diagnosis by the

ARTICLE IN PRESS . C. Undey et al. / Engineering Applications of Artificial Intelligence 16 (2003) 555–566

564

End of Batch Report

22.2.2003 21:5:42

Batch Number PEN-7 Total Batch Time : 301.0 hours Productivity Measures Max Penicillin Conc. = 2.159 g/L at 86.0% of Phase 2 Final Penicillin Conc. = 2.125 g/L Total Penicillin Produced = 239.262 g

During the batch operation 1 faults occured Fault 1 The following variables caused Fault 1: Low AGITATOR.POWER The following variables were affected by Fault 1: Low O2.CONCENTRATION

Fig. 7. RTKBS end-of-batch report for the agitator power step fault case.

RTKBS. Multivariate chart limits are also constructed for use in RTKBS. In both MPLSV models for each phase 5 latent variables are retained in the model after cross validation. The second stage involves developing predictive MPLSB models between available data partitions matrix and end-of-batch quality measurements matrix as discussed in Section 3.2. To develop the first MPLSB model, data are collected in 50% increment of phase 1 resulting in two data partitions X1;1 and X1;2 ; and for every 20% increase in phase 2 evolution resulting in five data partitions ðX2;n ; n ¼ 1; y; 5Þ as shown in Fig. 5. MPLSB modeling is performed between the rearranged X that is augmented as a new data partition becomes available at pre-scheduled increment points. As more data become available, local MPLSB models are developed for predicting end-of-batch quality with process measurements data collected up to that point. In this application, seven local models are

Fig. 8. RTKBS fault and warning messages (upper right) at the end of the batch for the agitator power step case. Upper left: both the T 2 and SPE are represented as a time series along with their control limits. Variable contributions are represented by a vertical bar with their limits represented by a line. Lower left: the predicted end-of-batch quality is indicated with ð&Þ; the mean value is represented by the center line and the confidence limits are represented by the upper and lower lines. The actual final product quality is indicated with ð\Þ:

ARTICLE IN PRESS . C. Undey et al. / Engineering Applications of Artificial Intelligence 16 (2003) 555–566

developed and the explained variance in quality block Y increased with each local model (Table 2). 4.3. Monitoring of abnormal batch runs The first case illustrates the capability to detect and diagnose a step disturbance. A 20% step decrease in the agitator power is introduced at 33.6% completion point in phase 2. The KBS immediately identifies the low agitator power correctly and reports the fault as shown in the upper right portion of Fig. 2. The KBS reports that the dissolved oxygen concentration is too low due to the agitator power failure (upper right part of Fig. 2). Inspection of the SPE and T 2 statistics (upper left part of Fig. 2) reveals the out-of-control condition and the variable contribution plots indicate both the cause (variable 2) and effects (variable 6) of the fault. At the time of the fault, the quality variables appear to be proceeding normally (lower left part of Fig. 2). When

565

the batch is finished, a report is produced by the KBS providing details such as the batch duration, the values of quality variables at the time of batch termination and the list of faults and affected variables (Fig. 7). Product quality predictions made based on the seven models (lower left part of Fig. 8) indicate that all quality variables are significantly different than their acceptable ranges as the batch progresses because of the disturbance. The confidence intervals are not displayed because the model is no longer valid as indicated by the inflated SPE. The ability of the KBS to detect drift faults is also demonstrated by introducing a downward drift of 0.042% per sampling instant on the feed flow rate starting at 33.6% completion in phase 2. Due to the small magnitude of the drift, the effects on the fermenter are not immediately evident. The first indication of a developing problem is reported by the KBS in the form of a warning shown in the upper right portion of Fig. 9,

Fig. 9. RTKBS fault and warning messages (upper right) at the time of the out-of-control signal for substrate feed rate drift case. Upper left: both the T 2 and SPE are represented as a time series along with their control limits. Variable contributions are represented by a vertical bar with their limits represented by a line. Lower left: the predicted end-of-batch quality at the time of the fault is indicated with ð&Þ; the mean value is represented by the center line and the confidence limits are represented by the upper and lower lines.

ARTICLE IN PRESS 566

. C. Undey et al. / Engineering Applications of Artificial Intelligence 16 (2003) 555–566

the T 2 statistic detects the process fault (upper left part of Fig. 9), and the quality prediction at the time when the fault is detected shows that the predicted product quality is deviating slightly from the mean, although not substantially (lower left part of Fig. 9).

5. Conclusions The integration of various process operation supervision tasks using an RTKBS enhances the efficiency of process supervision. Multivariate statistical methods for SPM and FDD provide refined information to RTKBS and improve the accuracy and speed of inferences. Through the use of a hierarchical rule structure, contributions can be effectively used to determine the variables responsible for causing process deviations. The system gives the user access to model-based monitoring tools and knowledge-based diagnosis techniques integrated in a real-time environment with a simple, yet functional user interface. Consequently, process faults may be detected more rapidly and consistently by the integrated KBS than by using statistical or knowledgebased techniques exclusively. Interfacing statistical techniques with process-independent rules results in a system that can be used to monitor any process, compared to a system composed primarily of processspecific rules. Furthermore, such a system provides excellent diagnosis assistance to plant personnel who may be knowledgable about plant operations, though not experts in MSPM techniques.

References Albert, S., Kinley, R.D., 2001. Multivariate statistical monitoring of batch processes: an industrial case study of fermentation supervision. Trends in Biotechnology 19, 53–62. Alford, J., Cairney, C., Higgs, R., Honsowetz, M., Huynh, V., Jines, A., Keates, D., Skelton, C., 1999a. Real rewards from artificial intelligence. InTech, April, pp. 52–55. Alford, J., Cairney, C., Higgs, R., Honsowetz, M., Huynh, V., Jines, A., Keates, D., Skelton, C., 1999b. Online expert-system applications use in fermentation plants. InTech, July, pp. 50–54. Birol, G., Undey, C., Cinar, A., 2002. A modular simulation package for fed-batch fermentation: penicillin production. Computers and Chemical Engineering 26, 1553–1565. Cinar, A., Parulekar, S., Undey, C., Birol, G., 2003. Batch Fermentation: Modeling, Monitoring and Control. Marcel Dekker, New York, NY. Gensym Corporation, 2001. G2 Reference Manual. Cambridge, MA. Glassey, J., Montague, G., Mohan, P., 2000. Issues in the development of an industrial bioprocess advisory system. Trends in Biotechnology 18, 136–141. Guay, M., 2000. Personal communication. Henrion, R., 1994. N-way principal component analysis. Theory, algorithms and applications. Chemometrics and Intelligent Laboratory Systems 25, 1–23.

Kourti, T., MacGregor, J.F., 1996. Multivariate SPC methods for process and product monitoring. Journal of Quality Technology 28, 409–428. Lennox, B., Montague, G.A., Hidden, H.G., Kornfeld, G., Goulding, P.R., 2001. Process monitoring of an industrial fed-batch fermentation. Biotechnology and Bioengineering 74, 125–135. Leung, D., Romagnoli, J., 2002. An integration mechanism for multivariate knowledge-based fault diagnosis. Journal of Process Control 12, 15–26. Mathworks, 2001a. Matlab, Version 6.1 (2001) www.mathworks.com, The MathWorks, Inc. Natick, MA. Mathworks, 2001b. Matlab Compiler, Version 2.2 (2001) User’s Guide, The MathWorks, Inc. Natick, MA. Miller, P., Swanson, R.E., Heckler, C.F., 1998. Contribution plots: the missing link in multivariate quality control. International Journal of Applied Mathematics and Computer Science 8, 775–792. Nomikos, P., 1996. Detection and diagnosis of abnormal batch operations based on multi-way principal components analysis. ISA Transactions 35, 259–266. Nomikos, P., MacGregor, J.F., 1994. Monitoring batch processes using multiway principal component analysis. AIChE Journal 40, 1361–1375. Nomikos, P., MacGregor, J.F., 1995. Multi-way partial least squares in monitoring batch processes. Chemometrics and Intelligent Laboratory Systems 30, 97–108. Norvilas, A., Negiz, A., DeCicco, J., Cinar, A., 2000. Intelligent process monitoring by interfacing knowledge-based systems and multivariate statistical monitoring. Journal of Process Control 10, 341–350. Petti, T.F., Klein, J., Dhurjati, P.S., 1990. Diagnostic model processor: using deep knowledge for process fault diagnosis. AIChE Journal 36 (4), 565–575. Ramesh, T.S., Shum, S.K., Davis, J.K., 1988. A structured framework for efficient problem solving in diagnostic expert systems. Computers and Chemical Engineering 12 (9/10), 891–902. Stephanopoulos, G., 1990. Artificial intelligence in process engineering—current state and future trends. Computers and Chemical Engineering 14 (11), 1259–1270. Tatara, E., Cinar, A., 2002. An intelligent system for multivariate statistical process monitoring and diagnosis. ISA Transactions 41, 255–270. Undey, C., Tatara, E., Williams, B.A., Birol, G., Cinar, A., 2000. A hybrid supervisory knowledge-based system for monitoring penicillin fermentation. Proceedings of the American Control Conference, Vol. 6, Chicago, IL, pp. 3944–3948. Venkatasubramanian, V., Rich, S.H., 1988. An object-oriented twotier architecture for integrating compiled and deep-level knowledge for process diagnosis. Computers and Chemical Engineering 12 (9/ 10), 903–921. Venkatasubramanian, V., Raghunathan, R., Kavuri, S.N., 2003. A review of process fault detection and diagnosis. Part II: Qualitative models and search strategies. Computers and Chemical Engineering 27, 313–326. Westerhuis, J.A., Gurden, S.P., Smilde, A.K., 2000. Generalized contribution plots in multivariate statistical process monitoring. Chemometrics and Intelligent Laboratory Systems 51, 95–114. Wold, S., Geladi, P., Esbensen, K., Ohman, J., 1987. Multi-way principal component and PLS analysis. Journal of Chemometrics 1, 41–56. Wold, S., Kettaneh, N., Friden, H., Holmberg, A., 1998. Modelling and diagnostics of batch processes and analogous kinetic experiments. Chemometrics and Intelligent Laboratory Systems 44, 331–340.