Phase division and process monitoring for multiphase batch processes with transitions

Chemometrics and Intelligent Laboratory Systems 145 (2015) 72–83 Contents lists available at ScienceDirect Chemometrics and Intelligent Laboratory S...

Download PDF

2MB Sizes 2 Downloads 93 Views

Report

PDF Reader
Full Text

Chemometrics and Intelligent Laboratory Systems 145 (2015) 72–83

Contents lists available at ScienceDirect

Chemometrics and Intelligent Laboratory Systems journal homepage: www.elsevier.com/locate/chemolab

Phase division and process monitoring for multiphase batch processes with transitions Xiaochu Tang a, Yuan Li b,⁎, Zhi Xie a a b

College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning Province 110819, China College of Information Engineering, Shenyang university of Chemical Technology, Shenyang, Liaoning Province, 110142 China

a r t i c l e

i n f o

Article history: Received 27 February 2015 Received in revised form 10 April 2015 Accepted 11 April 2015 Available online 20 April 2015 Keywords: Multiphase batch process Phase division Process monitoring Transition

a b s t r a c t In this paper, a new repeatability factor is introduced to achieve phase division for multiphase batch processes. Then a two-step feature vector selection based kernel variable correlation analysis (TSFVS-KVCA) method is proposed for batch processes with transitions monitoring. The TSFVS-KVCA method not only considers the within-phase nonlinear variable correlation but also between-phase one by extracting the common bases and the speciﬁc bases between two neighboring phases. The TSFVS method ﬁrst selects feature vectors from each steady phase as within-phase subsets. Then between-phase subsets are selected from the within-phase subsets of two neighboring phases. These selected feature vectors have well capacity of description on original data so that they can substitute for original data in a feature space F. Thus the TSFVS method reduces the computational complexity and the instability of high-dimensional kernel in subsequent KVCA method. Moreover, each withinphase subset can be used as sub-bases in a feature space F, which simplify the objective function of KVCA method. In this way, the common bases can be extracted in an easier way. Based on the common bases, two neighboring steady phases can be separated into the common subspace and the speciﬁc subspace, in which nonlinear process monitoring method is carried out. Furthermore, an online monitoring method for transitions is proposed based on a just-in-time model. The model can be described by the common bases of two neighboring phases and speciﬁc bases of the dominant phase at current time interval. The dominant phase can be dynamically determined according to the correlation between current transition sample and neighboring phases. The results of simulation demonstrate effectiveness and superiority of the proposed method. © 2015 Elsevier B.V. All rights reserved.

1. Introduction Batch processes have been a core type of production in industrial process [1–5]. In order to ensure the safety of operation environment and meet the requirement of product quality, it is essential for batch processes to carry out the process monitoring. Multivariate statistical analysis, as an effective tool, has attracted increasing attention on applying this type of methods to batch processes monitoring. Multi-way principal component (MPCA) and multi-way partial least squares (MPLS) methods have laid a foundation for introducing multivariate statistical analysis method to batch processes monitoring [6–8]. After that, some researches and improvements for batch processes monitoring have been developed [9–18]. However, it is noted that there exits an important information on multiplicity in batch processes, which can not be mined by traditional MPCA and MPLS method. Therefore, process monitoring for multiphase batch processes should be well developed. ⁎ Corresponding author at: College of Information Engineering, Shenyang University of Chemical Technology, Shenyang, Liaoning Province, 110042, China. Tel./fax: + 86 02489383358. E-mail address: [email protected] (Y. Li).

http://dx.doi.org/10.1016/j.chemolab.2015.04.007 0169-7439/© 2015 Elsevier B.V. All rights reserved.

There are different process characteristics for different phases in multiphase batch process. So, different phase may exhibit different underlying behaviors. To improve the performance of process modeling, some multiphase statistical modeling and monitoring methods have been developed. Lu et al proposed a sub-PCA modeling and monitoring method, in which phase are divided by clustering weight loading of each time slice and within-phase model is built [19]. This algorithm was further developed, in which a new improvement is that separating phase based on PLS parameter matrix corresponding to each time slice [20]. Other methods were proposed by Camacho and Pico [21–23]. However, these methods utilized a strict hard-partition and neglected the transitions between two neighboring phases. This may compromise the accuracy of isolated sub-model during the transition period and it can cause a high false alarm rate in transitions. It illustrates that transition should not be ignored in process modeling and monitoring for multiphase batch processes. Therefore, it is necessary to investigate identiﬁcation, modeling and monitoring for transitions. Zhao et al. ﬁrstly proposed a soft transition algorithm to deal with hard phase division [24]. In this method, class radius and kernel radius are deﬁned to give an ambiguous phase boundary. The transition regions are separated and the corresponding model is expressed as a weight form of two neighboring phase models. Although this

X. Tang et al. / Chemometrics and Intelligent Laboratory Systems 145 (2015) 72–83

method has made some improvements for the performance of transition monitoring, the subjective parameters selection and the calculation of the membership degrees based on Euclidean distances may affect transition identiﬁcation and modeling accuracy. Yao and Gao identiﬁed the transitions automatically without the speciﬁed parameters, which obtained better monitoring results [25]. However, the above methods for transition modeling based on a ﬁxed prior relationship of transition with two neighboring phases. This kind of predeﬁned and determinate transition model may be difﬁcult to characterize transition pattern enough. Also, above mentioned researches on multiphase modeling only focused on how to isolate each individual phase and neglected the interrelationship between phases. Recently, Zhao et al. proposed a new method for transition identiﬁcation and modeling based on a reconstruction idea, overcoming the shortcomings of above methods [26]. Each transition sample is reconstructed based on the extracted common bases and speciﬁc bases between two neighboring phases. The transition can be found by tracking the change of contribution coefﬁcients of target phase model. The transition behaviors can be detected by checking the residual of model. However, there exit some shortcomings for this method. First, the phase center was deﬁned before modeling so that modeling still based on the speciﬁc number of phase and the hard partition. Furthermore, the transition identiﬁcation is not complete since the only start of each transition can be found. In addition, the common bases and speciﬁc bases are extracted based on the multiset variable correlation analysis (MsVCA) [27]. A two-step extraction may lead to excessive extraction so that the bases obtained may be not optimal. Zhang et al. developed the variable correlation analysis (VCA) method based on the objective function on covariance information and introduce VCA method to the kernel space for multimode process monitoring [28]. However, high-dimensional kernel matrix can bring to the difﬁculties in obtaining the eigenvalues and eigenvectors. Besides, transition information is not considered and a detailed illustration on statistics is not given. Recently, Ge et al proposed a new phase division method using a repeatability factor based on Euclidean distance and transition information is considered [29]. However, their work mainly focused on quality prediction, process monitoring has not been considered. In this paper, new indexes are proposed to achieve phase division based on a new repeatability factor. For batch processes, repeatability between batches is an important characteristic. This characteristic can be used to achieve phase division since the repeatability in the steady phase is higher than in the transition. Therefore, the key is to ﬁnd an index to reﬂect the repeatability information in the real sense so that phase division can be well achieved. This paper makes a further understanding on repeatability from a geometrical view. On the one hand, repeatability between two batches may depend on the similarity of space location corresponding data points. On the other hand, it also depends on the similarity of point attribution to the cluster. Based on this understanding, an improved repeatability factor, which based on the diffusion distance, is used to describe the repeatability between batches for phase division. Compared with the conventional repeatability factor, which only measure the similarity between data points, the improved repeatability factor can further reﬂect the intrinsic geometry of the dataset. This intrinsic geometry provides the underlying information on the local density distribution. The bigger density distribution is, the bigger repeatability is. Moreover, a pair of points in the same cluster has bigger density distribution than those in different cluster. Therefore, the improved repeatability factor can reﬂect the similarity of attribute to the cluster beside the similarity between two points. Based on the improved repeatability, the repeatability characteristic between batches can be described in the true sense, which improves the performance of phase division. Based on the results of phase division, modeling and monitoring for multi-phase batch are developed. For multi-phase batch processes, modeling can include the steady phase modeling and the transition

73

modeling. Due to different process characteristics between steady phases and transitions, different modeling schemes should be adopted. For steady phases, the TSFVS-KVCA method is proposed for nonlinear process monitoring. The ﬁrst step feature vector selection is performed on each steady phase, and these selected feature vectors construct a subset of original phase data, which is called as the within-phase subset. Then, the second step feature vector selection is carried out on a dataset constructed by within-phase subsets of two neighboring phases and between-phase subset is obtained. Each within-phase subset or each between-phase subset forms the basis vectors, which has the well capacity of generalization on original data in feature space. Furthermore, a two-step feature selection can effectively reduce the size of original dataset avoiding the computational complexity and the instability of decomposition for high dimensional kernel. In addition, the TSFVS-KVCA method solves the excessive extraction problems and provides new sub-basis vectors to substitute for the traditional linear combination. It shows that the combination of TSFVS with KVCA is suitable and meaningful. Based on a series of subsets, the common bases and speciﬁc bases are extracted. The steady phase is separated into different subspaces. Then, the statistics derivation in different subspaces is given for nonlinear process monitoring. For steady phases, process characteristic keeps similar, so a determinate model is suitable. However, for transitions, the process pattern is gradually changeable. Regularly, the process pattern of a transition sample may be quite alike the nearer phase. Because of dynamics and uncertainty, constructing a determinate model to describe the transition pattern is not reliable. Accordingly, it would be more desirable that modeling for transitions based on a dynamic model without the prior information. Although there exits the switch of transition pattern, it is certain that some inherent characteristics still remains unchanged. The extracted bases based on the TSFVS-KVCA method just can reﬂect the similarity and the dissimilarity between two neighboring phases. They not only achieve the modeling for steady phase, but also provide important information for transition sample analysis and modeling. The transition pattern can be described by the common bases between two neighboring phases and the speciﬁc bases in the starting and target phase, respectively. The change of transition is mainly reﬂected by the switch of speciﬁc bases, since the common bases between the two neighboring phases keeps unchanged during the transition. Although the transition covers two types of phase-speciﬁc characteristics, the fact is that the only one plays a dominant role to describe the transition pattern. It is why the transition pattern is more similar to the start phase at the beginning, whereas it behaves more similar to the target phase at the end. The dominant role of speciﬁc bases is switching from the start phase to the target phase as the transition progresses. The time interval at which the dominant role of speciﬁc bases switches from one phase to the other phase is called the switch point and the phase of which speciﬁc bases plays a dominant role is called the dominant phase. The transition pattern can be well described based on the common bases and the speciﬁc bases of dominant phase at current time interval since the variation caused by the speciﬁc bases of non-dominant is equal to the one that a disturbance within the normal limit caused. Therefore, the dynamic modeling for the transition pattern may mainly depend on how to choose the right speciﬁc bases to explain the transition pattern at each time interval. The dominant phase is determined based on which phase the nearest neighbor of the current transition sample is from. After the dominant phase is determined, a just-in-time model for transition sample is constructed based on the common bases and the speciﬁc bases of dominant phase for transition monitoring at the current time interval. This paper can be composed as follows. In Section 2, the descriptions of proposed methods including phase division and process monitoring method are given. Then, the effectiveness and superiority of proposed method are demonstrated by applying it to the penicillin fermentation process. Conclusions are drawn in the last section.

74

X. Tang et al. / Chemometrics and Intelligent Laboratory Systems 145 (2015) 72–83

division algorithm. An improved repeatability factor is deﬁned as follow:

2. Methodology 2.1. Diffusion distance In this paper, diffusion distance is used in the improved repeatability factor. The detailed description on diffusion distance is given in this n o section [30]. Suppose Xd ¼ xd1 ; xd2 ; ⋯; xdNd is a dataset with Nd sample points, x di is a sample point in Xd, i = 1, 2, ⋯, Nd. From the geometrical view, dataset Xd can be regarded as a weighted graph G and each sample point x di can be regarded as a node in G. The diffusion distance D 2t (x di , x dj ) between x di and x dj can be deﬁned as follows:

2 Dt

2 d d d d xi ; x j ¼ pt xi ; −pt x j ;

1=q

2 pt xdi ; sd −pt xdj ; sd ¼ q sd sd ∈Xd X

ð1Þ Where pt(x di ,⋅) is the transition probability of going from node x di to other nodes in t steps. Here, transition probability in one step is used in diffusion distance calculation. The corresponding transition probability from node x di to x dj is given as follows: kβ xdi ; xdj d d p xi ; x j ¼ deg xdi

ð2Þ

0

1 d d 2 x −x j B i C kβ ¼ exp@− A c

In Eq. (2), calculated by

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2 e−Dt ðxi ;x j Þ

is the degree of node

ð3Þ

x di

in graph G, which is

X d d d deg xi ¼ kβ x i ; s

ð4Þ

sd ∈Xd

In addition, in Eq. (1), q(Δ) is the unique stationary distribution [32], calculated by deg xd d q x ¼X d deg s sd ∈Xd

ð6Þ

Where i, j = 1, 2, ⋯, I, k = 1, 2, ⋯, K, D2t is diffusion distance, xi is the i-th sample point of the k-th time slice. The irf based on the diffusion distance between two batches to reﬂect the repeatability. If two batches have quite similar process behaviors, it shows that there is high repeatability between them and a high value of irf will be obtained. Otherwise, the value of irf will be low. Repeatability of each time slice can be represented by corresponding repeatability matrix, which is given as follow: IR Fk ¼ fir f k ði; jÞg; i; j ∈ I

ð7Þ

Based on the Eq. (7), the differential repeatability matrix between consecutive time samples can be obtained by DIR Fk ¼ IR Fk −IR Fk−1 ¼ fdirf ði; jÞg; i; j ∈ I

ð8Þ

To measure the repeatability at each time slice, the degree of repeatability (DR) of each time slice is calculated by DRk ¼

X

IR Fk ¼

I X I X i

Where kβ is the Gaussian kernel with the width c [31]. It can be given as follows:

deg(x di )

ir f k ði; jÞ ¼

ir f k ði; jÞ

ð9Þ

j

In batch processes, the steady phase has a higher repeatability than the transition period, because higher uncertainties may exit in the transitions. So the obtained values of DR in the steady phases are usually higher than the DR values in the transitions. To identify the switch of phase, differential degree of repeatability (DDR) between adjacent time slices is also calculated by DDRk ¼

X

DIR Fk ¼

I X I X i

dir f k ði; jÞ

ð10Þ

j

Based on the differential degree of repeatability DDR, the switch between the steady phase and transition can be detected. The value of DDR will have an obvious change, when the switch between the steady phase and the transition happens. In the present work, the improved repeatability factor based on the diffusion distance can reﬂect a real repeatability in each time slice. It is because that the pseudo repeatability are avoided compared with the repeatability factor based on the Euclidean distance.

ð5Þ

The diffusion distance not only reﬂects on space location information between data points, but also reﬂects connectivity information between them. The connectivity represents the number of paths between two points. Therefore, the diffusion distance can reﬂect the local density distribution. The bigger local density is, the smaller diffusion distance is. And two points are more likely to be in the same class. Therefore, the diffusion distance between two points not only depends on the space location similarity but also depends on the attribution similarity. 2.2. Phase division based on the improved repeatability factor In our proposed method, a new improved repeatability factor is introduced to achieve the phase identiﬁcation for multiphase batch process. Assume batch process data is collected from I batches. In each batch, J process variables are measured at k = 1, 2, ⋯, K time instances. e ðI JK Þ. Each The batch-wise unfolded matrix can be denoted as X time slice matrix X(I × J) can be regard as the basic unit for phase

2.3. TSFVS-KVCA method After the phase division, the batch process data that is batch-wise unfolded are ﬁrst normalized. Then, the normalized matrix is variable-wise rearranged. Suppose that M steady phases are identiﬁed during the batch duration and each phase data can be denoted as Xm ðINm J Þ ¼ h iT m m , which Nm is the number of samples in each steady xm 1 ; x2 ; ⋯; xINm phase, m = 1, 2, ⋯, M, and each sample vector x m i , i = 1, 2, ⋯, INm. Consider a mapping function ϕ, which maps the data from the original space to the iT h m , where it is assumed that feature space as ϕ Xm ¼ ϕ xm 1 ; ⋯; ϕ xIN m IN m m ∑i¼1 ϕ xi ¼ 0. For each phase data, the kernel matrix Km is a INm × INm m m matrix, which can be deﬁned by [Km]ij = K m ij = 〈ϕ(xi ), ϕ(xj )〉, 1 ≤ i ≤ INm, 1 ≤ j ≤ INm. From the kernel matrix, we can see that the size of kernel matrix depends on the number of samples in the observed data. For multiphase batch data, the size of matrix variable-wise is large. Hence, when kernel methods applied to the multiphase batch process to deal with nonlinear problems, they usually encounter difﬁculties in

X. Tang et al. / Chemometrics and Intelligent Laboratory Systems 145 (2015) 72–83

the three aspects: memory usage, computational complexity and computational instability [33]. The aim of TSFVS method is ﬁnding several feature vectors, which the number of feature vectors is smaller than the number of original data, as an approximation of a basis in F for each phase and two neighboring phase, respectively. The corresponding basis can capture the geometrical structure of the original data in the feature space F [34], so that the mapping of original data can be represented by the mapping of selected feature vectors in the feature space F. For the TSFVS method, the ﬁrst step is that to select the feature vectors for each steady phase data Xm(INm × J). Let Lm(m = 1, ⋯, M) be the number of feature vectors in each steady phase, where Lm ≤ INm. For each steady phase, the selected feature vectors can be denoted as xm s,l(l = 1, ⋯, Lm), and Lm is the number of selected feature vectors for each phase. The within-phase subset can be denoted asXm S ¼ h i m m xm s;1 ; ⋯; xs;Lm . We can estimate the mapping of any sample vector xi by a linear combination of the mapping of Xm S , which is formulated as follows: ^ x m ¼ ϕ Xm T α m ϕ i S i

ð11Þ

iT h m is the mapping matrix of ¼ ϕ xs;1 ; ⋯; ϕ xm Where ϕ Xm S s;Lm selected feature vectors in the feature space F, which selected feature vectors deﬁne a subspace in the feature space F and corresponding subh iT m m m space for each phase can be denoted as F m is S , and α i ¼ α i;1 ; ⋯; α i;Lm the coefﬁcient vector. m For a given x m i , the purpose is ﬁnd α i so that the estimated m ^ mapping ϕ x is as close as possible to the real mapping ϕ(xm i ). We i

can minimize the following equation:

m δi

m T m T m m T m m ϕ xi −ϕ XS α i ϕ xi −ϕ XS α i ¼ m 2 ϕ x i

ð12Þ

m Putting the derivative of δm i to zero obtains the coefﬁcient α i :

T −1 m m m m m ϕ XS ϕ x i α i ¼ ϕ XS ϕ XS

ð13Þ

T m −1 (ϕ(X m exists if the selected feature vectors are linearly S ) ϕ(X S )) independent in F. Substituting Eq. (13) in Eq. (12), we can obtain:

m

δi ¼ min m αi

T Km KmKm 1− Si mSS Si K ii

@ min m XS

X

xm ∈Xm i

!1 mT m m KSi KSS KSi A 1− Km ii

Eq. (15) is equivalent to: X Km T Km Km Si SS Si Km ii xm ∈Xm

max m XS

ð16Þ

i

m m Deﬁne the global and the local ﬁtness Jm S and JSi for a given set XS by

T m m 1 X m m Km K K J ; J ¼ Si mSS Si INm xm ∈Xm Si Si K ii

m

JS ¼

ð17Þ

i

The feature vectors selection is an iterative process: at the ﬁrst iteration, the sample that gives the maximum global ﬁtness is selected. Except the ﬁrst iteration, the sample that having the minimum local ﬁtness is used to construct the new feature vectors set. The iteration stops when the matrix K m SS is no longer invertible. It also stops when the global ﬁtness J m S reaches a given value, or when a predeﬁned number of feature vectors have been reached. After the ﬁrst step feature vectors selection, the within-phase subset m Xm S for each steady phase is obtained. The within-phase subset XS , +1 Xm of two neighboring phase are combined to construct a new S h iT e m;mþ1 ¼ Xm T Xmþ1 T . Then the second step feature vectors data set X S S S e m;mþ1 . The procedure is the selection is performed on each data set X S same as the above ﬁrst step FVS and the corresponding between+1 . The phase subset is obtained, which can be denoted as Xm,m S within-phase subset and between-phase subset provide well inputs for further process monitoring based on the KVCA method, which avoid the spectral decomposition of high dimensional kernels, reduce the memory usage and the computational complexity, as well as increase the numerical stability. In both the MsVCA method proposed by Zhao et al and the KVCA method proposed by Zhang et al, any sub-basis in each dataset pm j (j = 1, 2, ⋯, J) can be given based on the linear combination of observation data or its mapping. A linear coefﬁcient should be considered in the common bases extraction. Alternatively, based on the proposed TSFVS-KVCA method, the within-phase subset just can form a sub-basis in the feature space, which can simplify the object function in the common basis vectors extraction. The common basis vector pg between two neighboring phases is introduced to measure the between-phase interrelations. It should approximate to two neighboring phase sub-bases and even can substitute these sub-bases. The objective function is given as follows:

max ð14Þ

m þ1 X

T 2 m T pg ϕ X S

m¼m

ð18Þ

T

s:t: pg pg ¼ 1

Where K m SS is the kernel square matrix composed by the dot products of selected feature vectors. It can be deﬁned by [K m SS ] pq = m m m m (Km SS ) pq = 〈ϕ(x s,p ), ϕ(x s,q )〉, 1 ≤ p ≤ L m , 1 ≤ q ≤ L m , which x s,p , x s,q are the selected feature vectors. K m Si is the vector of dot products between x m i and the selected feature vectors, which can be deﬁned m m m m by [K m Si ]li =(K Si )li = 〈ϕ(x s,l ), ϕ(x i )〉. K ii is the dot products of sample m xi . For the feature vectors selection, the selected feature set X m S should satisfy to minimize the Eq. (14) over all samples x m i . It can be given as follows: 0

75

ð15Þ

Based on a Lagrange operator, the objective function can be given as follows: þ1 m T 2 X m T T F pg ; λg ¼ pg ϕ XS −λg pg pg −1

ð19Þ

m¼m

Where λg is constant scalar. The partial derivative of F(pg, λg) with respect to pg can be calculated and obtained expression is set to zero, we have

m þ1 X

T T T m m ϕ XS pg ϕ XS ¼ λg pg

m

ð20Þ

76

X. Tang et al. / Chemometrics and Intelligent Laboratory Systems 145 (2015) 72–83

Both sides multiply pg T , the following expressions can be obtained m þ1 X

T 2 m T pg ϕ XS ¼ λg

ð21Þ

Where β = [β1, ⋯, βH]T. To ﬁnd solutions of Eq. (30), we only solve the eigenvalue problem as follows: e ¼λ β Kβ g

ð31Þ

m

From Eq. (31), the eigenvectors β1, ⋯ βH with eigenvalues λg,1 ≥ ⋯ ≥ λg,H can be obtained. The ﬁrst p eigenvectors are retained and the common basis vector can be obtained:

Right multiply pg, we further obtain m þ1 X

T m m pg ¼ λg pg ϕ XS ϕ XS

ð22Þ

m¼m

pg;k ¼

The above Eq. (21) can be rewritten into vectors form as follows: m þ1 X

Lm T X m m ϕ xs;l ϕ xs;l

m¼m

l¼1

!! pg ¼ λg pg

ð23Þ

Unfolding Eq. (23), it can obtain H T X ϕ xs; j ϕ xs; j pg ¼ λg pg

ð24Þ

j¼1

Where H is the num of all the feature samples in two neighboring phases and xs,j is a feature sample from two neighboring phases. Both sides of Eq. (24) can be multiplied by ϕ(xs,k), we can formulate that * + H D E X D E ¼ λg ϕ xs;k ; pg ϕ xs; j ϕ xs; j ; pg ϕ xs;k ;

ð25Þ

j¼1

Due to the common basis vector pg between two neighboring phases must lie in the span of two input subset, there exits coefﬁcients βi(i = 1, ⋯, H) such that pg ¼

H X

βi ϕ xs;i

ð26Þ

i¼1

Combining Eqs. (25) and (26), we have H X

βi

i¼1

H D ED E X ϕ xs;k ; ϕ xs; j ϕ xs; j ; ϕ xs;i j¼1

¼ λg

H X

ð27Þ

D E βi ϕ xs;k ; ϕ xs;i

We deﬁne a H × H matrix K by [K]ij = Kij = 〈ϕ(xs,i), ϕ(xs,j)〉 [35,36]. In this paper, the Gaussian kernel is used, which can be deﬁned as K i j ¼ 2 kx −x k exp − s;i c s; j , which c is the width of kernel function [31,37]. Before the kernel matrix K is applied to the consequent step, the mean centering can be carried out as follows [35]: ð28Þ

Where 2

1H ¼

1 14 ⋮ H 1

3

⋯ 1 ⋱ ⋮ 5 ⋯ 1

ð29Þ

The Eq. (27) can be rewritten as:

e 2 β ¼ λ Kβ e K g

βi;k ϕ xs;i

ð32Þ

i¼1

The common bases Pg(J × p) = [pg,1, ⋯, pg,p] can be obtained from Eq. (32). Where k = 1, 2, ⋯, p, which p is the number of retained common bases. After the common bases Pg between two neighboring phases is obtained, each phase can be separated into two different subspace and modeling and monitoring can be further carried out. 2.4. Process modeling and monitoring based on the TSFVS-KVCA method Based on the TSFVS-KVCA method, different modeling methods are adapted for steady phases and transitions monitoring. In each steady, the process characteristics show similar so that a ﬁxed modeling can be carried out. However, there is high uncertainty in the transition. So a just-in-time model is adopted. 2.4.1. Modeling and monitoring for steady phases After the TSFVS-KVCA extraction, the common bases Pg between two neighboring phases are obtained. Each steady phase dataset in F m S can be separated into two different parts as follows: m m m ϕ XS ¼ ϕc XS þ ϕs XS

ð33Þ

Where ϕc(Xm S ) contains the common subspace information and ϕs(Xm S ) contains speciﬁc subspace information for each phase. They can be obtained by m m T ϕ c XS ¼ ϕ XS P g P g T m m m m I−Pg Pg ϕs XS ¼ ϕ XS −ϕc XS ¼ ϕ XS

ð34Þ

The scores Tm c of each phase data in the direction of common bases Pg can be obtained by projecting ϕ(Xm S ) onto Pg, it can be given as follows:

i¼1

e ¼ K−1 K−K1 þ 1 K1 K H H H H

H X

ð30Þ

m m T c ¼ ϕ X S Pg

ð35Þ

In each speciﬁc subspace, the further decomposition is carried out based principal component analysis (PCA) method such that the systematic information and noise information both are obtained. Here the corresponding PCA loading vector is called as the speciﬁc bases, denoted as P m s . The following calculations are carried out: m m m T s ¼ ϕ XS P s ^ Xm ¼ T m Pm T ϕ s S s s m m ^ Xm E s ¼ ϕ XS −ϕ S s

ð36Þ

m m ^ Where T m s is the scores in the direction of speciﬁc bases P s , ϕs X S m is the estimation of ϕs(X m S ), and Es is the residual in the speciﬁc subspace.

X. Tang et al. / Chemometrics and Intelligent Laboratory Systems 145 (2015) 72–83

For a test sample x during online process monitoring, the phase is identiﬁed according to the time and the common score can be calculated as follow: H D E D E X tc ¼ pg ; ϕðxÞ ¼ βi ϕ xs;i ; ϕðxÞ

We can further obtain: m

Due to the

must lie in the span

ϕs x m s;1

Lm X

m

ð42Þ

ð37Þ

h i m Where ω ¼ ω m 1 ; ⋯; ω Lm , which can be obtained by solving the

; ⋯; ϕs xm s;Lm , there exits

eigenvalue problem. It is noted that the critical problem is how to calculate the inner product between ϕs(⋅), since the data in the original space corresponding to ϕs(⋅) is unavailable. Therefore, the formulation on inner product between ϕs(⋅) can be given as follow:

the following linear combination:

ps ¼

m

K s ω ¼ λs ω

i¼1

pm s

77

D

m m ωi ϕs xs;i

ð38Þ

i¼1

m Where xs,i is the feature sample from each phase. Based on PCA, we have:

E T m m m m m m ϕs xs;i ; ϕs xs; j ϕ xs; j −ϕc xs; j ¼ ϕ xs;i −ϕc xs;i T T T T m m m m m m m m ¼ ϕ xs;i ϕ xs; j −ϕ xs;i ϕc xs; j −ϕc xs;i ϕ xs; j þ ϕc xs;i ϕc xs; j Lm Lm p p X X X X m m m m m T T T T tc;i;l pg;l tc; j;k pg;k − tc;i;k pg;k tc; j;l pg;l ¼ K SS i j − l¼1

þ

T m m m m ps ¼ λs ps ϕs XS ϕs XS

ð39Þ

p X

k¼1

T m tc;i;k pg;k

k¼1

p X

l¼1

k¼1

T m tc; j;k pg;k

k¼1

p X m m m ¼ K SS i j − t c;i;l t c; j;l l¼1

According to the formulation on how to obtain pg, it is not difﬁcult to obtain: Lm X

m

ωi

i¼1

¼

Lm D X

ED E m m m m ϕs xs;k ; ϕs xs; j ϕs xs; j ; ϕs xs;i

j¼1 Lm X

m λs

m

ωi

D

ð40Þ

E m m ϕs xs;k ; ϕs xs;i

i¼1

m

Lm X j¼1

ð44Þ

i¼1

m m m m Deﬁne an Lm × Lm matrix K m s by [K s ]ij = (Ks )ij = 〈ϕs(xs,i), ϕs(xs,j)〉. The Eq. (40) can be expressed as

ωi

After the ω is obtained, the speciﬁc score of each phase can be calculated as follows: Lm D E X D E m m m m t s ¼ ps ; ϕs ðxÞ ¼ ωi ϕs x s;i ; ϕs ðxÞ

i¼1

Lm X

ð43Þ

m

Ks

ji

m kj

Ks

m

¼ λs

Lm X

m m ωi K s ki

ð41Þ

The residual can be calculated by Lm p 2 X X m m m2 m2 ^ xm t s; j − t s; j es ¼ ϕs xs −ϕ s s ¼ j¼1

i¼1

TSFVS-KVCA Steady phase 1

Test samples

Model

Unfolded variable-wise training dataset

SP 1

Ps1 DF1=1

Transition 1 Steady phase 2

T1

.. . Steady phase M

.. .

Pg12 DF2=1

SP 2

Ps2

.. .

j¼1

.. . .. .

SP M

Fig. 1. Flowchart of process monitoring based on the proposed TSFVS-KVCA method.

ð45Þ

78

X. Tang et al. / Chemometrics and Intelligent Laboratory Systems 145 (2015) 72–83

Subsequently, the monitoring statistics are calculated as follows: m 2

m T

m

−1 m bc;i

T c;i

¼ bc;i O c;i

m 2 T s;i m SPE s;i

m T m −1 m ¼ t s;i Os;i t s;i m T m ¼ es;i es;i

ð46Þ

m Where Om c,i and Os,i are the covariance matrices of scores in common subspace and speciﬁc subspace, respectively. The conﬁdence limit for m 2 Tp,α with α as the signiﬁcance factor [38]:

m 2

T p;α ∼

pðN−1Þ F N−p p;N−p;α

ð47Þ

Where p is the number of retained basis vectors and N is the number of samples in the model. The control limit of squared prediction error (SPE) in the steady phase can be calculated based on an approximate distribution [8]: m

m

2

SPE ∼g χ hm ;α

ð48Þ

Where g m = v m/2bm and hm = 2(bm)2/vm, in which bm is the mean of all the statistics SPE in the phase m, and vm is the corresponding variance. 2.4.2. Modeling and monitoring for transitions In the transition, the process pattern over batches are commonly are irregular due to the dynamic and complex transition behavior. Thus, it is generally unfeasible that modeling a determinate statistical model with reference transition data describes all the possible transition patterns. During online monitoring, a new transition pattern different from training patterns may be detected as a fault pattern. To avoid the inadaptability of a determinate model, a just-in-time model is proposed for transition monitoring. The transition pattern is gradually changeable from the starting phase to the target phase. At the beginning, the transition patterns are similar to the start phase ones, while the transition patterns are similar to the target phase ones as the transition runs. Naturally, the transition pattern can be described by model of the start phase or target phase. Different from the previous modeling for transitions based on a weight of models of two neighboring phase, in this paper, a just-in-time model is determined in a new ways, based on a

Table 1 Selected process variables for process modeling. No.

Process Variables

1 2 3 4 5 6 7 8 9 10

Aeration rate Agitator power Substrate feed rate Substrate feed temperature Dissolved oxygen concentration Culture volume Carbon dioxide concentration PH Temperature Generated heat

new understanding for the switch of the transition pattern. During the transition period, each transition pattern covers information on the m+1 directions of Pg, Pm , which are the common bases and s and Ps speciﬁc bases in the start phase Xm and the target phase Xm + 1. The between-phase common bases stay invariable, so the changes of transition pattern mainly reﬂect on the changes of phase-speciﬁc bases. Although there exit two types of phase-speciﬁc bases, the only one type of phase-speciﬁc bases play a dominant role in describing the transition pattern. The corresponding phase is deﬁned as the dominant phase in this paper. Compared with the bases of dominant phase, the bases of adjacent phase can be used to describe the information from a process disturbance. That is, when the transition pattern is more similar to the start phase, the start phase is dominant phase at current transition pattern, and the effect that the target phase plays is equal to the disturbance, which the variation caused by the target phase is under the normal control limit. Thus, the transition pattern can be described by the common bases and the speciﬁc bases of dominant phase. Yet the effect of speciﬁc bases is changeable as the transition progresses. When the transition sample is nearer to the target phase, the variation caused by the target phase will beyond the normal control limit and the switch of dominant phase occurs. Therefore, the key to construct the just-in-time model at each transition pattern is that determine which phases is the dominant phase at current transition pattern. That is to say, the current transition pattern is more similar to which phases. Then, the corresponding speciﬁc bases are chosen to construct the just-in-time model together with the common bases between two neighboring phases.

Fig. 2. Flow diagram of penicillin fermentation process.

X. Tang et al. / Chemometrics and Intelligent Laboratory Systems 145 (2015) 72–83

79

0.5

Table 2 Illustration on introduced faults.

0.48 Fault no.

Fault description

Occurrence time (h)

Fault type

1 2 3 4

10% substrate feed rate 15% substrate feed rate 0.2% substrate feed rate -10% agitation power

90–400 40–200 100–400 60–400

step step slope step

RF

0.46 0.44 0.45 0.42

0.4 0.35

0.4

DF

m

DF

mþ1

0 0.38

0

50

0.04

100

10 150

20

200 Time(h)

30

40

250

50

300

350

400

350

400

0.05

0.03 0 0.02

DRF

In this paper, a discriminant factor (DF) is introduced as the guidance for online dominant phase identiﬁcation. Firstly, the k-nearest neighbor of each transition sample is found based on the diffusion distance. If the k-nearest neighbor samples are all from representative subset in the start phase, the discriminant factor (DF) of start phase is equal to 1, yet the target phase one is equal to 0. It illustrates that the current transition pattern is similar to the start phase. So the start phase is the dominant phase at the current transition pattern and the corresponding the speciﬁc bases with the common bases is used to construct the just-in-time model. Suppose the number of nearest neighbor for each transition sample is K. numm and numm + 1 are the number of samples among all the nearest neighbor samples from the start phase Xm and target phase Xm + 1, respectively. The DF of start phase and target phase, denoted as DF m and DF m + 1, respectively, are calculated by

-0.05 0.01

0

10

20

30

40

50

0 -0.01 -0.02

0

50

100

150

200 Time(h)

250

300

m

¼ num =K mþ1

¼ num

ð49Þ

=K

For the current transition sample, the corresponding just-in-time model Pt can be determined based on the DF as follows 8h i m m mþ1 > < Pg ; Ps ; if D F ¼ 1; DF ¼0 Pt ¼ h i m mþ1 > : Pg ; Pmþ1 ; if D F ¼ 0; DF ¼1 s

ð50Þ

It is noted that the value of DF is not continuous. There are two possible values of DF, 0 or 1. It is because the k-nearest neighbor can not include samples from two different steady phases at the same

Fig. 4. Phase division results based on Ge’s method.

time. For batch processes, samples in the same steady phase may have high similarity, yet ones in the different steady phase, even two neighboring phase, have less similarity than in the same steady phase. So when a transition pattern is similar to a certain steady phase, k-nearest neighbor must be selected from this phase. It has the priority to the other neighboring phase as soon as a suitable representative subset is selected. The representative subset is constructed by the sample in the nearest several time slices to the transition. The control limit can be determined according to the Eqs. (47) and (48). The ﬂowchart of process monitoring based on proposed TSFVS-KVCA can be given in Fig. 1. 30

0.5

Tc2

20 0.4

10

0.5

DR

0.3

0 0.2

0

0

50

100

0.05

150

30

200 Time (h)

40

250

50

300

50

100

150

200 Time (h)

250

300

350

400

0

50

100

150

200 Time (h)

250

300

350

400

0

50

100

150

200 Time (h)

250

300

350

400

20

350

400

Ts2

0 20

0.1

0

10

0

0.05

-0.05 20

10 30

40

50

0

SPE

DDR

0

5

0 -0.05

0

50

100

150

200 Time (h)

250

300

350

Fig. 3. Phase division results based on the proposed method.

400

Fig. 5. Process monitoring results based on the proposed TSFVS-KVCA method for the normal test batch.

X. Tang et al. / Chemometrics and Intelligent Laboratory Systems 145 (2015) 72–83

40

10000

30

1000

T2

T2

80

20

100 10

10 0

0

0

50

100

150

200 Time (h)

250

300

350

400

100

150

200 Time (h)

250

300

350

400

0

50

100

150

200 Time (h)

250

300

350

400

100

SPE

SPE

50

1000

20

10

10 0

0

0

0

50

100

150

200 Time (h)

250

300

350

400

Fig. 8. Process monitoring results based on the sub-PCA method for fault 1.

Fig. 6. Process monitoring results based on the sub-PCA method for the normal test batch.

3. Simulations In this section, proposed method, including multiple phase division and process monitoring with transition, is applied to the penicillin fermentation process. Penicillin fermentation has been a well known multiphase batch process and the ﬂow diagram is shown in Fig. 2. Penicillin fermentation process has an important value no matter in academic or industry so that a series of problems on this production process has also been paid attention. In this paper, we focus on how to deal with nonlinear and multiphase problems so that an effective process monitoring for penicillin production process can achieve. The process simulator has been developed for the research on batch process monitoring and quality prediction. The simulator is open up based on a mechanistic model and can obtain from the website: http://www.chbe.iit.ebu/ ~cinar/ [39]. Two important variables PH and temperature are controlled based on the proportional, integral, and derivative (PID) controllers. In contrast, the substrate of glucose is continuously fed under open-loop control. Each batch contains 400h and the sampling interval is 1h. Assume all batches have the same length. The selected process variables are given in Table 1. In the simulation, 15 normal batches are

generated from the simulator with slight variations under normal operation conditions as the training dataset. One normal batch and four fault batches are used to test the performance of process monitoring based on the TSFVS-KVCA method. An illustration on introduced fault is given in Table 2. From the Table 2, we can see that different fault types, different process variables and different fault occurrence time are considered when faults are introduced. The aim is that we can demonstrate the superior performance of proposed TSFVS-KVCA process monitoring method in a more effective and more comprehensive way. First, multiple phase division is carried out based on the indexes DR and DDR. The corresponding results of phase division are shown in Fig. 3. The DR reﬂects the global degree of all the between-batches repeatability in each time slice. The DDR reﬂects the global differential degree of all the between-batches repeatability between two neighboring time slices. The repeatability between batches can be further understood their similarity. Due to the characteristic of multiphase batch process, which there are high similarities batch-to-batch and time slice-to-time slice in each steady phase, yet not in the transition, therefore, the DR in the steady phase should trend to be stable or smaller ﬂuctuations and is higher than DR in the transition. For DDR, the value 10000

10000 1000

Tc2

Tc2

1000 100

100 10

10

0

0

50

100

150

200 Time (h)

250

300

350

300

300

200

200

0 0

50

100

150

200 Time (h)

250

300

350

400

100

150

200 Time (h)

250

300

350

400

0

50

100

150

200 Time (h)

250

300

350

400

50

100

150

200 Time (h)

250

300

350

400

15

15

10

10

SPE

SPE

50

100

100 0

0

400

Ts2

Ts2

0

5 0

5 0

0

50

100

150

200 Time (h)

250

300

350

400

Fig. 7. Process monitoring results based on the proposed TSFVS-KVCA method for fault 1.

0

Fig. 9. Process monitoring results based on the proposed TSFVS-KVCA method for fault 2.

X. Tang et al. / Chemometrics and Intelligent Laboratory Systems 145 (2015) 72–83 10000

control limit switches from the start phase to the target phase as the transition switches. From Fig. 5, we can see that all the statistics are under the control limit. It illustrates the results of process monitoring consists with the actual situation. To compare with the performance of process monitoring, the sub-PCA method is also applied to monitor all the test batches. The results of process monitoring based on the subPCA method for the normal batch are given in Fig. 6. In Fig. 6, several T2 statistic values exceed the control limit about 30th hour. So it gives false alarm for the normal batch monitoring. The false alarm occurrence is due to the hard partition of sub-PCA method neglecting the transition so that no suitable model is provided for modeling and monitoring. However, no false alarm occurs in proposed TSFVS-KCA process monitoring. It demonstrates that the model based on the proposed method can effectively capture the process dynamic information no matter in the steady phase or transition. Therefore, the proposed process monitoring is more effective and reliable. Subsequently, the fault batches are introduced to further demonstrate the superior performance of process monitoring based on proposed method. For the ﬁrst fault, a 10% step increase is introduced to substrate feed rate from the 90th hour until the 400th hour. Process monitoring results based on the proposed method are given in Fig. 7. The comparison based on the sub-PCA method are given is Fig. 8. In Fig. 8, the statistic T 2 goes out of the control limit at the103th hour and moves around the control limit until the 127th hour, which process monitoring based on the statistic T2 has a 13 hours delay and 9.97% missing alarm rate. In addition, there are several false alarm points at the beginning of batch, which lead to a 8.99% false alarm rate. For SPE in Fig. 8, the ﬁrst alarm point is the 97 hour with a 7 hour delay and 3.86% missing alarm rate. However, the process monitoring for fault1, the proposed method give a better result in Fig. 7, which all the statistics can ﬁnd fault in time with no delay and false detection. For the second fault, a 15% step increase is introduced to substrate feed rate at the transition period, which is from the 40th hour to 200th hour. The aim of introducing fault2 is that measure the performance of fault monitoring at the transition. On the other hand, although the fault 2 ends at the 200th hour, the process underlying characteristic has been deteriorated so that it cannot return the normal state. Process monitoring results based on the proposed method and the Sub-PCA method are given in Figs. 9 and 10, respectively. From the Fig. 9, we can see that all the statistics are sensitive to the fault 2 so that it can report it in time with no delay. Furthermore, all the statistics based on the proposed method can give the effective monitoring until the batch ends since the statistics still keep outside control limit after the 200th hour. In contrast, the statistics based on sub-PCA method give the alarm with a 4 hour delay. It is noted that both statistics in Fig. 10 gradually fall after the 200th hour and even under the control limit, which lead to the high missing alarm rate. The details on the performance index of process monitoring are listed in Table 3. From the comparison between Figs. 9 and 10, we can see that the proposed method shows superior monitoring performance. For fault 3, a 0.2% slope increase is imposed on the substrate feed rate from the 100th hour to the 400th hour. As relative to the type of step

T2

1000 100 10 0

0

50

100

150

200 Time (h)

250

300

350

400

0

50

100

150

200 Time (h)

250

300

350

400

10000

SPE

1000 100 10 0

81

Fig. 10. Process monitoring results based on the sub-PCA method for fault 2.

should trend to be zero or smaller absolute value in the steady phase. Instead, there are bigger ﬂuctuations in the transition. From Fig. 3, it is not difﬁcult to see that the process includes two steady phases and one transition. It consists with the real process knowledge. The cut-off value for DR is 0.44 and for DDR is 0.005. The cut-off value is selected based on whether the relative bigger change occurs continuously. Based on the selected cut-off value, we obtain the steady phase1 from sample 1 to 29, the transition from sample 30 to 45, and the steady phase 2 from sample 46 to 400. Both indexes give consist results of phase division. To illustrate the performance of phase division, the simulation results of a comparison method proposed by Ge et al. are given in Fig. 4. In Ge’s method, the repeatability factor (RF) and the differential repeatability factor (DRF) are used for phase division. From Fig. 4, we can see that RF has a fall trend at the beginning of phase. So it provides the information that the process is switching from the steady phase to the transition. However, according to the process knowledge, it is impossible that the process enters the transition in such an early time interval. For DRF, there are several relative larger ﬂuctuations in earlier time interval. It is difﬁcult to determine the cutoff value for phase division. In addition, the RF and DRF cannot give consist division result. Those of problems are due to the pseudo-repeatability information are included in index RF and DRF. It degrades the performance of phase division. Therefore, Compared to Ge’s method, the phase division method based on the improved repeatability factor give a more accurate and more reliable result. After the phase division is achieved, the process monitoring based on the proposed TSFVS-KVCA is performed. The number of feature vectors for within-phase subset X1S, X2S and between-phase subset X12 S is 48, 175 and 175. Process monitoring results based on proposed TSFVS-KVCA for the normal test batch are given in Fig. 5. There are three statistics for process monitoring, including T2c that represents the information in the common space, T2s and SPE that represents the score and residual information in the speciﬁc space, respectively. The

Table 3 The performance indexes of process monitoring based on the sub-PCA method and the proposed method. Method

Sub-PCA

Statistics

T2

Proposed method

PerIndex

FA

MA

DT

FA

MA

DT

FA

MA

DT

FA

MA

DT

FA

MA

DT

Nor.test Fault1 Fault2 Fault3 Fault4

2 8.99 0 7.07 0

/ 9.97 14.96 2.99 18.77

/ 13 4 10 0

0 0 2.56 0 0

/ 3.86 13.02 2.66 37.24

/ 7 4 8 0

0 0 0 0 0

/ 0 0 0 0

/ 0 0 0 0

0 0 0 0 0

/ 0 0 0 0

/ 0 0 0 0

0 0 0 0 0

/ 0 0 0 0

/ 0 0 0 0

T2c

SPE

FA (%): false alarm rate; MA (%): missing alarm rate. DT (h): delay time; Nor.test: normal test batch; PreIndex: performance index.

Ts2

SPE

82

X. Tang et al. / Chemometrics and Intelligent Laboratory Systems 145 (2015) 72–83

40

10000

Tc2

Tc2

1000 100

20

10 0

0

50

100

150

200 Time (h)

250

300

350

0

400

80

200

60

Ts2

Ts2

300

100

0

50

100

150

200 Time (h)

250

300

350

400

0

50

100

150

200 Time (h)

250

300

350

400

0

50

100

150

200 Time (h)

250

300

350

400

40 20

0

0

50

100

150

200 Time (h)

250

300

350

400

0

8

10

6

SPE

SPE

15

5 0

4 2

0

50

100

150

200 Time (h)

250

300

350

400

0

Fig. 11. Process monitoring results based on the proposed TSFVS-KVCA method for fault 3. Fig. 13. Process monitoring results based on the proposed TSFVS-KVCA method for fault 4.

fault, the proposed method still gives the satisfying results. Process monitoring results based on the proposed method and sub-PCA method are shown in Figs. 11 and 12, respectively. In Fig. 12, the ﬁrst alarm of T2 statistic is the 110th hour with a 10 hours delay. And there are some false alarms and missing alarms in monitoring with a 7.07% false alarm rate and 2.99% missing alarm rate, which can be listed in Table 3. The SPE statistic has an earlier alarm than the T2 statistic, but it has 8 hours delay and 2.66% missing alarm rate. Compared with the sub-PCA method, the statistics based on the proposed method make a fast respond once the fault occurs and there are no false or missing alarms. It can further demonstrate that the proposed method not only has the superior monitoring performance on the type of step fault, but also on the type of slope fault. Finally, the fault 4 is introduced by making a 10% step decrease on agitation power from the 60th hour to the end. Different from the above introduced faults, the fault 4 is generated from the abnormal change of another variable, which is agitation power. The Fig. 13 gives the results of process monitoring based on the proposed method. From the Fig. 13, we can see that all the statistics have a signiﬁcant rise once the fault is introduced and they keep the effective results until the batch ends. There are no delays and false detections. The

monitoring results based on the sub-PCA method are given in Fig. 14. Although there are no delays in both statistics, the ﬂuctuations of statistics around the control limit bring high missing alarm rate, especially in SPE statistic. The simulations above mentioned demonstrate that the proposed method has the superior monitoring performance on different fault types. 4. Conclusions In this paper, phase division and process monitoring for multiphase batch processes have been proposed. The proposed indexes DR and DDR can reﬂect the real repeatability of each time slice, which avoids the inaccurate division caused by pseudo repeatability information. Accordingly, these indexes have been successfully applied to improve the performance of phase division. Based on the characteristics of multiphase batch process, this paper proposed the TSFVS-KVCA method for process monitoring. This method solved several important problems. On the one hand, it has applied the kernel method to the multiphase batch processes. It is more important that a series of within-phase subsets and between-phase subsets are selected to substitute for original 10000 1000

1000

T2

T2

10000

100

100 10

10 0

0 0

50

100

150

200 Time (h)

250

300

350

400

0

50

100

150

200 Time (h)

250

300

350

400

0

50

100

150

200 Time (h)

250

300

350

400

100

10000

SPE

SPE

1000 100

10

10 0

0

50

100

150

200 Time (h)

250

300

350

Fig. 12. Process monitoring results based on the sub-PCA method for fault 3.

400

Fig. 14. Process monitoring results based on the sub-PCA method for fault 4.

X. Tang et al. / Chemometrics and Intelligent Laboratory Systems 145 (2015) 72–83

larger samples avoiding the computational complexity and the spectral decomposition instability of high dimensional kernel. On the other hand, the common and speciﬁc nonlinear correlations between two neighboring phases are considered and corresponding statistics in each subspace have been concluded. It is more worth to be highlighted that process monitoring for transitions has been proposed based on a new idea that transition pattern can be described by the common bases of two neighboring and the speciﬁc bases of dominant phase. Finally, the proposed method was applied to the Penicillin fermentation process comparing with the Sub-PCA method. A normal test batch and different types of fault have been introduced for process monitoring. The proposed method exhibited no delays and no false detections, which demonstrated its effectiveness and superiority. Conﬂict of interest The authors declared that they have no conﬂicts of interest to this work. Acknowledgements The authors would like to acknowledge the National Natural Science Foundation of China under Grant 61174119, 61034006, 60774070, Liaoning Province Foundation under Grant 2009R47, education department research project of Liaoning Province L2012139, and Liaoning Province doctoral start funds 20131089. References [1] K.A. Kosanovich, K.S. Dahl, M.J. Piovoso, Improved process understanding using multiway principal component analysis, Ind. Eng. Chem. Res. 35 (1996) 138–146. [2] C. Undey, I. Boz, E. Oztemel, A. Çinar, Statistical monitoring of multistage batch processes, Proc. AIChE Ann. Meet, 1999. [3] C. Ündey, E. Tatara, B.A. Williams, G. Birol, A. Çinar, A hybrid supervisory knowledgebased system for monitoring penicillin fermentation, Proc. Am. Contr. Conf. 3944–3948 (2000). [4] J.A. Westerhuis, P.M.J. Coenegracht, Multivariate modeling of the pharmaceutical two-step process of wet granulation and tableting with multiblock partial least squares, J. Chemometr. 11 (1997) 379–392. [5] B.M. Wise, N.B. Gallagher, S.W. Butler, D.D. White, G.G. Barna, A comparison of principal component analysis, multiway principal component analysis, trilinear decomposition and parallel factor analysis for fault detection in a semiconductor etch process, J. Chemom. 13 (1999) 379–396. [6] P. Nomikos, Detection and diagnosis of abnormal batch operations based on multiway principal component analysis, ISA Trans. 35 (1996) 259–266. [7] P. Nomikos, J.F. MacGregor, Monitoring batch processes using multiway principal component analysis, Am. Inst. Chem. Eng. J. 40 (1994) 1361–1375. [8] P. Nomikos, J.F. MacGregor, Multivariate SPC charts for monitoring batch processes, Technometrics 37 (1995) 41–59. [9] J.A. Westerhuis, T. Kourti, J.F. MacGregor, Comparing alternative approaches for multivariate statistical analysis of batch process data, J. Chemom. 13 (1999) 297–413. [10] E.N.M. van Sprang, H.-J. Ramaker, J.A. Westerhuis, S.P. Gurden, A.K. Smide, Critical evaluation of approaches for on-line batch process monitoring, Chem. Eng. Sci. 57 (2002) 3979–3991. [11] J. Chen, K.-C. Liu, On-line batch process monitoring using dynamic PCA and dynamic PLS models, Chem. Eng. Sci. 57 (2002) 63–75.

83

[12] C. Undey, S. Ertunc, A. Cinar, Online batch/fed-batch process performance monitoring, quality prediction, and variable-contribution analysis for diagnosis, Ind. Eng. Chem. Res. 42 (2003) 4645–4658. [13] T. Kourti, Multivariate dynamic data modeling for analysis and statistical process control of batch processes. Start-ups and grade transitions, J. Chemom. 17 (2003) 93–109. [14] H. Albazzaz, X.Z. Wang, Statistical process control charts for batch operations based on independent component analysis, Ind. Eng. Chem. Res. 43 (2004) 6731–6741. [15] N. He, S.Q. Wang, L. Xie, An improved adaptive multi-way principal component analysis for monitoring streptomycin fermentation process, Chin. J. Chem. Eng. 12 (2004) 96–101. [16] J.M. Lee, C.K. Yoo, I.B. Lee, Online batch process monitoring using a consecutively updated multi-way principal analysis model, J. Biotechnol. 108 (2004) 61–77. [17] J.M. Lee, C.K. Yoo, I.B. Lee, Fault detection of batch processes using multiway kernel principal component analysis, Comput. Chem. Eng. 28 (2004) 1837–1847. [18] J. Yu, S.J. Qin, Multiway Gaussian mixture model based multiphase batch process monitoring, Ind. Eng. Chem. Res. 48 (2009) 8585–8594. [19] N.Y. Lu, F.R. Gao, F.L. Wang, A sub-PCA modeling and online monitoring strategy for batch processes, AIChE J 50 (2004) 255–259. [20] N.Y. Lu, F.R. Gao, Stage-based process analysis and quality prediction for batch processes, Ind. Eng. Chem. Res. 44 (2005) 3547–3555. [21] J. Camacho, J. Pico, Online monitoring of batch processes using multiphase principal component analysis, J. Process Control 16 (2006) 1021–1035. [22] J. Camacho, J. Pico, Multi-phase principal component analysis for batch processes modeling, Chemom. Intell. Lab. Syst. 81 (2006) 127–136. [23] J. Camacho, J. Pico, A. Ferrer, Multi-phase analysis framework for handing batch process data, J. Chemom. 22 (2008) 632–643. [24] C.H. Zhao, F.L. Wang, N.Y. Lu, M.X. Jia, Stage-based soft transition multiple PCA modeling, J. Process Control 17 (2007) 728–741. [25] Y. Yao, F.R. Gao, Phase and transition based batch process modeling and online monitoring, J. Process Control 19 (2009) 816–826. [26] C.H. Zhao, F.R. Gao, Between-phase-based statistical analysis and modeling for transition monitoring in multiphase batch processes, AIChE J 00 (2011) 1–15. [27] C.H. Zhao, F.R. Gao, D.P. Niu, F.L. Wang, A two-step basis vector extraction strategy for multiset variable correlation analysis, Chemom. Intell. Lab. Syst. 107 (2011) 147–154. [28] Y.W. Zhang, J.Y. An, Z.M. Li, H. Wang, Modeling and monitoring for handling nonlinear dynamic processes, Inf. Sci. 235 (2013) 97–105. [29] Z.Q. Ge, L.P. Zhao, Y. Yao, Z.H. Song, F.R. Gao, Utilizing transition information in online quality prediction of multiphase batch processes, J. Process Control 22 (2012) 599–611. [30] Y.X. Huang, X.F. Zha, Jay Lee, C.L. Liu, Discriminant diffusion maps analysis: A robust manifold learner for dimensionality reduction and its applications in machine condition monitoring and fault diagnosis, Mech. Syst. Signal Process. 34 (2013) 277–297. [31] N. Christianini, J. Shawe, An introduction to support vector machines and other kernel-based learning methods, Cambridge university press, UK, 2000. [32] S. Lafon, A.B. Lee, Diffusion maps and coarse-graining: a uniﬁed framework for dimensionality reduction, graph partitioning, and data set parameterization, IEEE Trans. Pattern Anal. Mach. Intell. 28 (8) (2006) 1393–1430. [33] J.Z. Wang, Geometric structure of high-dimensional data and dimensionality reduction, Higher Education Press, 2012. [34] G. Baudat, F. Anouar, Feature vector selection and projection using kernels, Neurocomputing 55 (2003) 21–38. [35] B. Scholkopf, A.J. Smola, K. Muller, Nonlinear component analysis as a kernel eigenvalue problem, Neural Comput. 10 (5) (1998) 1299–1399. [36] S. Romdhani, S. Gong, A. Psarrou, A multi-view nonlinear active shape model using kernel PCA, Proceedings of BMVC, Nottingham, UK, 1999. 483–492. [37] S. Mika, B. Scholkopf, A.J. Smola, K.-R. Muller, M. Scholz, G. Ratsch, Kernel PCA and de-noising in feature spaces, Adv. Neural Inf. Process. Syst. 11 (1999) 536–542. [38] J.-M. Lee, C.K. Yoo, S.W. Choi, Peter A. Vanrolleghem, I.-B. Lee, Nonlinear process monitoring using kernel principal component analysis, Chem. Eng. Sci. 59 (2004) 223–234. [39] G. Birol, C. Undey, A. Cinar, A modular simulation package for fed-batch fermentation: penicillin production, Comput. Chem. Eng. 26 (2002) 1553–1565.

Phase division and process monitoring for multiphase batch processes with transitions

Phase division and process monitoring for multiphase batch processes with transitions

Recommend Documents