Fault detection of batch processes based on multivariate functional kernel principal component analysis Huangang Wang, Ma Yao PII: DOI: Reference:
S0169-7439(15)00243-9 doi: 10.1016/j.chemolab.2015.09.018 CHEMOM 3108
To appear in:
Chemometrics and Intelligent Laboratory Systems
Received date: Revised date: Accepted date:
11 November 2014 28 September 2015 29 September 2015
Please cite this article as: Huangang Wang, Ma Yao, Fault detection of batch processes based on multivariate functional kernel principal component analysis, Chemometrics and Intelligent Laboratory Systems (2015), doi: 10.1016/j.chemolab.2015.09.018
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
RI
Huangang Wang∗, Ma Yao
PT
Fault detection of batch processes based on multivariate functional kernel principal component analysis
NU
SC
Institute of Control Theory and Technology, Department of Automation, Tsinghua University, Beijing 100084, China
MA
Abstract
In some batch processes, the variables’ trajectories often show obviously functional nature and can be considered as smooth functions rather than just
D
vectors, then the collected three-way data array essentially can be trans-
TE
formed into a two-way function matrix. For this purpose, the approach of
AC CE P
functional data analysis (FDA) is introduced to express the variables’ functions in this article, which can highlight the subtle differences in the shape of variables’ trajectories between normal batches and faulty ones, and can also easily handle the problems of irregular three-way data array, such as unequal batch length, different sampling rates, missing data, etc. Based on the function matrix, a novel monitoring method called multivariate functional kernel principal component analysis (MFKPCA) is proposed for fault detection of batch processes, which directly eigen-decomposes the two-way function matrix in the function space and uses the kernel trick to handle the nonlinear correlations. Different from the traditional PCA method, MFKPCA deCorresponding author. Tel.: 86-10-62781993; Fax: 86-10-62786911. Email addresses:
[email protected] (Huangang Wang),
[email protected] (Ma Yao) ∗
Preprint submitted to Chemometrics and Intelligent Laboratory SystemsSeptember 28, 2015
ACCEPTED MANUSCRIPT
PT
signs three complementary control charts for batch process monitoring based on three statistics including Hotelling’s T 2 , squared prediction error (SPE)
RI
and mean squared error (MSE). Finally, some simulations and an industrial
SC
semiconductor etch process are used to illustrate the validity of the proposed monitoring method.
NU
Keywords: Batch process monitoring, Fault detection, Multivariate functional kernel principal component analysis, Functional data analysis
MA
1. Introduction
Batch process monitoring is particularly important to ensure the high
D
quality of the final products, such as pharmaceuticals, semiconductors, poly-
TE
mers and bio-chemicals, and hence has attracted many researchers’ attention. Among various monitoring methods, data-driven statistical process monitor-
AC CE P
ing (SPM) methods have been developed rapidly due to their flexibility [1]. The characteristic of batch process dataset is the special structure of the three-way data array (batches × variables × time), where each variable trajectory varying with time in a finite batch duration usually shows interesting nonlinear dynamic feature. This makes the batch process monitoring quite different from the traditional continuous process monitoring. For the regular three-way array Y(I × J × K) (i.e. each variable in each batch has the same sampling number K), parallel factor analysis (PARAFAC) and Tucker models can be used to perform the trilinear decomposition of the batch data [2–5]. Some traditional multivariate SPM methods have already been extended to the case of batch process monitoring based on the multiway processing approaches, such as multiway principal component analysis (MPCA) [6–8], mul2
ACCEPTED MANUSCRIPT
PT
tiway partial least squares (MPLS) [6, 9], multiway independent component analysis (MICA) [10], etc. Essentially, these multiway monitoring methods
RI
are equivalent to first unfolding the three-way array along some direction
SC
and then building the traditional multivariate SPM models based on the unfolded two-way data matrix. In order to handle the nonlinear correlations
NU
between different variables and/or the nonlinear correlations between different sampling times, the kernel trick has been introduced and some nonlinear
MA
monitoring methods have been proposed, such as multiway kernel PCA/ICA (MKPCA/MKICA), etc. [11–14]
However, these monitoring methods require that the collected batch dataset
D
should be a regular three-way array, which implies that each batch should
TE
have the same sampling duration and each variable should have the same sampling rate. While, in many actual batch processes, these constrains of-
AC CE P
ten cannot be satisfied due to the variability in the chemical composition of the raw material, stochastic disturbances, varying environmental conditions, etc.[15, 16] Some observations may be lost due to the malfunction of the sensors and/or the connectors. As a consequence, some data preprocessing procedures are needed to synchronize the raw batch data, such as using indicator variable technique (IVT) [17], dynamic time warping (DTW) [15, 18], and relaxed-greedy time warping (RGTW) [19]. Once the regular three-way array is obtained, the methods mentioned above can be performed to monitor the batch processes. From another viewpoint, Chen and Liu [20] used the concept of orthonormal function approximation to express each variable trajectory, and built the PCA model (called FSPCA in brief) based on the coefficient matrix derived
3
ACCEPTED MANUSCRIPT
PT
from the orthogonal function. FSPCA can handle the problem of varying length in batch runs, and reduce the dimension of each batch sample since
RI
the number of orthogonal functions is often much smaller than that of sam-
SC
pling times [20]. In fact, their processing idea is quite similar to the idea of functional data analysis (FDA) [21, 22], which has already attracted much
NU
attention in different research areas, such as statistics, economics, bioinformatics, chemometrics, etc. And many interesting results have been pub-
MA
lished, such as function data modeling and analysis [21–24], function data classification and clustering [25–29], novelty detection or outlier detection [30–32], etc. Compared with the multivariate data analysis (MDA) based
D
on the finite dimensional vectors, FDA emphasizes the smoothness of the
TE
trajectories and is very suitable for analyzing the non-stationary time series with possibly unequally spaced observations [33], which just coincides
AC CE P
with the situation of irregular three-way array in batch process monitoring. Therefore, the theory of FDA is introduced into batch process monitoring in this article which further extends the rationale of FSPCA. In summary, for batch process monitoring, using FDA can highlight the subtle differences in the variable trajectories between normal batches and faulty ones by removing the random noise, and hence can improve the fault detection performance. Based on the correspondence between the observations and the sampling times, FDA can uniformly handle the irregular three-way dataset, including the problems of unequal batch length, different sampling rates of variables, and missing data. Besides, in some particular situations, utilizing the derivatives or other transformations of the functions could further amplify the difference between normal batches and faulty ones.
4
ACCEPTED MANUSCRIPT
PT
For example, Rossi and Villa [26] used the second derivatives of the spectrometric data to improve the classification accuracy. Without the constraint
RI
of the orthonormal basis functions, some other flexible basis functions, e.g.
SC
B-splines, can be used to describe the variable trajectories. And then we will no longer just focus on the coefficient vectors of the orthonormal func-
NU
tions, but will make full use of the entire function expressions to build the monitoring model.
MA
Moreover, the correlation between different monitoring variables may be nonlinear, so the kernel trick is utilized to build the nonlinear PCA model for multivariate functional data, and the corresponding method is called multi-
D
variable functional kernel principal component analysis (MFKPCA). Since
TE
the selected function model for each variable only relies on the collected normal batch samples, the fitting errors may remain some variation information
AC CE P
about variable trajectories between normal batches and faulty ones. FSPCA ignores this information and only uses the T 2 and SPE statistics to detect the faults. In MFKPCA, we further utilize the statistic of mean squared error (MSE), which measures the difference between the fitting functions and the original observations, to complement those two statistics for fault detection. The rest of this article is organized as follows. First, based on the introduction of functional data representation, the rationale of the MFKPCA method is elaborated in Section 2. By defining the T 2 , SPE and MSE statistics, the monitoring procedures of MFKPCA-based batch process monitoring method are listed in Section 3. In Section 4, different methods are compared via several simulations in order to illustrate the potential applications of MFKPCA. And after that, Section 5 utilizes an industrial semiconductor
5
ACCEPTED MANUSCRIPT
6 B> >@ CA ? &
$% "# !
'()*(+,-. Z[\
8
5
NU
6
/*0-
=:< ;
YY Y
9 P O MQ N ಹ W U RV SX T KL 8 6 '()*(+,-. Z]\
MA
7
42 3 1
!
SC
$% "#
YY Y
RI
&
I DH J E FG
PT
5
ಹ
Figure 1: Illustration of functional data representation of batch process data: (a) irregular
D
three-way array, and (b) the corresponding function matrix.
TE
etch process to further verify the actual monitoring performance of the pro-
AC CE P
posed method. Finally, some conclusions are drawn in Section 6. 2. Multivariate functional kernel principal component analysis 2.1. Functional data representation In batch processes, the k-th observation of variable j in batch run i is denoted by yi,j,k , and its corresponding sampling time is denoted by ti,j,k , where i = 1, ..., I, j = 1, ..., J, k = 1, ..., Ki,j . Ki,j may vary with both batch index i and variable index j just as Fig. 1(a) shows. Accordingly, the sampling sequence of variable j in batch run i corresponds to a Ki,j -dimensional column vector yi,j = [yi,j,1, ..., yi,j,Ki,j ]T , and the vector of sampling times ti,j = [ti,j,1, ..., ti,j,Ki,j ]T . Here, the observations yi,j,k satisfy the following model, yi,j,k = xi,j (ti,j,k ) + εi,j,k
6
(1)
ACCEPTED MANUSCRIPT
PT
where the function xi,j (t) lies in the Hilbert space L2 (T ) of squared integrable functions for a compact domain T . The measurement errors εi,j,k are
RI
independently and normally distributed with mean E[εi,j,k ] = 0 and variance
SC
var[εi,j,k ] = σj2 . Based on Eq. (1), the i-th batch sample is transformed into a J-dimensional vector-valued function xi (t) = [xi,1 (t), ..., xi,J (t)]T (called
NU
multivariate functional data), and the entire training dataset becomes an Fig. 1(b)),
· · · x1,J (t) .. .. . . · · · xI,J (t)
MA
I × J-dimensional function matrix (see x (t) 1,1 .. X(t) = . xI,1 (t)
(2)
D
In order to get the functions in Eq. (2), the basis function system can be
TE
used, such as the Fourier basis functions, B-splines, wavelets, etc. [21] Sup-
then
AC CE P
pose that the basis used for variable j is denoted by {ϕj,d(t), d = 1, ..., Dj },
xi,j (t) =
Dj X
ci,j,d ϕj,d(t) = cTi,j ϕj (t)
(3)
d=1
where the coefficient vector ci,j = [ci,j,1, ..., ci,j,Dj ]T , and the vector of basis functions ϕj (t) = [ϕj,1(t), ..., ϕj,Dj (t)]T . The least squares can be used to estimate the coefficient vector ci,j , min ci,j
K i,j P
[yi,j,k − xi,j (ti,j,k )]2
(4)
k=1 T
⇒ min (yi,j − Ψi,j ci,j ) (yi,j − Ψi,j ci,j ) ci,j
where the matrix
Ψi,j
=
··· .. .
ϕj,Dj (ti,j,1) .. . ϕj,1 (ti,j,Ki,j ) · · · ϕj,Dj (ti,j,Ki,j ) ϕj,1 (ti,j,1) .. .
Ki,j ×Dj
7
(5)
ACCEPTED MANUSCRIPT
(6)
RI
ˆ ci,j = (ΨTi,j Ψi,j )−1 ΨTi,j yi,j
PT
Solving the least squares problem (4) gets
dxi,j (t) dn−1 xi,j (t) dn xi,j (t) + · · · + wn−1 + dt dtn−1 dtn
(7)
MA
L (xi,j (t)) = w0 + w1
NU
is a linear differential operator defined as
SC
For controlling the smoothness of the functions, a roughness penalty R pen(xi,j (t)) = T [L (xi,j (t))]2 dt is often added in Eq. (4) [21], where L(·)
Then the optimization problem becomes [21] min
[yi,j,k − xi,j (ti,j,k )]2 + η × pen(xi,j (t))
k=1
D
ci,j
K i,j P
T
ci,j
R
T
L (ϕj (t)) L(ϕj (t))T dt is a Dj × Dj symmetric matrix and can
AC CE P
where Rj =
TE
⇒ min (yi,j − Ψi,j ci,j ) (yi,j − Ψi,j ci,j ) +
(8)
ηcTi,j Rj ci,j
be calculated by a numerical quadrature scheme. The smoothing parameter η is a trade-off between the error of fitting and the smoothness of functions, which can be chosen by using the generalized cross-validation approach [21, 34]. Then the estimated coefficient vector is ˆ ci,j = (ΨTi,j Ψi,j + ηRj )−1 ΨTi,j yi,j
(9)
And the corresponding fitting function is xˆi,j (t) = ˆ cTi,j ϕj (t)
(10)
2.2. Functional data centering and scaling In actual batch processes, different variables may have different physical units and their data may have different scales, so before building the 8
ACCEPTED MANUSCRIPT
PT
MFKPCA model the batch data should be centered and scaled. For the
I
(11)
SC
1X µ ˆj (t)= xˆi,j (t) I i=1
RI
fitting functions xˆi,j (t), the mean function of variable j is
(12)
MA
NU
and the scaling parameter can be calculated as v u I u1 X 1 Z ˆ δj = t [ˆ xi,j (t) − µ ˆ j (t)]2 dt I i=1 C(T ) T
where C(T ) represents the length of the domain T . Then the fitting functions can be normalized as follows,
(13)
TE
D
x˜i,j (t) = (ˆ xi,j (t) − µ ˆ j (t)) /δˆj = ˜ cTi,j ϕj (t)
P where ˜ ci,j = ˆ ci,j − I1 Im=1 ˆ cm,j /δˆj . Accordingly, the i-th batch sample
AC CE P
is denoted by x ˜i (t) = [˜ xi,1 (t), ..., x˜i,J (t)]T , and the entire training dataset ˜ is denoted by X(t) = [˜ x1 (t), ..., x ˜I (t)]T . Similarly, the original observations
yi,j,k can also be processed as y˜i,j,k = (yi,j,k − µ ˆj (ti,j,k )) /δˆj
(14)
in order to be compared with the sampling values x˜i,j (ti,j,k ) of the fitting functions in the similar manner. 2.3. Algorithm of MFKPCA In traditional PCA, the connection between the covariance matrix and the kernel matrix (or the inner-product matrix) is the foundation of the KPCA algorithm [35]. For batch processes whose samples are vector-valued functions (or multivariate functional data), this connection still remains. Considering 9
ACCEPTED MANUSCRIPT
PT
the case of using the linear kernel firstly, the covariance function matrix of ˜ the normalized training dataset X(t) is I
(15)
SC
RI
1X ˜ t) = 1 X(s) ˜ T X(t) ˜ V(s, = x ˜i (s)˜ xi (t)T I I i=1
which is a J × J matrix changing with the time variables s, t ∈ T . The
NU
corresponding eigenvalue problem is defined as [21, 29] Z ˜ t)ξp (t)dt = λp ξp (s) V(s, T
MA
where the eigenvalues λp are non-increasing, i.e.
(16)
λ1 ≥ λ2 ≥ · · · , and
D
the eigenfunctions ξp (t) = [ξp,1(t), ..., ξp,J (t)]T satisfy that hξp (t), ξq (t)i = R ξ (t)T ξq (t)dt = 1 if p = q and 0 otherwise. T p
TE
Similar to the PCA algorithm [35], each eigenfunction lies in the subspace
spanned by the training dataset, which implies that there is an I-dimensional
AC CE P
vector αp = [αp,1 , ..., αp,I ]T ∈ RI satisfying the following equation, ˜ T αp = ξp (t) = X(t)
I X
αp,i x ˜i (t)
(17)
i=1
Then the eigen-problem (16) becomes R ˜ t)ξp (t)dt = λp ξp (s) V(s, T R ˜ T ˜ X(t) ˜ T dtαp = Iλp X(s) ˜ T αp ⇒ X(s) X(t) T
(18)
˜ T Gα ˜ p = Iλp X(s) ˜ T αp ⇒ X(s)
where the inner-product matrix R ˜ = ˜ X(t) ˜ T dt G X(t) T h˜ x1 (t), x ˜1 (t)i · · · h˜ x1 (t), x ˜I (t)i .. .. .. = . . . h˜ xI (t)), x ˜1 (t)i · · · h˜ xI (t), x ˜I (t)i 10
(19)
I×I
ACCEPTED MANUSCRIPT
PT
J R J P P ˜ i,m = h˜ Its elements G xi (t), x ˜m (t)i = x˜ (t)˜ xm,j (t)dt = ˜ cTi,j R0j ˜ cm,j , T i,j j=1 j=1 R i, m = 1, ..., I, where R0j = T ϕj (t)ϕj (t)T dt (j = 1, ..., J) is a Dj × Dj
RI
symmetric matrix and can be calculated by a numerical quadrature scheme.
SC
Since the R0j is constant for different batches, it can be calculated and stored in advance for reducing the redundant calculation. Since the last row in
NU
Eq. (18) must hold for ∀s ∈ T , it implies that
(20)
MA
˜ p = Iλp αp Gα
i.e. the vector αp is proportional to the eigenvector of the inner-product
D
matrix. Considering the constraint hξp (t), ξp (t)i = 1, the L2 -norm of αp
TE
must satisfy ||αp ||2 = 1/(Iλp ).
As a consequence, we can use the decomposition of the inner-product
AC CE P
matrix to build the PCA model for multivariate functional data (or vectorvalued functions). By introducing the kernel trick, each batch sample (vectorvalued function) is mapped from the original space Ω to the feature space ℑ by the nonlinear transformation function φ(·) : x ˜i (t) 7→ φ(˜ xi (t)), and the inner product of two batch samples in the feature space can be expressed by the kernel function k (˜ xi (t), x ˜m (t)) = hφ(˜ xi (t)), φ(˜ xm (t))i (i, m = 1, ..., I). For example, the widely used radial basis kernel has the following form, ! k (˜ xi (t), x ˜m (t)) = exp −||˜ xi (t) − x ˜m (t)||2 /S 2
(21)
where the kernel width parameter S is prespecified by the user, and the
11
ACCEPTED MANUSCRIPT
PT
squared L2 -norm is J R P
(˜ xi,j (t) − x˜m,j (t))2 dt j=1 R J P = (˜ ci,j − ˜ cm,j )T T ϕj (t)ϕj (t)T dt (˜ ci,j − ˜ cm,j )
=
j=1
RI
j=1 J P
T
(22)
SC
||˜ xi (t) − x ˜m (t)||2 =
(˜ ci,j − ˜ cm,j )T R0j (˜ ci,j − ˜ cm,j )
NU
Based on the kernel function, the corresponding kernel matrix can be
D
MA
calculated conveniently as follows: k (˜ x1 (t), x ˜1 (t)) · · · k (˜ x1 (t), x ˜I (t)) .. .. .. K= . . . k (˜ xI (t), x ˜1 (t)) · · · k (˜ xI (t), x ˜I (t))
(23) I×I
TE
Centering the batch samples φ(˜ xi (t)) in the feature space ℑ gets I
AC CE P
1X ˜ xi (t)) = φ(˜ φ(˜ xi (t)) − φ(˜ xu (t)) I u=1
and the corresponding centered kernel function is D E ˜ ˜ ˜ k (˜ xi (t), x ˜m (t)) = φ(˜ xi (t)), φ(˜ xm (t)) I I P P = k (˜ xi (t), x ˜m (t)) − I1 k (˜ xi (t), x ˜u (t)) − 1I k (˜ xv (t), x ˜m (t)) + I12
I P
u=1
v=1
(24)
(25)
k (˜ xu (t), x ˜v (t))
u,v=1
Then the kernel matrix becomes [35] ˜ = K − 1I×I K − K1I×I + 1I×I K1I×I K (26) 1 ··· 1 .. . 1 .. . where 1I×I = I . . . . Furthermore, in order to carry out the 1 ··· 1 I×I variance scaling of the kernel functions, each kernel element can be divided 12
ACCEPTED MANUSCRIPT
PT
. . ˜ ˜ ← (I − 1) K ˜ trace K ˜ , which by the constant trace K (I − 1), i.e. K
RI
˜ xi (t)) in the feature space is divided by implies that each batch sample φ(˜ r . ˜ trace K (I − 1). For brevity, the notations after variance scaling re-
SC
main the same. Often, the variance scaling can adjust the numerical precision and the magnitude of the SPE statistic. Analogous to Eq. (20), the kernel
NU
˜ is decomposed as matrix K
(27)
MA
˜ p = Iλp αp Kα
where ||αp ||2 = 1/(Iλp ) and λ1 ≥ λ2 ≥ · · · . Then the eigenfunction (also
D
denoted by ξp (t) ∈ ℑ) in the feature space satisfies
TE
ξp (t) =
I X
˜ xi (t)) αp,i φ(˜
(28)
i=1
AC CE P
which is a linear combination of all training samples in the feature space. Based on the eigenfunctions ξp (t), the feature space is divided into a principal component subspace and a residual subspace, and subsequently several statistics and their control charts will be designed for fault detection of batch processes.
3. MFKPCA for fault detection of batch processes 3.1. Control charts For clarity, here the raw sampling sequence of an arbitrary batch sample is denoted by ynew,j = [ynew,j,1, ..., ynew,j,Knew,j ]T and it is denoted by y ˜new,j = [˜ ynew,j,1, ..., y˜new,j,Knew,j ]T after centering and scaling, where j = 1, ..., J. The vector of the corresponding sampling times tnew,j = [tnew,j,1, ..., tnew,j,Knew,j ]T . Using the same approach to express the functions (Eq. (10)) and normalizing 13
ACCEPTED MANUSCRIPT
PT
them according to Eq. (13), this batch sample is represented as a vector˜ xnew (t)) onto the eigenfunctions ξp (t) are φ(˜
RI
valued function x ˜new (t) = [˜ xnew,1 (t), ..., x˜new,J (t)]T . Then the projections of
(29)
NU
SC
D E D E I ˜ xnew (t)), ξp (t) = P αp,i φ(˜ ˜ xnew (t)), φ(˜ ˜ xi (t)) γnew,p = φ(˜ i=1 I I P P 1 = αp,i k (˜ xnew (t), x ˜i (t)) − I k (˜ xnew (t), x ˜m (t)) m=1 i=1 # I I P P 1 1 −I k (˜ xm (t), x ˜i (t)) + I 2 k (˜ xm (t), x ˜m′ (t)) m,m′ =1
MA
m=1
If the first P principal components (PCs) are retained, the score vector γnew = [γnew,1, ..., γnew,P ]T . Like the traditional KPCA model, Hotelling’s
D
T 2 statistic is used to measure the variation of the batch sample (the fitting
TE
function) within the MFKPCA model [36, 37],
AC CE P
T T 2 = γnew Λ−1 γnew
(30)
where Λ−1 is the diagonal matrix of the inverse of the first P eigenvalues. Assuming that the score vectors follow a multivariate normal distribution, the control limit can be obtained from the F -distribution [7], Tα2 ∼
P (I 2 − 1) FP, I−P, α I(I − P )
(31)
where I is the number of the training samples, P is the number of PCs to retain, and α is the significant level. The squared prediction error (SPE) statistic is used to measure the goodness of fit of the batch sample (the fitting function) to the MFKPCA model, which is defined as ˜ xnew (t)) − φ˜P (˜ SP E = ||φ(˜ xnew (t))||2 14
(32)
ACCEPTED MANUSCRIPT
PT
P P where φ˜P (˜ xnew (t)) = γnew,pξp (t) is the reconstructed batch sample with p=1
the first P PCs in the feature space. Based on the fact that hξp (t), ξq (t)i = 1
RI
if p = q and 0 otherwise, the SPE statistic can be calculated as
SC
˜ xnew (t)) − φ˜P (˜ SP E = ||φ(˜ xnew (t))||2 D E P ˜ xnew (t)), φ(˜ ˜ xnew (t)) − P γ 2 = φ(˜ new,p p=1
NU
= k (˜ xnew (t), x ˜new (t)) − I P
2 I
k (˜ xnew (t), x ˜i (t))
i=1 P P
k (˜ xi (t), x ˜m (t)) −
MA
+ I12
I P
i,m=1
p=1
(33)
2 γnew,p
Assuming that the reconstructed errors follow a multivariate normal distri-
D
bution, its control limit can be obtained from a χ2 -distribution [7, 38],
TE
SP Eα ∼ gSP E · χ2hSP E , α
(34)
AC CE P
where gSP E = vSP E / (2 · mSP E ), hSP E = 2 · m2SP E /vSP E , mSP E and vSP E are the estimated mean and variance of the SPE statistic from the training batch samples respectively.
Different from the traditional KPCA model, in MFKPCA, there is also some variation information between the original observations y ˜new,j and the fitting functions x˜new,j (t) that could be utilized for detecting faulty batches. On one hand, if the fitting function reflects the true shape of the variable trajectory, the fitting errors correspond to the measurement noise or system noise. The increase of the measurement noise means that the system is influenced by some instable disturbances. On the other hand, the fitting model for each variable trajectory usually just relies on the training dataset whose samples are mostly in-control, so whether the fitting model is fit for the unknown faulty batches or not needs further consideration. It relies on 15
ACCEPTED MANUSCRIPT
PT
the complexity of the fitting model. In some situations, the fitting errors may contain some useful variation information about the variable trajectories for
RI
distinguishing faulty batches from normal ones, then just using the T 2 and
SC
SPE statistics may not be sufficient. Therefore, the mean squared error (MSE) statistic is used to further measure the fitting errors between the
NU
fitting functions and the original observations,
(35)
MA
Knew,j J X 1X 1 MSE = [˜ xnew,j (tnew,j,k ) − y˜new,j,k ]2 J j=1 Knew,j k=1
Here we use the mean of the fitting errors because of the varying number Knew,j of the sampling times. The control limit of MSE can also be estimated
TE
D
from a χ2 -distribution similarly [7, 38], (36)
AC CE P
MSEα ∼ gM SE · χ2hM SE , α
where gM SE = vM SE / (2 · mM SE ), hM SE = 2 · m2M SE /vM SE , mM SE and vM SE are the mean and variance of the MSE statistic estimated from the training dataset.
As a consequence, in MFKPCA, three statistics (T 2 , SPE and MSE) are used to monitor the batch processes and detect the unknown faulty batches. The experimental results below will show that these three statistics complement each other and are all useful for fault detection in general. In addition, in order to reduce the number of control charts, these statistics can be further synthesized together into one monitoring index by using the joint probability density function [39] or the linear combination [40]. Nevertheless, in this article we mainly use them separately to analyze and compare their individual performance in detail. 16
ACCEPTED MANUSCRIPT
PT
3.2. Monitoring procedures For batch process monitoring, the historical dataset is used to build the
RI
MFKPCA model and to estimate the control limits of three statistics, based
SC
on which a new batch sample is predicted as a faulty one or not. Here, we mainly focus on the problem of end-of-batch fault detection, and the
(i) Off-line modeling procedures
NU
corresponding monitoring procedures are summarized as follows.
MA
(1) Express the batch samples in the training dataset as the vectorvalued functions x ˆi (t) (Eq. (10)), i = 1, ..., I.
D
(2) Calculate the mean functions µ ˆ j (t) (Eq. (11), j = 1, ..., J) and the
TE
scaling parameters δˆj (Eq. (12), j = 1, ..., J), based on which the functions and the observations are centered and scaled by using
AC CE P
Eqs. (13) and (14) respectively.
(3) Calculate the kernel matrix K (Eq. (23)) and carry out the mean ˜ centering in the feature space to get the centered kernel matrix K
(Eq. (26)).
˜ p = Iλp αp , where ||αp ||2 = (4) Solve the eigenvalue problem Kα 1/(Iλp ) and λ1 ≥ λ2 ≥ · · · , to retain the first P principal components, and then calculate the score vector γi for each batch (Eq. (29)). (5) Calculate three statistics’ values Ti2 , SP Ei and MSEi (i = 1, ..., I) for all training batch samples (Eqs. (30), (33) and (35)), and estimate their corresponding control limits (Eqs. (31), (34) and (36)) for monitoring. 17
ACCEPTED MANUSCRIPT
PT
(ii) Fault detection procedures (1) After getting a new whole batch sample ynew,j , j = 1, ..., J, express
RI
it as a vector-valued function x ˆnew (t) (Eq. (10)) in the similar
SC
manner.
(2) Carry out the functional data centering and scaling to get the
NU
vector-valued function x ˜new (t) (Eq. (13)).
(3) Calculate the kernel function k (˜ xnew (t), x ˜i (t)) between the new
MA
batch and each training batch, and calculate the score vector γnew (Eq. (29)).
D
2 (4) Based on Eqs. (30), (33) and (35), calculate three statistics Tnew ,
TE
SP Enew and MSEnew of the new batch.
AC CE P
2 (5) Monitor whether Tnew , SP Enew or MSEnew exceeds its control
limit.
4. Simulated case studies This section uses several simulation examples to illustrate the potential applications of the MFKPCA method. Here, each batch sample is constructed as follows, yi,1,k = ai × ti,1,k + εi,1,k yi,2,k = ai × t2i,2,k + εi,2,k
(37)
yi,3,k = bi × [4 sin(ti,3,k ) + 0.5 sin(ω0 ti,3,k )] + εi,3,k where each batch has three monitoring variables, i.e. j = 1, 2, 3, whose trajectories are linear, quadratic and sine-shaped respectively. The yi,j,k ’s 18
ACCEPTED MANUSCRIPT
PT
are observations, where i = 1, ..., I, j = 1, 2, 3 and k = 1, ..., Ki,j . The sampling times ti,j,k ∈ [0, ti,max ] with sampling interval 0.01, and the ti,max ’s
RI
are randomly sampled from the uniform distribution on the interval [0.9, 1.1].
SC
The εi,j,k ’s are independently and identically distributed (i.i.d.) Gaussian noise with mean 0 and variance σε2 , i.e. εi,j,k ∼N (0, σε2 ). The coefficient
NU
vectors [ai , bi ]T and the parameter ω0 are adjustable to obtain several different cases.
MA
Since the batch length is changing, the traditional MPCA [7] and MKPCA [11] methods couldn’t be used directly without preprocessing the irregular three-way array. Here, the DTW approach developed by Kassidas et al. [15]
D
is utilized to equalize the batch length and synchronize the variable trajecto-
TE
ries. In our experiments, the width of the band global constraint is chosen as 15, and an arbitrary batch sample that has 100 sampling times in the train-
AC CE P
ing dataset is chosen as the initial referenced batch. Then twenty iterations are repeated to train the DTW model, i.e. to get the final referenced batch trajectory and the weight matrix. For the FSPCA [20] and MFKPCA methods, the B-splines are used here to construct the functions for all variables’ trajectories. The parameters are chosen via the generalized cross-validation approach, which uses a twicediscounted mean squared error measure as the parameter selection criterion [21, 34]. The corresponding results are as follows. Variable 1 uses 3 order 2 B-splines and chooses the integrated squared functions themselves as the roughness with the smoothing parameter η = 10−2 . Variable 2 uses 5 order 4 B-splines and chooses the integrated squared second derivatives as the roughness with η = 10−2 . And variable 3 uses 15 order 6 B-splines and
19
ACCEPTED MANUSCRIPT
PT
3
14 3
2.5 2
10 Variable 3
1.5 1
1
SC
1.5
RI
2 Variable 2
Variable 1
12
2.5
0.5
0.5
−0.5 0.5 Time
1
0
0.5 Time
1
6 4 2 0
0
0.5 Time
1
MA
0
NU
0 0
8
Figure 2: Original observations (gray points) and fitting functions (blue dashed lines) of
D
ten batches in case A.
chooses the integrated squared fourth derivatives as the roughness with η =
TE
10−7 . Fig. 2 shows three variables’ sampling points (gray points) of ten
AC CE P
batches and their corresponding fitting functions (blue dashed lines) in case A, and the results show that the fitting curves can fit the original trajectories well. In each case, 800 normal batches are used as the training dataset to build the models, and 200 normal batches and 200 faulty ones form the testing dataset to test their monitoring performance. 4.1. Case A: variations in variable trajectories The coefficient vectors [ai , bi ]T of both normal batches and faulty batches follow the bivariate Gaussian distribution with mean vector [2.5, 2.5]T and covariance matrix diag {2.5, 2.5}. Their measurement errors εi,j,k are all i.i.d. Gaussian noise with mean 0 and variance 0.04, i.e. σε = 0.2. The main difference between normal batches and faulty ones is in the shape of the trajectory of variable 3. In normal batches, the parameter ω0 = 10 (see 20
ACCEPTED MANUSCRIPT
100
80 T2
40
40
20
20 50
100
150
200 Samples
250
300
350
0
400
300
1
250
0.8
200 150
0.6 0.4
0
50
100
150
200 Samples
250
300
350
0.2
400
50
100
0
50
150
200 Samples
250
300
350
400
100
150
200 Samples
250
300
350
400
NU
100
0
RI
0
SPE
SPE
0
60
SC
T2
60
PT
100 Testing: normal Testing: faulty Control limit
80
(a)
(b)
T2
40
T2
20
10
50
100
150
200 Samples
250
SPE
10
5
50
100
150
200 Samples
250
AC CE P
0
350
TE
15
0
300
400
300
350
20
0
0
50
100
150
200 Samples
250
300
350
400
0
50
100
150
200 Samples
250
300
350
400
0
50
100
150
200 Samples
250
300
350
400
0.2
0.1
0
0.5 MSE
0
D
0
SPE
MA
30
0.4 0.3 0.2
400
(d)
(c)
Figure 3: Control charts with 95% confidence limits (black dashed lines) by using (a) MPCA with P = 10, (b) MKPCA with P = 10 and S = 103 , (c) FSPCA with P = 7, and (d) MFKPCA with P = 5 and S = 103 , in case A.
Eq. (37)), while ω0 = 11 in faulty batches. The observations of ten normal batches are presented as the gray points in Fig. 2, and their fitting functions are presented as the blue dashed lines. Based on the collected dataset, four methods including MPCA, MKPCA, FSPCA and MFKPCA are used to monitor the batch process respectively. For getting good monitoring performance, all these methods need to specify some parameters. MPCA and FSPCA need to select the number of PCs P 21
ACCEPTED MANUSCRIPT
PT
to retain, and MKPCA and MFKPCA need to select both the number of PCs P and the kernel parameter. Here, the radial basis kernel (Eq. (21))
RI
is mainly used due to its flexibility and hence the kernel parameter S needs
SC
to be specified. In this article, we compare the monitoring performance
NU
of four methods with different specified parameters. The number of PCs P ∈ {2, 3, ..., 15} and the kernel parameter S ∈ 10−3 , 10−2 , ..., 103 , then
the best monitoring results of them are shown in Fig. 3. For adjusting the
MA
magnitude of the SPE statistic, each value of the kernel function is scaled by ˜ /(I − 1) [11]. dividing it by the constant trace K
The control charts clearly show that the fault detection performance of
D
FSPCA and MFKPCA is superior to that of MPCA and MKPCA, and
TE
MFKPCA obtains the similar results with FSPCA in this case. Based on analyzing the dataset, the only difference between normal batches and faulty
AC CE P
ones relies on the shape of the trajectory of the third variable, so the random noise may flood this variation information. However, using FDA to construct the functions can reduce the bad influence of the noise to some extent. Besides, in MPCA and MKPCA, using the DTW approach to synchronize the trajectories may distort the fault patterns and reduce the difference between normal and faulty batches [41]. It is mainly because all the batch trajectories are synchronized according to the referenced batch trajectory which is trained just based on the normal batches. After synchronization, all the batches including the faulty batches will tend to have the similar trajectory shape with the referenced batch. Therefore, in this case study, the MPCA and MKPCA methods couldn’t detect out most of the faulty batches. On the contrary, the FSPCA and MFKPCA methods often do not have such
22
ACCEPTED MANUSCRIPT
60
20
Testing: normal Testing: faulty Control limit
0
0
50
100
T2
40
50
100
150
200 Samples
250
300
350
400
0.04 0.02 0
0.8 1.5 MSE
0.4 0.2 0
1 0.5 0
0
50
100
150
200 Samples
250
300
350
200 Samples
250
300
350
400
50
0
50
100
150
200 Samples
250
300
350
400
100
150
200 Samples
250
300
350
400
(b)
MA
(a)
400
0
NU
SPE
0.6
SC
0
SPE
0
150
RI
0.06
20
PT
T2
40
Figure 4: Control charts with 95% confidence limits (black dashed lines) by using (a)
D
FSPCA with P = 10, and (b) MFKPCA with P = 5 and S = 103 , in case B.
TE
distortion problem by using the FDA approach, so their control charts show much better detecting results and most of the faulty batches can be detected
AC CE P
correctly. Since there are only linear correlations, the proposed MFKPCA method presents the similar performance to FSPCA. In fact, the MFKPCA method tends to choose a large kernel width parameter S in order to mimic a linear monitoring model. Since the random noise doesn’t contain any variation information, the MSE statistic doesn’t work in this case. 4.2. Case B: variations in Gaussian noise In this case, the coefficient vectors [ai , bi ]T in all batches still follow the bivariate Gaussian distribution with mean vector [2.5, 2.5]T and covariance matrix diag {2.5, 2.5}, and the parameter ω0 = 10 for both normal batches and faulty ones. The standard deviation σε of Gaussian noise in normal batches is still equal to 0.2, however σε in faulty batches increases to 0.3 in this case. The control charts for FSPCA and MFKPCA are shown in 23
ACCEPTED MANUSCRIPT
4.5
PT
Normal batches Faulty batches 4
RI
3
SC
b
i
3.5
2.5
1.5 0.5
1
1.5
2
NU
2
2.5
3
3.5
4
4.5
5
a
i
MA
Figure 5: Distribution of the coefficient vectors [ai , bi ]T in case C.
Fig. 4. FSPCA performs not well and can’t detect most of the faults, since
D
the variation information between normal and faulty batches is filtered out
TE
after fitting the functions. However, based on the MSE statistic, MFKPCA can still accurately detect most of the faulty batches. Thus the MSE statistic
AC CE P
may complement the T 2 and SPE statistics well for fault detection. In summary, the whole information about the batch process in MFKPCA is divided into three parts: the T 2 statistic is used to measure the variation of the fitting function within the MFKPCA model, the SPE statistic is used to measure the goodness of fit of the fitting function to the MFKPCA model, and the MSE statistic is used to measure the fitting errors between the fitting function and the original observations. These three statistics complement each other in batch process monitoring. Here it is a relatively special case where the measurement noise itself contains some useful variation information.
24
ACCEPTED MANUSCRIPT
20 Testing: normal Testing: faulty Control limit T2
15
T2
40
20
5
50
100
150
200 Samples
250
300
350
0
400
300
50
1 SPE
SPE
200
0.95
100
0.9
0
50
100
150
200 Samples
250
300
350
0.85
400
100
0
50
150
200 Samples
250
300
350
400
100
150
200 Samples
250
300
350
400
NU
0
0
1.05
RI
0
SC
0
10
PT
60
(a)
(b)
15
T2
10
10 5
0
50
100
150
200 Samples
250
2
1 0.5
0
50
100
150
200 Samples
250
AC CE P
0
350
TE
SPE
1.5
300
400
D
0
300
350
5 0
0
50
100
150
200 Samples
250
300
350
400
0
50
100
150
200 Samples
250
300
350
400
0
50
100
150
200 Samples
250
300
350
400
1.5 1 0.5 0
0.2 MSE
T2
15
SPE
MA
20
0.1
0 400
(d)
(c)
Figure 6: Control charts with 95% confidence limits (black dashed lines) by using (a) MPCA with P = 10, (b) MKPCA with P = 10 and S = 10, (c) FSPCA with P = 7, and (d) MFKPCA with P = 5 and S = 1, in case C.
4.3. Case C: nonlinear correlations In all batches, the parameter ω0 is specified as 10, and the standard deviation σε of Gaussian noise is specified as 0.2. However, different from cases A and B, the coefficient vectors [ai , bi ]T are distributed as Fig. 5, where the blue points represent 1000 normal batches and the red asterisks represent 200 faulty batches. It can be seen that the coefficient vectors of normal batches are nonlinearly distributed, which leads to the nonlinear correlations be25
ACCEPTED MANUSCRIPT
PT
tween some monitoring variables. Fig. 6 shows the control charts of MPCA, MKPCA, FSPCA and MFKPCA. The nonlinear correlation makes the non-
RI
linear monitoring methods achieve better monitoring results than the linear
SC
methods, and the testing accuracies of MKPCA and MFKPCA are much better than those of MPCA and FSPCA respectively. Overall, the proposed
NU
MFKPCA method achieves the highest testing accuracy of about 90%, which is much higher than the MKPCA method whose testing accuracy is about
MA
70%. This phenomenon is similar to case A that the random noise and the DTW approach may influence and distort the difference in variable trajectories between normal and faulty batches. Therefore, the MFKPCA method
TE
D
based on FDA is more suitable to describe the batch dataset. 5. Industrial application case study
AC CE P
This section uses an industrial case study to further verify the actual performance of MFKPCA in batch process monitoring. The dataset is collected from a semiconductor etch process performed on the Lam 9600 plasma etch tool, which is to etch the TiN/Al-0.5% Cu/TiN/oxide stack with an inductively coupled BCl3 /Cl2 plasma [37]. It mainly includes two important steps: main etch of the Al layer terminating at the Al endpoint (step 4) and overetch for the underlying TiN and oxide layer (step 5), and the whole dataset contains three kinds of monitoring signals: machine state variables (MSV), optical emission spectroscopy (OES) and radio-frequency monitors (RFM) [37]. Here, 19 MSVs listed in Table 1 are used to do the experiments, and the whole dataset contains 107 normal batches and 20 faulty batches (not include the batches with a lot of missing values). In our experiments, 87 nor26
ACCEPTED MANUSCRIPT
PT
Table 1: Machine state variables in the semiconductor etch process [37]. Variable
No.
Variable
1
BCl3 flow
11
RF power
2
Cl2 flow
12
RF impedance
3
RF bottom power
13
TCP tuner
4
RFB reflected power
14
TCP phase error
5
Endpoint A detection
15
6
Helium pressure
16
7
Chamber pressure
8
RF tuner
9
RF load
10
Phase error
SC
TCP impedance TCP top power
TCP reflected power
NU
17
RI
No.
TCP load
19
Vat valve
MA
18
mal batches are randomly chosen as the training dataset, and the remaining
D
20 normal batches and 20 faulty batches form the testing dataset.
TE
We construct the functions of the variable trajectories by using the parameters presented in Table 2. The trajectories of variables 5 (Endpoint A detec-
AC CE P
tion), 9 (RF load), 13 (TCP tuner) and 18 (TCP load) show obviously fluctuant features and functional nature, and hence use more B-splines than other variables. Besides, these four variables all use the integrated squared second derivatives as the roughness. The original sampling points (gray points) and their fitting functions (blue dashed lines) of the normal batches are shown in Fig. 7. For convenience, the domain is defined as T = [0, 100] here. Other variables use the constant basis to fit their trajectories. Based on the fitting functions, FSPCA and MFKPCA are respectively used to build the monitoring models on the training dataset, and their monitoring performance is tested on the testing dataset. For the traditional MPCA and MKPCA methods, the DTW approach is also used to synchronize the batch trajectories. Here, the width of band global constraint is chosen as 15, and one batch that 27
ACCEPTED MANUSCRIPT
PT
Table 2: Parameter selection in functional data representation for each variable. Variable
Order
Interior knots
5
Endpoint A detection
4
[6,6,6,9,9,9,20,30,40,40,40,48,50,58,61,68,68,68,80,90,99]
9
RF load
4
[25,50,55,60,65,70,85]
13
TCP tuner
4
[6,15,25,42,42,42,50,55,63,63,63,68,80,90,99]
10−3
18
TCP load
4
[6,6,6,15,25,42,42,42,50,55,61,61,61,68,80,90,99]
10−4
SC
Other variables
RI
No.
The constant basis
λ 10−2 1
NU
has 100 sampling times in the training dataset is chosen as the referenced batch. Then the DTW model is built based on 20 iterations. Similar to the
MA
above simulations, we also use different parameters to build different models,
D
i.e. the number of PCs P ∈ {5, 10, 15, 20, 25, 30} and the kernel parameter S ∈ 10−3 , 10−2 , ..., 103 , and then compare their best monitoring results.
TE
The control charts in a single experiment are shown in Fig. 8 (note their different vertical scales), where the blue points represent the normal batches
AC CE P
in the testing dataset and the red asterisks represent the faulty ones. The results show that the MFKPCA method gets the best monitoring result whose detecting accuracy of the whole testing dataset is 87.5%, while the highest accuracy among all other methods is just 82.5%. It’s worth noting that, in Fig. 8(d), the MSE value of fault 12 is much larger than others. This corresponds to a particular situation where the models of the fitting functions may not fit the unknown faulty batches. Consider the contribution to the MSE value for each variable in the whole batch, CM SE,new,j =
1 Knew,j
Knew,j
X
[˜ xnew,j (tnew,j,k ) − y˜new,j,k]2
(38)
k=1
where j = 1, 2, ..., 19. In the corresponding contribution plot (Fig. 9(a)), the contribution for variable 1 (BCl3 flow) dominates the whole MSE value. This is connected with the trajectories of variable 1 that are shown in Fig. 9(b). 28
9200
3000
9150
2500
9100
9000
1000
8950
500
8900
0
0
20
40
60
80
4
2.1
0
20
40
60
80
100
60
80
100
Time
4
x 10
2.9
x 10
2.85
TCP load
MA
2.05 2 1.95
20
TE
0
40
60
80
2.8
2.75 2.7
D
1.9 1.85
8850
100
NU
Time
TCP tuner
9050
RI
1500
SC
2000
PT
3500
RF load
Endpoint A detection
ACCEPTED MANUSCRIPT
100
Time
2.65
0
20
40 Time
AC CE P
Figure 7: Original observations (gray points) and fitting functions (blue dashed lines) of variables 5 (Endpoint A detection), 9 (RF load), 13 (TCP tuner) and 18 (TCP load).
The trajectories of normal batches are almost constant and hence we use the constant basis to construct their fitting functions. This is reasonable based on the prior knowledge since we just have the information about the training dataset before we collect any other different batches. However, in fault 12, the trajectory of variable 1 (the red dashed line in Fig. 9(b)) varies a lot in the batch duration and its sampling value changes from about 725 (in step 4) to about 775 (in step 5). When the constant basis is used, the fitting error for variable 1 is very large, so is the MSE value. Therefore, using the MSE statistic here can easily detect this fault. This also illustrates that in MFKPCA introducing the MSE statistic for batch process monitor29
ACCEPTED MANUSCRIPT
20 Testing: normal Testing: faulty Control limit
15 T2
40
10
0
5
0
5
10
15
20 Samples
25
30
35
0
40
6
5
10
5
2
SPE
10
SPE
10
4
0
10
10
3
−2
0
5
10
15
20 Samples
25
30
35
10
40
10
0
5
15
20 Samples
25
30
35
40
10
15
20 Samples
25
30
35
40
NU
10
0
4
10
RI
20
SC
T2
60
PT
80
(a)
(b)
T2
400
T2
40
20
5
10
15
20 Samples
25
3
2
10
1
5
10
15
20 Samples
25
AC CE P
0
35
TE
SPE
10
10
30
40
30
35
200
0
0
5
10
15
20 Samples
25
30
35
40
0
5
10
15
20 Samples
25
30
35
40
30
35
40
2
10
0
10
−2
10
5
10
Fault 12
MSE
0
D
0
SPE
MA
60
0
10 40
0
5
10
15
20 Samples
25
(d)
(c)
Figure 8: Control charts with 95% confidence limits (black dashed lines) by using (a) MPCA with P = 5, (b) MKPCA with P = 5 and S = 103 , (c) FSPCA with P = 10, and (d) MFKPCA with P = 15 and S = 103 , in the semiconductor etch process.
ing is reasonable, since the fitting errors may contain some useful variation information in variable trajectories. Considering the randomness in a single experiment, we repeat the above experiment 100 times and the average detecting results on the testing dataset including the average accuracy, false alarm rate and missing alarm rate are listed in Table 3. For each monitoring method, the results of using one single statistic (such as T 2 , SPE or MSE) and the results of using all statistics 30
ACCEPTED MANUSCRIPT
5
PT
x 10
RI
3
2
1
0
0
5
10 Variable No.
770
20
760
MA
BCl3 flow
15
NU
(a)
780
750 740
TE
0
D
730 720
SC
Contribution to MSE
4
20
40
60
80
100
Time
(b)
AC CE P
Figure 9: Illustration of fault 12: (a) the contribution to the MSE value of fault 12 for each variable, and (b) the trajectories of variable 1, where the blue lines represent the normal batches and the red dashed line represents fault 12.
simultaneously (denoted by “Combined” in Tables 3 and 4, which means that if any statistic exceeds its corresponding control limit, the testing batch is labeled as a fault) are all listed respectively. The average testing accuracy of MFKPCA is about 81.8%, which is higher than all the other methods and is about 3% higher than the suboptimal FSPCA method. Compared with the traditional MPCA and MKPCA methods, MFKPCA has a similar missing alarm rate, but has a lower false alarm rate, i.e. less normal batches are falsely predicted as faults in MFKPCA. On the contrary, the false alarm rate of MFKPCA is quite similar to that of FSPCA (their difference is less than
31
ACCEPTED MANUSCRIPT
the semiconductor etch process.
PT
Table 3: Average detecting results on the testing dataset by using different methods in Statistic
Accuracy
False alarm rate
Missing alarm rate
MPCA
T2
0.627
0.000
0.747
SPE
0.769
0.458
Combined
0.769
0.458
T2
0.532
0.000
MFKPCA
SC
0.004 0.934
SPE
0.742
0.513
0.004
Combined
0.742
0.513
0.004
T2
0.550
0.000
0.900
SPE
0.788
0.353
0.072
Combined
0.788
0.353
0.072
0.629
0.045
0.699
0.828
0.340
0.005
NU
FSPCA
0.004
MSE
MA
MKPCA
RI
Method
0.581
0.069
0.770
Combined
0.818
0.360
0.005
T2
TE
D
SPE
1%), while MFKPCA’s missing alarm rate is about 6% lower than FSPCA.
AC CE P
It’s worth noting that the “Combined” index is often more sensitive to the faults than the individual statistics. However, accordingly it also becomes easier to mislabel some normal batches as faults. For example, in Table 3, the “Combined” index in MFKPCA has a larger false alarm rate than the SPE statistic. This causes the entire testing accuracy of “Combined” (81.8%) be less than that of SPE (82.8%). Table 4 further presents the monitoring results of FSPCA and MFKPCA on the 20 faulty batches, and the best result for each fault by using the “Combined” index is marked in bold. The results show that using the proposed MFKPCA method could indeed improve the fault detection performance. In addition, as Fig. 9 shows, the MSE statistic in MFKPCA is still very sensitive to fault 12 and can always detect it out correctly. In conclusion, the proposed MFKPCA-based monitoring method
32
ACCEPTED MANUSCRIPT
conductor etch process.
MFKPCA
Fault
T2
SPE
Combined
T2
1
0
1.000
1.000
0.020
2
0
0.970
0.970
3
0
1.000
1.000
SC
FSPCA
PT
Table 4: Monitoring results of the 20 faults by using FSPCA and MFKPCA in the semi-
MSE
Combined
0
1.000
RI
SPE
1.000
0
1.000
0
1.000
0
1.000
0
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
0
1.000
1.000
0
1.000
0
1.000
6
0
0.950
0.950
7
0
1.000
1.000
8
0
0.720
0.720
0.950 1.000
11
0
0.830
12
0
0.150
13
0
1.000
14
0
15
0
16
0
17
MA
0 1.000
0
1.000
0
1.000
1.000
1.000
1.000
1.000
0
1.000
0
1.000
0.950
0
0.900
0.010
0.9000
1.000
1.000
1.000
1.000
1.000
0.830
0
1.000
0
1.000
0.150
0
1.000
1.000
1.000
D
9 10
NU
4 5
1.000
1.000
0
1.000
1.000
0
1.000
0
1.000
1.000
1.000
0
1.000
0
1.000
1.000
1.000
1.000
1.000
0
1.000
0
1.000
1.000
0
1.000
0
1.000
18
0
1.000
1.000
1.000
1.000
0
1.000
19
0
1.000
1.000
0.010
1.000
0
1.000
20
0
1.000
1.000
0
1.000
0.600
1.000
Average
0.100
0.929
0.929
0.302
0.995
0.231
0.995
AC CE P
TE
1.000
1.000
is more reliable for batch process monitoring. 6. Conclusions In the present article, the multivariate functional kernel principal component analysis (MFKPCA) method has been proposed for batch process monitoring by combining the functional data analysis (FDA) approach with the nonlinear principal component analysis approach. Different from the tra-
33
ACCEPTED MANUSCRIPT
PT
ditional batch process monitoring methods such as PARAFAC, Tucker models and multiway monitoring methods, MFKPCA treats each batch sample
RI
as a J-dimensional vector-valued function (or multivariate functional data),
SC
and describes the whole three-way data array (may be regular or irregular) as an I × J-dimensional funtion matrix based on FDA. Generally speaking,
NU
using the FDA approach can highlight the subtle differences in the variables’ trajectories between normal batches and faulty ones, and can easily handle
MA
the problems of unequal batch length and missing data. The kernel trick is introduced to handle the nonlinear correlations between different monitoring variables and/or sampling times in the batch process. Since the fitting errors
D
may contain some useful variation information, three complementary statis-
TE
tics, including Hotelling’s T 2 , SPE and MSE, have been used to construct the control charts for batch process monitoring. Some simulation examples
AC CE P
and an industrial application case study have been used to demonstrate the performance of the proposed MFKPCA method, and the results show that it can indeed improve the monitoring performance of the semiconductor etch process.
Acknowledgements
This work was supported by the National Key Technologies Research and Development Program of China (No.2011AA060203) and the No.2 Important National Science and Technology Specific Projects of China (No.2011ZX02504).
34
ACCEPTED MANUSCRIPT
PT
References [1] Z. Q. Ge, Z. H. Song, F. R. Gao, Review of recent research on data-based
RI
process monitoring, Ind. Eng. Chem. Res. 52 (2013) 3543–3562.
SC
[2] A. K. Smilde, D. A. Doornbos, Three-way methods for the calibration
J. Chemom. 5 (1991) 345–360.
NU
of chromatographic systems: comparing PARAFAC and three-way PLS,
Syst. 38 (1997) 149–171.
MA
[3] R. Bro, PARAFAC. Tutorial and applications, Chemom. Intell. Lab.
D
[4] J. A. Westerhuis, T. Kourti, J. F. MacGregor, Comparing alternative
TE
approaches for multivariate statistical analysis of batch process data, J. Chemom. 13 (1999) 397–413.
AC CE P
[5] D. J. Louwerse, A. K. Smilde, Multivariate statistical process control of batch processes based on three-way models, Chem. Eng. Sci. 55 (2000) 1225–1235.
¨ [6] S. Wold, P. Geladi, K. Esbensen, J. Ohman,
Multi-way principal
components- and PLS-analysis, J. Chemom. 1 (1987) 41–56. [7] P. Nomikos, J. F. MacGregor, Monitoring batch processes using multiway principal component analysis, AIChE J. 40 (1994) 1361–1375. [8] P. Nomikos, J. F. MacGregor, Multivariate SPC charts for monitoring batch processes, Technometrics 37 (1995) 41–59. [9] P. Nomikos, J. F. MacGregor, Multi-way partial least squares in monitoring batch processes, Chemom. Intell. Lab. Syst. 30 (1995) 97–108. 35
ACCEPTED MANUSCRIPT
PT
[10] C. K. Yoo, J. M. Lee, P. A. Vanrolleghem, I. B. Lee, On-line monitoring of batch processes using multiway independent component analysis,
RI
Chemom. Intell. Lab. Syst. 71 (2004) 151–163.
SC
[11] J. M. Lee, C. Yoo, I. B. Lee, Fault detection of batch processes using multiway kernel principal component analysis, Comput. Chem. Eng. 28
NU
(2004) 1837–1847.
[12] Y. W. Zhang, S. J. Qin, Fault detection of nonlinear processes using
MA
multiway kernel independent component analysis, Ind. Eng. Chem. Res. 46 (2007) 7780–7787.
D
[13] X. M. Tian, X. L. Zhang, X. G. Deng, S. Chen, Multiway kernel inde-
TE
pendent component analysis based on feature samples for batch process
AC CE P
monitoring, Neurocomputing 72 (2009) 1584–1596. [14] C. H. Zhao, F. R. Gao, F. L. Wang, Nonlinear batch process monitoring using phase-based kernel-independent component analysis-principal component analysis (KICA-PCA), Ind. Eng. Chem. Res. 48 (2009) 9163– 9174.
[15] A. Kassidas, J. F. MacGregor, P. A. Taylor, Synchronization of batch trajectories using dynamic time warping, AIChE J. 44 (1998) 864–875. [16] J. M. Gonzal´ez-Mart´ınez, R. Vitale, O. E. de Noord, A. Ferrer, Effect of synchronization on bilinear batch process modeling, Ind. Eng. Chem. Res. 53 (2014) 4339–4351. ¨ [17] C. Undey, A. Cinar, Statistical monitoring of multistage, multiphase batch processes, IEEE Contr. Syst. Mag. 22 (2002) 40–52. 36
ACCEPTED MANUSCRIPT
PT
[18] H. Sakoe, S. Chiba, Dynamic programming algorithm optimization for spoken word recognition, IEEE Trans. Acoust. Speech Signal Process.
RI
26 (1978) 43–49.
SC
[19] J. M. Gonzal´ez-Mart´ınez, A. Ferrer, J. A. Westerhuis, Real-time synchronization of batch trajectories for on-line multivariate statistical pro-
NU
cess control using dynamic time warping, Chemom. Intell. Lab. Syst. 105 (2011) 195–206.
MA
[20] J. H. Chen, J. L. Liu, Derivation of function space analysis based PCA control charts for batch process monitoring, Chem. Eng. Sci. 56 (2001)
D
3289–3304.
TE
[21] J. O. Ramsay, B. W. Silverman, Functional Data Analysis, 2nd ed.,
AC CE P
Springer, New York, 2005. [22] F. Ferraty, P. Vieu, Nonparametric Functional Data Analysis: Theory and Practice, Springer, New York, 2006. [23] F. Yao, H. G. M¨ uller, J. L. Wang, Functional data analysis for sparse longitudinal data, J. Am. Stat. Assoc. 100 (2005) 577–590. [24] A. Cuevas, A partial overview of the theory of statistics with functional data, J. Stat. Plan. Infer. 147 (2014) 1–23. [25] G. Biau, F. Bunea, M. H. Wegkamp, Functional classification in Hilbert spaces, IEEE Trans. Inf. Theory 51 (2005) 2163–2172. [26] F. Rossi, N. Villa, Support vector machine for functional data classification, Neurocomputing 69 (2006) 730–742. 37
ACCEPTED MANUSCRIPT
PT
[27] A. Delaigle, P. Hall, Achieving near perfect classification for functional data, J. R. Stat. Soc. Ser. B: Stat. Methodol. 74 (2012) 267–286.
RI
[28] G. M. James, C. A. Sugar, Clustering for sparsely sampled functional
SC
data, J. Am. Stat. Assoc. 98 (2003) 397–408.
NU
[29] J. Jacques, C. Preda, Model-based clustering for multivariate functional data, Comput. Stat. Data Anal. 71 (2014) 92–106.
MA
[30] J. M. Torres, P. J. G. Nieto, L. Alejano, A. N. Reyes, Detection of outliers in gas emissions from urban areas using functional data analysis,
D
J. Hazard. Mater. 186 (2011) 144–149.
TE
[31] G. Yu, C. L. Zou, Z. J. Wang, Outlier detection in functional observations with applications to profile monitoring, Technometrics 54 (2012)
AC CE P
308–318.
[32] D. A. Clifton, L. Clifton, S. Hugueny, D. Wong, L. Tarassenko, An extreme function theory for novelty detection, IEEE J. Sel. Top. Signal Process. 7 (2013) 28–37. [33] J. O. Ramsay, C. J. Dalzell, Some tools for functional data analysis, J. R. Stat. Soc. Ser. B: Methodol. 53 (1991) 539–572. [34] G. H. Golub, M. Health, G. Wahba, Generalized cross-validation as a method for choosing a good ridge parameter, Technometrics 21 (1979) 215–223. [35] B. Sch¨olkopf, A. Smola, K. R. M¨ uller, Nonlinear component analysis as a kernel eigenvalue problem, Neural Comput. 10 (1998) 1299–1319. 38
ACCEPTED MANUSCRIPT
PT
[36] H. Hotelling, Analysis of a complex of statistical variables into principal components, J. Educ. Psychol. 24 (1933) 417–441.
RI
[37] B. M. Wise, N. B. Gallagher, S. W. Butler, D. D. White, G. G. Barna,
SC
A comparison of principal component analysis, multiway principal component analysis, trilinear decomposition and parallel factor analysis for
NU
fault detection in a semiconductor etch process, J. Chemom. 13 (1999) 379–396.
MA
[38] G. E. P. Box, Some theorems on quadratic forms applied in the study of analysis of variance problems, I. Effect of inequality of variance in the
D
one-way classification, Ann. Math. Stat. 25 (1954) 290–302.
TE
[39] Q. Chen, U. Kruger, M. Meronk, A. Y. T. Leung, Synthesis of T 2 and Q
AC CE P
statistics for process monitoring, Contr. Eng. Pract. 12 (2004) 745–755. [40] H. H. Yue, S. J. Qin, Reconstruction-based fault identification using a combined index, Ind. Eng. Chem. Res. 40 (2001) 4403–4414. [41] Y. Yao, F. R. Gao, A survey on multistage/multiphase statistical modeling methods for batch processes, Annu. Rev. Contr. 33 (2009) 172–183.
39
ACCEPTED MANUSCRIPT Figures Captions
PT
Figure 1: Illustration of functional data representation of batch process data: (a) irregular three-way array, and (b) the corresponding function matrix.
RI
Figure 2: Original observations (gray points) and fitting functions (blue dashed lines) of ten
SC
batches in case A.
Figure 3: Control charts with 95% confidence limits (black dashed lines) by using (a) MPCA with
NU
P = 10 , (b) MKPCA with P = 10 and S = 103 , (c) FSPCA with P = 7 , and (d) 3
MFKPCA with P = 5 and S = 10 , in case A.
MA
Figure 4: Control charts with 95% confidence limits (black dashed lines) by using (a) FSPCA 3
with P = 10 , and (b) MFKPCA with P = 5 and S = 10 , in case B. T
[ ai , bi ]
D
Figure 5: Distribution of the coefficient vectors
in case C.
TE
Figure 6: Control charts with 95% confidence limits (black dashed lines) by using (a) MPCA with
AC CE P
P = 10 , (b) MKPCA with P = 10 and S = 10 , (c) FSPCA with P = 7 , and (d) MFKPCA with P = 5 and S = 1 , in case C.
Figure 7: Original observations (gray points) and fitting functions (blue dashed lines) of variables 5 (Endpoint A detection), 9 (RF load), 13 (TCP tuner) and 18 (TCP load).
Figure 8: Control charts with 95% confidence limits (black dashed lines) by using (a) MPCA with
P = 5 , (b) MKPCA with P = 5 and S = 103 , (c) FSPCA with P = 10 , and (d) 3
MFKPCA with P = 15 and S = 10 , in the semiconductor etch process. Figure 9: Illustration of fault 12: (a) the contribution to the MSE value of fault 12 for each variable, and (b) the trajectories of variable 1, where the blue lines represent the normal batches and the red dashed line represents fault 12.
ACCEPTED MANUSCRIPT Highlights
TE
D
MA
NU
SC
RI
PT
Traditional KPCA method is combined with the approach of functional data analysis. The proposed MFKPCA method is used for batch process monitoring. Three complementary statistics (T2, SPE and MSE) are used for fault detection.
AC CE P
l l l