Fault detection of batch processes based on multivariate functional kernel principal component analysis

Fault detection of batch processes based on multivariate functional kernel principal component analysis

    Fault detection of batch processes based on multivariate functional kernel principal component analysis Huangang Wang, Ma Yao PII: DO...

3MB Sizes 0 Downloads 97 Views

    Fault detection of batch processes based on multivariate functional kernel principal component analysis Huangang Wang, Ma Yao PII: DOI: Reference:

S0169-7439(15)00243-9 doi: 10.1016/j.chemolab.2015.09.018 CHEMOM 3108

To appear in:

Chemometrics and Intelligent Laboratory Systems

Received date: Revised date: Accepted date:

11 November 2014 28 September 2015 29 September 2015

Please cite this article as: Huangang Wang, Ma Yao, Fault detection of batch processes based on multivariate functional kernel principal component analysis, Chemometrics and Intelligent Laboratory Systems (2015), doi: 10.1016/j.chemolab.2015.09.018

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

RI

Huangang Wang∗, Ma Yao

PT

Fault detection of batch processes based on multivariate functional kernel principal component analysis

NU

SC

Institute of Control Theory and Technology, Department of Automation, Tsinghua University, Beijing 100084, China

MA

Abstract

In some batch processes, the variables’ trajectories often show obviously functional nature and can be considered as smooth functions rather than just

D

vectors, then the collected three-way data array essentially can be trans-

TE

formed into a two-way function matrix. For this purpose, the approach of

AC CE P

functional data analysis (FDA) is introduced to express the variables’ functions in this article, which can highlight the subtle differences in the shape of variables’ trajectories between normal batches and faulty ones, and can also easily handle the problems of irregular three-way data array, such as unequal batch length, different sampling rates, missing data, etc. Based on the function matrix, a novel monitoring method called multivariate functional kernel principal component analysis (MFKPCA) is proposed for fault detection of batch processes, which directly eigen-decomposes the two-way function matrix in the function space and uses the kernel trick to handle the nonlinear correlations. Different from the traditional PCA method, MFKPCA deCorresponding author. Tel.: 86-10-62781993; Fax: 86-10-62786911. Email addresses: [email protected] (Huangang Wang), [email protected] (Ma Yao) ∗

Preprint submitted to Chemometrics and Intelligent Laboratory SystemsSeptember 28, 2015

ACCEPTED MANUSCRIPT

PT

signs three complementary control charts for batch process monitoring based on three statistics including Hotelling’s T 2 , squared prediction error (SPE)

RI

and mean squared error (MSE). Finally, some simulations and an industrial

SC

semiconductor etch process are used to illustrate the validity of the proposed monitoring method.

NU

Keywords: Batch process monitoring, Fault detection, Multivariate functional kernel principal component analysis, Functional data analysis

MA

1. Introduction

Batch process monitoring is particularly important to ensure the high

D

quality of the final products, such as pharmaceuticals, semiconductors, poly-

TE

mers and bio-chemicals, and hence has attracted many researchers’ attention. Among various monitoring methods, data-driven statistical process monitor-

AC CE P

ing (SPM) methods have been developed rapidly due to their flexibility [1]. The characteristic of batch process dataset is the special structure of the three-way data array (batches × variables × time), where each variable trajectory varying with time in a finite batch duration usually shows interesting nonlinear dynamic feature. This makes the batch process monitoring quite different from the traditional continuous process monitoring. For the regular three-way array Y(I × J × K) (i.e. each variable in each batch has the same sampling number K), parallel factor analysis (PARAFAC) and Tucker models can be used to perform the trilinear decomposition of the batch data [2–5]. Some traditional multivariate SPM methods have already been extended to the case of batch process monitoring based on the multiway processing approaches, such as multiway principal component analysis (MPCA) [6–8], mul2

ACCEPTED MANUSCRIPT

PT

tiway partial least squares (MPLS) [6, 9], multiway independent component analysis (MICA) [10], etc. Essentially, these multiway monitoring methods

RI

are equivalent to first unfolding the three-way array along some direction

SC

and then building the traditional multivariate SPM models based on the unfolded two-way data matrix. In order to handle the nonlinear correlations

NU

between different variables and/or the nonlinear correlations between different sampling times, the kernel trick has been introduced and some nonlinear

MA

monitoring methods have been proposed, such as multiway kernel PCA/ICA (MKPCA/MKICA), etc. [11–14]

However, these monitoring methods require that the collected batch dataset

D

should be a regular three-way array, which implies that each batch should

TE

have the same sampling duration and each variable should have the same sampling rate. While, in many actual batch processes, these constrains of-

AC CE P

ten cannot be satisfied due to the variability in the chemical composition of the raw material, stochastic disturbances, varying environmental conditions, etc.[15, 16] Some observations may be lost due to the malfunction of the sensors and/or the connectors. As a consequence, some data preprocessing procedures are needed to synchronize the raw batch data, such as using indicator variable technique (IVT) [17], dynamic time warping (DTW) [15, 18], and relaxed-greedy time warping (RGTW) [19]. Once the regular three-way array is obtained, the methods mentioned above can be performed to monitor the batch processes. From another viewpoint, Chen and Liu [20] used the concept of orthonormal function approximation to express each variable trajectory, and built the PCA model (called FSPCA in brief) based on the coefficient matrix derived

3

ACCEPTED MANUSCRIPT

PT

from the orthogonal function. FSPCA can handle the problem of varying length in batch runs, and reduce the dimension of each batch sample since

RI

the number of orthogonal functions is often much smaller than that of sam-

SC

pling times [20]. In fact, their processing idea is quite similar to the idea of functional data analysis (FDA) [21, 22], which has already attracted much

NU

attention in different research areas, such as statistics, economics, bioinformatics, chemometrics, etc. And many interesting results have been pub-

MA

lished, such as function data modeling and analysis [21–24], function data classification and clustering [25–29], novelty detection or outlier detection [30–32], etc. Compared with the multivariate data analysis (MDA) based

D

on the finite dimensional vectors, FDA emphasizes the smoothness of the

TE

trajectories and is very suitable for analyzing the non-stationary time series with possibly unequally spaced observations [33], which just coincides

AC CE P

with the situation of irregular three-way array in batch process monitoring. Therefore, the theory of FDA is introduced into batch process monitoring in this article which further extends the rationale of FSPCA. In summary, for batch process monitoring, using FDA can highlight the subtle differences in the variable trajectories between normal batches and faulty ones by removing the random noise, and hence can improve the fault detection performance. Based on the correspondence between the observations and the sampling times, FDA can uniformly handle the irregular three-way dataset, including the problems of unequal batch length, different sampling rates of variables, and missing data. Besides, in some particular situations, utilizing the derivatives or other transformations of the functions could further amplify the difference between normal batches and faulty ones.

4

ACCEPTED MANUSCRIPT

PT

For example, Rossi and Villa [26] used the second derivatives of the spectrometric data to improve the classification accuracy. Without the constraint

RI

of the orthonormal basis functions, some other flexible basis functions, e.g.

SC

B-splines, can be used to describe the variable trajectories. And then we will no longer just focus on the coefficient vectors of the orthonormal func-

NU

tions, but will make full use of the entire function expressions to build the monitoring model.

MA

Moreover, the correlation between different monitoring variables may be nonlinear, so the kernel trick is utilized to build the nonlinear PCA model for multivariate functional data, and the corresponding method is called multi-

D

variable functional kernel principal component analysis (MFKPCA). Since

TE

the selected function model for each variable only relies on the collected normal batch samples, the fitting errors may remain some variation information

AC CE P

about variable trajectories between normal batches and faulty ones. FSPCA ignores this information and only uses the T 2 and SPE statistics to detect the faults. In MFKPCA, we further utilize the statistic of mean squared error (MSE), which measures the difference between the fitting functions and the original observations, to complement those two statistics for fault detection. The rest of this article is organized as follows. First, based on the introduction of functional data representation, the rationale of the MFKPCA method is elaborated in Section 2. By defining the T 2 , SPE and MSE statistics, the monitoring procedures of MFKPCA-based batch process monitoring method are listed in Section 3. In Section 4, different methods are compared via several simulations in order to illustrate the potential applications of MFKPCA. And after that, Section 5 utilizes an industrial semiconductor

5

ACCEPTED MANUSCRIPT

6 B> >@ CA ? &

$% "# !

'()*(+,-. Z[\

8

5

NU

6

/*0-

=:< ;

YY Y

9 P O MQ N ಹ W U RV SX T KL 8 6 '()*(+,-. Z]\

MA

7

42 3 1

!

SC

$% "#

YY Y

RI

&

I DH J E FG

PT

5



Figure 1: Illustration of functional data representation of batch process data: (a) irregular

D

three-way array, and (b) the corresponding function matrix.

TE

etch process to further verify the actual monitoring performance of the pro-

AC CE P

posed method. Finally, some conclusions are drawn in Section 6. 2. Multivariate functional kernel principal component analysis 2.1. Functional data representation In batch processes, the k-th observation of variable j in batch run i is denoted by yi,j,k , and its corresponding sampling time is denoted by ti,j,k , where i = 1, ..., I, j = 1, ..., J, k = 1, ..., Ki,j . Ki,j may vary with both batch index i and variable index j just as Fig. 1(a) shows. Accordingly, the sampling sequence of variable j in batch run i corresponds to a Ki,j -dimensional column vector yi,j = [yi,j,1, ..., yi,j,Ki,j ]T , and the vector of sampling times ti,j = [ti,j,1, ..., ti,j,Ki,j ]T . Here, the observations yi,j,k satisfy the following model, yi,j,k = xi,j (ti,j,k ) + εi,j,k

6

(1)

ACCEPTED MANUSCRIPT

PT

where the function xi,j (t) lies in the Hilbert space L2 (T ) of squared integrable functions for a compact domain T . The measurement errors εi,j,k are

RI

independently and normally distributed with mean E[εi,j,k ] = 0 and variance

SC

var[εi,j,k ] = σj2 . Based on Eq. (1), the i-th batch sample is transformed into a J-dimensional vector-valued function xi (t) = [xi,1 (t), ..., xi,J (t)]T (called

NU

multivariate functional data), and the entire training dataset becomes an Fig. 1(b)), 

· · · x1,J (t)  ..  .. .  .  · · · xI,J (t)

MA

I × J-dimensional function matrix (see  x (t)  1,1  .. X(t) =  .  xI,1 (t)

(2)

D

In order to get the functions in Eq. (2), the basis function system can be

TE

used, such as the Fourier basis functions, B-splines, wavelets, etc. [21] Sup-

then

AC CE P

pose that the basis used for variable j is denoted by {ϕj,d(t), d = 1, ..., Dj },

xi,j (t) =

Dj X

ci,j,d ϕj,d(t) = cTi,j ϕj (t)

(3)

d=1

where the coefficient vector ci,j = [ci,j,1, ..., ci,j,Dj ]T , and the vector of basis functions ϕj (t) = [ϕj,1(t), ..., ϕj,Dj (t)]T . The least squares can be used to estimate the coefficient vector ci,j , min ci,j

K i,j P

[yi,j,k − xi,j (ti,j,k )]2

(4)

k=1 T

⇒ min (yi,j − Ψi,j ci,j ) (yi,j − Ψi,j ci,j ) ci,j

where the matrix

Ψi,j



  = 

··· .. .



ϕj,Dj (ti,j,1)   ..  .  ϕj,1 (ti,j,Ki,j ) · · · ϕj,Dj (ti,j,Ki,j ) ϕj,1 (ti,j,1) .. .

Ki,j ×Dj

7

(5)

ACCEPTED MANUSCRIPT

(6)

RI

ˆ ci,j = (ΨTi,j Ψi,j )−1 ΨTi,j yi,j

PT

Solving the least squares problem (4) gets

dxi,j (t) dn−1 xi,j (t) dn xi,j (t) + · · · + wn−1 + dt dtn−1 dtn

(7)

MA

L (xi,j (t)) = w0 + w1

NU

is a linear differential operator defined as

SC

For controlling the smoothness of the functions, a roughness penalty R pen(xi,j (t)) = T [L (xi,j (t))]2 dt is often added in Eq. (4) [21], where L(·)

Then the optimization problem becomes [21] min

[yi,j,k − xi,j (ti,j,k )]2 + η × pen(xi,j (t))

k=1

D

ci,j

K i,j P

T

ci,j

R

T

L (ϕj (t)) L(ϕj (t))T dt is a Dj × Dj symmetric matrix and can

AC CE P

where Rj =

TE

⇒ min (yi,j − Ψi,j ci,j ) (yi,j − Ψi,j ci,j ) +

(8)

ηcTi,j Rj ci,j

be calculated by a numerical quadrature scheme. The smoothing parameter η is a trade-off between the error of fitting and the smoothness of functions, which can be chosen by using the generalized cross-validation approach [21, 34]. Then the estimated coefficient vector is ˆ ci,j = (ΨTi,j Ψi,j + ηRj )−1 ΨTi,j yi,j

(9)

And the corresponding fitting function is xˆi,j (t) = ˆ cTi,j ϕj (t)

(10)

2.2. Functional data centering and scaling In actual batch processes, different variables may have different physical units and their data may have different scales, so before building the 8

ACCEPTED MANUSCRIPT

PT

MFKPCA model the batch data should be centered and scaled. For the

I

(11)

SC

1X µ ˆj (t)= xˆi,j (t) I i=1

RI

fitting functions xˆi,j (t), the mean function of variable j is

(12)

MA

NU

and the scaling parameter can be calculated as v u I u1 X 1 Z ˆ δj = t [ˆ xi,j (t) − µ ˆ j (t)]2 dt I i=1 C(T ) T

where C(T ) represents the length of the domain T . Then the fitting functions can be normalized as follows,

(13)

TE

D

x˜i,j (t) = (ˆ xi,j (t) − µ ˆ j (t)) /δˆj = ˜ cTi,j ϕj (t)

  P where ˜ ci,j = ˆ ci,j − I1 Im=1 ˆ cm,j /δˆj . Accordingly, the i-th batch sample

AC CE P

is denoted by x ˜i (t) = [˜ xi,1 (t), ..., x˜i,J (t)]T , and the entire training dataset ˜ is denoted by X(t) = [˜ x1 (t), ..., x ˜I (t)]T . Similarly, the original observations

yi,j,k can also be processed as y˜i,j,k = (yi,j,k − µ ˆj (ti,j,k )) /δˆj

(14)

in order to be compared with the sampling values x˜i,j (ti,j,k ) of the fitting functions in the similar manner. 2.3. Algorithm of MFKPCA In traditional PCA, the connection between the covariance matrix and the kernel matrix (or the inner-product matrix) is the foundation of the KPCA algorithm [35]. For batch processes whose samples are vector-valued functions (or multivariate functional data), this connection still remains. Considering 9

ACCEPTED MANUSCRIPT

PT

the case of using the linear kernel firstly, the covariance function matrix of ˜ the normalized training dataset X(t) is I

(15)

SC

RI

1X ˜ t) = 1 X(s) ˜ T X(t) ˜ V(s, = x ˜i (s)˜ xi (t)T I I i=1

which is a J × J matrix changing with the time variables s, t ∈ T . The

NU

corresponding eigenvalue problem is defined as [21, 29] Z ˜ t)ξp (t)dt = λp ξp (s) V(s, T

MA

where the eigenvalues λp are non-increasing, i.e.

(16)

λ1 ≥ λ2 ≥ · · · , and

D

the eigenfunctions ξp (t) = [ξp,1(t), ..., ξp,J (t)]T satisfy that hξp (t), ξq (t)i = R ξ (t)T ξq (t)dt = 1 if p = q and 0 otherwise. T p

TE

Similar to the PCA algorithm [35], each eigenfunction lies in the subspace

spanned by the training dataset, which implies that there is an I-dimensional

AC CE P

vector αp = [αp,1 , ..., αp,I ]T ∈ RI satisfying the following equation, ˜ T αp = ξp (t) = X(t)

I X

αp,i x ˜i (t)

(17)

i=1

Then the eigen-problem (16) becomes R ˜ t)ξp (t)dt = λp ξp (s) V(s, T R ˜ T ˜ X(t) ˜ T dtαp = Iλp X(s) ˜ T αp ⇒ X(s) X(t) T

(18)

˜ T Gα ˜ p = Iλp X(s) ˜ T αp ⇒ X(s)

where the inner-product matrix R ˜ = ˜ X(t) ˜ T dt G X(t) T  h˜ x1 (t), x ˜1 (t)i · · · h˜ x1 (t), x ˜I (t)i   .. .. .. = . . .  h˜ xI (t)), x ˜1 (t)i · · · h˜ xI (t), x ˜I (t)i 10

    

(19)

I×I

ACCEPTED MANUSCRIPT

PT

J R J P P ˜ i,m = h˜ Its elements G xi (t), x ˜m (t)i = x˜ (t)˜ xm,j (t)dt = ˜ cTi,j R0j ˜ cm,j , T i,j j=1 j=1 R i, m = 1, ..., I, where R0j = T ϕj (t)ϕj (t)T dt (j = 1, ..., J) is a Dj × Dj

RI

symmetric matrix and can be calculated by a numerical quadrature scheme.

SC

Since the R0j is constant for different batches, it can be calculated and stored in advance for reducing the redundant calculation. Since the last row in

NU

Eq. (18) must hold for ∀s ∈ T , it implies that

(20)

MA

˜ p = Iλp αp Gα

i.e. the vector αp is proportional to the eigenvector of the inner-product

D

matrix. Considering the constraint hξp (t), ξp (t)i = 1, the L2 -norm of αp

TE

must satisfy ||αp ||2 = 1/(Iλp ).

As a consequence, we can use the decomposition of the inner-product

AC CE P

matrix to build the PCA model for multivariate functional data (or vectorvalued functions). By introducing the kernel trick, each batch sample (vectorvalued function) is mapped from the original space Ω to the feature space ℑ by the nonlinear transformation function φ(·) : x ˜i (t) 7→ φ(˜ xi (t)), and the inner product of two batch samples in the feature space can be expressed by the kernel function k (˜ xi (t), x ˜m (t)) = hφ(˜ xi (t)), φ(˜ xm (t))i (i, m = 1, ..., I). For example, the widely used radial basis kernel has the following form, !  k (˜ xi (t), x ˜m (t)) = exp −||˜ xi (t) − x ˜m (t)||2 /S 2

(21)

where the kernel width parameter S is prespecified by the user, and the

11

ACCEPTED MANUSCRIPT

PT

squared L2 -norm is J R P

(˜ xi,j (t) − x˜m,j (t))2 dt j=1 R  J P = (˜ ci,j − ˜ cm,j )T T ϕj (t)ϕj (t)T dt (˜ ci,j − ˜ cm,j )

=

j=1

RI

j=1 J P

T

(22)

SC

||˜ xi (t) − x ˜m (t)||2 =

(˜ ci,j − ˜ cm,j )T R0j (˜ ci,j − ˜ cm,j )

NU

Based on the kernel function, the corresponding kernel matrix can be

D

MA

calculated conveniently as follows:  k (˜ x1 (t), x ˜1 (t)) · · · k (˜ x1 (t), x ˜I (t))   .. .. .. K= . . .  k (˜ xI (t), x ˜1 (t)) · · · k (˜ xI (t), x ˜I (t))

    

(23) I×I

TE

Centering the batch samples φ(˜ xi (t)) in the feature space ℑ gets I

AC CE P

1X ˜ xi (t)) = φ(˜ φ(˜ xi (t)) − φ(˜ xu (t)) I u=1

and the corresponding centered kernel function is D E ˜ ˜ ˜ k (˜ xi (t), x ˜m (t)) = φ(˜ xi (t)), φ(˜ xm (t)) I I P P = k (˜ xi (t), x ˜m (t)) − I1 k (˜ xi (t), x ˜u (t)) − 1I k (˜ xv (t), x ˜m (t)) + I12

I P

u=1

v=1

(24)

(25)

k (˜ xu (t), x ˜v (t))

u,v=1

Then the kernel matrix becomes [35] ˜ = K − 1I×I K − K1I×I + 1I×I K1I×I K (26)   1 ··· 1   ..  . 1  .. . where 1I×I = I  . . .  . Furthermore, in order to carry out the   1 ··· 1 I×I variance scaling of the kernel functions, each kernel element can be divided 12

ACCEPTED MANUSCRIPT

PT

 . .   ˜ ˜ ← (I − 1) K ˜ trace K ˜ , which by the constant trace K (I − 1), i.e. K

RI

˜ xi (t)) in the feature space is divided by implies that each batch sample φ(˜ r  . ˜ trace K (I − 1). For brevity, the notations after variance scaling re-

SC

main the same. Often, the variance scaling can adjust the numerical precision and the magnitude of the SPE statistic. Analogous to Eq. (20), the kernel

NU

˜ is decomposed as matrix K

(27)

MA

˜ p = Iλp αp Kα

where ||αp ||2 = 1/(Iλp ) and λ1 ≥ λ2 ≥ · · · . Then the eigenfunction (also

D

denoted by ξp (t) ∈ ℑ) in the feature space satisfies

TE

ξp (t) =

I X

˜ xi (t)) αp,i φ(˜

(28)

i=1

AC CE P

which is a linear combination of all training samples in the feature space. Based on the eigenfunctions ξp (t), the feature space is divided into a principal component subspace and a residual subspace, and subsequently several statistics and their control charts will be designed for fault detection of batch processes.

3. MFKPCA for fault detection of batch processes 3.1. Control charts For clarity, here the raw sampling sequence of an arbitrary batch sample is denoted by ynew,j = [ynew,j,1, ..., ynew,j,Knew,j ]T and it is denoted by y ˜new,j = [˜ ynew,j,1, ..., y˜new,j,Knew,j ]T after centering and scaling, where j = 1, ..., J. The vector of the corresponding sampling times tnew,j = [tnew,j,1, ..., tnew,j,Knew,j ]T . Using the same approach to express the functions (Eq. (10)) and normalizing 13

ACCEPTED MANUSCRIPT

PT

them according to Eq. (13), this batch sample is represented as a vector˜ xnew (t)) onto the eigenfunctions ξp (t) are φ(˜

RI

valued function x ˜new (t) = [˜ xnew,1 (t), ..., x˜new,J (t)]T . Then the projections of

(29)

NU

SC

D E D E I ˜ xnew (t)), ξp (t) = P αp,i φ(˜ ˜ xnew (t)), φ(˜ ˜ xi (t)) γnew,p = φ(˜ i=1  I I P P 1 = αp,i k (˜ xnew (t), x ˜i (t)) − I k (˜ xnew (t), x ˜m (t)) m=1 i=1 # I I P P 1 1 −I k (˜ xm (t), x ˜i (t)) + I 2 k (˜ xm (t), x ˜m′ (t)) m,m′ =1

MA

m=1

If the first P principal components (PCs) are retained, the score vector γnew = [γnew,1, ..., γnew,P ]T . Like the traditional KPCA model, Hotelling’s

D

T 2 statistic is used to measure the variation of the batch sample (the fitting

TE

function) within the MFKPCA model [36, 37],

AC CE P

T T 2 = γnew Λ−1 γnew

(30)

where Λ−1 is the diagonal matrix of the inverse of the first P eigenvalues. Assuming that the score vectors follow a multivariate normal distribution, the control limit can be obtained from the F -distribution [7], Tα2 ∼

P (I 2 − 1) FP, I−P, α I(I − P )

(31)

where I is the number of the training samples, P is the number of PCs to retain, and α is the significant level. The squared prediction error (SPE) statistic is used to measure the goodness of fit of the batch sample (the fitting function) to the MFKPCA model, which is defined as ˜ xnew (t)) − φ˜P (˜ SP E = ||φ(˜ xnew (t))||2 14

(32)

ACCEPTED MANUSCRIPT

PT

P P where φ˜P (˜ xnew (t)) = γnew,pξp (t) is the reconstructed batch sample with p=1

the first P PCs in the feature space. Based on the fact that hξp (t), ξq (t)i = 1

RI

if p = q and 0 otherwise, the SPE statistic can be calculated as

SC

˜ xnew (t)) − φ˜P (˜ SP E = ||φ(˜ xnew (t))||2 D E P ˜ xnew (t)), φ(˜ ˜ xnew (t)) − P γ 2 = φ(˜ new,p p=1

NU

= k (˜ xnew (t), x ˜new (t)) − I P

2 I

k (˜ xnew (t), x ˜i (t))

i=1 P P

k (˜ xi (t), x ˜m (t)) −

MA

+ I12

I P

i,m=1

p=1

(33)

2 γnew,p

Assuming that the reconstructed errors follow a multivariate normal distri-

D

bution, its control limit can be obtained from a χ2 -distribution [7, 38],

TE

SP Eα ∼ gSP E · χ2hSP E , α

(34)

AC CE P

where gSP E = vSP E / (2 · mSP E ), hSP E = 2 · m2SP E /vSP E , mSP E and vSP E are the estimated mean and variance of the SPE statistic from the training batch samples respectively.

Different from the traditional KPCA model, in MFKPCA, there is also some variation information between the original observations y ˜new,j and the fitting functions x˜new,j (t) that could be utilized for detecting faulty batches. On one hand, if the fitting function reflects the true shape of the variable trajectory, the fitting errors correspond to the measurement noise or system noise. The increase of the measurement noise means that the system is influenced by some instable disturbances. On the other hand, the fitting model for each variable trajectory usually just relies on the training dataset whose samples are mostly in-control, so whether the fitting model is fit for the unknown faulty batches or not needs further consideration. It relies on 15

ACCEPTED MANUSCRIPT

PT

the complexity of the fitting model. In some situations, the fitting errors may contain some useful variation information about the variable trajectories for

RI

distinguishing faulty batches from normal ones, then just using the T 2 and

SC

SPE statistics may not be sufficient. Therefore, the mean squared error (MSE) statistic is used to further measure the fitting errors between the

NU

fitting functions and the original observations,

(35)

MA

Knew,j J X 1X 1 MSE = [˜ xnew,j (tnew,j,k ) − y˜new,j,k ]2 J j=1 Knew,j k=1

Here we use the mean of the fitting errors because of the varying number Knew,j of the sampling times. The control limit of MSE can also be estimated

TE

D

from a χ2 -distribution similarly [7, 38], (36)

AC CE P

MSEα ∼ gM SE · χ2hM SE , α

where gM SE = vM SE / (2 · mM SE ), hM SE = 2 · m2M SE /vM SE , mM SE and vM SE are the mean and variance of the MSE statistic estimated from the training dataset.

As a consequence, in MFKPCA, three statistics (T 2 , SPE and MSE) are used to monitor the batch processes and detect the unknown faulty batches. The experimental results below will show that these three statistics complement each other and are all useful for fault detection in general. In addition, in order to reduce the number of control charts, these statistics can be further synthesized together into one monitoring index by using the joint probability density function [39] or the linear combination [40]. Nevertheless, in this article we mainly use them separately to analyze and compare their individual performance in detail. 16

ACCEPTED MANUSCRIPT

PT

3.2. Monitoring procedures For batch process monitoring, the historical dataset is used to build the

RI

MFKPCA model and to estimate the control limits of three statistics, based

SC

on which a new batch sample is predicted as a faulty one or not. Here, we mainly focus on the problem of end-of-batch fault detection, and the

(i) Off-line modeling procedures

NU

corresponding monitoring procedures are summarized as follows.

MA

(1) Express the batch samples in the training dataset as the vectorvalued functions x ˆi (t) (Eq. (10)), i = 1, ..., I.

D

(2) Calculate the mean functions µ ˆ j (t) (Eq. (11), j = 1, ..., J) and the

TE

scaling parameters δˆj (Eq. (12), j = 1, ..., J), based on which the functions and the observations are centered and scaled by using

AC CE P

Eqs. (13) and (14) respectively.

(3) Calculate the kernel matrix K (Eq. (23)) and carry out the mean ˜ centering in the feature space to get the centered kernel matrix K

(Eq. (26)).

˜ p = Iλp αp , where ||αp ||2 = (4) Solve the eigenvalue problem Kα 1/(Iλp ) and λ1 ≥ λ2 ≥ · · · , to retain the first P principal components, and then calculate the score vector γi for each batch (Eq. (29)). (5) Calculate three statistics’ values Ti2 , SP Ei and MSEi (i = 1, ..., I) for all training batch samples (Eqs. (30), (33) and (35)), and estimate their corresponding control limits (Eqs. (31), (34) and (36)) for monitoring. 17

ACCEPTED MANUSCRIPT

PT

(ii) Fault detection procedures (1) After getting a new whole batch sample ynew,j , j = 1, ..., J, express

RI

it as a vector-valued function x ˆnew (t) (Eq. (10)) in the similar

SC

manner.

(2) Carry out the functional data centering and scaling to get the

NU

vector-valued function x ˜new (t) (Eq. (13)).

(3) Calculate the kernel function k (˜ xnew (t), x ˜i (t)) between the new

MA

batch and each training batch, and calculate the score vector γnew (Eq. (29)).

D

2 (4) Based on Eqs. (30), (33) and (35), calculate three statistics Tnew ,

TE

SP Enew and MSEnew of the new batch.

AC CE P

2 (5) Monitor whether Tnew , SP Enew or MSEnew exceeds its control

limit.

4. Simulated case studies This section uses several simulation examples to illustrate the potential applications of the MFKPCA method. Here, each batch sample is constructed as follows, yi,1,k = ai × ti,1,k + εi,1,k yi,2,k = ai × t2i,2,k + εi,2,k

(37)

yi,3,k = bi × [4 sin(ti,3,k ) + 0.5 sin(ω0 ti,3,k )] + εi,3,k where each batch has three monitoring variables, i.e. j = 1, 2, 3, whose trajectories are linear, quadratic and sine-shaped respectively. The yi,j,k ’s 18

ACCEPTED MANUSCRIPT

PT

are observations, where i = 1, ..., I, j = 1, 2, 3 and k = 1, ..., Ki,j . The sampling times ti,j,k ∈ [0, ti,max ] with sampling interval 0.01, and the ti,max ’s

RI

are randomly sampled from the uniform distribution on the interval [0.9, 1.1].

SC

The εi,j,k ’s are independently and identically distributed (i.i.d.) Gaussian noise with mean 0 and variance σε2 , i.e. εi,j,k ∼N (0, σε2 ). The coefficient

NU

vectors [ai , bi ]T and the parameter ω0 are adjustable to obtain several different cases.

MA

Since the batch length is changing, the traditional MPCA [7] and MKPCA [11] methods couldn’t be used directly without preprocessing the irregular three-way array. Here, the DTW approach developed by Kassidas et al. [15]

D

is utilized to equalize the batch length and synchronize the variable trajecto-

TE

ries. In our experiments, the width of the band global constraint is chosen as 15, and an arbitrary batch sample that has 100 sampling times in the train-

AC CE P

ing dataset is chosen as the initial referenced batch. Then twenty iterations are repeated to train the DTW model, i.e. to get the final referenced batch trajectory and the weight matrix. For the FSPCA [20] and MFKPCA methods, the B-splines are used here to construct the functions for all variables’ trajectories. The parameters are chosen via the generalized cross-validation approach, which uses a twicediscounted mean squared error measure as the parameter selection criterion [21, 34]. The corresponding results are as follows. Variable 1 uses 3 order 2 B-splines and chooses the integrated squared functions themselves as the roughness with the smoothing parameter η = 10−2 . Variable 2 uses 5 order 4 B-splines and chooses the integrated squared second derivatives as the roughness with η = 10−2 . And variable 3 uses 15 order 6 B-splines and

19

ACCEPTED MANUSCRIPT

PT

3

14 3

2.5 2

10 Variable 3

1.5 1

1

SC

1.5

RI

2 Variable 2

Variable 1

12

2.5

0.5

0.5

−0.5 0.5 Time

1

0

0.5 Time

1

6 4 2 0

0

0.5 Time

1

MA

0

NU

0 0

8

Figure 2: Original observations (gray points) and fitting functions (blue dashed lines) of

D

ten batches in case A.

chooses the integrated squared fourth derivatives as the roughness with η =

TE

10−7 . Fig. 2 shows three variables’ sampling points (gray points) of ten

AC CE P

batches and their corresponding fitting functions (blue dashed lines) in case A, and the results show that the fitting curves can fit the original trajectories well. In each case, 800 normal batches are used as the training dataset to build the models, and 200 normal batches and 200 faulty ones form the testing dataset to test their monitoring performance. 4.1. Case A: variations in variable trajectories The coefficient vectors [ai , bi ]T of both normal batches and faulty batches follow the bivariate Gaussian distribution with mean vector [2.5, 2.5]T and covariance matrix diag {2.5, 2.5}. Their measurement errors εi,j,k are all i.i.d. Gaussian noise with mean 0 and variance 0.04, i.e. σε = 0.2. The main difference between normal batches and faulty ones is in the shape of the trajectory of variable 3. In normal batches, the parameter ω0 = 10 (see 20

ACCEPTED MANUSCRIPT

100

80 T2

40

40

20

20 50

100

150

200 Samples

250

300

350

0

400

300

1

250

0.8

200 150

0.6 0.4

0

50

100

150

200 Samples

250

300

350

0.2

400

50

100

0

50

150

200 Samples

250

300

350

400

100

150

200 Samples

250

300

350

400

NU

100

0

RI

0

SPE

SPE

0

60

SC

T2

60

PT

100 Testing: normal Testing: faulty Control limit

80

(a)

(b)

T2

40

T2

20

10

50

100

150

200 Samples

250

SPE

10

5

50

100

150

200 Samples

250

AC CE P

0

350

TE

15

0

300

400

300

350

20

0

0

50

100

150

200 Samples

250

300

350

400

0

50

100

150

200 Samples

250

300

350

400

0

50

100

150

200 Samples

250

300

350

400

0.2

0.1

0

0.5 MSE

0

D

0

SPE

MA

30

0.4 0.3 0.2

400

(d)

(c)

Figure 3: Control charts with 95% confidence limits (black dashed lines) by using (a) MPCA with P = 10, (b) MKPCA with P = 10 and S = 103 , (c) FSPCA with P = 7, and (d) MFKPCA with P = 5 and S = 103 , in case A.

Eq. (37)), while ω0 = 11 in faulty batches. The observations of ten normal batches are presented as the gray points in Fig. 2, and their fitting functions are presented as the blue dashed lines. Based on the collected dataset, four methods including MPCA, MKPCA, FSPCA and MFKPCA are used to monitor the batch process respectively. For getting good monitoring performance, all these methods need to specify some parameters. MPCA and FSPCA need to select the number of PCs P 21

ACCEPTED MANUSCRIPT

PT

to retain, and MKPCA and MFKPCA need to select both the number of PCs P and the kernel parameter. Here, the radial basis kernel (Eq. (21))

RI

is mainly used due to its flexibility and hence the kernel parameter S needs

SC

to be specified. In this article, we compare the monitoring performance

NU

of four methods with different specified parameters. The number of PCs  P ∈ {2, 3, ..., 15} and the kernel parameter S ∈ 10−3 , 10−2 , ..., 103 , then

the best monitoring results of them are shown in Fig. 3. For adjusting the

MA

magnitude of the SPE statistic, each value of the kernel function is scaled by   ˜ /(I − 1) [11]. dividing it by the constant trace K

The control charts clearly show that the fault detection performance of

D

FSPCA and MFKPCA is superior to that of MPCA and MKPCA, and

TE

MFKPCA obtains the similar results with FSPCA in this case. Based on analyzing the dataset, the only difference between normal batches and faulty

AC CE P

ones relies on the shape of the trajectory of the third variable, so the random noise may flood this variation information. However, using FDA to construct the functions can reduce the bad influence of the noise to some extent. Besides, in MPCA and MKPCA, using the DTW approach to synchronize the trajectories may distort the fault patterns and reduce the difference between normal and faulty batches [41]. It is mainly because all the batch trajectories are synchronized according to the referenced batch trajectory which is trained just based on the normal batches. After synchronization, all the batches including the faulty batches will tend to have the similar trajectory shape with the referenced batch. Therefore, in this case study, the MPCA and MKPCA methods couldn’t detect out most of the faulty batches. On the contrary, the FSPCA and MFKPCA methods often do not have such

22

ACCEPTED MANUSCRIPT

60

20

Testing: normal Testing: faulty Control limit

0

0

50

100

T2

40

50

100

150

200 Samples

250

300

350

400

0.04 0.02 0

0.8 1.5 MSE

0.4 0.2 0

1 0.5 0

0

50

100

150

200 Samples

250

300

350

200 Samples

250

300

350

400

50

0

50

100

150

200 Samples

250

300

350

400

100

150

200 Samples

250

300

350

400

(b)

MA

(a)

400

0

NU

SPE

0.6

SC

0

SPE

0

150

RI

0.06

20

PT

T2

40

Figure 4: Control charts with 95% confidence limits (black dashed lines) by using (a)

D

FSPCA with P = 10, and (b) MFKPCA with P = 5 and S = 103 , in case B.

TE

distortion problem by using the FDA approach, so their control charts show much better detecting results and most of the faulty batches can be detected

AC CE P

correctly. Since there are only linear correlations, the proposed MFKPCA method presents the similar performance to FSPCA. In fact, the MFKPCA method tends to choose a large kernel width parameter S in order to mimic a linear monitoring model. Since the random noise doesn’t contain any variation information, the MSE statistic doesn’t work in this case. 4.2. Case B: variations in Gaussian noise In this case, the coefficient vectors [ai , bi ]T in all batches still follow the bivariate Gaussian distribution with mean vector [2.5, 2.5]T and covariance matrix diag {2.5, 2.5}, and the parameter ω0 = 10 for both normal batches and faulty ones. The standard deviation σε of Gaussian noise in normal batches is still equal to 0.2, however σε in faulty batches increases to 0.3 in this case. The control charts for FSPCA and MFKPCA are shown in 23

ACCEPTED MANUSCRIPT

4.5

PT

Normal batches Faulty batches 4

RI

3

SC

b

i

3.5

2.5

1.5 0.5

1

1.5

2

NU

2

2.5

3

3.5

4

4.5

5

a

i

MA

Figure 5: Distribution of the coefficient vectors [ai , bi ]T in case C.

Fig. 4. FSPCA performs not well and can’t detect most of the faults, since

D

the variation information between normal and faulty batches is filtered out

TE

after fitting the functions. However, based on the MSE statistic, MFKPCA can still accurately detect most of the faulty batches. Thus the MSE statistic

AC CE P

may complement the T 2 and SPE statistics well for fault detection. In summary, the whole information about the batch process in MFKPCA is divided into three parts: the T 2 statistic is used to measure the variation of the fitting function within the MFKPCA model, the SPE statistic is used to measure the goodness of fit of the fitting function to the MFKPCA model, and the MSE statistic is used to measure the fitting errors between the fitting function and the original observations. These three statistics complement each other in batch process monitoring. Here it is a relatively special case where the measurement noise itself contains some useful variation information.

24

ACCEPTED MANUSCRIPT

20 Testing: normal Testing: faulty Control limit T2

15

T2

40

20

5

50

100

150

200 Samples

250

300

350

0

400

300

50

1 SPE

SPE

200

0.95

100

0.9

0

50

100

150

200 Samples

250

300

350

0.85

400

100

0

50

150

200 Samples

250

300

350

400

100

150

200 Samples

250

300

350

400

NU

0

0

1.05

RI

0

SC

0

10

PT

60

(a)

(b)

15

T2

10

10 5

0

50

100

150

200 Samples

250

2

1 0.5

0

50

100

150

200 Samples

250

AC CE P

0

350

TE

SPE

1.5

300

400

D

0

300

350

5 0

0

50

100

150

200 Samples

250

300

350

400

0

50

100

150

200 Samples

250

300

350

400

0

50

100

150

200 Samples

250

300

350

400

1.5 1 0.5 0

0.2 MSE

T2

15

SPE

MA

20

0.1

0 400

(d)

(c)

Figure 6: Control charts with 95% confidence limits (black dashed lines) by using (a) MPCA with P = 10, (b) MKPCA with P = 10 and S = 10, (c) FSPCA with P = 7, and (d) MFKPCA with P = 5 and S = 1, in case C.

4.3. Case C: nonlinear correlations In all batches, the parameter ω0 is specified as 10, and the standard deviation σε of Gaussian noise is specified as 0.2. However, different from cases A and B, the coefficient vectors [ai , bi ]T are distributed as Fig. 5, where the blue points represent 1000 normal batches and the red asterisks represent 200 faulty batches. It can be seen that the coefficient vectors of normal batches are nonlinearly distributed, which leads to the nonlinear correlations be25

ACCEPTED MANUSCRIPT

PT

tween some monitoring variables. Fig. 6 shows the control charts of MPCA, MKPCA, FSPCA and MFKPCA. The nonlinear correlation makes the non-

RI

linear monitoring methods achieve better monitoring results than the linear

SC

methods, and the testing accuracies of MKPCA and MFKPCA are much better than those of MPCA and FSPCA respectively. Overall, the proposed

NU

MFKPCA method achieves the highest testing accuracy of about 90%, which is much higher than the MKPCA method whose testing accuracy is about

MA

70%. This phenomenon is similar to case A that the random noise and the DTW approach may influence and distort the difference in variable trajectories between normal and faulty batches. Therefore, the MFKPCA method

TE

D

based on FDA is more suitable to describe the batch dataset. 5. Industrial application case study

AC CE P

This section uses an industrial case study to further verify the actual performance of MFKPCA in batch process monitoring. The dataset is collected from a semiconductor etch process performed on the Lam 9600 plasma etch tool, which is to etch the TiN/Al-0.5% Cu/TiN/oxide stack with an inductively coupled BCl3 /Cl2 plasma [37]. It mainly includes two important steps: main etch of the Al layer terminating at the Al endpoint (step 4) and overetch for the underlying TiN and oxide layer (step 5), and the whole dataset contains three kinds of monitoring signals: machine state variables (MSV), optical emission spectroscopy (OES) and radio-frequency monitors (RFM) [37]. Here, 19 MSVs listed in Table 1 are used to do the experiments, and the whole dataset contains 107 normal batches and 20 faulty batches (not include the batches with a lot of missing values). In our experiments, 87 nor26

ACCEPTED MANUSCRIPT

PT

Table 1: Machine state variables in the semiconductor etch process [37]. Variable

No.

Variable

1

BCl3 flow

11

RF power

2

Cl2 flow

12

RF impedance

3

RF bottom power

13

TCP tuner

4

RFB reflected power

14

TCP phase error

5

Endpoint A detection

15

6

Helium pressure

16

7

Chamber pressure

8

RF tuner

9

RF load

10

Phase error

SC

TCP impedance TCP top power

TCP reflected power

NU

17

RI

No.

TCP load

19

Vat valve

MA

18

mal batches are randomly chosen as the training dataset, and the remaining

D

20 normal batches and 20 faulty batches form the testing dataset.

TE

We construct the functions of the variable trajectories by using the parameters presented in Table 2. The trajectories of variables 5 (Endpoint A detec-

AC CE P

tion), 9 (RF load), 13 (TCP tuner) and 18 (TCP load) show obviously fluctuant features and functional nature, and hence use more B-splines than other variables. Besides, these four variables all use the integrated squared second derivatives as the roughness. The original sampling points (gray points) and their fitting functions (blue dashed lines) of the normal batches are shown in Fig. 7. For convenience, the domain is defined as T = [0, 100] here. Other variables use the constant basis to fit their trajectories. Based on the fitting functions, FSPCA and MFKPCA are respectively used to build the monitoring models on the training dataset, and their monitoring performance is tested on the testing dataset. For the traditional MPCA and MKPCA methods, the DTW approach is also used to synchronize the batch trajectories. Here, the width of band global constraint is chosen as 15, and one batch that 27

ACCEPTED MANUSCRIPT

PT

Table 2: Parameter selection in functional data representation for each variable. Variable

Order

Interior knots

5

Endpoint A detection

4

[6,6,6,9,9,9,20,30,40,40,40,48,50,58,61,68,68,68,80,90,99]

9

RF load

4

[25,50,55,60,65,70,85]

13

TCP tuner

4

[6,15,25,42,42,42,50,55,63,63,63,68,80,90,99]

10−3

18

TCP load

4

[6,6,6,15,25,42,42,42,50,55,61,61,61,68,80,90,99]

10−4

SC

Other variables

RI

No.

The constant basis

λ 10−2 1

NU

has 100 sampling times in the training dataset is chosen as the referenced batch. Then the DTW model is built based on 20 iterations. Similar to the

MA

above simulations, we also use different parameters to build different models,

D

i.e. the number of PCs P ∈ {5, 10, 15, 20, 25, 30} and the kernel parameter  S ∈ 10−3 , 10−2 , ..., 103 , and then compare their best monitoring results.

TE

The control charts in a single experiment are shown in Fig. 8 (note their different vertical scales), where the blue points represent the normal batches

AC CE P

in the testing dataset and the red asterisks represent the faulty ones. The results show that the MFKPCA method gets the best monitoring result whose detecting accuracy of the whole testing dataset is 87.5%, while the highest accuracy among all other methods is just 82.5%. It’s worth noting that, in Fig. 8(d), the MSE value of fault 12 is much larger than others. This corresponds to a particular situation where the models of the fitting functions may not fit the unknown faulty batches. Consider the contribution to the MSE value for each variable in the whole batch, CM SE,new,j =

1 Knew,j

Knew,j

X

[˜ xnew,j (tnew,j,k ) − y˜new,j,k]2

(38)

k=1

where j = 1, 2, ..., 19. In the corresponding contribution plot (Fig. 9(a)), the contribution for variable 1 (BCl3 flow) dominates the whole MSE value. This is connected with the trajectories of variable 1 that are shown in Fig. 9(b). 28

9200

3000

9150

2500

9100

9000

1000

8950

500

8900

0

0

20

40

60

80

4

2.1

0

20

40

60

80

100

60

80

100

Time

4

x 10

2.9

x 10

2.85

TCP load

MA

2.05 2 1.95

20

TE

0

40

60

80

2.8

2.75 2.7

D

1.9 1.85

8850

100

NU

Time

TCP tuner

9050

RI

1500

SC

2000

PT

3500

RF load

Endpoint A detection

ACCEPTED MANUSCRIPT

100

Time

2.65

0

20

40 Time

AC CE P

Figure 7: Original observations (gray points) and fitting functions (blue dashed lines) of variables 5 (Endpoint A detection), 9 (RF load), 13 (TCP tuner) and 18 (TCP load).

The trajectories of normal batches are almost constant and hence we use the constant basis to construct their fitting functions. This is reasonable based on the prior knowledge since we just have the information about the training dataset before we collect any other different batches. However, in fault 12, the trajectory of variable 1 (the red dashed line in Fig. 9(b)) varies a lot in the batch duration and its sampling value changes from about 725 (in step 4) to about 775 (in step 5). When the constant basis is used, the fitting error for variable 1 is very large, so is the MSE value. Therefore, using the MSE statistic here can easily detect this fault. This also illustrates that in MFKPCA introducing the MSE statistic for batch process monitor29

ACCEPTED MANUSCRIPT

20 Testing: normal Testing: faulty Control limit

15 T2

40

10

0

5

0

5

10

15

20 Samples

25

30

35

0

40

6

5

10

5

2

SPE

10

SPE

10

4

0

10

10

3

−2

0

5

10

15

20 Samples

25

30

35

10

40

10

0

5

15

20 Samples

25

30

35

40

10

15

20 Samples

25

30

35

40

NU

10

0

4

10

RI

20

SC

T2

60

PT

80

(a)

(b)

T2

400

T2

40

20

5

10

15

20 Samples

25

3

2

10

1

5

10

15

20 Samples

25

AC CE P

0

35

TE

SPE

10

10

30

40

30

35

200

0

0

5

10

15

20 Samples

25

30

35

40

0

5

10

15

20 Samples

25

30

35

40

30

35

40

2

10

0

10

−2

10

5

10

Fault 12

MSE

0

D

0

SPE

MA

60

0

10 40

0

5

10

15

20 Samples

25

(d)

(c)

Figure 8: Control charts with 95% confidence limits (black dashed lines) by using (a) MPCA with P = 5, (b) MKPCA with P = 5 and S = 103 , (c) FSPCA with P = 10, and (d) MFKPCA with P = 15 and S = 103 , in the semiconductor etch process.

ing is reasonable, since the fitting errors may contain some useful variation information in variable trajectories. Considering the randomness in a single experiment, we repeat the above experiment 100 times and the average detecting results on the testing dataset including the average accuracy, false alarm rate and missing alarm rate are listed in Table 3. For each monitoring method, the results of using one single statistic (such as T 2 , SPE or MSE) and the results of using all statistics 30

ACCEPTED MANUSCRIPT

5

PT

x 10

RI

3

2

1

0

0

5

10 Variable No.

770

20

760

MA

BCl3 flow

15

NU

(a)

780

750 740

TE

0

D

730 720

SC

Contribution to MSE

4

20

40

60

80

100

Time

(b)

AC CE P

Figure 9: Illustration of fault 12: (a) the contribution to the MSE value of fault 12 for each variable, and (b) the trajectories of variable 1, where the blue lines represent the normal batches and the red dashed line represents fault 12.

simultaneously (denoted by “Combined” in Tables 3 and 4, which means that if any statistic exceeds its corresponding control limit, the testing batch is labeled as a fault) are all listed respectively. The average testing accuracy of MFKPCA is about 81.8%, which is higher than all the other methods and is about 3% higher than the suboptimal FSPCA method. Compared with the traditional MPCA and MKPCA methods, MFKPCA has a similar missing alarm rate, but has a lower false alarm rate, i.e. less normal batches are falsely predicted as faults in MFKPCA. On the contrary, the false alarm rate of MFKPCA is quite similar to that of FSPCA (their difference is less than

31

ACCEPTED MANUSCRIPT

the semiconductor etch process.

PT

Table 3: Average detecting results on the testing dataset by using different methods in Statistic

Accuracy

False alarm rate

Missing alarm rate

MPCA

T2

0.627

0.000

0.747

SPE

0.769

0.458

Combined

0.769

0.458

T2

0.532

0.000

MFKPCA

SC

0.004 0.934

SPE

0.742

0.513

0.004

Combined

0.742

0.513

0.004

T2

0.550

0.000

0.900

SPE

0.788

0.353

0.072

Combined

0.788

0.353

0.072

0.629

0.045

0.699

0.828

0.340

0.005

NU

FSPCA

0.004

MSE

MA

MKPCA

RI

Method

0.581

0.069

0.770

Combined

0.818

0.360

0.005

T2

TE

D

SPE

1%), while MFKPCA’s missing alarm rate is about 6% lower than FSPCA.

AC CE P

It’s worth noting that the “Combined” index is often more sensitive to the faults than the individual statistics. However, accordingly it also becomes easier to mislabel some normal batches as faults. For example, in Table 3, the “Combined” index in MFKPCA has a larger false alarm rate than the SPE statistic. This causes the entire testing accuracy of “Combined” (81.8%) be less than that of SPE (82.8%). Table 4 further presents the monitoring results of FSPCA and MFKPCA on the 20 faulty batches, and the best result for each fault by using the “Combined” index is marked in bold. The results show that using the proposed MFKPCA method could indeed improve the fault detection performance. In addition, as Fig. 9 shows, the MSE statistic in MFKPCA is still very sensitive to fault 12 and can always detect it out correctly. In conclusion, the proposed MFKPCA-based monitoring method

32

ACCEPTED MANUSCRIPT

conductor etch process.

MFKPCA

Fault

T2

SPE

Combined

T2

1

0

1.000

1.000

0.020

2

0

0.970

0.970

3

0

1.000

1.000

SC

FSPCA

PT

Table 4: Monitoring results of the 20 faults by using FSPCA and MFKPCA in the semi-

MSE

Combined

0

1.000

RI

SPE

1.000

0

1.000

0

1.000

0

1.000

0

1.000

1.000

1.000

1.000

1.000

1.000

1.000

1.000

0

1.000

1.000

0

1.000

0

1.000

6

0

0.950

0.950

7

0

1.000

1.000

8

0

0.720

0.720

0.950 1.000

11

0

0.830

12

0

0.150

13

0

1.000

14

0

15

0

16

0

17

MA

0 1.000

0

1.000

0

1.000

1.000

1.000

1.000

1.000

0

1.000

0

1.000

0.950

0

0.900

0.010

0.9000

1.000

1.000

1.000

1.000

1.000

0.830

0

1.000

0

1.000

0.150

0

1.000

1.000

1.000

D

9 10

NU

4 5

1.000

1.000

0

1.000

1.000

0

1.000

0

1.000

1.000

1.000

0

1.000

0

1.000

1.000

1.000

1.000

1.000

0

1.000

0

1.000

1.000

0

1.000

0

1.000

18

0

1.000

1.000

1.000

1.000

0

1.000

19

0

1.000

1.000

0.010

1.000

0

1.000

20

0

1.000

1.000

0

1.000

0.600

1.000

Average

0.100

0.929

0.929

0.302

0.995

0.231

0.995

AC CE P

TE

1.000

1.000

is more reliable for batch process monitoring. 6. Conclusions In the present article, the multivariate functional kernel principal component analysis (MFKPCA) method has been proposed for batch process monitoring by combining the functional data analysis (FDA) approach with the nonlinear principal component analysis approach. Different from the tra-

33

ACCEPTED MANUSCRIPT

PT

ditional batch process monitoring methods such as PARAFAC, Tucker models and multiway monitoring methods, MFKPCA treats each batch sample

RI

as a J-dimensional vector-valued function (or multivariate functional data),

SC

and describes the whole three-way data array (may be regular or irregular) as an I × J-dimensional funtion matrix based on FDA. Generally speaking,

NU

using the FDA approach can highlight the subtle differences in the variables’ trajectories between normal batches and faulty ones, and can easily handle

MA

the problems of unequal batch length and missing data. The kernel trick is introduced to handle the nonlinear correlations between different monitoring variables and/or sampling times in the batch process. Since the fitting errors

D

may contain some useful variation information, three complementary statis-

TE

tics, including Hotelling’s T 2 , SPE and MSE, have been used to construct the control charts for batch process monitoring. Some simulation examples

AC CE P

and an industrial application case study have been used to demonstrate the performance of the proposed MFKPCA method, and the results show that it can indeed improve the monitoring performance of the semiconductor etch process.

Acknowledgements

This work was supported by the National Key Technologies Research and Development Program of China (No.2011AA060203) and the No.2 Important National Science and Technology Specific Projects of China (No.2011ZX02504).

34

ACCEPTED MANUSCRIPT

PT

References [1] Z. Q. Ge, Z. H. Song, F. R. Gao, Review of recent research on data-based

RI

process monitoring, Ind. Eng. Chem. Res. 52 (2013) 3543–3562.

SC

[2] A. K. Smilde, D. A. Doornbos, Three-way methods for the calibration

J. Chemom. 5 (1991) 345–360.

NU

of chromatographic systems: comparing PARAFAC and three-way PLS,

Syst. 38 (1997) 149–171.

MA

[3] R. Bro, PARAFAC. Tutorial and applications, Chemom. Intell. Lab.

D

[4] J. A. Westerhuis, T. Kourti, J. F. MacGregor, Comparing alternative

TE

approaches for multivariate statistical analysis of batch process data, J. Chemom. 13 (1999) 397–413.

AC CE P

[5] D. J. Louwerse, A. K. Smilde, Multivariate statistical process control of batch processes based on three-way models, Chem. Eng. Sci. 55 (2000) 1225–1235.

¨ [6] S. Wold, P. Geladi, K. Esbensen, J. Ohman,

Multi-way principal

components- and PLS-analysis, J. Chemom. 1 (1987) 41–56. [7] P. Nomikos, J. F. MacGregor, Monitoring batch processes using multiway principal component analysis, AIChE J. 40 (1994) 1361–1375. [8] P. Nomikos, J. F. MacGregor, Multivariate SPC charts for monitoring batch processes, Technometrics 37 (1995) 41–59. [9] P. Nomikos, J. F. MacGregor, Multi-way partial least squares in monitoring batch processes, Chemom. Intell. Lab. Syst. 30 (1995) 97–108. 35

ACCEPTED MANUSCRIPT

PT

[10] C. K. Yoo, J. M. Lee, P. A. Vanrolleghem, I. B. Lee, On-line monitoring of batch processes using multiway independent component analysis,

RI

Chemom. Intell. Lab. Syst. 71 (2004) 151–163.

SC

[11] J. M. Lee, C. Yoo, I. B. Lee, Fault detection of batch processes using multiway kernel principal component analysis, Comput. Chem. Eng. 28

NU

(2004) 1837–1847.

[12] Y. W. Zhang, S. J. Qin, Fault detection of nonlinear processes using

MA

multiway kernel independent component analysis, Ind. Eng. Chem. Res. 46 (2007) 7780–7787.

D

[13] X. M. Tian, X. L. Zhang, X. G. Deng, S. Chen, Multiway kernel inde-

TE

pendent component analysis based on feature samples for batch process

AC CE P

monitoring, Neurocomputing 72 (2009) 1584–1596. [14] C. H. Zhao, F. R. Gao, F. L. Wang, Nonlinear batch process monitoring using phase-based kernel-independent component analysis-principal component analysis (KICA-PCA), Ind. Eng. Chem. Res. 48 (2009) 9163– 9174.

[15] A. Kassidas, J. F. MacGregor, P. A. Taylor, Synchronization of batch trajectories using dynamic time warping, AIChE J. 44 (1998) 864–875. [16] J. M. Gonzal´ez-Mart´ınez, R. Vitale, O. E. de Noord, A. Ferrer, Effect of synchronization on bilinear batch process modeling, Ind. Eng. Chem. Res. 53 (2014) 4339–4351. ¨ [17] C. Undey, A. Cinar, Statistical monitoring of multistage, multiphase batch processes, IEEE Contr. Syst. Mag. 22 (2002) 40–52. 36

ACCEPTED MANUSCRIPT

PT

[18] H. Sakoe, S. Chiba, Dynamic programming algorithm optimization for spoken word recognition, IEEE Trans. Acoust. Speech Signal Process.

RI

26 (1978) 43–49.

SC

[19] J. M. Gonzal´ez-Mart´ınez, A. Ferrer, J. A. Westerhuis, Real-time synchronization of batch trajectories for on-line multivariate statistical pro-

NU

cess control using dynamic time warping, Chemom. Intell. Lab. Syst. 105 (2011) 195–206.

MA

[20] J. H. Chen, J. L. Liu, Derivation of function space analysis based PCA control charts for batch process monitoring, Chem. Eng. Sci. 56 (2001)

D

3289–3304.

TE

[21] J. O. Ramsay, B. W. Silverman, Functional Data Analysis, 2nd ed.,

AC CE P

Springer, New York, 2005. [22] F. Ferraty, P. Vieu, Nonparametric Functional Data Analysis: Theory and Practice, Springer, New York, 2006. [23] F. Yao, H. G. M¨ uller, J. L. Wang, Functional data analysis for sparse longitudinal data, J. Am. Stat. Assoc. 100 (2005) 577–590. [24] A. Cuevas, A partial overview of the theory of statistics with functional data, J. Stat. Plan. Infer. 147 (2014) 1–23. [25] G. Biau, F. Bunea, M. H. Wegkamp, Functional classification in Hilbert spaces, IEEE Trans. Inf. Theory 51 (2005) 2163–2172. [26] F. Rossi, N. Villa, Support vector machine for functional data classification, Neurocomputing 69 (2006) 730–742. 37

ACCEPTED MANUSCRIPT

PT

[27] A. Delaigle, P. Hall, Achieving near perfect classification for functional data, J. R. Stat. Soc. Ser. B: Stat. Methodol. 74 (2012) 267–286.

RI

[28] G. M. James, C. A. Sugar, Clustering for sparsely sampled functional

SC

data, J. Am. Stat. Assoc. 98 (2003) 397–408.

NU

[29] J. Jacques, C. Preda, Model-based clustering for multivariate functional data, Comput. Stat. Data Anal. 71 (2014) 92–106.

MA

[30] J. M. Torres, P. J. G. Nieto, L. Alejano, A. N. Reyes, Detection of outliers in gas emissions from urban areas using functional data analysis,

D

J. Hazard. Mater. 186 (2011) 144–149.

TE

[31] G. Yu, C. L. Zou, Z. J. Wang, Outlier detection in functional observations with applications to profile monitoring, Technometrics 54 (2012)

AC CE P

308–318.

[32] D. A. Clifton, L. Clifton, S. Hugueny, D. Wong, L. Tarassenko, An extreme function theory for novelty detection, IEEE J. Sel. Top. Signal Process. 7 (2013) 28–37. [33] J. O. Ramsay, C. J. Dalzell, Some tools for functional data analysis, J. R. Stat. Soc. Ser. B: Methodol. 53 (1991) 539–572. [34] G. H. Golub, M. Health, G. Wahba, Generalized cross-validation as a method for choosing a good ridge parameter, Technometrics 21 (1979) 215–223. [35] B. Sch¨olkopf, A. Smola, K. R. M¨ uller, Nonlinear component analysis as a kernel eigenvalue problem, Neural Comput. 10 (1998) 1299–1319. 38

ACCEPTED MANUSCRIPT

PT

[36] H. Hotelling, Analysis of a complex of statistical variables into principal components, J. Educ. Psychol. 24 (1933) 417–441.

RI

[37] B. M. Wise, N. B. Gallagher, S. W. Butler, D. D. White, G. G. Barna,

SC

A comparison of principal component analysis, multiway principal component analysis, trilinear decomposition and parallel factor analysis for

NU

fault detection in a semiconductor etch process, J. Chemom. 13 (1999) 379–396.

MA

[38] G. E. P. Box, Some theorems on quadratic forms applied in the study of analysis of variance problems, I. Effect of inequality of variance in the

D

one-way classification, Ann. Math. Stat. 25 (1954) 290–302.

TE

[39] Q. Chen, U. Kruger, M. Meronk, A. Y. T. Leung, Synthesis of T 2 and Q

AC CE P

statistics for process monitoring, Contr. Eng. Pract. 12 (2004) 745–755. [40] H. H. Yue, S. J. Qin, Reconstruction-based fault identification using a combined index, Ind. Eng. Chem. Res. 40 (2001) 4403–4414. [41] Y. Yao, F. R. Gao, A survey on multistage/multiphase statistical modeling methods for batch processes, Annu. Rev. Contr. 33 (2009) 172–183.

39

ACCEPTED MANUSCRIPT Figures Captions

PT

Figure 1: Illustration of functional data representation of batch process data: (a) irregular three-way array, and (b) the corresponding function matrix.

RI

Figure 2: Original observations (gray points) and fitting functions (blue dashed lines) of ten

SC

batches in case A.

Figure 3: Control charts with 95% confidence limits (black dashed lines) by using (a) MPCA with

NU

P = 10 , (b) MKPCA with P = 10 and S = 103 , (c) FSPCA with P = 7 , and (d) 3

MFKPCA with P = 5 and S = 10 , in case A.

MA

Figure 4: Control charts with 95% confidence limits (black dashed lines) by using (a) FSPCA 3

with P = 10 , and (b) MFKPCA with P = 5 and S = 10 , in case B. T

[ ai , bi ]

D

Figure 5: Distribution of the coefficient vectors

in case C.

TE

Figure 6: Control charts with 95% confidence limits (black dashed lines) by using (a) MPCA with

AC CE P

P = 10 , (b) MKPCA with P = 10 and S = 10 , (c) FSPCA with P = 7 , and (d) MFKPCA with P = 5 and S = 1 , in case C.

Figure 7: Original observations (gray points) and fitting functions (blue dashed lines) of variables 5 (Endpoint A detection), 9 (RF load), 13 (TCP tuner) and 18 (TCP load).

Figure 8: Control charts with 95% confidence limits (black dashed lines) by using (a) MPCA with

P = 5 , (b) MKPCA with P = 5 and S = 103 , (c) FSPCA with P = 10 , and (d) 3

MFKPCA with P = 15 and S = 10 , in the semiconductor etch process. Figure 9: Illustration of fault 12: (a) the contribution to the MSE value of fault 12 for each variable, and (b) the trajectories of variable 1, where the blue lines represent the normal batches and the red dashed line represents fault 12.

ACCEPTED MANUSCRIPT Highlights

TE

D

MA

NU

SC

RI

PT

Traditional KPCA method is combined with the approach of functional data analysis. The proposed MFKPCA method is used for batch process monitoring. Three complementary statistics (T2, SPE and MSE) are used for fault detection.

AC CE P

l l l