Correlation-based spectral clustering for flexible process monitoring

Correlation-based spectral clustering for flexible process monitoring

Journal of Process Control 21 (2011) 1438–1448 Contents lists available at ScienceDirect Journal of Process Control journal homepage: www.elsevier.c...

965KB Sizes 1 Downloads 68 Views

Journal of Process Control 21 (2011) 1438–1448

Contents lists available at ScienceDirect

Journal of Process Control journal homepage: www.elsevier.com/locate/jprocont

Correlation-based spectral clustering for flexible process monitoring Koichi Fujiwara ∗ , Manabu Kano, Shinji Hasebe Department of Chemical Engineering, Kyoto University, Nishikyo-ku, Kyoto 615-8510, Japan

a r t i c l e

i n f o

Article history: Received 24 December 2010 Received in revised form 22 June 2011 Accepted 22 June 2011 Available online 30 August 2011 Keywords: Soft-sensor Statistical process monitoring Spectral clustering Batch process Nearest correlation method Fault detection Estimation

a b s t r a c t The individuality of production devices should be taken into account when statistical models are designed for parallelized devices. In the present work, a new clustering method, referred to as NC-spectral clustering, is proposed for discriminating the individuality of production devices. The key idea is to classify samples according to the differences of the correlation among measured variables, since the individuality of production devices is expressed by the correlation. In the proposed NC-spectral clustering, the nearest correlation (NC) method and spectral clustering are integrated. The NC method generates the weighted graph that expresses the correlation-based similarities between samples, and the constructed graph is partitioned by spectral clustering. A new statistical process monitoring method and a new softsensor design method are proposed on the basis of NC-spectral clustering. The usefulness of the proposed methods is demonstrated through a numerical example and a case study of parallelized batch processes.

1. Introduction In order to operate manufacturing processes efficiently, it is crucial to estimate product quality from operating conditions, and to detect faults or malfunctions for preventing undesirable operation. The first function is realized by developing a soft-sensor, which is a mathematical model that represents relationship between operating conditions and product quality. The second function is realized by a process monitoring system. Although the most desirable approach to realize the above functions will be the use of precise first-principle models, modeling a complex industrial process is very difficult and time-consuming. Another option to construct process models is the use of operation data; a statistical model approach. In the last two decades, such statistical models have been widely accepted for process control and monitoring. In the steel industry, for example, many companies have built integrated databases to store huge amount of operation data and product property data obtained from all factories. These data are used to construct statistical models for quality prediction and quality design [1,2]. Soft-sensors have been widely used in various processes. Partial least squares (PLS) and artificial neural network (ANN) have been accepted as useful techniques for soft-sensor design [3–7]. More recently, the application of subspace identification (SSID) to

∗ Corresponding author. Tel.: +81 774 93 5350; fax: +81 774 93 5155. E-mail address: [email protected] (K. Fujiwara). 0959-1524/$ – see front matter © 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.jprocont.2011.06.023

© 2011 Elsevier Ltd. All rights reserved.

soft-sensor design has been reported [8,9]. In the Japanese chemical industry, more than 400 soft-sensors have been implemented; 75% of them are used in distillation processes and 25% in reaction processes including polymerization. Multiple linear regression is dominant, PLS is the second and ANN is very few [10]. Multivariate statistical process control (MSPC) has been used for process monitoring. The conventional MSPC uses the T2 and Q statistics derived through principal component analysis (PCA) [11–13]. Another method based on independent component analysis (ICA) has been investigated to improve fault detection performance [14]. Even if a good statistical model can be developed successfully, its performance deteriorates as process characteristics change. In chemical processes, process characteristics are changed by catalyst deactivation, scale adhesion and so on. In semiconductor manufacturing processes, periodic cleaning of equipment changes the process characteristics drastically. Therefore, maintenance of statistical models is very important to keep their performances. Although the models should be updated as process characteristics change, repeated and manual construction of them should be avoided due to heavy workload [10]. To construct soft-sensors that can cope with changes in process characteristics and to update statistical models automatically, recursive PLS [15] and Just-In-Time (JIT) modeling [16–18] have been proposed. Recursive PLS updates a PLS model recursively, and JIT modeling builds a local model from neighbor samples around the query only when an estimate is required. In addition, several JIT-based process monitoring methods have been proposed [19,20].

K. Fujiwara et al. / Journal of Process Control 21 (2011) 1438–1448

A new JIT modeling method, referred to as Correlation-based JIT (CoJIT) modeling, has been proposed [21]. Since changes in process characteristics are expressed as differences of the correlation among variables, CoJIT modeling builds a local model from samples whose correlation can properly describe the query. In addition, the individuality of production devices should be taken into account. In semiconductor processes, for example, tens of parallelized production devices are used, and they have different characteristics even if their catalog specifications are the same. In such a case, a statistical model developed for one device is not always applicable to another device, and it is very laborious to construct a statistical model for each device. Therefore, a practical modeling method that can cope with the individuality of production devices should be developed. If there are some devices whose characteristics are similar to each other, the same statistical model may be applicable to them. That is, to cope with the individuality of production devices and to reduce the number of statistical models, it is useful to cluster operation data of parallelized processes into a few classes according to their characteristics and to construct a statistical model for each class. A well-known clustering method is the k-means method. Although it can cluster samples on the basis of the distance, it does not take into account the correlation among variables. In order to cope with nonlinearity, it can be integrated with kernel PCA (KPCA) [22]. In the KPCA-based k-means method, KPCA is used to capture nonlinearity among variables and its scores are clustered by the k-means method. In addition, manifold learning methods, such as Isomap and locally linear embedding (LLE), have been proposed for finding and visualizing low-dimensional nonlinear structures in highdimensional data [23,24]. However, these methods do not always give clear boundaries between clusters. On the other hand, a method that divides a process operation region into small multiple polyhedral regions and builds a local model in each small region has been proposed. In such a method, a process is expressed by a combination of local models. A piecewise affine (PWA) model is an example of such a model [25]. However, it is difficult for the PWA model to express the correlation among variables appropriately since a process operation region is divided as polyhedrons. A new method for clustering samples according to their correlation has to be developed. Since the correlation among variables can be expressed as affine subspaces, samples that are on the same affine subspace should be classified to the same class. To detect such samples, the correlation among possible triple samples can be evaluated. The correlation coefficient between samples xj and xk when xi is the origin is written as {i}

Cj,k =

(xj − xi )T (xk − xi ) ||xj − xi ||||xk − xi ||

(1)

{i} Cj,k

is close to ±1, it is highly possible that samples xi , xj When and xk are on the same affine subspace. Therefore, the objective function of the clustering method is as follows: max J = {Kl }

P 



l

i, j, k ∈ Kl i= / j= / k

{i}

|Cj,k |

(2)

where Kl (l = 1, . . ., P) is the index of classes. Since the maximization problem of Eq. (2) is hard to solve, a heuristic clustering algorithm is proposed in the present work. The proposed method, referred to as NC-spectral clustering, consists of the nearest correlation (NC) method that can detect samples

1439

whose correlation is similar to the query [26] and spectral clustering that can partition a weighted graph [27,28]. In addition, a new statistical process monitoring method and a new soft-sensor design method based on NC-spectral clustering are developed. The proposed methods are validated through a case study of parallelized batch processes. 2. Spectral clustering Spectral clustering [27,28] is a clustering method based on the graph theory. It can partition a weighted graph, whose weights express affinities between nodes, into subgraphs through cutting some of their arcs. Although some spectral clustering algorithms have been proposed, the Max–Min Cut (Mcut) method [27] is described in this section. Given a weighted graph G and its affinity matrix (adjacency matrix) W, G is partitioned into two subgraphs A and B. The affinity between A and B is defined as cut(A, B) ≡ W (A, B)

(3)

W (A, B) =

(4)



Wu,v

u∈A,v∈B

W (A) ≡ W (A, A)

(5)

where u and v denote nodes of subgraphs A and B, respectively. That is, the affinity between subgraphs cut(A, B) is the sum of the weights of the arcs between subgraphs. The Mcut method searches subgraphs A and B that can minimize cut(A, B) and maximize W(A) and W(B), simultaneously. The objective function of the Mcut method is as follows: min J =

cut(A, B) cut(A, B) + W (A) W (B)

(6)

Since indexes of nodes are interchangeable, the affinity matrix W can be defined as follows:



W=

WA W B,A

W A,B WB



(7)

where WA and WB are the affinity matrices within the subgraphs A and B, respectively, and WA,B and WB,A are the affinity matrices between A and B. The affinity W is described by using vectors x = [1, . . ., 1, 0, . . ., 0]T and y = [0, . . ., 0, 1, . . ., 1]T that express partition into subgraphs. W (A, B) = xT (D − W )x = y T (D − W )y T

W (A) = x W x

(9)

T

W (B) = y W y where D = diag(We) and e = [1, . . ., (6) can be described as xT (D − W )x y T (D − W )y + xT W x yT W y

min J = x,y

(8)

(10) 1]T .

Using these equations, Eq.

(11)

Only the first term of Eq. (11) needs to be analyzed. The minimization problem of Eq. (11) is hard to solve, since the optimizing variables x and y are binary variables. Therefore, these binary variables should be relaxed. The new index vector q = {a, − b} (a, b > 0) is introduced:



qu =

a (u ∈ A) −b (u ∈ B)

(12)

where qu is the uth element of q. Finally, the problem can be written as min Jq = q

qT (D − W )q qT W q

(13)

1440

K. Fujiwara et al. / Journal of Process Control 21 (2011) 1438–1448

x7

x4 x3

x2

x7

x5

x3

x1 x2

x6

P

x4

x1 x6

V Original space

x5

Translated space

Fig. 1. An example of the procedure of the NC method.

This minimization problem results in the eigenvalue problem. (I − D−1/2 W D−1/2 )z = z

(14)

The solution q* of Eq. (13) is expressed as q* = D−1/2 z2 , where z2 is the eigenvector corresponding to the second smallest eigenvalue 2 . The detailed analysis of the Mcut method is described in Appendix A. Through the above procedure, the Mcut method does not need parameter tuning and its calculation is fast. Since the Mcut method partitions a weighted graph into two subgraphs, the procedure described above is repeated when three or more subgraphs are needed. In spectral clustering, the definition of an affinity is arbitrary and affects results. Ng et al. defined the affinity between samples si and sj by using the Gaussian kernel [28].



(W )i,j = exp

−d2 (si , sj )



2 2

(15)

where d(· , ·) is a distance function and  is a tuning parameter. 3. NC-spectral clustering

this case, x2 − x5 and x3 − x4 satisfy such a relationship. The correlation coefficients of these pairs must be 1 or −1. On the other hand, x6 and x7 that are not the elements of V cannot make such pairs. The pairs whose correlation coefficients are ±1 are thought to have the same correlation as x1 . In practice, the threshold of the correlation coefficient  (0 <  ≤ 1) has to be used, since there are no pairs whose correlation coefficient is strictly ±1. Therefore, the pairs should be selected when the absolute values of their correlation coefficients are larger than . The pairs whose correlation is similar to the query can be detected through the above procedure. 3.2. NC-spectral clustering The correlation-based affinity matrix for spectral clustering can be constructed by using the NC method. Assume that samples xn ∈ RM (n = 1, 2, . . . , N) are stored in the database. The procedure of the proposed NC-spectral clustering is as follows: Step 1. Step 2. Step 3. Step 4.

In the present work, a new clustering method based on the correlation among variables is proposed. In the proposed method, instead of solving Eq. (2), the correlation-based affinities between samples are calculated by using the nearest correlation (NC) method to construct a weighted graph, and the constructed weighted graph is partitioned by spectral clustering. This method is referred to as NC-spectral clustering.

Step 5. Step 6. Step 7. Step 8.

3.1. Nearest correlation method

Step 9.

The NC method can detect samples whose correlation is similar to the query without any teacher signals [26]. The concept of the NC method is as follows. Suppose that the affine subspace P in Fig. 1 (left) shows the correlation among variables and all the samples on P have the same correlation. Although x1 , x2 , . . ., x5 have the same correlation, x6 and x7 have a different correlation from the others. The NC method aims to detect samples whose correlation is similar to the query x1 . In this example, x2 , . . ., x5 on P should be detected. First, the whole space is translated so that the query becomes the origin as shown in Fig. 1 (right). That is, x1 is subtracted from all other samples xi (i = 2, . . ., 7). Since the translated affine subspace contains the origin, it becomes the linear subspace V. Next, a line connecting each sample and the origin is drawn. Then, check whether another sample can be found on this line. In

Set the zero matrix S ∈ RN×N ,  (0 <  ≤ 1) and L = 1. Set the zero matrix S L ∈ RN×N . x  n = xn − xL for n = 1, 2, . . ., N (n = / L). Calculate the correlation coefficients Ck,l between all possible pairs of x  k and x  l (k = / l). Select all the pairs of k˜ and ˜l satisfying |Ck, ˜ ˜l | ≥ . (S L )k, ˜ ˜l = (S L )˜l,k˜ = 1. S = S + SL . If L = N, output S as the affinity matrix. Otherwise, L = L + 1 and return to Step 2. Partition the graph expressed by S through spectral clustering.

In the above procedure, steps 3–5 correspond to the NC method. The number of classes can be determined on the basis of physical knowledge of process, such as the number of product grades or production devices. However, it should be determined by trial and error when such physical knowledge is not available. 3.3. Illustrative example The detailed function of the proposed method is illustrated through a simple example. An objective data set consists of nine samples x1 , x2 , . . . , x9 ∈ R2 ; x1 , . . ., x4 and x5 , . . ., x8 are on the lines l and k, respectively, as shown in Fig. 2. On the other hand, x9 is an outlier. That is, the data set consists of three classes {x1 ,

K. Fujiwara et al. / Journal of Process Control 21 (2011) 1438–1448

x8 x7

l

{x5 , . . ., x8 } and {x9 }. This example clearly shows that the proposed NC-spectral clustering can classify samples on the basis of the correlation among variables.

k

x9

x6 x5 x4

x3

x2

3.4. Numerical examples

x1

The discrimination performance of the proposed NC-spectral clustering is compared with that of the k-means method, spectral clustering and the KPCA-based k-means method through a numerical example. The discrimination performances are compared through twodimensional examples. The objective data set consists of three classes that have different correlation and intersect at the origin. One hundred samples in each of three classes are generated by the following equation.

Fig. 2. An objective data set.

. . ., x4 }, {x5 , . . ., x8 } and {x9 }. In addition, samples x2 , x7 and x9 are lined up by chance. First, the zero matrix S ∈ R9×9 is defined, and the correlations of all possible pairs of samples are checked by the NC method. For example, when x1 is the query, x2 − x3 , x2 − x4 and x3 − x4 are detected as pairs whose correlation can describe x1 ; (S)2,3 = (S)3,2 = 1, (S)2,4 = (S)4,2 = 1 and (S)3,4 = (S)4,3 = 1. In the same way, since x1 − x3 , x1 − x4 , x3 − x4 and x7 − x9 are detected when x2 is the query, one is added to the elements of S corresponding to these pairs. At this time, (S)3,4 = (S)4,3 = 2 because the pair x3 − x4 is detected again. This procedure is repeated till all samples become the query. Finally, the affinity matrix S becomes as follows:

⎡0 2 ⎢2 0 ⎢2 2 ⎢ ⎢2 2 ⎢ S = ⎢0 0 ⎢0 0 ⎢ ⎢0 1 ⎣ 0 0



2 2 0 0 0 0 0 2 2 0 0 1 0 1⎥ 0 2 0 0 0 0 0⎥ ⎥ 2 0 0 0 0 0 0⎥ ⎥ 0 0 0 2 2 2 0⎥ 0 0 2 0 2 2 0⎥ ⎥ 0 0 2 2 0 2 1⎥ ⎦ 0 0 0 2 2 2 0 0 1 0 0 0 0 1 0 0

(16)

In S, (S)2,7 = (S)7,2 = 1, (S)2,9 = (S)9,2 = 1 and (S)7,9 = (S)9,7 = 1 since samples x2 , x7 and x9 are arranged in line by chance and the pairs of these samples are detected by the NC method. However, the weights of these pairs in S are smaller than those of the pairs that have the true correlation. Fig. 3 shows an example of the graph expression of the calculated affinity matrix S. In Fig. 3, the length of each arc is almost inversely proportional to their weights. By partitioning this graph using spectral clustering, the nodes are classified into {x1 , . . ., x4 },

6

5

9

x = sai + n

(i = 1, 2, 3)

2

3

1

4

Fig. 3. A weighted graph expressing the affinity matrix S.

(17)

where s ∼ N(0, 1), measurement noise n = [n1 n2 ]T , nj ∼ N(0, 0.01), and N(m, ) denotes the normal distribution with mean m and standard deviation . The coefficient matrices ai ∈ R2 are a1 = [1 2]T , a2 = [2 2]T and a3 = [2 1]T . Before clustering, the generated samples are standardized. In the conventional spectral clustering, the affinities between samples are defined using the Gaussian kernel in Eq. (15), where the Euclidean distance is used as a distance function d(· , ·) and  = 1. In the KPCA-based k-means method, the Gaussian kernel in Eq. (15) is used as the kernel function and d(· , ·) is the Euclidean function and  = 0.1, 0.5, 1.0 and 2.0. In the proposed method, the parameter of the NC method is  = 0.999. These parameters are determined by trial and error. The generated samples and the clustering results of the k-means method, spectral clustering and the proposed NC-spectral clustering are shown in Fig. 4, and those of the KPCA-based k-means method is shown in Fig. 5. The conventional distance-based methods and the nonlinear clustering method cannot classify samples correctly. The proposed NC-spectral clustering can classify samples accurately in most regions except around the origin. To evaluate the effect of the parameter  on the clustering results, samples are classified using NC-spectral clustering with different . Fig. 6 shows the clustering results with  = 0.998, 0.995, 0.99 and 0.98. The clustering performance deteriorates as  becomes small, and the clustering result does not change much when  is less than 0.99. This example shows the parameter  should be close to one although the optimal value depends on the variance of measurement noise. In addition, another case where three classes are located separately from each other and samples are contaminated by large measurement noise is investigated. One hundred samples in each of three classes are generated by the following equation. x = sai + bi + ni

7

8

1441

(i = 1, 2, 3)

(18)

where s and ai are the same as Eq. (17). The biases bi ∈ R2 are b1 = [0 0.4]T , b2 = [− 0.5 0]T and b3 = [0.5 0.4]T , and the measurement noise is ni = [n1 n2 ]T , nj ∼ N(0,  i ), where  1 = 0.02,  2 = 0.03 and  3 = 0.04. Before clustering, the generated samples are standardized. The generated samples and the clustering results of the kmeans method, spectral clustering and the proposed NC-spectral clustering are shown in Fig. 7. The settings of spectral clustering and NC-spectral clustering are the same as those in the previous example. In Fig. 7, every clustering method can discriminate samples accurately in most regions and the clustering results are similar to each other. This example shows that the proposed NC-spectral clustering functions successfully as well as the con-

1442

K. Fujiwara et al. / Journal of Process Control 21 (2011) 1438–1448

0.8

True classes

0

0

-0.8 −1 0.8

0

1

Spectral clustering

-0.8 −1

0

1

NC-spectral clustering (γ =0.999)

0.8

0

-0.8 −1

The k−means method

0.8

0

0

1

−0.8 −1

0

1

Fig. 4. Classification results of the 2-dimentional example when three classes intersect at the origin: true classes (top-left), the k-means method (top-right), spectral clustering (bottom-left) and NC-spectral clustering (bottom-right).

ventional distance-based methods even when classes are located separately from each other.

The data matrix X can be reconstructed or estimated from TR with linear transformation VR . ˆ = T R V TR = XV R V TR X

(20)

4. Process monitoring that can cope with individuality of devices

The information lost through the dimensional compression, that is, errors, is written as

In this section, the conventional MSPC is briefly explained and a new statistical process monitoring method considering the individuality of devices is proposed.

ˆ = X(I − V R V TR ) E =X −X

4.1. Conventional process monitoring method

T2 =

In the conventional MSPC, T2 and Q statistics are used for process monitoring; they are derived from PCA, which is a tool for data compression and information extraction [11]. PCA finds linear combinations of variables that describe major trends in a data set. In PCA, a loading matrix V R ∈ RM×R is derived as the right singular matrix of a data matrix X ∈ RN×M whose ith row is xTi . Here, M, N, and R (≤ M) denote the numbers of variables, samples, and principal components retained in the PCA model, respectively. All variables are mean-centered and appropriately scaled. The score is a projection of X onto the subspace spanned by principal components, which is the column space of VR . The score matrix T R ∈ RN×R is given by T R = XV R

(19)

(21)

Using these relationships, the T2 and Q statistics of a sample x are defined as follows: R  t2 r

r=1

Q =

M 

tr

(xm − xˆ m )2

(22)

(23)

m=1

where tr denotes the standard deviation of the rth score tr , xm and xˆ m are the mth element of x and its estimated value, respectively. The T2 statistic expresses the normalized distance from the origin in the subspace spanned by principal components. When T2 is small, the sample is close to the mean of the modeling data. The Q statistic is the distance between the sample and the subspace spanned by principal components. To detect process faults, the reference model representing the normal operating condition is constructed, and both T2 and Q statistics of the current measured sample are monitored. If either of

K. Fujiwara et al. / Journal of Process Control 21 (2011) 1438–1448

0.8

Bandwidth = 0.1

0.8

0

-0.8 -1

0.8

Bandwidth = 0.5

0

0

1

Bandwidth = 1

-0.8 -1

0.8

0

-0.8 -1

1443

0

1

Bandwidth = 2

0

0

1

-0.8 -1

0

1

Fig. 5. Classification results of the 2-dimentional example by the KPCA-based k-means method (Gaussian kernel):  = 0.1 (top-left),  = 0.5 (top-right),  = 1.0 (bottom-left) and  = 2.0 (bottom-right).

them is outside the control limit, the process is judged to be out of control.

Assume that samples xn ∈ RM (n = 1, 2, . . . , N) that represent normal operating condition are stored in the database. The proposed process monitoring method is as follows:

4.2. Process monitoring based on NC-spectral clustering

1. Classify the samples xn to P classes using NC-spectral clustering, and ˝j = {n|xn ∈ Kj } (j = 1, 2, . . ., P). 2. Apply PCA to xi (i ∈ ˝j ) to derive the reference models Vj .

Although many successful applications have shown the practicability of MSPC, it is hard difficult to apply MSPC to parallelized production devices that may have different correlation among variables. NC-spectral clustering can be used for capturing the individuality of production devices, i.e. differences of correlation. In the present work, a new process monitoring method, referred to as NC-MSPC, is proposed by integrating MSPC and NC-spectral clustering. Fig. 8 shows the difference of the normal operating conditions (NOC) of MSPC and NC-MSPC. The samples consist of two groups that have different correlation and they represent the individuality of two devices; the ellipses show the NOC defined by the control limit of the T2 statistic. When a sample is located at the median center of the two groups, it should be detected as fault because it is away from both groups. However, MSPC cannot detect this sample because the NOC contains the region between two groups as shown in Fig. 8 (left). On the other hand, NC-MSPC can detect it correctly because the NOC can be determined more accurately than MSPC as shown in Fig. 8 (right). In the proposed method, the operation data are clustered by using NC-spectral clustering and the reference model of each class is derived by PCA. Only when either T2 or Q statistic is out-ofcontrol, fault is detected in the same way as MSPC.

2

3. Determine the control limits of the T2 and Q statistics: T j and Q j . 4. Calculate the T2 and Q statistics, Tj2 and Qj , between a new sample x˜ and all Vj , when x˜ is measured. 2

5. Detect fault, if Tj2 > T j for all j or Qj > Q j for all j. In step 3, to make the control limits represent (100 − ˛)% confi2

dence limits (0 < ˛ < 100), T j and Q j can be determined so that ˛% of normal samples in class Kj are detected as fault. 5. Soft-sensor design that can cope with individuality of devices Similarly to the process monitoring method described in the previous section, operation data need to be clustered according to their characteristics to construct soft-sensors that can cope with the individuality of production devices. In this section, a new soft-sensor design method based on NC-spectral clustering is proposed. In the proposed method, the operation data are clustered by using NC-spectral clustering and a model is constructed for each class.

1444

K. Fujiwara et al. / Journal of Process Control 21 (2011) 1438–1448

0.8

γ =0.998

0

-0.8 -1

0.8

0

0

1

γ =0.99

-0.8 -1

0

1

γ =0.98

0.8

0

-0.8 -1

γ =0.995

0.8

0

0

1

-0.8 -1

0

1

Fig. 6. Classification results of NC-spectral clustering with different parameters :  = 0.998 (top-left), 0.995 (top-right), 0.99 (bottom-left) and 0.98 (bottom-right).

Assume that the input samples and output samples are stored in the database: xn ∈ RM and y n ∈ RL (n = 1, 2, . . . , N). The proposed soft-sensor design procedure is as follows:

where e is the error of PCA. The classifier h is described as h(x) = argmin Qj {j}

1. Classify the input samples xn to P classes using NC-spectral clustering, and ˝j = {n|xn ∈ Kj } (j = 1, 2, . . ., P). 2. Construct models fj : X → Y from xi and yi (i ∈ ˝j ) for Kj , where X is the set of input and Y is that of output. 3. Classify a new sample x˜ to class K˜j = h(˜x), where h : X → K is a classifier and K is the set of class. 4. Calculate an estimate yˆ = f˜j (˜x). In the above algorithm, any modeling method can be used for building a local model f. In the present work, PLS is used to cope with the collinearity problem. The Q statistic can be used as the evaluation function of a classifier h in step 3. As described in the previous section, Q is the distance between the sample and the subspace spanned by principal components. In other words, Q is a measure of dissimilarity between the sample and the modeling data from the viewpoint of the correlation among variables. Therefore, the class that minimizes the Q statistic of x should be selected as the class of x. From Eq. (23), the Q statistic can be rewritten as T

Q =e e

(24)

= xT (I − V R V TR )x

(25)

(26)

K

{j}T

Qj = xT (I − V R V R )x

(27)

{j}

where V R is the loading matrix derived from the matrix X{j} whose rows are samples belonging to class Kj . 6. Case study In this section, the proposed NC-spectral clustering is compared with the k-means method through their application to operation data of parallelized batch processes. In addition, statistical process monitoring systems and soft-sensors for product composition are designed on the basis of their clustering results. The detailed batch process model used in this case study is described in [31,32]. 6.1. Problem setting A schematic diagram of the batch reactor is shown in Fig. 9. In this process, two liquid-phase reactions take place: reaction 1 A + B → C reaction 2 A + C → D

(28)

The component C is the desired product while D is the byproduct, and the objective is to achieve a good conversion of product C. The initial reactor temperature is 20 ◦ C and the initial amount of the raw materials A and B changes as the random numbers following N(20, 0.1).

K. Fujiwara et al. / Journal of Process Control 21 (2011) 1438–1448

The k−means method

True classes 0.8

0.8

0

0

-0.8 -1

0

1

-0.8 -1

Spectral clustering

0

1

NC-spectral clustering (g =0.999)

0.8

0.8

0

0

-0.8 -1

1445

0

1

-0.8 -1

0

1

Fig. 7. Classification results of the 2-dimentional example when three classes are located separately from each other: true classes (top-left), the k-means method (top-right), spectral clustering (bottom-left) and NC-spectral clustering (bottom-right).

Since reaction 1 proceeds at 90 ◦ C or higher, the reactor temperature should be raised as fast as possible after operation starts. As reaction 1 proceeds, the reactor temperature increases due to reaction heat. However, reaction 2 proceeds at the reactor temperature exceeding 100 ◦ C and product C is converted to byproduct D. Therefore, a rise in the reaction temperature has to be controlled after it reaches around 90 ◦ C. In this process, the reactor temperature and the jacket temperature are controlled through a multivariable control system. The jacket temperature Tj and the reactor temperature Tr are measured every 1 min and the termination time of operation is 120 min, and they are arranged in a vector as follows:

x = [Tj (0), Tj (1), . . . , Tj (120), Tr (0), Tr (1), . . . , Tr (120)] ∈ R242 (29) The operation data of the batch process are shown in Fig. 10. In this case study, five reactors R1 , R2 , . . ., R5 are operated in parallel. Their heat transfer coefficients are unchanged during the batch operation, and they change every batch operation randomly. The heat transfer coefficients Ui (i = 1, 2, . . ., 5) are

 Ui =

U(40.60, 40.62), U(40.57, 40.59), U(40.54, 40.56),

sample

Group 1 Group 2

Fig. 8. Normal operating conditions of MSPC and NC-MSPC.

i = 1, 2 i = 3, 4 i=5

sample

Group 1 Group 2

(30)

1446

K. Fujiwara et al. / Journal of Process Control 21 (2011) 1438–1448

TT

The k-means method

TC TT

10

Hot

t2

Reactor

0

Cold

-10

A+B→C A+C→D

-10

0

t1

10

20

NC-spectral clustering

Fig. 9. Schematic diagram of the batch reactor with control systems.

10

0

t2

where U(a, b) denotes the uniform random numbers on the closed period [a, b]. Therefore, there are three types of correlation among variables although there are five reactors. In this case study, the operation data of 20 batches of each reactor are stored in the database, and process monitoring systems and soft-sensors are designed using these data.

-10

6.2. Clustering Before constructing process monitoring systems and softsensors, the operation data of all 100 batches stored in the database are clustered into three classes using the k-means method and the proposed NC-spectral clustering. As preprocessing of the operation data, its dimension is reduced by Multiway PCA [29,30]; the number of the retained principal components is two. Therefore, the input variables of these clustering methods are the scores t1 and t2 . The parameter of NC-spectral clustering is  = 0.99. The clustering results of the k-means method and NC-spectral clustering are shown in Fig. 11. In this figure, the sample distribution on the t1 –t2 plane is shown. In the result of the k-means method, the center of each class cj (j = 1, 2, 3) is designated by a circle, and samples are certainly classified according to the distance. On the other hand, in the case of the proposed method, samples are classified regardless of the distance.

-10

Temp. [ ºC]

80

40 Reactor temp. Jacket temp. 0 0

40

80 Min

Fig. 10. Batch process operation data.

120

t1

10

20

Fig. 11. Classification results: the k-means method (top) and NC-spectral clustering (bottom).

6.3. Process monitoring In this case study, a newly developed reactor R6 is monitored. The heat transfer coefficient of R6 changes as the random number following U(40.60, 40.62), which is the same as that of R1 and R2 . The drift of jacket temperature measurements is investigated as a fault. The measurement of the jacket temperature Tj (k) [◦ C] changes as follows: Tj (k) = Tj0 (k) + ıT (k − k0 ) [◦ C]

120

0

(31)

is the measurement without drift, k [min] is the where Tj0 (k) sampling time, k0 [min] is the drift start time and ıT [◦ C/min] is the rate of drift. In this case study, k0 = 20 min and ıT = 0 (Cases 1 and 2), 0.1 (Case 3) and −0.1 (Case 4). That is, Cases 1 and 2 are normal conditions. The operation time is divided to twelve periods at 10 min intervals, and process condition is checked at the end of every period. Therefore, sensor drift starts at the beginning of the third period and it should be detected as early as possible. In each operating period, three reference models are built for three classes clustered by NC-spectral clustering in Section 6.2. The control limits of T2 and Q are determined so that they represent 95% confidence limits, and the number of principal components retained in the reference models is five. In addition, another reference model is built on the basis of the clustering result of the k-means method described in Section 6.2. This k-means methodbased MSPC is called KM-MSPC. The process monitoring results during the second to fifth periods are shown in Tables 1–4. In these tables, the T2 and Q statistics 2 2 divided by their control limits T and Q for three classes, T˜ 2 = T/T

K. Fujiwara et al. / Journal of Process Control 21 (2011) 1438–1448

1447

Table 1 Monitoring results of Case 1 by KM-MSPC and NC-MSPC. KM-MSPC

NC-MSPC

2nd

3rd

4th

5th

T˜ 2 Class 1 Class 2 Class 3 Q˜

0.01 1.54 2.51

0.83 0.7 1.72

2.66 0.44 1.67

0.27 2.17 0.81

Class 1 Class 2 Class 3

0.07 0.20 0

0.26 647 3.14

3.15 0.18 0.44 –

Detection





2nd

3rd

4th

5th

2.72 1.91 0.19

0.05 267 1.59

61.8 0.04 0.23

0.62 0.34 0.16

0.06 1.85 6.06

1.30 0.37 0

0.51 3483 38148 0.41 5.12 15.1

1.09 0.06 7.63











and Q˜ = Q/Q , are described. Process fault is detected only when T˜ 2 > 1 or Q˜ > 1 of all classes. From these results, both KM-MSPC and NC-MSPC determine correctly that fault does not occur in Cases 1 and 2 and at the second period of Cases 3 and 4, where sensor drift has not occurred. In Case 3, both methods can detect fault at the third period correctly. However, in Case 4, KM-MSPC cannot detect sensor drift until the fifth period. On the other hand, NC-MSPC can detect it at the third period Table 2 Monitoring results of Case 2 by KM-MSPC and NC-MSPC. KM-MSPC 2nd

4th

5th

2nd

3rd

4th

5th

T˜ 2 Class 1 Class 2 Class 3 Q˜

0.38 2.06 5.7

196 0.22 0.28

5.36 1.21 0.57

0.22 3.25 0.22

2.57 2.34 0.31

0.04 960 0.12

0.22 0.25 0.73

0.74 0.20 0.02

Class 1 Class 2 Class 3

0 0.11 0.18

26.7 419 0.08

4.89 0.57 0.35

0.21 1.75 4.74

0.75 0.22 0.38

26.8 3414 34664 0.39 0.65 9.86

1.01 0.29 6.11











Detection







Table 3 Monitoring results of Case 3 by KM-MSPC and NC-MSPC. KM-MSPC 2nd T˜ 2 Class 1 Class 2 Class 3 Q˜ Class 1 Class 2 Class 3 Detection

NC-MSPC

3rd

4th

5th

0.68 1.55 0.49

1100 1.63 0.18

0.30 2.65 7.02

0.57 0.05 2.67

0.04 0 0.05

9830 247 19.3

1.58 8.78 15.3







2nd

3rd

4th

5th

2.48 1.90 0.31

1.29 1998 3.71

1339 1.11 9.77

0.19 0.27 1.94

2.39 5.33 12.5

0.32 0 0.29

38.2 50612 20.9

2849 3.10 46.6

3.29 2.30 12.3











KM-MSPC 3rd

NC-MSPC 4th

5th

2nd

3rd

4th

5th

T˜ 2 Class 1 Class 2 Class 3 Q˜

3.06 3.18 0.5

771 0.07 0.16

0.19 1.14 4.96

2.63 1.77 0.38

3.42 4.19 0.62

0.56 1726 0.16

1339 1.11 4.18

0.04 1.08 0.07

Class 1 Class 2 Class 3

0.96 0.66 0.36

22612 968 0.91

0.77 3.37 5.62

1.97 3.40 3.34

0.05 0.74 1.29

27.1 21367 2.12

1938 1.52 31.1

2.90 1.20 5.43

















Detection

A soft-sensor is constructed to estimate the amount of product MC [kmol] at the end of the batch. Three models are built for each of three classes clustered by NC-spectral clustering in Section 6.2, and PLS is used for model construction. In addition, another soft-sensor is designed on the basis of the clustering result of the k-means method. In this case, the distance from the class center cj is used for sample discrimination when an output is requested. That is, the classifier g is described as g(x) = argmin ||x − c j ||2 K

(32)

In this case study, MC of the new reactor R6 is estimated for performance validation of the soft-sensors. The heat transfer coefficient of R6 changes as the random number following U(40.60, 40.62), which is the same as that of R1 and R2 . The number of the validation batches is 20. The estimation results of soft-sensors are shown in Fig. 12. In this figure, the horizontal axis and the vertical axis express the measurements and the estimates, respectively. RMSE is the rootmean-squared error and r2 denotes the determination coefficient between measurements and estimates. These results clearly show that the proposed method can achieve higher estimation performance than the k-means method-based soft-sensor and RMSE is improved by 43%.

7. Conclusion

Table 4 Monitoring results of Case 4 by KM-MSPC and NC-MSPC.

2nd

adequately. This case study clearly shows the proposed NC-MSPC can detect fault earlier than KM-MSPC.

6.4. Soft-sensor design

NC-MSPC

3rd

Fig. 12. Estimation results by PLS with the k-means method (top) and NC-spectral clustering (bottom).

A new clustering method was proposed by integrating the NC method and spectral clustering. The proposed NC-spectral clustering can accurately discriminate the correlation among variables. In addition, a new statistical process monitoring method and a new soft-sensor design method based on NC-spectral clustering were proposed. Since NC-spectral clustering can discriminate the individuality of production devices, the proposed methods can improve their performances greatly in comparison with the distance-based methods. The usefulness of the proposed methods is demonstrated through numerical examples and a case study of parallelized batch processes.

1448

K. Fujiwara et al. / Journal of Process Control 21 (2011) 1438–1448

Appendix A. Analysis of the relaxed problem in the Mcut method The detailed analysis of the relaxed problem in the Mcut method is described here. The objective function of the Mcut method is as follows: min Jq = q

qT (D − W )q qT W q

(A.1)

where D = diag(We), W is an affinity matrix and e = [1, . . ., 1]T . At first, q in Eq. (A.1) can be scaled as min ˜Jq =

˜ )q˜ q˜ T (I − W

(A.2)

q˜ T q˜



˜ q˜ > 0. W ˜ ˜ = D−1/2 W D−1/2 , q˜ = D1/2 q/|D|1/2 and q˜ T W where W ˜ )i,j ≤ 1 and is a probability transition matrix because 0 ≤ (W  ˜ )i,j = 1 where (W ˜ )i,j is the element of W ˜. (W j ˜ The objective function Jq can be expressed as Rayleigh quo˜ . From Rayleigh’s theorem, a quotient R(x) tient when P ≡ I − W is minimized by the eigenvector x1 corresponding to the smallest eigenvalue 1 , and the minimum value of R(x) is 1 . Using this theorem, the relaxed minimization problem Eq. (A.2) results in the eigenvalue problem. Suppose that an eigenvector of P is D1/2 e and w i denotes the ith row of W, then (D)i,i = di = w i e and ˜ D1/2 e = D1/2 e − D−1/2 W D−1/2 D1/2 e = 0(A.3) PD1/2 e = ID1/2 e − W Therefore, one of the eigenvectors of P is D1/2 e and its eigenvalue ˜ is 1 because it is the is 0. In addition, the largest eigenvalue of W ˜ )x ≥ 0. That is, probability transition matrix, and xT Px = xT (I − W P is a positive semidefinite matrix, and D1/2 e is the eigenvector z1 corresponding to the smallest eigenvalue 1 = 0. ˜ all elements of z1 are positive Although z1 can minimize q, and Eq. (12) is not satisfied. From the max–min theory, the second smallest solution is the second smallest eigenvalue 2 . The eigenvector z2 corresponding to 2 satisfies z T1 z 2 = 0 because P is a symmetric matrix. Therefore, the eigenvector z2 has both positive and negative elements, and it is the optimal solution q˜ ∗ of Eq. A.2. From the above analysis, Eq. (A.2) can be rewritten as min˜Jq =

q˜ T (I − D−1/2 W D−1/2 )q˜



(A.4)

q˜ T q˜

and this results in the eigenvalue problem. (I − D−1/2 W D−1/2 )z = z

(A.5) q* = D−1/2 z2 ;

The solution q* is expressed as 2 and z2 are called Fiedler value and Fiedler vector, respectively [33]. References [1] M. Kano, Y. Nakagawa, Data-based process monitoring, process control and quality improvement: recent developments and applications in steel industry, Computers & Chemical Engineering 32 (2008) 12–24. [2] H. Shigemori, M. Kano, S. Hasebe, Optimum quality design system for steel products through locally weighted regression model, Journal of Process Control 21 (2011) 293–301. [3] T. Mejdell, S. Skogestad, Estimation of distillation compositions from multiple temperature measurements using partial-least-squares regression, Industrial & Engineering Chemistry Research 30 (1991) 2543–2555.

[4] M. Kano, K. Miyazaki, S. Hasebe, I. Hashimoto, Inferential control system of distillation compositions using dynamic partial least squares regression, Journal of Process Control 10 (2000) 157–166. [5] H. Kamohara, A. Takinami, M. Takeda, M. Kano, S. Hasebe, I. Hashimoto, Product quality estimation and operating condition monitoring for industrial ethylene fractionators, Journal of Chemical Engineering of Japan 37 (2004) 422–428. [6] H. Kaneko, M. Arakawa, K. Funatsu, Development of a new soft sensor method using independent component analysis and partial least squares, AIChE Journal 55 (2009) 87–98. [7] V. Radhakrishnan, A. Moham, Neural networks for the identification and control of blast furnace hot metal quality, Journal of Process Control 10 (2000) 509–524. [8] R. Amirthalingam, J. Lee, Subspace identification based inferential control applied to a continuous pulp digester, Journal of Process Control 9 (1999) 397–406. [9] M. Kano, S. Lee, S. Hasebe, Two-stage subspace identification for softsensor design and disturbance estimation, Journal of Process Control 19 (2009) 179–186. [10] M. Kano, M. Ogawa, The state of the art in chemical process control in Japan: good practice and questionnaire survey, Journal of Process Control 20 (2010) 969–982. [11] J.E. Jackson, G.S. Mudholkar, Control procedures for residuals associated with principal component analysis, Technometrics 21 (1979) 341–349. [12] J.V. Kresta, J.F. MacGregor, T.E. Marlin, Multivariate statistical monitoring of process operating performance, The Canadian Journal of Chemical Engineering 69 (1991) 35–47. [13] T. Kourti, J.F. MacGregor, Process analysis, monitoring and diagnosis, using multivariate projection methods, Chemometrics and Intelligent Laboratory Systems 28 (1995) 3–21. [14] M. Kano, S. Tanaka, S. Hasebe, I. Hashimoto, H. Ohno, Monitoring independent components for fault detection, AIChE Journal 49 (2003) 969–976. [15] S.J. Qin, Recursive PLS algorithms for adaptive data modeling, Computers & Chemical Engineering 22 (1998) 503–514. [16] G. Bontempi, M. Birattari, H. Bersini, Lazy learning for local modeling and control design, International Journal of Control 72 (1999) 643–658. [17] C.G. Atkeson, A.W. Moore, S. Schaal, Locally weighted learning, Artificial Intelligence Review 11 (1997) 11–73. [18] C. Cheng, M.S. Chiu, A new data-based methodology for nonlinear process modeling, Chemical Engineering Science 59 (2004) 2801–2810. [19] C. Cheng, M.S. Chiu, Nonlinear process monitoring using JITL-PCA, Chemometrics and Intelligent Laboratory Systems 76 (2005) 1–13. [20] Z. Gea, Z. Song, Online monitoring of nonlinear multiple mode processes based on adaptive local model approach, Control Engineering Practice 16 (2008) 1427–1437. [21] K. Fujiwara, M. Kano, S. Hasebe, Soft-sensor development using correlationbased Just-In-Time modeling, AIChE Journal 55 (2009) 1754–1765. [22] B. SchAolkopf, A.J. Smola, K.R. MAuller, Nonlinear component analysis as a kernel eigenvalue problem, Neural Computation 10 (1998) 1299–1319. [23] J.B. Tenenbaum, V. Silva, J.C. Langford, A global geometric framework for nonlinear dimensionality reduction, Science 290 (2000) 2319–2323. [24] S.T. Roweis, L.K. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science 290 (2000) 2323–2326. [25] G.F. Trecate, M. Muselli, D. Liberati, M. Morari, A clustering technique for the identification of piecewise affine system, Automatica 39 (2003) 205–217. [26] K. Fujiwara, M. Kano, S. Hasebe, Development of correlation-based pattern recognition algorithm and adaptive soft-sensor design, Control Engineering Practice, doi:10.1016/j.conengprac.2010.11.013, in press. [27] C. Ding, X. He, H. Zha, M. Gu, H.D. Simon, A min–max cut algorithm for graph partitioning and data clustering, in: Proceeding of IEEE International Conference on Data Mining, San Jose, 2001. [28] A.N. Ng, M.I. Jordan, Y. Weiss, On spectral clustering: analysis and an algorithm, in: Proceeding of Advances in Neural Information Processing Systems (NIPS), Vancouver, 2001. [29] P. Nomikos, J.F. MacGregor, Monitoring batch processes using multiway principal component analysis, AIChE Journal 40 (1994) 1361–1375. [30] P. Nomikos, J.F. MacGregor, Multiway partial least squares in monitoring batch processes, Chemometrics and Intelligent Laboratory Systems 30 (1995) 97–109. [31] B.J. Cott, S. Macchietto, Temperature control of exothermic batch reactors using generic mode control, Industrial & Engineering Chemistry Research 28 (1989) 1177–1184. [32] K. Fujiwara, M. Kano, S. Hasebe, Development of correlation-based clustering method and its application to software sensing, Chemometrics and Intelligent Laboratory Systems 10 (2010) 130–138. [33] M. Fiedler, Algebraic connectivity of graphs, Czechoslovak Mathematical Journal 23 (1973) 298–305.