Quality prediction and quality-relevant monitoring with multilinear PLS for batch processes

Quality prediction and quality-relevant monitoring with multilinear PLS for batch processes

    Quality Prediction and Quality-relevant Monitoring with Multilinear PLS for Batch Processes Lijia Luo, Shiyi Bao, Jianfeng Mao, Di Ta...

3MB Sizes 1 Downloads 74 Views

    Quality Prediction and Quality-relevant Monitoring with Multilinear PLS for Batch Processes Lijia Luo, Shiyi Bao, Jianfeng Mao, Di Tang PII: DOI: Reference:

S0169-7439(15)00274-9 doi: 10.1016/j.chemolab.2015.11.004 CHEMOM 3132

To appear in:

Chemometrics and Intelligent Laboratory Systems

Received date: Revised date: Accepted date:

22 June 2015 26 October 2015 4 November 2015

Please cite this article as: Lijia Luo, Shiyi Bao, Jianfeng Mao, Di Tang, Quality Prediction and Quality-relevant Monitoring with Multilinear PLS for Batch Processes, Chemometrics and Intelligent Laboratory Systems (2015), doi: 10.1016/j.chemolab.2015.11.004

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT Quality Prediction and Quality-relevant Monitoring with Multilinear PLS for Batch Processes

IP

T

Lijia Luo, Shiyi Bao, Jianfeng Mao, Di Tang

SC R

College of Mechanical Engineering, Zhejiang University of Technology, Engineering Research Center of Process Equipment and Remanufacturing, Ministry of Education, Hangzhou, China

NU

ABSTRACT: The multilinear regression method is applied for quality prediction and quality-relevant monitoring in batch processes. Four multilinear partial least squares (PLS) models

MA

are investigated, including three higher-order PLS (HOPLS) models, termed as HOPLS-Tucker, HOPLS-RTucker and HOPLS-CP, and the N-way PLS (N-PLS) model. These multilinear PLS

TE

D

methods have two advantages as compared to the unfold-PLS method. Firstly, they retain the inherent three-way representation of batch data and avoid the disadvantages caused by data

CE P

unfolding, resulting in more stable process models. Secondly, they summarize the main information on each mode of data and describe the three-way interactions between them, and therefore have

AC

better modeling accuracy and intuitive interpretability. Online quality prediction and quality-relevant monitoring methods are developed by combining multilinear PLS with the moving data window technique. These methods are tested in a fed-batch penicillin fermentation process. The results indicate that the multilinear PLS method has higher predictive accuracy, better anti-noise capability and monitoring performance than the unfold-PLS method. Keywords: tensor; multilinear PLS; quality-relevant monitoring; quality prediction; batch processes.



Corresponding Author. Tel.: +86 (0571) 88320349.

E-mail address: [email protected] (L.J. Luo). 1

ACCEPTED MANUSCRIPT 1. Introduction

T

Batch process is widely used to produce low volume and high value-added products, including

IP

polymers, chemicals, pharmaceuticals and semiconductors, because of its high flexibility to adapt the

SC R

rapidly changing market situations. The increasing international competition has aroused the demand for high quality products. Reliable process monitoring as well as precise quality prediction and

NU

control are therefore necessary to ensure the safe and profitable operation of batch processes. The process monitoring aims to warn abnormal operating conditions by timely detecting and diagnosing

MA

process faults, which is helpful for taking corrective actions to recover the normal production and

D

enhance the process safety. The objective of quality prediction and control is to estimate the product

TE

quality from operating conditions in a fast and accurate way, and further to derive better operating

CE P

conditions that can reach a high product quality. However, process monitoring and quality prediction are difficult for batch processes, because they suffer not only from those intractable nonlinear and

AC

time-varying process behaviors, but also from the unavailable online quality measurements, batch-to-batch variations, and so on. In last decades, data-driven process monitoring and control methods have been widely accepted in various industrials, benefiting from the widespread application of automation instrument and data acquisition technologies. Data-driven methods build process models using operation data and rarely require the knowledge of process mechanism. Therefore, they are especially applicable for those complicated industrial processes, in which precise first-principle models are not available or difficult to be built. Different from continuous processes, operation data of a batch process are usually recorded in a three-way data array with three directions of batch, variable and sampling time. 2

ACCEPTED MANUSCRIPT Bilinear analysis methods and multilinear analysis methods are often adopted to analyze this three-way data array. The bilinear analysis method is also known as the unfolding method, including

IP

T

multiway principal component analysis (MPCA) [1], multiway partial least squares (MPLS) [2],

SC R

multiway independent component analysis (MICA) [3], and so on. These methods unfold the three-way data array to a matrix along the batch-wise or variable-wise direction firstly, and then

NU

build process models based on the unfolded data. However, multilinear analysis methods maintain the three-way representation of the data array and build process models using tensor decompositions,

MA

such as parallel factor analysis (PARAFAC) [4] and Tucker3 decomposition [5]. So far, bilinear methods, especially MPCA and MPLS, have attracted more research attention than

TE

D

multilinear methods in process control applications. Lots of MPCA/MPLS-based process monitoring and quality prediction methods have been developed. For example, Lee et al. [6] proposed a

CE P

multiway kernel PCA method for monitoring nonlinear batch processes. Lu and Gao [7] and Zhao et al. [8] presented two phase-based PLS methods for the quality prediction of multiphase batch

AC

processes, which take the multiphase feature of batch processes into account. Yu [9] proposed a multi-way Gaussian mixture model based adaptive kernel partial least squares (MGMM-AKPLS) method to predict quality variables in nonlinear batch processes. As mentioned above, the MPCA/MPLS-based methods need to unfold the three-way batch data before building process models. However, two potential disadvantages are caused by the data unfolding operation. Firstly, the unfolded data may have the "large variable number but small sample size" problem. For instance, when a three-way array X(10×10×100) containing 10 batches, 10 variables and 100 time slices is unfolded to a matrix X(10×1000), only ten samples are available for PCA or PLS to compute the 3

ACCEPTED MANUSCRIPT 1000-dimensional loading vectors, probably resulting in an unreliable estimate of model parameters [10]. Secondly, MPLS and MPCA fail to offer an explicit description of the three-way interactions in

IP

T

the batch dataset, because the three-way data structure is destroyed and the information from two

SC R

directions is convolved together after data unfolding. These two disadvantages may reduce the monitoring performance, prediction ability and interpretability of MPCA/MPLS models.

NU

In recent years, researchers have started to analyze batch data with multilinear (i.e., tensor) analysis methods, where batch data are represented in their natural three-way form as tensors. For

MA

example, Meng et al. [4] and Louwerse and Smilde [5] applied PARAFAC and Tucker decomposition for batch process monitoring, respectively. Luo et al. [11] proposed a generalized

TE

D

Tucker2 (GTucker2) model for monitoring uneven-length batch processes. Since the batch data array is inherently three-way, using multilinear analysis methods can naturally avoid the disadvantages

CE P

caused by data unfolding. Multilinear models commonly have much fewer parameters than MPCA/MPLS models, because batch data are compressed in three directions instead of two. These

AC

multilinear models are therefore expected to be more stable [5]. Besides, multilinear models can summarize all main effects and interactions in batch data, because they extract the main information on each mode of data and describe relations between them [12]. Thus, multilinear models have better modeling accuracy and intuitive interpretability. Particularly, two multilinear regression algorithms haven been proposed, i.e., higher-order partial least squares (HOPLS) [10] and N-way partial least squares (N-PLS) [13]. HOPLS and N-PLS both have some advantages as compared to unfold-PLS (MPLS), including robustness to noise, stabilized solution, increased predictability and intuitive interpretability [10,13]. These multilinear regression methods may have good application prospects 4

ACCEPTED MANUSCRIPT in batch processes. In this paper, the multilinear regression method is applied for quality prediction and

IP

T

quality-relevant monitoring in batch processes. Four multilinear PLS models, including

SC R

HOPLS-Tucker (i.e., HOPLS), HOPLS-RTucker, HOPLS-CP and N-PLS, are investigated. Especially, HOPLS-RTucker and HOPLS-CP are two new multilinear PLS models, which are

NU

derived by imposing extra restrictions on HOPLS-Tucker. HOPLS-RTucker aims to handle the multi-way data with a specific (or free) mode, on which the information remain unchanged.

MA

HOPLS-CP replaces the Tucker decomposition used in HOPLS-Tucker with the CP decomposition (i.e., canonical decomposition/parallel factor analysis) [14] to simplify the model and improve the

TE

D

computational efficiency. Online quality prediction and quality-relevant monitoring methods are then proposed by combining multilinear PLS with the moving data window technique. Monitoring

CE P

statistics named T2, Qx and Qy are constructed for fault detection, and the contribution plot is applied for fault diagnosis. These methods are tested in a fed-batch penicillin fermentation process. The

AC

results show that multilinear PLS outperforms MPLS (i.e., unfold-PLS) in terms of higher predictive accuracy, better anti-noise capability, higher fault detection rates and lower false alarm rate.

2. Notation and Preliminaries 2.1. Notation and definitions

Tensors (multi-way arrays), matrices and vectors are denoted by underlined boldface capital letters, boldface capital letters and boldface lowercase letters respectively, e.g., X, X and x. The order of a tensor is the number of ways or modes. The ith entry of a vector x is xi. The ith column of a matrix X

5

ACCEPTED MANUSCRIPT is xi and the element (i, j) is xij. The element (i1, i2, …, iN) of an Nth-order tensor X I1I2 

xi1i2

iN

I N

is

. X(n) and x(n) denote the nth factor matrix and factor vector in a sequence, respectively. X(n) is

IP

T

the mode-n matricization of a tensor X [14]. Symbols “∘”, “⨂” and “⨁” denote the vector outer

SC R

product, Kronecker product and Khatri–Rao product, respectively. The superscript “+” denotes the Moore-Penrose pseudoinverse.

Y = X n A I1

I n  I N

 I n1 J  I n1 I N

I n   J M

Z = X ,Y

in 1

in

in1 jin1 iN

D

  xi1

i1 1 i2 1

[14,15].

t

iN in

  xi1 in 1

For

in

iN

iN 1

iN

. The n-mode product t In

or with a vector

a jin or

two

2 i1i2

Y = X n t I1

X I1

tensors

is

I n1 I n1 I N

I n  I N

and

with the same size on the nth-mode, their n-mode cross-covariance is

I1 n;n

 I n1  I n1   I N  J1  J n1 J n1  J M

In

with zi1

AC

The Tucker decomposition of a tensor X I1

where G J1

  x

TE

Y J1

in1in1 iN

IN

In

with yi1

CE P

yi1

I2

with a matrix A J In

In

with

I1

is X 

MA

X I1

of a tensor

I N

NU

The norm of an Nth-order tensor X I1I2 

Jn  J N

X  G 1 A(1) 2

I n  I N

in1in1 iN j1

jn1 jn1

jM

  xi1 in 1

in

iN

y j1

in

jM

.

is concisely expressed as [14,15]

N A( N )  G; A(1) ,

, A( N ) 

is the core tensor, and A( n ) Jn In denote factor matrices. The canonical

decomposition/parallel factor analysis (CP) of a tensor X I1

I n  I N

[14,15] R

X   ar(1) ar(2) r 1

ar( N )   A(1) ,

where ar( n ) In is the rth-column vector of matrix A( N ) .

6

, A( N ) 

is concisely expressed as

ACCEPTED MANUSCRIPT 2.2. Bilinear PLS and unfold-PLS

T

The partial least squares (PLS) is a well-known multivariate regression method for two-way data

IP

[16]. The unfold-PLS, also known as MPLS, is an extended version of PLS to deal with three-way

SC R

data [2]. The relation between PLS and MPLS is that MPLS performs ordinary PLS on two matrices unfolded from three-way data arrays. Given two three-way data arrays X(I×J×K) and Y(I×M×N),

NU

they are unfolded into two matrices X(I×JK) and Y(I×MN). MPLS decomposes matrices X and Y as R

X   t r prT  E  TP T  E R

MA

r 1

Y   t q  F  TQ  F r 1

T r r

(1)

T

D

where tr (I×1) are score vectors, p(JK×1) and q(MN×1) are loading vectors, R is the number of score

TE

vectors. E(I×JK) and F(I×MN) are residual matrices. T = [t1, …, tR] is the score matrix, P = [p1, …,

CE P

pR] and Q = [q1, …, qR] are loading matrices. MPLS has the same properties as PLS. More detailed descriptions about MPLS can be found in ref [2].

AC

3. Multilinear PLS

3.1. A brief review of multilinear PLS methods

The conventional bilinear PLS cannot deal with multi-way data directly. The unfold-PLS is applicable for multi-way data by using the data unfolding operation, while it is not a true multilinear method because it loses the multi-way representation of data. Thus, some researchers have attempted to propose multilinear regression algorithms [10,13,17,18]. Among them, N-PLS [13] and HOPLS [10] are two truly multilinear regression models.

7

ACCEPTED MANUSCRIPT 3.1.1. N-way PLS (N-PLS)

T

Bro [13] introduced the N-PLS as an extension of the bilinear PLS to multi-way data. N-PLS

IP

decomposes the independent data X and dependent data Y into score vectors and weight vectors,

SC R

subject to maximum pairwise covariance between score vectors of X and Y. Specifically, for a univariate vector y(I×1) and an three-way data array X(I×J×K), X is decomposed into one score

NU

vector t(I×1) and two weight vectors w(J)(J×1) and w(K)(K×1), namely one vector per mode, as shown in Fig. 1. Thus, the decomposition of X is given by

MA

X = t w ( J ) w ( K )  E , i.e., xijk  ti w(j J ) wk( K )  eijk

D

where E contains the residuals.

(2)

TE

Figure 1. One component decomposition of a three-way array X(I×J×K).

CE P

N-PLS solves the following optimization problem to find w(J) and w(K) such that the covariance

AC

between t and y is maximized [13]

   I J K (J ) (K ) 2  max cov( t , y ) min x  t w w        ijk i j k w J ,w K  i 1 j 1 k 1   

(3)

s.t. w ( J )  w ( K )  1

Since, for given w(J) and w(K) with w ( J )  w ( K )  1 , the least squares solution of Eq. (2) is J

K

ti   xijk w(j J ) wk( K )

(4)

j 1 k 1

the optimization problem Eq. (3) can be rewritten as J K   (J ) (K ) max cov( t , y ) t  x w w    i ijk j k w J ,wK j 1 k 1  

s.t. w ( J )  w ( K )  1

8

(5)

ACCEPTED MANUSCRIPT I

According to cov(t , y)   yi ti , Eq. (5) is further expressed as i 1

J K  I   I J K  (J ) (K ) max y t t  x w w  max yi xijk w(j J ) wk( K )        i i i ijk j k J K J K w ,w j 1 k 1   i 1  w , w  i 1 j 1 k 1

(6)

T

 w

1

(K )

IP

s.t. w

(J )

I

SC R

Defining a matrix Z(J×K) with elements z jk   yi xijk , Eq. (6) simplifies to i 1

 J K  max z jk w(j J ) wk( K )    J K w ,w  j 1 k 1 

NU

(7)

s.t. w ( J )  w ( K )  1

MA

Thus, optimal w(J) and w(K) are first singular vectors of the singular value decomposition (SVD) of Z



 J K  T max z jk w(j J ) wk( K )   max w ( J )  Zw ( K )    J K J K w ,w  j 1 k 1  w ,w

  SVD( Z )

(8)

D

 w , w



(K )

TE

(J )

More components can be calculated from residuals by applying a deflation operation. Deflation

CE P

operations for X and y with respect to the part explained by previous score and weight vectors are X i 1  X i  ti wi( J ) wi( K )

(9)

AC

yi 1  yi  Ti bi

where Ti  [t1 ,

, ti ] is a score matrix that consists of all previous score vectors, bi  (TiT Ti )1TiT y1

are regression coefficients, and X1 and y1 denote the initial X and y respectively. Accordingly, replacing X in Eq. (2) with Xi+1 and y in Eq. (3) with yi+1, the (i+1)th components can be computed. Analogously, the solution for an N-way independent data X I1I2 

I N

is obtained by finding the

weight vectors w ( I2 ) , …, w ( I N ) . In higher-order cases, weight vectors cannot be calculated by an SVD as in Eq. (8), but are obtained by the one-component CP decomposition of a tensor

Z I2 

I N

I1

with elements zi2

iN

  xi1 i1 1

iN

yi1 , namely

9

ACCEPTED MANUSCRIPT  I2 max  w I 2 , w I N i 1 2

IN

z

iN 1

s.t. w ( I2 ) 

i2

iN

 wi(NI N )  

wi(2I2 )

(10)

 w(IN )  1

N w ( I N )T . It is also possible to extend

IP

T

The score vector is then computed by t  X 2 w ( I2 )T 3

JM

in a similar way. Likewise,

SC R

above N-PLS algorithm to an M-way dependent data Y J1J2 

weight vectors q ( J 2 ) , …, q ( J M ) of Y are found by performing one-component CP decomposition on a JM

, and then the score vector is u  Y 2 q( J2 )T 3

Finally, the N-PLS model for X I1I2 

I N

NU

multi-way array S = Y 1 t J2 

and Y J1J2 

R

MA

X =  t r wr( I2 ) r 1

Y =  ur q is

the

number

of

components,

TE

qr( J2 ) 

R

(11)

q

( JM ) r

and

 FR r :

wr( I2 ) 

 qr( J M )  1 . The relation between the score matrix U = [u1 ,

matrix T = [t1 ,

CE P

where

D

r 1

with I1 = J1 is expressed as

wr( I N )  E R

R

( J2 ) r

JM

M q( J M )T .

, t R ] I1R is U = TB , where B = [b1 ,

 wr( I N )  1

and

, uR ] J1R and the score

, bR ] RR is a regression coefficient

AC

matrix. Note that B is an upper triangular matrix, and its rth column br is 1  b  br   r  with br  TrT Tr  TrT ur 0 R  r 

where 0R-r is a zero vector with the size of R-r, and Tr  [t1 ,

(12)

, tr ] I1r is the score matrix

containing previous r score vectors. More details about N-PLS can be found in refs [13] and [19].

3.1.2. Higher-order PLS based on Tucker decomposition (HOPLS-Tucker)

Zhao et al. [10] proposed HOPLS-Tucker as a generalized multilinear regression model for multi-way data. HOPLS-Tucker projects two tensors into a low-dimensional latent space using the block Tucker decomposition [20], and then performs regression on latent variables. Given an N-way 10

ACCEPTED MANUSCRIPT X I1I2 

(N ≥ 3) independent data

Y J1J2 

JM

I N

and an M-way (M ≥ 3) dependent data

with the same size on the first mode, i.e., I1 = J1, HOPLS-Tucker decomposes them as

IP

T

a sum of Tucker blocks [10] R

X   Gr 1 t r 2 Pr(1) 3 r 1 R

Y   Dr 1 t r 2 Q 3 (1) r

r 1

SC R

N Pr( N 1)  E R ( M 1) r

M Q

(13)

 FR

( m ) M 1 r m 1

Gr 1L2 

( n ) N 1 r n 1

In1Ln1 and

 LN

and Dr 1K2 

Ln n2  N

and

are core tensors, and ER I1I2 

KM

Km m2  M

I N

and FR J1J2 

JM

are

are tuning parameters for controlling the model

D

residuals.

P 

J m1Km1 are rth loading matrices on mode-n of X and mode-m of Y respectively,

MA

Q 

NU

where R is the number of latent vectors, t r I1 is the rth latent vectors,

TE

complexity [10]. The Tucker decomposition lacks of uniqueness due to permutation, scaling, and

CE P

rotation indeterminacies [14]. To improve the uniqueness property of HOPLS-Tucker, Zhao et al. [10] imposed three constraints: (1) core tensors Gr and Dr are all-orthogonal, (2) loading matrices Pr( n )

AC

and Qr( m ) are column-wise orthonormal, i.e., Pr( n )T Pr( n )  I , Qr( m)T Qr( m)  I , and (3) latent vectors tr are of length one, i.e., t r  1 . Defining a latent matrix T = [t1 , loading matrices P ( n ) = [ P1( n ) ,

, t R ] I1R , two sequences of

, PR( n ) ] In1RLn1 and Q( m) = [Q1( m) ,

core tensors G  blockdiag(G1 ,

, GR ) RRL2 

 RLN

, QR( m) ] Jm1RKm1 , and two

and D  blockdiag( D1 ,

, DR ) RRK2 

 RKM

,

Eq. (13) is rewritten as [10] X  G 1 T 2 P (1) 3

N P ( N 1)  E R

Y  D 1 T 2 Q (1) 3

M Q ( M 1)  FR

(14)

The schematic diagram of the HOPLS-Tucker model for two three-order tensors is shown in Fig. 2. Figure 2. Schematic diagram of a HOPLS-Tucker model for three-way arrays X and Y. 11

ACCEPTED MANUSCRIPT As shown in Eq. (13) and Eq. (14), HOPLS-Tucker aims to finding common latent vectors that can approximate X and Y simultaneously, which can be expressed as a optimization problem {t r ,Gr ,Dr ,Pr

E R  FR 2

,Qr( m ) }

2

T

min( n )

(15)

IP

s.t. t r  1, Pr( n )T Pr( n )  I , Qr( m )T Qr( m )  I

SC R

Because each pair of tr Gr, Dr, Pr( n ) and Qr( m ) can be obtained sequentially with the same criteria by the deflation operation, the problem of HOPLS-Tucker is simplified to finding the first latent vector t

NU

and two sequences of loading matrices P(n) and Q(m) [10]. Thus, removing the indexes r and R in Eq.

MA

(15) and applying the propositions introduced by Zhao et al. [10], the objective function of Eq. (15) is equivalently expressed as [10] ,Q( m ) }

2

 max G  D  max (n) ( m) ( n) ( m) 2

{t,P

,Q

}

2

2

{t,P

,Q

G, D }

{1,1}

(16)

D

{t,G,D,P

E  F 2

min (n)

TE

with

N P ( N 1)T

D  Y 1 t T 2 Q (1)T 3

M Q ( M 1)T

CE P

G  X 1 t T 2 P (1)T 3

2

According to Eq. (17) and t rT t r  1 ,

G, D

 X ,Y 

AC

(17)

{1,1}

; P (1)T , {1,1}

is equivalent to [10]

, P ( N 1)T , Q (1)T ,

Q ( M 1)T  

2

Q ( M 1)T  

2

(18)

Thus, the final optimization problem of HOPLS-Tucker is [10] max  X , Y { P ( n ) ,Q ( m ) }  s.t. P

( n )T

P

(n)

; P (1)T , {1,1}

 I,Q

( m )T

Q

( m)

, P ( N 1)T , Q (1)T ,

(19)

I

which can be solved by the higher order orthogonal iteration (HOOI) algorithm

[14,20]. After

obtained loadings P ( n ) , the optimal latent vector t is determined by minimizing the squared norm of the residual

2

E , namely t = arg min X  G; t , P (1) , 12

, P ( N 1) 

2

(20)

ACCEPTED MANUSCRIPT Eq. (20) is solved by choosing t as the first leading left singular vector of the following SVD [10] SVD  X 2 P (1)T 3 

N P ( N 1)T   (1) 

(21)

IP

T

The core tensors G and D are calculated by Eq. (17). More components can be obtained by repeating

SC R

above procedures with the following deflation operation

X r 1  X r  Gr ; t r , Pr(1) , , Pr( N 1)  Yr 1  Yr   Dr ; t r , Qr(1) , , Qr( M 1) 

I N

is an Nth-order tensor and Y I1J is a matrix, the

MA

Particularly, when X I1I2 

NU

where X1 and Y1 denote the initial X and Y.

HOPLS-Tucker model simplifies as R

D

X   Gr 1 t r 2 Pr(1) 3 r 1 R

(22)

N Pr( N 1)  E R (23)

TE

Y   d r t r qr  FR r 1

CE P

where dr is a scalar regression coefficient, and qr  1 . In this case, the optimization problem of HOPLS-Tucker is formulated as [10]

AC

( X 1 Y T ); qT , P (1)T , max (n) { P , q} s.t. P

( n )T

P

(n)

, P ( N 1)T 

2

(24)

 I,q q 1 T

which is solved by performing HOOI on the cross-covariance tensor C = X 1 Y T J I2  Subsequently, the core tensor G (C ) 1L2  vector t that minimizes E

2

 LN

I N

.

corresponding to C is also obtained. Then, the latent

is [10]

t =  X 2 P (1)T 3

(C ) N P ( N 1)T  G(1) (1)

(25)

t=t t After obtained the latent vector t and loading vector q, the regression coefficient d is calculated by

d  t T Yq . More components can be extracted via the deflation operation. 13

ACCEPTED MANUSCRIPT 3.2. Two new variants of HOPLS-Tucker

T

In this paper, by imposing extra restrictions on the HOPLS-Tucker model, two new important

IP

variants of HOPLS-Tucker are derived, termed as HOPLS based on relaxed Tucker decomposition

SC R

(HOPLS-RTucker) and HOPLS based on CP decomposition (HOPLS-CP).

HOPLS-Tucker compresses all modes of multi-way data to a few components. However, for some

NU

multi-way data, compressing all modes may be not necessary, such as the time mode in batch data.

MA

This is the main reason for developing the HOPLS-RTucker model. HOPLS-RTucker restricts the loading matrices on some specific modes (called free mode) of the multi-way data to be the identity

D

matrix, and thus the information on these free modes remains unchanged. For instance, a

TE

HOPLS-RTucker model is R

r 1 R

CE P

X   Gr 1 t r 2 Pr(1) 3   Gr ; t r , Pr(1) , r 1 R

, Pr( n 1) , I , Pr( n 1) ,

Y   Dr 1 t r 2 Q 3

AC

r 1 R

   Dr ; t r , Qr(1) , r 1

n 1 Pr( n 1) n I n 1 Pr( n 1) n  2

(1) r

 N Pr( N 1)  E R

, Pr( N 1)  +E R (26)

( m 1) r

m 1 Q

( m 1) r

m I m 1 Q

, Qr( m 1) , I , Qr( m 1) ,

m  2

( M 1) r

M Q

 FR

, Qr( M 1)  +FR

which is similar to Eq. (13) except that Ln = In and Km = Jm for Gr 1L2 

 LN

and Dr 1K2 

KM

,

as well as Pr( n )  I and Qr( m)  I . The schematic diagram of a HOPLS-RTucker model for two three-way arrays is shown in Fig. 3. Similar to HOPLS-Tucker, the HOPLS-RTucker model can be computed using the HOOI algorithm, while removing those steps related to loading matrices on the free modes. HOPLS-RTucker thus has higher computational efficiency than HOPLS-Tucker. Figure 3. Schematic diagram of a HOPLS-RTucker model for three-way arrays X and Y.

14

ACCEPTED MANUSCRIPT If restricting n : Ln  1 and m : Km  1 in Eq. (13), the block Tucker decomposition used in HOPLS-Tucker simplifies to the CP decomposition, resulting in the HOPLS-CP model R

X =  X ,r t r pr(1) Y =  Y ,r t r q

(1) r

where

t rT t r  1 ,

pr( n )T pr( n )  1

J X  diag (X ,r ) RR

R

q

 FR   J Y ; T , Q ,

and

(1)

qr( m)T qr( m)  1

and JY  diag (Y ,r ) RR

,Q

( M 1)

for

(27)

  FR

r

,

n

and

m

are two diagonal tensors. T  [t1 ,

, pR( n) ] and Q( m)  [q1( m) ,

.

, tR ]

, qR( m) ] are mode-n and mode-m

MA

is the latent matrix. P ( n)  [ p1( n) ,

R

NU

r 1

( M 1) r

IP

R

SC R

r 1

, P ( N 1)   E R

T

pr( N 1)  E R   J X ; T , P (1) ,

loading matrices of X and Y, respectively. Fig. 4 illustrates the HOPLS-CP model of two three-way

D

arrays. The HOPLS-CP model can be computed in the similar way as HOPLS-Tucker using the

TE

HOOI or alternating least squares (ALS) algorithm [14,21,22]. Note that the HOPLS-CP model has

CE P

the similar form as the N-PLS model, as shown in Eq. (11), if merging λX,r and λY,r together with tr as two new vectors tˆr  X ,r tr and uˆ r  Y ,r tr . Therefore, N-PLS can be regarded as a special case of

AC

HOPLS-Tucker. The main difference between HOPLS-CP and N-PLS lies in their solving algorithms. N-PLS combines the CP decomposition with a nonlinear iterative partial least squares (NIPALS)-like algorithm to finding each pair of latent vectors and loading vectors [10,13]. However, instead of using the iterative method, HOPLS-CP directly finds latent and loading vectors by performing CP decomposition on the cross-covariance of tensors X and Y, improving the computational efficiency. Benefiting from the better fitness ability of Tucker decomposition over CP decomposition [23], HOPLS-Tucker may have higher modeling accuracy than N-PLS and HOPLS-CP. However, HOPLS-CP and N-PLS are much more concise than HOPLS-Tucker by simplifying loading matrices to be loading vectors. They also have better uniqueness property than HOPLS-Tucker, owing to the 15

ACCEPTED MANUSCRIPT uniqueness of CP decomposition [14,23]. These advantages make HOPLS-CP and N-PLS more efficient in computation and easier to be interpreted.

IP

T

Figure 4. Schematic diagram of a HOPLS-CP model for three-way arrays X and Y.

SC R

3.3. Prediction with multilinear PLS

The above multilinear PLS models can be applied to predict new responses Ynew from new

NU

observations Xnew. The prediction procedure consists of two steps: extracting the new latent vector

MA

Tnew from Xnew, and then predicting Ynew with Tnew. For the HOPLS-Tucker model, Ynew is predicted from Xnew by [10]

new ˆ T new  X (1) P

D

(28)

, pˆ R ] I2

I N R

and Qˆ = [qˆ1 ,

, qˆR ] J2

J M R

pˆ r   Pr( N 1) 

 Pr(1)  Gr(1)

qˆr   Qr( M 1) 

 Qr(1)  DrT(1)

CE P

where Pˆ = [ pˆ1 ,

TE

new ˆ ˆ T Yˆ(1)new  T newQˆ T  X (1) PQ

consist of R columns as follows (29)

AC

If Y is a two-way matrix, the prediction is performed by new ˆ T new  X (1) P new ˆ Yˆ(1)new  T new DQT  X (1) PDQT

where D = diag(dr ) RR is a diagonal matrix, Q = [q1 ,

(30)

, qR ] J2 R , and dr and qr are defined in

Eq. (23). For the HOPLS-RTucker model, the prediction is also performed using Eq. (28) and Eq. (30) by replacing loading matrices on the free modes of X and Y with the identity matrix. In the HOPLS-CP model, Y is predicted by new  1 T new  X (1) P X new  1 Yˆ(1)new  T new  Y Q  X (1) P  X Y Q

16

(31)

ACCEPTED MANUSCRIPT  X  diag(X ,r ) RR IN

and Q R J 2

JM

are

two

diagonal

matrices,

are P  ( P ( N 1) 

 P (1) )T

Q  (Q ( M 1) 

 Q (1) )T

T

P R I2

Y  diag(Y ,r ) RR

and

(32)

IP

where

SC R

and λX,r, λX,r, P(n) and Q(m) are defined in Eq. (27). For the N-PLS model, Y is predicted by new ˆ T  X (1) W

(33)

NU

new ˆ Yˆ(1)new  TBQˆ T  X (1) WBQˆ T

where B is the regression coefficient matrix defined in Eq. (12), Wˆ = [wˆ 1 ,

, qˆR ] J2

J M R

I N R

and

are weight matrices, and the rth columns of Wˆ and Qˆ are given by

MA

Qˆ = [qˆ1 ,

, wˆ R ] I2

wˆ r  wr( I N )  wr( I N 1 ) 

 qr( J 2 )

(34)

TE

D

qˆr  qr( J M )  qr( J M 1 ) 

 wr( I2 )

CE P

4. Quality prediction and quality-relevant monitoring for batch processes 4.1. Preprocessing of batch data

AC

Suppose all batches in a batch process have the same durations. Let {X(I×J×K), Y(I×M×K)} be the training dataset, namely reference batches, where X contains independent process variables, Y consists of quality variables, I denotes the number of batches, J and M are numbers of variables, and K is the sampling number. For better modeling, raw batch data should be mean centered or scaled to eliminate the effect of different variable units and highlight differences among batches. However, it lacks of effective preprocessing methods that can directly cope with the three-way data structure of batch dataset. Thus, conventional preprocessing methods for two-way data are usually combined with a data unfolding procedure to preprocess the three-way batch dataset. This is carried out by unfolding X(I×J×K) and Y(I×M×K) into matrices X (I×JK) and Y(I×MK) firstly, and then mean 17

ACCEPTED MANUSCRIPT centering or scaling column vectors of X(I×JK) and Y(I×MK). After that, matrices X(I×JK) and Y(I×MK) are refolded back to three-way data arrays for modeling. Therefore, as illustrated in Fig. 5,

IP

T

the data preprocessing procedure includes three steps: (1) unfolding X(I×J×K) and Y(I×M×K) into

SC R

matrices X(I×JK) and Y(I×MK), (2) mean centering or scaling X(I×JK) and Y(I×MK) to have zero mean or unit variance, (3) refolding X(I×JK) and Y(I×MK) back to the initial three-way form of

NU

X(I×J×K) and Y(I×M×K). Note that, different from MPLS, here data unfolding is just used for data preprocessing rather than data modeling. After mean centering or scaling, the unfolded data matrix is

MA

refolded back to a three-way data array. Therefore, the data preprocessing procedure has less effect on original three-way correlations in the batch dataset.

CE P

4.2. Online quality prediction

TE

D

Figure 5. Preprocessing procedure for batch data.

To follow dynamic and nonlinear changes of the batch process, multilinear PLS is combined with

AC

the moving data window technique for online quality prediction. Let the moving window have a size of d, the training dataset in the window till sampling time k is denoted by {Xk(I×J×d), Yk(I×M×d)}, where the window data of ith reference batch is

 x1,i k d 1  i x X ki ( J  d )   1,k d 2   xi  1,k

xi2,k d 1 xi2,k d 2 xi2,k

 y1,i k d 1 xiJ ,k d 1    i xiJ ,k d 2   y1,k d 2 i  , Yk ( M  d )      yi xiJ ,k   1,k

y i2,k d 1 y i2,k d 2 y i2,k

y iM ,k d 1   y iM ,k d 2    y iM ,k 

A window mode multilinear PLS model is built from mean centered {Xk(I×J×d), Yk(I×M×d)}. As the window slides along the sampling time, the window mode multilinear PLS model will be updated periodically. 18

ACCEPTED MANUSCRIPT For a new batch, its window data X knew ( J  d ) till the sampling time k is

 x1,new k d 1  new x X knew ( J  d )   1,k d 2   x new  1,k

 x new J ,k d 1  x new  J ,k  d  2   new  x J ,k 

x 2,new k d 1

IP

T

x 2,new k d 2

SC R

x 2,new k

After mean centering X knew ( J  d ) , quality variables Yknew ( M  d ) at the sampling time k are predicted

NU

by the window mode multilinear PLS model. Note that, as the moving window slides from time k to time k+d-1, the prediction value at one sampling time k* (k ≤k*≤k+d-1) for each quality variable in

MA

Y new will be computed d times in adjacent windows, yielding yˆ new* | k , …, yˆ new* | (k  d  1) , m = m ,k

m ,k

1, …, M. The mean value of these predictions is thus calculated in sequence and set as the final

D

prediction of yˆ new* , namely

TE

m ,k

at t  k :

yˆ new*  yˆ new* | k m ,k

CE P

at t  k  1:



AC

at t  k  d  1: yˆ

new m ,k *

new m ,k *

m ,k







new m ,k *

| k  yˆ new* | (k  1) m ,k

2 yˆ new* | k  m ,k

(35)

 yˆ new* | (k  d  1) m ,k

d

This operation also has the advantage to reduce the noise influence.

4.3. Online quality-relevant monitoring

4.3.1. Fault detection and diagnosis Based on a multilinear PLS model, monitoring statistics named T2 statistic and Q statistic are constructed for fault detection. The T2 statistic measures the deviation in the latent (or score) subspace, while the Q statistic measures the deviation in the residual subspace. The T2 statistic for a 19

ACCEPTED MANUSCRIPT batch {Xs(J×K), Ys(M×K)} is defined as

Ts2  t s S 1t sT

(36)

IP

T

where ts is the projection of {Xs(J×K), Ys(M×K)} in the latent space, S denotes the covariance matrix

SC R

of the latent matrix T obtained from the training dataset, i.e., S  T T T ( I  1) . Two Q statistics are defined for a batch {Xs(J×K), Ys(M×K)} in the residual spaces of X and Y respectively, which are

Qs ,Y  Fs

  j ,k es2, jk

2

2

  m,k f s2,mk

(37)

NU

Qs , X  Es

MA

where Es and Fs are residuals of Xs(J×K) and Ys(M×K). Note that the QY statistic is only applicable for those batch processes whose quality variables can be measured online.

D

The control limit of T2 statistic at significance level α is [24]

TE

T2 ~

R( I 2  1) F ( R, I  R) I ( I  R)

(38)

CE P

where R is the number of components in the multilinear PLS model, I is the number of reference batches in training dataset, and F ( R, I  R) is an F-distribution with R and I-R degrees of freedom.

AC

The control limit of Q statistic is computed by kernel density estimation (KDE) [25]. Regarding Q statistics (QX or QY) of all reference batches as a univariate sample Q1 , Q2 ,

, QI   , its

distribution density is estimated by [25]

1 fˆ (Q, )  I

I

 Q  Qi   

 K  i 1

(39)

where Q denotes a data point under consideration, θ is a smoothing parameter, and K(∙) is a kernel function. The control limit Qα of Q statistic at significance level α is then estimated with the inverse cumulative distribution function of fˆ (Q, ) . For a new batch, faults are detected if any monitoring statistic exceeds its control limit. After 20

ACCEPTED MANUSCRIPT detected a fault, the contribution plot [26] is applied for fault diagnosis. The contribution plot shows the contribution of each variable to monitoring statistics. The largest contributor is probably the fault

IP

T

variable. According to the definition of Q statistic in Eq. (37), the contribution of jth process variable

csQ, Xjk  es2, jk

SC R

at sampling time k to the QX statistic is

(40)

NU

Similarly, the contribution of mth quality variable at sampling time k to the QY statistic is

csQ,Ymk  f s2,mk

(41)

MA

If using HOPLS-Tucker or HOPLS-RTucker, the definition of T2 statistic in Eq. (36) can be rewritten as

D

Ts2  t s S 1t sT  t s S 1 ( xs Pˆ )T

TE

 t s S 1



x pˆ j , k s , jk jk



T

(42)

CE P

  j ,k xs , jk t s S 1 pˆ Tjk

where xs(1×JK) is the vectoring of Xs(J×K), Pˆ JK R is defined in Eq. (28), and pˆ jk (1 R) is the

AC

jk-th row vector of Pˆ . Thus, the contribution of jth process variable at sampling time k to the T2 statistic is defined as

csT, jk = xs , jk ts S 1 pˆ Tjk

(43)

Similarly, for HOPLS-CP and N-PLS, the contribution of jth process variable at sampling time k to the T2 statistic is defined as HOPLS-CP: csT, jk = xs , jk ts S 1 pTjk N-PLS: csT, jk = xs , jk t s S 1wˆ Tjk

(44) (45)

In Eq. (44), p jk (1 R) is the jk-th row vector of the matrix P =P  JK R , and P is defined in Eq. (32). In Eq. (45), wˆ jk (1 R) is the jk-th row vector of the matrix Wˆ JK R defined in Eq. (33). 21

ACCEPTED MANUSCRIPT 4.3.2. Complete monitoring procedure

T

The moving data window technique is also applied for online monitoring to track dynamic and

IP

nonlinear changes of the batch process. A window mode multilinear PLS model is built from training

SC R

data, and then control limits of monitoring statistics are calculated. For a new batch, its window data { X knew ( J  d ) , Yknew ( M  d ) } at the sampling time k are projected on the window mode multilinear

NU

PLS model to compute the latent (or score) vector and residuals, which are further used to calculate

MA

the T2 and Q statistics. If any monitoring statistic exceeds its control limit, a fault is detected and the fault variable is identified by contribution plot.

D

The complete quality-relevant monitoring procedure for a batch process includes two phases:

TE

Phase I: Build offline window mode multilinear PLS models

CE P

Step 1: Select reference batches and construct the training dataset. Step 2: Choose the size d of the moving data window.

AC

Step 3: For all sampling time k≥d, preprocess window data of all reference batches to zero-mean and unit-variance. Step 4: Choose a multilinear PLS method, and build window mode multilinear PLS models for all sampling time k≥d. Step 5: Calculate control limits of QX, QY and T2 statistics for all sampling time k≥d. Phase II: Online monitoring Step 1: For a new batch, preprocess its window data { X knew ( J  d ) , Yknew ( M  d ) } at the sampling time k≥d. Step 2: Project window data on the window mode multilinear PLS model. 22

ACCEPTED MANUSCRIPT Step 3: Compute QX, QY and T2 statistics of the new batch. Step 4: Check whether QX, QY and T2 statistics exceed their control limits. If yes, go to step 6, or

IP

T

go to step 5.

SC R

Step 5: Return back to step 1 and monitor the next sampling time k + 1. Step 6: Diagnose the fault variable via contribution plots.

NU

5. Case study

MA

The multilinear PLS-based quality prediction and quality-relevant monitoring methods are tested in a benchmark fed-batch penicillin fermentation process. The simulator PenSim v2.0 developed by

D

Birol et al. [27] was used to simulate this batch process. 80 normal batches were generated by

TE

running the simulator under normal operation conditions with small variations. The duration of each

CE P

batch was 200 h. As listed in Table 1, fourteen process variables were measured online with a sampling interval of 1 h. The penicillin concentration and the biomass concentration were chosen as

AC

quality variables, and other twelve variables were classified into input variables. Similar to refs [28] and [29], six fault batches were generated considering various fault variables, directions and magnitudes, as listed in Table 2. Each fault was introduced into the batch at 100 h and lasted until the end of the batch.

5.1. Case I: Quality prediction

A total of 60 normal batches are chosen to constitute the training dataset, denoted by {X(60×12×200), Y(60×2×200)}. The remaining 20 normal batches serve as the test dataset, denoted by {Xnew(20×12×200), Ynew(20×2×200)}. The root mean square errors of prediction (RMSEP) and an 23

ACCEPTED MANUSCRIPT R index are used to quantify the prediction performance of different models. The RMSEP is

Y  Yˆ

2

n , where Yˆ denotes the prediction of Y, and n is the number Y . A smaller RMSEP and a larger R

IP

of elements in Y. The R index is defined as R  1  Y  Yˆ

T

calculated by RMSEP 

SC R

indicate the better prediction performance.

Four window mode (WM) multilinear PLS models, termed as WMHOPLS-Tucker,

NU

WMHOPLS-RTucker, WMHOPLS-CP and WMN-PLS, are built separately to make a comparison between them. In the WMHOPLS-RTucker model, the time mode of data is set as the free mode.

MA

These four multilinear PLS models are also compared with the window mode MPLS (WMMPLS) model that is built by combining MPLS with the moving data window technique. For all models, the

TE

D

size of the moving data window is chosen as 10. The numbers of latent (or score) vectors in different models are chosen by 5-fold cross-validation [30]. To investigate the effect of noise on quality

CE P

prediction, white Gaussian noise with varying signal noise ratios (SNRs) of 20 dB, 10 dB and 5 dB are added into both training data and test data.

AC

Table 3 compares the prediction performance of five different models under different noise levels, where the case without noise is denoted by SNR = Inf dB. Note that all the R indexes in Table 3 are larger than 0.97, because Y  Yˆ

is much smaller than Y , implying the good prediction results in

all cases. Fig. 6 shows prediction results of five models for a test batch without noise. When without noise, WMMPLS and WMHOPLS-Tucker have the smallest RMSEP and the largest R index, i.e., the best prediction performance, closely followed by WMHOPLS-RTucker and then the WMN-PLS. WMHOPLS-CP has relatively low prediction accuracy as its RMSEP is larger than other four models. However, as the noise level increases, the RMSEP of WMMPLS increases faster than those of four 24

ACCEPTED MANUSCRIPT multilinear models. As a result, WMMPLS has the worst prediction performance when SNR ≤ 10 dB. This means that multilinear models are more robust to noise, probably because information across all

IP

T

modes of data is used in multilinear models [13]. WMHOPLS-RTucker provides the best prediction

SC R

results under all noise conditions. Although WMHOPLS-CP has the worst prediction performance at SNR = Inf dB and 20 dB, its performance is comparable to WMHOPLS-Tucker,

NU

WMHOPLS-RTucker and WMN-PLS with the noise level increasing to 10 dB and 5 dB. The superiority of multilinear models is significant for SNRs = 10 dB and 5 dB. As shown in Table 3, the

MA

RMSEP of WMHOPLS-RTucker decreases by about 20.4% as compared to the RMSEP of WMMPLS when SNR = 10 dB. For SNR = 5 dB, in comparison with WMMPLS, RMSEP values of

TE

D

WMHOPLS-Tucker, WMHOPLS-RTucker, WMHOPLS-CP and WMN-PLS decrease by about 16.9%, 21.4%, 18.7% and 18.4%, respectively. Prediction results of five models for a test batch at

CE P

the noise level of 5 dB are shown in Fig. 7. As shown in Fig. 7a, multilinear models have better prediction performance than WMMPLS at a high noise level, because their predictions are closer to

AC

simulation data. Note that WMHOPLS-Tucker and WMHOPLS-RTucker have lower efficiencies than WMN-PLS and WMHOPLS-CP, because it is time-consuming to select appropriate model parameters for them. By contrast, because WMN-PLS and WMHOPLS-CP have higher computational efficiency, lower RMSEP and better noise robustness, they may be more applicable to real industrial processes. Figure 6. Quality prediction results of five models for a test batch without noise. (a) Penicillin concentration, (b) Biomass concentration. Figure 7. Quality prediction results of five models for a test batch at the noise level of 5 dB. (a) Penicillin concentration, (b) Biomass concentration.

25

ACCEPTED MANUSCRIPT 5.2. Case II: Quality-relevant monitoring

T

In this case study, four training datasets containing 5, 10, 20 and 50 reference batches are

IP

constructed. The six fault batches in Table 2 constitute the test dataset. Five window mode

SC R

monitoring models, namely WMHOPLS-Tucker, WMHOPLS-RTucker, WMHOPLS-CP, WMN-PLS and WMMPLS, are built separately for comparison. The size of the moving window is set as 10.

NU

Fault detection rate (FDR) and false alarm rate (FAR) are used to quantify the monitoring

MA

performance. FDR is defined as the percentage of samples outside the control limit in fault conditions. FAR is the percentage of samples outside the control limit in non-fault conditions. A

D

larger FDR and a smaller FAR indicate the better fault detection performance.

TE

Fault detection results of five models are compared in Table 4, where average FDRs and FARs of

CE P

six fault batches are listed for each model. Fig. 8 shows monitoring charts of five models for the training dataset with 10 reference batches. Clearly, all monitoring models provide well fault

AC

detection results when the training dataset contains more than 20 reference batches. However, if the number of reference batches in the training dataset decreases to 10 and 5, QX and QY statistics of WMMPLS have larger FARs (≥ 10%), implying the poor monitoring performance. For the training dataset with 10 reference batches, four multilinear models all have well monitoring performance, except that WMN-PLS has a small FAR (≈ 4%) in the QY statistic (see Fig. 8). However, the QX and QY statistics of WMMPLS have larger FARs (see Fig. 8). When the training dataset only has 5 reference batches, similar to WMMPLS, the T2 statistics of four multilinear models almost fail to detect faults (FDR < 4%), and the QY statistics have large FARs (> 25%). However, different from WMMPLS, the QX statistics of multilinear models still successfully detect all process faults with 26

ACCEPTED MANUSCRIPT high FDRs (> 97%) and without false alarms. These results indicate that multilinear models are better than WMMPLS when the training dataset has fewer reference batches.

IP

T

Figure 8. Monitoring charts of five models for the training dataset with 10 reference batches. (a) WMHOPLS-Tucker, (b) WMHOPLS-RTucker, (c) WMHOPLS-CP, (d) WMN-PLS, (e) WMMPLS.

SC R

To further investigate the effect of noise on fault detection, white Gaussian noise with SNRs of 20 dB, 10 dB and 5 dB are added into the training dataset containing 20 reference batches as well as the

NU

test dataset. Table 5 lists monitoring results of five models at different noise levels, where Inf dB

MA

indicates the case without noise. Obviously, the noise degrades the monitoring performance of all models. A higher noise level results in the worse monitoring performance. The noise has the largest

D

influence on the T2 statistic, then the QY statistic and last the QX statistic. In the noise conditions, T2

TE

statistics of five models all have very small FDRs, and QY statistics show sharp decreases in FDRs

CE P

and large increases in FARs. However, except at a high noise level of 5 dB, QX statistics of four multilinear models have smaller decreases in FDRs and slightly increases in FARs, keeping good

AC

monitoring performance. Four multilinear models almost have similar monitoring performance. Compared with multilinear models, WMMPLS is more sensitive to noise, because its T2 statistic has a smaller FDR and its two Q statistics have larger FARs than those of multilinear models under the same conditions. As shown in Table 5, WMMPLS is actually not applicable for process monitoring when SNR ≤ 20 dB, due to its very small FDRs (< 6%) in T2 statistic and larger FARs (≥ 7%) in two Q statistics. Therefore, multilinear models are better than WMMPLS in dealing with noise data. After a fault was successfully detected by monitoring statistics, the fault process variable is diagnosed using the contribution plot. In the case of using the training dataset with 20 reference batches and without noise, the contribution plot of WMHOPLS-Tucker for the fault batch No. 5 is 27

ACCEPTED MANUSCRIPT illustrated in Fig. 9. Clearly, variable 1, i.e., aeration rate, is the largest contributor to all monitoring statistics at most sampling time in the fault condition, as marked in the red color in Fig. 9. Therefore,

IP

T

it is reasonable to identify the aeration rate as the fault variable, which is consistent with the preset

SC R

fault variable in Table 2. Contribution plots of WMHOPLS-RTucker, WMHOPLS-CP, WMN-PLS, and WMMPLS can be found in Supplementary data, and they also indicate that the aeration rate is the fault variable for fault batch No. 5.

NU

Figure 9. Contribution plots of WMHOPLS-Tucker for the fault batch No. 5.

MA

6. Conclusion

D

The multilinear PLS method was applied for quality prediction and quality-relevant monitoring in

TE

batch processes. Four multilinear PLS models, including HOPLS-Tucker, HOPLS-RTucker,

CE P

HOPLS-CP and N-PLS, were investigated. Especially, HOPLS-RTucker and HOPLS-CP were derived by imposing extra constraints on HOPLS-Tucker, and thus they can be regarded as two special cases of HOPLS-Tucker. HOPLS-RTucker restricts the loading matrices on some specific

AC

modes (called free mode) of multi-way data to be the identity matrix to keep information on these free modes unchanged. HOPLS-CP replaces the Tucker decomposition used in HOPLS-Tucker with the CP decomposition to get a simplified model. HOPLS-RTucker and HOPLS-CP both have higher computational efficiencies than HOPLS-Tucker. Moreover, HOPLS-CP model has the similar form as the N-PLS model, while their solving algorithms are totally different. HOPLS-CP may have a higher computational efficiency than N-PLS, while N-PLS may fit data better than HOPLS-CP. The quality prediction and quality-relevant monitoring methods were then developed by combining multilinear PLS with the moving data window technique. These methods were tested in a 28

ACCEPTED MANUSCRIPT fed-batch penicillin fermentation process. The results indicate that the multilinear PLS method outperform the unfold-PLS (i.e., MPLS) method in two aspects. One is that multilinear PLS has a

IP

T

better anti-noise capability than unfold-PLS. The other is that multilinear PLS has better monitoring

SC R

performance than unfold-PLS for datasets with fewer reference batches. Four multilinear PLS methods have little differences in the prediction and monitoring performance. By contrast, N-PLS

NU

and HOPLS-CP are more practical because of their simplicity and high computational efficiency.

MA

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 61304116),

AC

References

CE P

TE

D

the Zhejiang Provincial Natural Science Foundation of China (No. LQ13B060004).

[1] P. Nomikos, J.F. MacGregor, Monitoring batch processes using multiway principal component analysis, AIChE J. 40 (1994) 1361–1375. [2] P. Nomikos, J.F. MacGregor, Multi-way partial least squares in monitoring batch processes, Chemom. Intell. Lab. Syst. 30 (1995) 97–108. [3] C.K. Yoo, J.M. Lee, P.A. Vanrolleghem, I.B. Lee, On-line monitoring of batch processes using multiway independent component analysis, Chemom. Intell. Lab. Syst. 71 (2004) 151–163. [4] X. Meng, A.J. Morris, E.B. Martin, On-line monitoring of batch processes using a PARAFAC representation, J. Chemom. 17 (2003) 65–81. 29

ACCEPTED MANUSCRIPT [5] D.J. Louwerse, A.K. Smilde, Multivariate statistical process control of batch processes based on three-way models, Chem. Eng. Sci. 55 (1999) 1225–1235.

SC R

component analysis, Comput. Chem. Eng. 28 (2004) 1837–1847.

IP

T

[6] J.M. Lee, C. Yoo, I.B. Lee, Fault detection of batch processes using multiway kernel principal

[7] N. Lu, F. Gao, Stage-based process analysis and quality prediction for batch processes, Ind. Eng.

NU

Chem. Res. 44 (2005) 3547–3555.

[8] C. Zhao, F. Wang, Z. Mao, N. Lu, M. Jia, Quality prediction based on phase-specific average

MA

trajectory for batch processes, AIChE J. 54 (2008) 693−705.

[9] J. Yu, Multiway gaussian mixture model based adaptive kernel partial least squares regression

TE

D

method for soft sensor estimation and reliable quality prediction of nonlinear multiphase batch processes, Ind. Eng. Chem. Res. 51 (2012) 13227−13237.

CE P

[10] Q. Zhao, C.F. Caiafa, D.P. Mandic, Z.C. Chao, Y. Nagasaka, N. Fujii, L. Zhang, A. Cichocki, Higher-order partial least squares (HOPLS): A generalized multi-linear regression method,

IEEE

AC

Trans. Pattern Anal. 35 (2013) 1660–1673. [11] L.J. Luo, S.Y. Bao, Z.L. Gao, J.Q. Yuan, Batch process monitoring with GTucker2 model, Ind. Eng. Chem. Res. 53 (2014) 15101−15110. [12] H.A.L. Kiers, I. Van Mechelen, Three-way component analysis: Principles and illustrative application, Psych. Methods. 6 (2001) 84–110. [13] R. Bro, Multi-way calibration. Multi-linear PLS, J. Chemom. 10 (1996) 47–62. [14] T.G. Kolda, B.W. Bader, Tensor decompositions and applications, SIAM Rev. 51 (2009) 455–500. 30

ACCEPTED MANUSCRIPT [15] T.G. Kolda, Multilinear operators for higher-order decompositions, Tech. Report. SAND2006-2081, Sandia National Laboratories, Albuquerque, New Mexico, Livermore, California,

IP

T

2006.

SC R

[16] P. Geladi, B.R. Kowalski, Partial least-squares regression: A tutorial, Anal. Chim. Acta 185 (1986) 1–17.

NU

[17] S. Wold, P. Geladi, K. Esbensen, J. Ohman, Multi-way principal components-and PLS-analysis, J. Chemom. 1 (1987) 41–56.

MA

[18] L. Stlhle, Aspects of the analysis of three-way data, Chemom. Intell. Lab. Syst. 7 (1989) 95–100.

TE

D

[19] A.K. Smilde, Comments on Multilinear PLS, J. Chemom. 11(1997) 367–377. [20] L. de Lathauwer, Decompositions of a higher-order tensor in block terms-Part II: Definitions

CE P

and uniqueness, SIAM J. Matrix Anal. Appl. 30 (2008) 1033–1066. [21] L. de Lathauwer, B. de Moor, J. Vandewalle, On the best rank-1 and rank-(R1, R2, ..., RN)

AC

approximation of higher-order tensors, SIAM J. Matrix Anal. Appl. 21 (2000) 1324–1342. [22] R.A. Harshman, Foundations of the PARAFAC procedure: Models and conditions for an “explanatory” multi-modal factor analysis, UCLA Working Papers in Phonetics 16 (1970) 1–84. [23] R. Bro, A. Smilde, S. de Jong, On the difference between lowrank and subspace approximation: improved model for multilinear PLS regression, Chemom. Intell. Lab. Syst. 58 (2001) 3–13. [24] P. Nomikos, J.F. MacGregor, Multivariate SPC charts for monitoring batch processes, Technometrics 37 (1995) 41–59. [25] E.B. Martin, A.J. Morris, Non-parametric confidence bounds for process performance 31

ACCEPTED MANUSCRIPT monitoring charts, J. Process Control 6 (1996) 349–358. [26] J.A. Westerhuis, S.P. Gurden, A.K. Smilde, Generalized contribution plots in multivariate

IP

T

statistical process monitoring, Chem. Intell. Lab. Syst. 51 (2000) 95–114.

SC R

[27] G. Birol, C. Undey, A. Cinar, A modular simulation package for fed-batch fermentation: penicillin production, Comput. Chem. Eng. 26 (2002) 1553–1565.

NU

[28] L.J. Luo, Tensor global-local preserving projections for batch process monitoring, Ind. Eng. Chem. Res. 53 (2014) 10166–10176.

MA

[29] L. J. Luo, S.Y. Bao, Z.L. Gao, J.Q. Yuan, Batch process monitoring with tensor global–local structure analysis, Ind. Eng. Chem. Res. 52 (2013) 18031–18042.

TE

D

[30] T. Hastie, R. Tibshirani, J. Friedman, The elements of statistical learning: data mining, inference,

AC

CE P

and prediction (2nd ed.), Springer, New York, 2011.

32

ACCEPTED MANUSCRIPT Table 1.Process variables in the fed-batch penicillin fermentation process

IP

T

Type

SC R

Unit L/h W L/h K g/L % L mmol/L K kcal/h L/h g/L g/L

Input variable

MA

NU

Process variables Aeration rate Agitator power input Substrate feeding rate Substrate feeding temperature Substrate concentration DO saturation Culture volume CO2 concentration pH Bioreactor temperature Generated heat Cold water flow rate Biomass concentration Penicillin concentration

Quality variable

D

Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14

TE

Table 2. Fault batches in the fed-batch penicillin fermentation process Fault batch No. Fault description

AC

CE P

1 2 3 4 5 6

10% step increase in aeration rate 10% step increase in agitator power input 10% step increase in substrate feeding rate 10% step decrease in substrate feeding rate 2 L/h ramp decrease in aeration rate 5 W ramp decrease in agitator power input

33

ACCEPTED MANUSCRIPT

Table 3. Prediction performance of five models at different noise levels WMHOPLS-RTucker

WMHOPLS-CP

R

RMSEP (g/L)

R

RMSEP (g/L)

Inf dB 0.9994

0.0027

0.9989

0.0054

0.9948

0.0252

20 dB 0.9949

0.0247

0.9954

0.0223

0.9920

10 dB 0.9874

0.0607

0.9886

0.0550

5 dB

0.0954

0.9813

0.0902

0.9802

WMMPLS

R

RMSEP (g/L)

R

RMSEP (g/L)

0.9986

0.0070

0.9994

0.0027

0.0387

0.9954

0.0223

0.9950

0.0243

0.9869

0.0633

0.9881

0.0573

0.9857

0.0691

0.9807

0.0933

0.9806

0.0937

0.9762

0.1148

MA N

US

CR

RMSEP (g/L)

TE D

R

IP

SNR

WMN-PLS

T

WMHOPLS-Tucker

AC

CE P

Bold values present the smallest RMSEP for each SNR.

34

ACCEPTED MANUSCRIPT Table 4. Average fault detection rates (FDRs) and false alarm rates (FARs) of six faults for five models

QX

T2

QY

QX

WMN-PLS

IP

WMHOPLS-CP T2

QY

CR

T2

WMHOPLS-RTucker

QX

QY

T2

QX

WMMPLS QY

T2

QX

QY

100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0

20

100.0 100.0

10

97.0

100.0 100.0

96.5

100.0

5

3.63

97.2

100.0

3.63

50

0

0

0

0

20

0

0

0

10

0

0

0

5

0

0

25.6

99.7

100.0 100.0

99.8

100.0

99.7

100.0 100.0 100.0

99.8

100.0

99.8

99.8

99.5

100.0

99.7

98.8

100.0

99.8

99.0

100.0

99.7

97.2

100.0

3.63

97.2

100.0

3.63

97.2

100.0

3.3

100.0

99.8

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

4.4

0

10.0

26.7

0

0

25.6

0

0

25.6

0

0

25.6

0

34.4

23.3

0

AC

FAR

99.5

CE P

FDR

MA N

50

TE D

batch number

WMHOPLS-Tucker

US

Total reference

T

using different numbers of reference batches without noise

Bold values present the largest FAR in each case.

35

ACCEPTED MANUSCRIPT Table 5. Average fault detection rates (FDRs) and false alarm rates (FARs) of six faults for five models

WMHOPLS-CP

QX

QY

100.0

100.0

99.5

T2

QX

QY

T2

QX

QY

T2

QX

QY

99.8 100.0 99.7 100.0 100.0 100.0 99.8 100.0 99.8

20 dB

8.6

84.8

37.8

8.6

85.0

38.1

8.4

85.0

38.1

8.4

85.0

38.4

2.6

86.6

39.1

10 dB

7.8

82.5

25.2

7.9

82.5

24.8

7.8

82.7

25.2

7.8

82.0

25.2

4.8

83.7

30.4

5 dB

9.4

72.1

19.5

9.9

71.6

20.3

8.3

73.6

19.1

7.8

71.6

20.6

5.4

72.1

21.8

Inf dB

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

20 dB

0

1.7

6.3

0

1.9

6.3

0

1.7

6.3

0

1.7

6.3

0

7.0

9.3

10 dB

0

2.2

8.5

AC

Inf dB 100.0 100.0 99.7

T2

WMMPLS

US

QY

MA N

QX

CR

SNR T2

WMN-PLS

IP

WMHOPLS-Tucker WMHOPLS-RTucker

T

using 20 reference batches at different noise levels

0

2.6

8.5

0

2.4

8.1

0

2.4

8.1

0

8.6

11.1

5 dB

0

10.7

7.4

0

10.6

7.6

0

11.1

7.8

0

11.1

7.6

0

12.6

10.6

FAR

CE P

TE D

FDR

Bold values present the smallest FDR or the largest FAR for each SNR.

36

US

CR

IP

T

ACCEPTED MANUSCRIPT

AC

CE P

TE D

MA N

Fig. 1

37

AC

CE P

TE D

MA N

US

CR

IP

T

ACCEPTED MANUSCRIPT

Fig. 2 38

AC

CE P

TE D

MA N

US

CR

IP

T

ACCEPTED MANUSCRIPT

Fig. 3 39

AC

CE P

TE D

MA N

US

CR

IP

T

ACCEPTED MANUSCRIPT

Fig. 4 40

AC

Fig. 5

CE P

TE D

MA N

US

CR

IP

T

ACCEPTED MANUSCRIPT

41

AC

CE P

TE D

MA N

US

CR

IP

T

ACCEPTED MANUSCRIPT

Fig. 6a 42

AC

CE P

TE D

MA N

US

CR

IP

T

ACCEPTED MANUSCRIPT

Fig. 6b

43

AC

CE P

TE D

MA N

US

CR

IP

T

ACCEPTED MANUSCRIPT

Fig. 7a 44

AC

CE P

TE D

MA N

US

CR

IP

T

ACCEPTED MANUSCRIPT

Fig. 7b 45

AC

CE P

TE D

MA N

US

CR

IP

T

ACCEPTED MANUSCRIPT

Fig. 8a

46

AC

CE P

TE D

MA N

US

CR

IP

T

ACCEPTED MANUSCRIPT

Fig. 8b 47

AC

CE P

TE D

MA N

US

CR

IP

T

ACCEPTED MANUSCRIPT

Fig. 8c 48

AC

CE P

TE D

MA N

US

CR

IP

T

ACCEPTED MANUSCRIPT

Fig. 8d 49

AC

CE P

TE D

MA N

US

CR

IP

T

ACCEPTED MANUSCRIPT

Fig. 8e 50

AC

CE P

TE D

MA N

US

CR

IP

T

ACCEPTED MANUSCRIPT

Fig. 9

51

ACCEPTED MANUSCRIPT

Figures captions

IP

T

Figure 1. One component decomposition of a three-way array X(I×J×K).

CR

Figure 2. Schematic diagram of a HOPLS-Tucker model for three-way arrays X and Y.

US

Figure 3. Schematic diagram of a HOPLS-RTucker model for three-way arrays X and Y.

MA N

Figure 4. Schematic diagram of a HOPLS-CP model for three-way arrays X and Y. Figure 5. Preprocessing procedure for batch data.

TE D

Figure 6. Quality prediction results of five models for a test batch without noise.

CE P

(a) Penicillin concentration, (b) Biomass concentration. Figure 7. Quality prediction results of five models for a test batch at the noise level of 5 dB.

AC

(a) Penicillin concentration, (b) Biomass concentration. Figure 8. Monitoring charts of five models for the training dataset with 10 reference batches. (a) WMHOPLS-Tucker, (b) WMHOPLS-RTucker, (c) WMHOPLS-CP, (d) WMN-PLS, (e) WMMPLS. Figure 9. Contribution plots of WMHOPLS-Tucker for the fault batch No. 5. 52

ACCEPTED MANUSCRIPT

Highlights

IP

T

1. Two new multilinear PLS models named HOPLS-CP and HOPLS-RTucker are developed.

CR

2. Batch process monitoring methods are proposed based on multilinear PLS.

US

3. Multilinear PLS is used for the quality prediction in batch processes.

AC

CE P

TE D

MA N

4. Multilinear PLS has better monitoring and prediction abilities than unfold-PLS.

53