Quality Prediction and Quality-relevant Monitoring with Multilinear PLS for Batch Processes Lijia Luo, Shiyi Bao, Jianfeng Mao, Di Tang PII: DOI: Reference:
S0169-7439(15)00274-9 doi: 10.1016/j.chemolab.2015.11.004 CHEMOM 3132
To appear in:
Chemometrics and Intelligent Laboratory Systems
Received date: Revised date: Accepted date:
22 June 2015 26 October 2015 4 November 2015
Please cite this article as: Lijia Luo, Shiyi Bao, Jianfeng Mao, Di Tang, Quality Prediction and Quality-relevant Monitoring with Multilinear PLS for Batch Processes, Chemometrics and Intelligent Laboratory Systems (2015), doi: 10.1016/j.chemolab.2015.11.004
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT Quality Prediction and Quality-relevant Monitoring with Multilinear PLS for Batch Processes
IP
T
Lijia Luo, Shiyi Bao, Jianfeng Mao, Di Tang
SC R
College of Mechanical Engineering, Zhejiang University of Technology, Engineering Research Center of Process Equipment and Remanufacturing, Ministry of Education, Hangzhou, China
NU
ABSTRACT: The multilinear regression method is applied for quality prediction and quality-relevant monitoring in batch processes. Four multilinear partial least squares (PLS) models
MA
are investigated, including three higher-order PLS (HOPLS) models, termed as HOPLS-Tucker, HOPLS-RTucker and HOPLS-CP, and the N-way PLS (N-PLS) model. These multilinear PLS
TE
D
methods have two advantages as compared to the unfold-PLS method. Firstly, they retain the inherent three-way representation of batch data and avoid the disadvantages caused by data
CE P
unfolding, resulting in more stable process models. Secondly, they summarize the main information on each mode of data and describe the three-way interactions between them, and therefore have
AC
better modeling accuracy and intuitive interpretability. Online quality prediction and quality-relevant monitoring methods are developed by combining multilinear PLS with the moving data window technique. These methods are tested in a fed-batch penicillin fermentation process. The results indicate that the multilinear PLS method has higher predictive accuracy, better anti-noise capability and monitoring performance than the unfold-PLS method. Keywords: tensor; multilinear PLS; quality-relevant monitoring; quality prediction; batch processes.
Corresponding Author. Tel.: +86 (0571) 88320349.
E-mail address:
[email protected] (L.J. Luo). 1
ACCEPTED MANUSCRIPT 1. Introduction
T
Batch process is widely used to produce low volume and high value-added products, including
IP
polymers, chemicals, pharmaceuticals and semiconductors, because of its high flexibility to adapt the
SC R
rapidly changing market situations. The increasing international competition has aroused the demand for high quality products. Reliable process monitoring as well as precise quality prediction and
NU
control are therefore necessary to ensure the safe and profitable operation of batch processes. The process monitoring aims to warn abnormal operating conditions by timely detecting and diagnosing
MA
process faults, which is helpful for taking corrective actions to recover the normal production and
D
enhance the process safety. The objective of quality prediction and control is to estimate the product
TE
quality from operating conditions in a fast and accurate way, and further to derive better operating
CE P
conditions that can reach a high product quality. However, process monitoring and quality prediction are difficult for batch processes, because they suffer not only from those intractable nonlinear and
AC
time-varying process behaviors, but also from the unavailable online quality measurements, batch-to-batch variations, and so on. In last decades, data-driven process monitoring and control methods have been widely accepted in various industrials, benefiting from the widespread application of automation instrument and data acquisition technologies. Data-driven methods build process models using operation data and rarely require the knowledge of process mechanism. Therefore, they are especially applicable for those complicated industrial processes, in which precise first-principle models are not available or difficult to be built. Different from continuous processes, operation data of a batch process are usually recorded in a three-way data array with three directions of batch, variable and sampling time. 2
ACCEPTED MANUSCRIPT Bilinear analysis methods and multilinear analysis methods are often adopted to analyze this three-way data array. The bilinear analysis method is also known as the unfolding method, including
IP
T
multiway principal component analysis (MPCA) [1], multiway partial least squares (MPLS) [2],
SC R
multiway independent component analysis (MICA) [3], and so on. These methods unfold the three-way data array to a matrix along the batch-wise or variable-wise direction firstly, and then
NU
build process models based on the unfolded data. However, multilinear analysis methods maintain the three-way representation of the data array and build process models using tensor decompositions,
MA
such as parallel factor analysis (PARAFAC) [4] and Tucker3 decomposition [5]. So far, bilinear methods, especially MPCA and MPLS, have attracted more research attention than
TE
D
multilinear methods in process control applications. Lots of MPCA/MPLS-based process monitoring and quality prediction methods have been developed. For example, Lee et al. [6] proposed a
CE P
multiway kernel PCA method for monitoring nonlinear batch processes. Lu and Gao [7] and Zhao et al. [8] presented two phase-based PLS methods for the quality prediction of multiphase batch
AC
processes, which take the multiphase feature of batch processes into account. Yu [9] proposed a multi-way Gaussian mixture model based adaptive kernel partial least squares (MGMM-AKPLS) method to predict quality variables in nonlinear batch processes. As mentioned above, the MPCA/MPLS-based methods need to unfold the three-way batch data before building process models. However, two potential disadvantages are caused by the data unfolding operation. Firstly, the unfolded data may have the "large variable number but small sample size" problem. For instance, when a three-way array X(10×10×100) containing 10 batches, 10 variables and 100 time slices is unfolded to a matrix X(10×1000), only ten samples are available for PCA or PLS to compute the 3
ACCEPTED MANUSCRIPT 1000-dimensional loading vectors, probably resulting in an unreliable estimate of model parameters [10]. Secondly, MPLS and MPCA fail to offer an explicit description of the three-way interactions in
IP
T
the batch dataset, because the three-way data structure is destroyed and the information from two
SC R
directions is convolved together after data unfolding. These two disadvantages may reduce the monitoring performance, prediction ability and interpretability of MPCA/MPLS models.
NU
In recent years, researchers have started to analyze batch data with multilinear (i.e., tensor) analysis methods, where batch data are represented in their natural three-way form as tensors. For
MA
example, Meng et al. [4] and Louwerse and Smilde [5] applied PARAFAC and Tucker decomposition for batch process monitoring, respectively. Luo et al. [11] proposed a generalized
TE
D
Tucker2 (GTucker2) model for monitoring uneven-length batch processes. Since the batch data array is inherently three-way, using multilinear analysis methods can naturally avoid the disadvantages
CE P
caused by data unfolding. Multilinear models commonly have much fewer parameters than MPCA/MPLS models, because batch data are compressed in three directions instead of two. These
AC
multilinear models are therefore expected to be more stable [5]. Besides, multilinear models can summarize all main effects and interactions in batch data, because they extract the main information on each mode of data and describe relations between them [12]. Thus, multilinear models have better modeling accuracy and intuitive interpretability. Particularly, two multilinear regression algorithms haven been proposed, i.e., higher-order partial least squares (HOPLS) [10] and N-way partial least squares (N-PLS) [13]. HOPLS and N-PLS both have some advantages as compared to unfold-PLS (MPLS), including robustness to noise, stabilized solution, increased predictability and intuitive interpretability [10,13]. These multilinear regression methods may have good application prospects 4
ACCEPTED MANUSCRIPT in batch processes. In this paper, the multilinear regression method is applied for quality prediction and
IP
T
quality-relevant monitoring in batch processes. Four multilinear PLS models, including
SC R
HOPLS-Tucker (i.e., HOPLS), HOPLS-RTucker, HOPLS-CP and N-PLS, are investigated. Especially, HOPLS-RTucker and HOPLS-CP are two new multilinear PLS models, which are
NU
derived by imposing extra restrictions on HOPLS-Tucker. HOPLS-RTucker aims to handle the multi-way data with a specific (or free) mode, on which the information remain unchanged.
MA
HOPLS-CP replaces the Tucker decomposition used in HOPLS-Tucker with the CP decomposition (i.e., canonical decomposition/parallel factor analysis) [14] to simplify the model and improve the
TE
D
computational efficiency. Online quality prediction and quality-relevant monitoring methods are then proposed by combining multilinear PLS with the moving data window technique. Monitoring
CE P
statistics named T2, Qx and Qy are constructed for fault detection, and the contribution plot is applied for fault diagnosis. These methods are tested in a fed-batch penicillin fermentation process. The
AC
results show that multilinear PLS outperforms MPLS (i.e., unfold-PLS) in terms of higher predictive accuracy, better anti-noise capability, higher fault detection rates and lower false alarm rate.
2. Notation and Preliminaries 2.1. Notation and definitions
Tensors (multi-way arrays), matrices and vectors are denoted by underlined boldface capital letters, boldface capital letters and boldface lowercase letters respectively, e.g., X, X and x. The order of a tensor is the number of ways or modes. The ith entry of a vector x is xi. The ith column of a matrix X
5
ACCEPTED MANUSCRIPT is xi and the element (i, j) is xij. The element (i1, i2, …, iN) of an Nth-order tensor X I1I2
xi1i2
iN
I N
is
. X(n) and x(n) denote the nth factor matrix and factor vector in a sequence, respectively. X(n) is
IP
T
the mode-n matricization of a tensor X [14]. Symbols “∘”, “⨂” and “⨁” denote the vector outer
SC R
product, Kronecker product and Khatri–Rao product, respectively. The superscript “+” denotes the Moore-Penrose pseudoinverse.
Y = X n A I1
I n I N
I n1 J I n1 I N
I n J M
Z = X ,Y
in 1
in
in1 jin1 iN
D
xi1
i1 1 i2 1
[14,15].
t
iN in
xi1 in 1
For
in
iN
iN 1
iN
. The n-mode product t In
or with a vector
a jin or
two
2 i1i2
Y = X n t I1
X I1
tensors
is
I n1 I n1 I N
I n I N
and
with the same size on the nth-mode, their n-mode cross-covariance is
I1 n;n
I n1 I n1 I N J1 J n1 J n1 J M
In
with zi1
AC
The Tucker decomposition of a tensor X I1
where G J1
x
TE
Y J1
in1in1 iN
IN
In
with yi1
CE P
yi1
I2
with a matrix A J In
In
with
I1
is X
MA
X I1
of a tensor
I N
NU
The norm of an Nth-order tensor X I1I2
Jn J N
X G 1 A(1) 2
I n I N
in1in1 iN j1
jn1 jn1
jM
xi1 in 1
in
iN
y j1
in
jM
.
is concisely expressed as [14,15]
N A( N ) G; A(1) ,
, A( N )
is the core tensor, and A( n ) Jn In denote factor matrices. The canonical
decomposition/parallel factor analysis (CP) of a tensor X I1
I n I N
[14,15] R
X ar(1) ar(2) r 1
ar( N ) A(1) ,
where ar( n ) In is the rth-column vector of matrix A( N ) .
6
, A( N )
is concisely expressed as
ACCEPTED MANUSCRIPT 2.2. Bilinear PLS and unfold-PLS
T
The partial least squares (PLS) is a well-known multivariate regression method for two-way data
IP
[16]. The unfold-PLS, also known as MPLS, is an extended version of PLS to deal with three-way
SC R
data [2]. The relation between PLS and MPLS is that MPLS performs ordinary PLS on two matrices unfolded from three-way data arrays. Given two three-way data arrays X(I×J×K) and Y(I×M×N),
NU
they are unfolded into two matrices X(I×JK) and Y(I×MN). MPLS decomposes matrices X and Y as R
X t r prT E TP T E R
MA
r 1
Y t q F TQ F r 1
T r r
(1)
T
D
where tr (I×1) are score vectors, p(JK×1) and q(MN×1) are loading vectors, R is the number of score
TE
vectors. E(I×JK) and F(I×MN) are residual matrices. T = [t1, …, tR] is the score matrix, P = [p1, …,
CE P
pR] and Q = [q1, …, qR] are loading matrices. MPLS has the same properties as PLS. More detailed descriptions about MPLS can be found in ref [2].
AC
3. Multilinear PLS
3.1. A brief review of multilinear PLS methods
The conventional bilinear PLS cannot deal with multi-way data directly. The unfold-PLS is applicable for multi-way data by using the data unfolding operation, while it is not a true multilinear method because it loses the multi-way representation of data. Thus, some researchers have attempted to propose multilinear regression algorithms [10,13,17,18]. Among them, N-PLS [13] and HOPLS [10] are two truly multilinear regression models.
7
ACCEPTED MANUSCRIPT 3.1.1. N-way PLS (N-PLS)
T
Bro [13] introduced the N-PLS as an extension of the bilinear PLS to multi-way data. N-PLS
IP
decomposes the independent data X and dependent data Y into score vectors and weight vectors,
SC R
subject to maximum pairwise covariance between score vectors of X and Y. Specifically, for a univariate vector y(I×1) and an three-way data array X(I×J×K), X is decomposed into one score
NU
vector t(I×1) and two weight vectors w(J)(J×1) and w(K)(K×1), namely one vector per mode, as shown in Fig. 1. Thus, the decomposition of X is given by
MA
X = t w ( J ) w ( K ) E , i.e., xijk ti w(j J ) wk( K ) eijk
D
where E contains the residuals.
(2)
TE
Figure 1. One component decomposition of a three-way array X(I×J×K).
CE P
N-PLS solves the following optimization problem to find w(J) and w(K) such that the covariance
AC
between t and y is maximized [13]
I J K (J ) (K ) 2 max cov( t , y ) min x t w w ijk i j k w J ,w K i 1 j 1 k 1
(3)
s.t. w ( J ) w ( K ) 1
Since, for given w(J) and w(K) with w ( J ) w ( K ) 1 , the least squares solution of Eq. (2) is J
K
ti xijk w(j J ) wk( K )
(4)
j 1 k 1
the optimization problem Eq. (3) can be rewritten as J K (J ) (K ) max cov( t , y ) t x w w i ijk j k w J ,wK j 1 k 1
s.t. w ( J ) w ( K ) 1
8
(5)
ACCEPTED MANUSCRIPT I
According to cov(t , y) yi ti , Eq. (5) is further expressed as i 1
J K I I J K (J ) (K ) max y t t x w w max yi xijk w(j J ) wk( K ) i i i ijk j k J K J K w ,w j 1 k 1 i 1 w , w i 1 j 1 k 1
(6)
T
w
1
(K )
IP
s.t. w
(J )
I
SC R
Defining a matrix Z(J×K) with elements z jk yi xijk , Eq. (6) simplifies to i 1
J K max z jk w(j J ) wk( K ) J K w ,w j 1 k 1
NU
(7)
s.t. w ( J ) w ( K ) 1
MA
Thus, optimal w(J) and w(K) are first singular vectors of the singular value decomposition (SVD) of Z
J K T max z jk w(j J ) wk( K ) max w ( J ) Zw ( K ) J K J K w ,w j 1 k 1 w ,w
SVD( Z )
(8)
D
w , w
(K )
TE
(J )
More components can be calculated from residuals by applying a deflation operation. Deflation
CE P
operations for X and y with respect to the part explained by previous score and weight vectors are X i 1 X i ti wi( J ) wi( K )
(9)
AC
yi 1 yi Ti bi
where Ti [t1 ,
, ti ] is a score matrix that consists of all previous score vectors, bi (TiT Ti )1TiT y1
are regression coefficients, and X1 and y1 denote the initial X and y respectively. Accordingly, replacing X in Eq. (2) with Xi+1 and y in Eq. (3) with yi+1, the (i+1)th components can be computed. Analogously, the solution for an N-way independent data X I1I2
I N
is obtained by finding the
weight vectors w ( I2 ) , …, w ( I N ) . In higher-order cases, weight vectors cannot be calculated by an SVD as in Eq. (8), but are obtained by the one-component CP decomposition of a tensor
Z I2
I N
I1
with elements zi2
iN
xi1 i1 1
iN
yi1 , namely
9
ACCEPTED MANUSCRIPT I2 max w I 2 , w I N i 1 2
IN
z
iN 1
s.t. w ( I2 )
i2
iN
wi(NI N )
wi(2I2 )
(10)
w(IN ) 1
N w ( I N )T . It is also possible to extend
IP
T
The score vector is then computed by t X 2 w ( I2 )T 3
JM
in a similar way. Likewise,
SC R
above N-PLS algorithm to an M-way dependent data Y J1J2
weight vectors q ( J 2 ) , …, q ( J M ) of Y are found by performing one-component CP decomposition on a JM
, and then the score vector is u Y 2 q( J2 )T 3
Finally, the N-PLS model for X I1I2
I N
NU
multi-way array S = Y 1 t J2
and Y J1J2
R
MA
X = t r wr( I2 ) r 1
Y = ur q is
the
number
of
components,
TE
qr( J2 )
R
(11)
q
( JM ) r
and
FR r :
wr( I2 )
qr( J M ) 1 . The relation between the score matrix U = [u1 ,
matrix T = [t1 ,
CE P
where
D
r 1
with I1 = J1 is expressed as
wr( I N ) E R
R
( J2 ) r
JM
M q( J M )T .
, t R ] I1R is U = TB , where B = [b1 ,
wr( I N ) 1
and
, uR ] J1R and the score
, bR ] RR is a regression coefficient
AC
matrix. Note that B is an upper triangular matrix, and its rth column br is 1 b br r with br TrT Tr TrT ur 0 R r
where 0R-r is a zero vector with the size of R-r, and Tr [t1 ,
(12)
, tr ] I1r is the score matrix
containing previous r score vectors. More details about N-PLS can be found in refs [13] and [19].
3.1.2. Higher-order PLS based on Tucker decomposition (HOPLS-Tucker)
Zhao et al. [10] proposed HOPLS-Tucker as a generalized multilinear regression model for multi-way data. HOPLS-Tucker projects two tensors into a low-dimensional latent space using the block Tucker decomposition [20], and then performs regression on latent variables. Given an N-way 10
ACCEPTED MANUSCRIPT X I1I2
(N ≥ 3) independent data
Y J1J2
JM
I N
and an M-way (M ≥ 3) dependent data
with the same size on the first mode, i.e., I1 = J1, HOPLS-Tucker decomposes them as
IP
T
a sum of Tucker blocks [10] R
X Gr 1 t r 2 Pr(1) 3 r 1 R
Y Dr 1 t r 2 Q 3 (1) r
r 1
SC R
N Pr( N 1) E R ( M 1) r
M Q
(13)
FR
( m ) M 1 r m 1
Gr 1L2
( n ) N 1 r n 1
In1Ln1 and
LN
and Dr 1K2
Ln n2 N
and
are core tensors, and ER I1I2
KM
Km m2 M
I N
and FR J1J2
JM
are
are tuning parameters for controlling the model
D
residuals.
P
J m1Km1 are rth loading matrices on mode-n of X and mode-m of Y respectively,
MA
Q
NU
where R is the number of latent vectors, t r I1 is the rth latent vectors,
TE
complexity [10]. The Tucker decomposition lacks of uniqueness due to permutation, scaling, and
CE P
rotation indeterminacies [14]. To improve the uniqueness property of HOPLS-Tucker, Zhao et al. [10] imposed three constraints: (1) core tensors Gr and Dr are all-orthogonal, (2) loading matrices Pr( n )
AC
and Qr( m ) are column-wise orthonormal, i.e., Pr( n )T Pr( n ) I , Qr( m)T Qr( m) I , and (3) latent vectors tr are of length one, i.e., t r 1 . Defining a latent matrix T = [t1 , loading matrices P ( n ) = [ P1( n ) ,
, t R ] I1R , two sequences of
, PR( n ) ] In1RLn1 and Q( m) = [Q1( m) ,
core tensors G blockdiag(G1 ,
, GR ) RRL2
RLN
, QR( m) ] Jm1RKm1 , and two
and D blockdiag( D1 ,
, DR ) RRK2
RKM
,
Eq. (13) is rewritten as [10] X G 1 T 2 P (1) 3
N P ( N 1) E R
Y D 1 T 2 Q (1) 3
M Q ( M 1) FR
(14)
The schematic diagram of the HOPLS-Tucker model for two three-order tensors is shown in Fig. 2. Figure 2. Schematic diagram of a HOPLS-Tucker model for three-way arrays X and Y. 11
ACCEPTED MANUSCRIPT As shown in Eq. (13) and Eq. (14), HOPLS-Tucker aims to finding common latent vectors that can approximate X and Y simultaneously, which can be expressed as a optimization problem {t r ,Gr ,Dr ,Pr
E R FR 2
,Qr( m ) }
2
T
min( n )
(15)
IP
s.t. t r 1, Pr( n )T Pr( n ) I , Qr( m )T Qr( m ) I
SC R
Because each pair of tr Gr, Dr, Pr( n ) and Qr( m ) can be obtained sequentially with the same criteria by the deflation operation, the problem of HOPLS-Tucker is simplified to finding the first latent vector t
NU
and two sequences of loading matrices P(n) and Q(m) [10]. Thus, removing the indexes r and R in Eq.
MA
(15) and applying the propositions introduced by Zhao et al. [10], the objective function of Eq. (15) is equivalently expressed as [10] ,Q( m ) }
2
max G D max (n) ( m) ( n) ( m) 2
{t,P
,Q
}
2
2
{t,P
,Q
G, D }
{1,1}
(16)
D
{t,G,D,P
E F 2
min (n)
TE
with
N P ( N 1)T
D Y 1 t T 2 Q (1)T 3
M Q ( M 1)T
CE P
G X 1 t T 2 P (1)T 3
2
According to Eq. (17) and t rT t r 1 ,
G, D
X ,Y
AC
(17)
{1,1}
; P (1)T , {1,1}
is equivalent to [10]
, P ( N 1)T , Q (1)T ,
Q ( M 1)T
2
Q ( M 1)T
2
(18)
Thus, the final optimization problem of HOPLS-Tucker is [10] max X , Y { P ( n ) ,Q ( m ) } s.t. P
( n )T
P
(n)
; P (1)T , {1,1}
I,Q
( m )T
Q
( m)
, P ( N 1)T , Q (1)T ,
(19)
I
which can be solved by the higher order orthogonal iteration (HOOI) algorithm
[14,20]. After
obtained loadings P ( n ) , the optimal latent vector t is determined by minimizing the squared norm of the residual
2
E , namely t = arg min X G; t , P (1) , 12
, P ( N 1)
2
(20)
ACCEPTED MANUSCRIPT Eq. (20) is solved by choosing t as the first leading left singular vector of the following SVD [10] SVD X 2 P (1)T 3
N P ( N 1)T (1)
(21)
IP
T
The core tensors G and D are calculated by Eq. (17). More components can be obtained by repeating
SC R
above procedures with the following deflation operation
X r 1 X r Gr ; t r , Pr(1) , , Pr( N 1) Yr 1 Yr Dr ; t r , Qr(1) , , Qr( M 1)
I N
is an Nth-order tensor and Y I1J is a matrix, the
MA
Particularly, when X I1I2
NU
where X1 and Y1 denote the initial X and Y.
HOPLS-Tucker model simplifies as R
D
X Gr 1 t r 2 Pr(1) 3 r 1 R
(22)
N Pr( N 1) E R (23)
TE
Y d r t r qr FR r 1
CE P
where dr is a scalar regression coefficient, and qr 1 . In this case, the optimization problem of HOPLS-Tucker is formulated as [10]
AC
( X 1 Y T ); qT , P (1)T , max (n) { P , q} s.t. P
( n )T
P
(n)
, P ( N 1)T
2
(24)
I,q q 1 T
which is solved by performing HOOI on the cross-covariance tensor C = X 1 Y T J I2 Subsequently, the core tensor G (C ) 1L2 vector t that minimizes E
2
LN
I N
.
corresponding to C is also obtained. Then, the latent
is [10]
t = X 2 P (1)T 3
(C ) N P ( N 1)T G(1) (1)
(25)
t=t t After obtained the latent vector t and loading vector q, the regression coefficient d is calculated by
d t T Yq . More components can be extracted via the deflation operation. 13
ACCEPTED MANUSCRIPT 3.2. Two new variants of HOPLS-Tucker
T
In this paper, by imposing extra restrictions on the HOPLS-Tucker model, two new important
IP
variants of HOPLS-Tucker are derived, termed as HOPLS based on relaxed Tucker decomposition
SC R
(HOPLS-RTucker) and HOPLS based on CP decomposition (HOPLS-CP).
HOPLS-Tucker compresses all modes of multi-way data to a few components. However, for some
NU
multi-way data, compressing all modes may be not necessary, such as the time mode in batch data.
MA
This is the main reason for developing the HOPLS-RTucker model. HOPLS-RTucker restricts the loading matrices on some specific modes (called free mode) of the multi-way data to be the identity
D
matrix, and thus the information on these free modes remains unchanged. For instance, a
TE
HOPLS-RTucker model is R
r 1 R
CE P
X Gr 1 t r 2 Pr(1) 3 Gr ; t r , Pr(1) , r 1 R
, Pr( n 1) , I , Pr( n 1) ,
Y Dr 1 t r 2 Q 3
AC
r 1 R
Dr ; t r , Qr(1) , r 1
n 1 Pr( n 1) n I n 1 Pr( n 1) n 2
(1) r
N Pr( N 1) E R
, Pr( N 1) +E R (26)
( m 1) r
m 1 Q
( m 1) r
m I m 1 Q
, Qr( m 1) , I , Qr( m 1) ,
m 2
( M 1) r
M Q
FR
, Qr( M 1) +FR
which is similar to Eq. (13) except that Ln = In and Km = Jm for Gr 1L2
LN
and Dr 1K2
KM
,
as well as Pr( n ) I and Qr( m) I . The schematic diagram of a HOPLS-RTucker model for two three-way arrays is shown in Fig. 3. Similar to HOPLS-Tucker, the HOPLS-RTucker model can be computed using the HOOI algorithm, while removing those steps related to loading matrices on the free modes. HOPLS-RTucker thus has higher computational efficiency than HOPLS-Tucker. Figure 3. Schematic diagram of a HOPLS-RTucker model for three-way arrays X and Y.
14
ACCEPTED MANUSCRIPT If restricting n : Ln 1 and m : Km 1 in Eq. (13), the block Tucker decomposition used in HOPLS-Tucker simplifies to the CP decomposition, resulting in the HOPLS-CP model R
X = X ,r t r pr(1) Y = Y ,r t r q
(1) r
where
t rT t r 1 ,
pr( n )T pr( n ) 1
J X diag (X ,r ) RR
R
q
FR J Y ; T , Q ,
and
(1)
qr( m)T qr( m) 1
and JY diag (Y ,r ) RR
,Q
( M 1)
for
(27)
FR
r
,
n
and
m
are two diagonal tensors. T [t1 ,
, pR( n) ] and Q( m) [q1( m) ,
.
, tR ]
, qR( m) ] are mode-n and mode-m
MA
is the latent matrix. P ( n) [ p1( n) ,
R
NU
r 1
( M 1) r
IP
R
SC R
r 1
, P ( N 1) E R
T
pr( N 1) E R J X ; T , P (1) ,
loading matrices of X and Y, respectively. Fig. 4 illustrates the HOPLS-CP model of two three-way
D
arrays. The HOPLS-CP model can be computed in the similar way as HOPLS-Tucker using the
TE
HOOI or alternating least squares (ALS) algorithm [14,21,22]. Note that the HOPLS-CP model has
CE P
the similar form as the N-PLS model, as shown in Eq. (11), if merging λX,r and λY,r together with tr as two new vectors tˆr X ,r tr and uˆ r Y ,r tr . Therefore, N-PLS can be regarded as a special case of
AC
HOPLS-Tucker. The main difference between HOPLS-CP and N-PLS lies in their solving algorithms. N-PLS combines the CP decomposition with a nonlinear iterative partial least squares (NIPALS)-like algorithm to finding each pair of latent vectors and loading vectors [10,13]. However, instead of using the iterative method, HOPLS-CP directly finds latent and loading vectors by performing CP decomposition on the cross-covariance of tensors X and Y, improving the computational efficiency. Benefiting from the better fitness ability of Tucker decomposition over CP decomposition [23], HOPLS-Tucker may have higher modeling accuracy than N-PLS and HOPLS-CP. However, HOPLS-CP and N-PLS are much more concise than HOPLS-Tucker by simplifying loading matrices to be loading vectors. They also have better uniqueness property than HOPLS-Tucker, owing to the 15
ACCEPTED MANUSCRIPT uniqueness of CP decomposition [14,23]. These advantages make HOPLS-CP and N-PLS more efficient in computation and easier to be interpreted.
IP
T
Figure 4. Schematic diagram of a HOPLS-CP model for three-way arrays X and Y.
SC R
3.3. Prediction with multilinear PLS
The above multilinear PLS models can be applied to predict new responses Ynew from new
NU
observations Xnew. The prediction procedure consists of two steps: extracting the new latent vector
MA
Tnew from Xnew, and then predicting Ynew with Tnew. For the HOPLS-Tucker model, Ynew is predicted from Xnew by [10]
new ˆ T new X (1) P
D
(28)
, pˆ R ] I2
I N R
and Qˆ = [qˆ1 ,
, qˆR ] J2
J M R
pˆ r Pr( N 1)
Pr(1) Gr(1)
qˆr Qr( M 1)
Qr(1) DrT(1)
CE P
where Pˆ = [ pˆ1 ,
TE
new ˆ ˆ T Yˆ(1)new T newQˆ T X (1) PQ
consist of R columns as follows (29)
AC
If Y is a two-way matrix, the prediction is performed by new ˆ T new X (1) P new ˆ Yˆ(1)new T new DQT X (1) PDQT
where D = diag(dr ) RR is a diagonal matrix, Q = [q1 ,
(30)
, qR ] J2 R , and dr and qr are defined in
Eq. (23). For the HOPLS-RTucker model, the prediction is also performed using Eq. (28) and Eq. (30) by replacing loading matrices on the free modes of X and Y with the identity matrix. In the HOPLS-CP model, Y is predicted by new 1 T new X (1) P X new 1 Yˆ(1)new T new Y Q X (1) P X Y Q
16
(31)
ACCEPTED MANUSCRIPT X diag(X ,r ) RR IN
and Q R J 2
JM
are
two
diagonal
matrices,
are P ( P ( N 1)
P (1) )T
Q (Q ( M 1)
Q (1) )T
T
P R I2
Y diag(Y ,r ) RR
and
(32)
IP
where
SC R
and λX,r, λX,r, P(n) and Q(m) are defined in Eq. (27). For the N-PLS model, Y is predicted by new ˆ T X (1) W
(33)
NU
new ˆ Yˆ(1)new TBQˆ T X (1) WBQˆ T
where B is the regression coefficient matrix defined in Eq. (12), Wˆ = [wˆ 1 ,
, qˆR ] J2
J M R
I N R
and
are weight matrices, and the rth columns of Wˆ and Qˆ are given by
MA
Qˆ = [qˆ1 ,
, wˆ R ] I2
wˆ r wr( I N ) wr( I N 1 )
qr( J 2 )
(34)
TE
D
qˆr qr( J M ) qr( J M 1 )
wr( I2 )
CE P
4. Quality prediction and quality-relevant monitoring for batch processes 4.1. Preprocessing of batch data
AC
Suppose all batches in a batch process have the same durations. Let {X(I×J×K), Y(I×M×K)} be the training dataset, namely reference batches, where X contains independent process variables, Y consists of quality variables, I denotes the number of batches, J and M are numbers of variables, and K is the sampling number. For better modeling, raw batch data should be mean centered or scaled to eliminate the effect of different variable units and highlight differences among batches. However, it lacks of effective preprocessing methods that can directly cope with the three-way data structure of batch dataset. Thus, conventional preprocessing methods for two-way data are usually combined with a data unfolding procedure to preprocess the three-way batch dataset. This is carried out by unfolding X(I×J×K) and Y(I×M×K) into matrices X (I×JK) and Y(I×MK) firstly, and then mean 17
ACCEPTED MANUSCRIPT centering or scaling column vectors of X(I×JK) and Y(I×MK). After that, matrices X(I×JK) and Y(I×MK) are refolded back to three-way data arrays for modeling. Therefore, as illustrated in Fig. 5,
IP
T
the data preprocessing procedure includes three steps: (1) unfolding X(I×J×K) and Y(I×M×K) into
SC R
matrices X(I×JK) and Y(I×MK), (2) mean centering or scaling X(I×JK) and Y(I×MK) to have zero mean or unit variance, (3) refolding X(I×JK) and Y(I×MK) back to the initial three-way form of
NU
X(I×J×K) and Y(I×M×K). Note that, different from MPLS, here data unfolding is just used for data preprocessing rather than data modeling. After mean centering or scaling, the unfolded data matrix is
MA
refolded back to a three-way data array. Therefore, the data preprocessing procedure has less effect on original three-way correlations in the batch dataset.
CE P
4.2. Online quality prediction
TE
D
Figure 5. Preprocessing procedure for batch data.
To follow dynamic and nonlinear changes of the batch process, multilinear PLS is combined with
AC
the moving data window technique for online quality prediction. Let the moving window have a size of d, the training dataset in the window till sampling time k is denoted by {Xk(I×J×d), Yk(I×M×d)}, where the window data of ith reference batch is
x1,i k d 1 i x X ki ( J d ) 1,k d 2 xi 1,k
xi2,k d 1 xi2,k d 2 xi2,k
y1,i k d 1 xiJ ,k d 1 i xiJ ,k d 2 y1,k d 2 i , Yk ( M d ) yi xiJ ,k 1,k
y i2,k d 1 y i2,k d 2 y i2,k
y iM ,k d 1 y iM ,k d 2 y iM ,k
A window mode multilinear PLS model is built from mean centered {Xk(I×J×d), Yk(I×M×d)}. As the window slides along the sampling time, the window mode multilinear PLS model will be updated periodically. 18
ACCEPTED MANUSCRIPT For a new batch, its window data X knew ( J d ) till the sampling time k is
x1,new k d 1 new x X knew ( J d ) 1,k d 2 x new 1,k
x new J ,k d 1 x new J ,k d 2 new x J ,k
x 2,new k d 1
IP
T
x 2,new k d 2
SC R
x 2,new k
After mean centering X knew ( J d ) , quality variables Yknew ( M d ) at the sampling time k are predicted
NU
by the window mode multilinear PLS model. Note that, as the moving window slides from time k to time k+d-1, the prediction value at one sampling time k* (k ≤k*≤k+d-1) for each quality variable in
MA
Y new will be computed d times in adjacent windows, yielding yˆ new* | k , …, yˆ new* | (k d 1) , m = m ,k
m ,k
1, …, M. The mean value of these predictions is thus calculated in sequence and set as the final
D
prediction of yˆ new* , namely
TE
m ,k
at t k :
yˆ new* yˆ new* | k m ,k
CE P
at t k 1:
yˆ
AC
at t k d 1: yˆ
new m ,k *
new m ,k *
m ,k
yˆ
new m ,k *
| k yˆ new* | (k 1) m ,k
2 yˆ new* | k m ,k
(35)
yˆ new* | (k d 1) m ,k
d
This operation also has the advantage to reduce the noise influence.
4.3. Online quality-relevant monitoring
4.3.1. Fault detection and diagnosis Based on a multilinear PLS model, monitoring statistics named T2 statistic and Q statistic are constructed for fault detection. The T2 statistic measures the deviation in the latent (or score) subspace, while the Q statistic measures the deviation in the residual subspace. The T2 statistic for a 19
ACCEPTED MANUSCRIPT batch {Xs(J×K), Ys(M×K)} is defined as
Ts2 t s S 1t sT
(36)
IP
T
where ts is the projection of {Xs(J×K), Ys(M×K)} in the latent space, S denotes the covariance matrix
SC R
of the latent matrix T obtained from the training dataset, i.e., S T T T ( I 1) . Two Q statistics are defined for a batch {Xs(J×K), Ys(M×K)} in the residual spaces of X and Y respectively, which are
Qs ,Y Fs
j ,k es2, jk
2
2
m,k f s2,mk
(37)
NU
Qs , X Es
MA
where Es and Fs are residuals of Xs(J×K) and Ys(M×K). Note that the QY statistic is only applicable for those batch processes whose quality variables can be measured online.
D
The control limit of T2 statistic at significance level α is [24]
TE
T2 ~
R( I 2 1) F ( R, I R) I ( I R)
(38)
CE P
where R is the number of components in the multilinear PLS model, I is the number of reference batches in training dataset, and F ( R, I R) is an F-distribution with R and I-R degrees of freedom.
AC
The control limit of Q statistic is computed by kernel density estimation (KDE) [25]. Regarding Q statistics (QX or QY) of all reference batches as a univariate sample Q1 , Q2 ,
, QI , its
distribution density is estimated by [25]
1 fˆ (Q, ) I
I
Q Qi
K i 1
(39)
where Q denotes a data point under consideration, θ is a smoothing parameter, and K(∙) is a kernel function. The control limit Qα of Q statistic at significance level α is then estimated with the inverse cumulative distribution function of fˆ (Q, ) . For a new batch, faults are detected if any monitoring statistic exceeds its control limit. After 20
ACCEPTED MANUSCRIPT detected a fault, the contribution plot [26] is applied for fault diagnosis. The contribution plot shows the contribution of each variable to monitoring statistics. The largest contributor is probably the fault
IP
T
variable. According to the definition of Q statistic in Eq. (37), the contribution of jth process variable
csQ, Xjk es2, jk
SC R
at sampling time k to the QX statistic is
(40)
NU
Similarly, the contribution of mth quality variable at sampling time k to the QY statistic is
csQ,Ymk f s2,mk
(41)
MA
If using HOPLS-Tucker or HOPLS-RTucker, the definition of T2 statistic in Eq. (36) can be rewritten as
D
Ts2 t s S 1t sT t s S 1 ( xs Pˆ )T
TE
t s S 1
x pˆ j , k s , jk jk
T
(42)
CE P
j ,k xs , jk t s S 1 pˆ Tjk
where xs(1×JK) is the vectoring of Xs(J×K), Pˆ JK R is defined in Eq. (28), and pˆ jk (1 R) is the
AC
jk-th row vector of Pˆ . Thus, the contribution of jth process variable at sampling time k to the T2 statistic is defined as
csT, jk = xs , jk ts S 1 pˆ Tjk
(43)
Similarly, for HOPLS-CP and N-PLS, the contribution of jth process variable at sampling time k to the T2 statistic is defined as HOPLS-CP: csT, jk = xs , jk ts S 1 pTjk N-PLS: csT, jk = xs , jk t s S 1wˆ Tjk
(44) (45)
In Eq. (44), p jk (1 R) is the jk-th row vector of the matrix P =P JK R , and P is defined in Eq. (32). In Eq. (45), wˆ jk (1 R) is the jk-th row vector of the matrix Wˆ JK R defined in Eq. (33). 21
ACCEPTED MANUSCRIPT 4.3.2. Complete monitoring procedure
T
The moving data window technique is also applied for online monitoring to track dynamic and
IP
nonlinear changes of the batch process. A window mode multilinear PLS model is built from training
SC R
data, and then control limits of monitoring statistics are calculated. For a new batch, its window data { X knew ( J d ) , Yknew ( M d ) } at the sampling time k are projected on the window mode multilinear
NU
PLS model to compute the latent (or score) vector and residuals, which are further used to calculate
MA
the T2 and Q statistics. If any monitoring statistic exceeds its control limit, a fault is detected and the fault variable is identified by contribution plot.
D
The complete quality-relevant monitoring procedure for a batch process includes two phases:
TE
Phase I: Build offline window mode multilinear PLS models
CE P
Step 1: Select reference batches and construct the training dataset. Step 2: Choose the size d of the moving data window.
AC
Step 3: For all sampling time k≥d, preprocess window data of all reference batches to zero-mean and unit-variance. Step 4: Choose a multilinear PLS method, and build window mode multilinear PLS models for all sampling time k≥d. Step 5: Calculate control limits of QX, QY and T2 statistics for all sampling time k≥d. Phase II: Online monitoring Step 1: For a new batch, preprocess its window data { X knew ( J d ) , Yknew ( M d ) } at the sampling time k≥d. Step 2: Project window data on the window mode multilinear PLS model. 22
ACCEPTED MANUSCRIPT Step 3: Compute QX, QY and T2 statistics of the new batch. Step 4: Check whether QX, QY and T2 statistics exceed their control limits. If yes, go to step 6, or
IP
T
go to step 5.
SC R
Step 5: Return back to step 1 and monitor the next sampling time k + 1. Step 6: Diagnose the fault variable via contribution plots.
NU
5. Case study
MA
The multilinear PLS-based quality prediction and quality-relevant monitoring methods are tested in a benchmark fed-batch penicillin fermentation process. The simulator PenSim v2.0 developed by
D
Birol et al. [27] was used to simulate this batch process. 80 normal batches were generated by
TE
running the simulator under normal operation conditions with small variations. The duration of each
CE P
batch was 200 h. As listed in Table 1, fourteen process variables were measured online with a sampling interval of 1 h. The penicillin concentration and the biomass concentration were chosen as
AC
quality variables, and other twelve variables were classified into input variables. Similar to refs [28] and [29], six fault batches were generated considering various fault variables, directions and magnitudes, as listed in Table 2. Each fault was introduced into the batch at 100 h and lasted until the end of the batch.
5.1. Case I: Quality prediction
A total of 60 normal batches are chosen to constitute the training dataset, denoted by {X(60×12×200), Y(60×2×200)}. The remaining 20 normal batches serve as the test dataset, denoted by {Xnew(20×12×200), Ynew(20×2×200)}. The root mean square errors of prediction (RMSEP) and an 23
ACCEPTED MANUSCRIPT R index are used to quantify the prediction performance of different models. The RMSEP is
Y Yˆ
2
n , where Yˆ denotes the prediction of Y, and n is the number Y . A smaller RMSEP and a larger R
IP
of elements in Y. The R index is defined as R 1 Y Yˆ
T
calculated by RMSEP
SC R
indicate the better prediction performance.
Four window mode (WM) multilinear PLS models, termed as WMHOPLS-Tucker,
NU
WMHOPLS-RTucker, WMHOPLS-CP and WMN-PLS, are built separately to make a comparison between them. In the WMHOPLS-RTucker model, the time mode of data is set as the free mode.
MA
These four multilinear PLS models are also compared with the window mode MPLS (WMMPLS) model that is built by combining MPLS with the moving data window technique. For all models, the
TE
D
size of the moving data window is chosen as 10. The numbers of latent (or score) vectors in different models are chosen by 5-fold cross-validation [30]. To investigate the effect of noise on quality
CE P
prediction, white Gaussian noise with varying signal noise ratios (SNRs) of 20 dB, 10 dB and 5 dB are added into both training data and test data.
AC
Table 3 compares the prediction performance of five different models under different noise levels, where the case without noise is denoted by SNR = Inf dB. Note that all the R indexes in Table 3 are larger than 0.97, because Y Yˆ
is much smaller than Y , implying the good prediction results in
all cases. Fig. 6 shows prediction results of five models for a test batch without noise. When without noise, WMMPLS and WMHOPLS-Tucker have the smallest RMSEP and the largest R index, i.e., the best prediction performance, closely followed by WMHOPLS-RTucker and then the WMN-PLS. WMHOPLS-CP has relatively low prediction accuracy as its RMSEP is larger than other four models. However, as the noise level increases, the RMSEP of WMMPLS increases faster than those of four 24
ACCEPTED MANUSCRIPT multilinear models. As a result, WMMPLS has the worst prediction performance when SNR ≤ 10 dB. This means that multilinear models are more robust to noise, probably because information across all
IP
T
modes of data is used in multilinear models [13]. WMHOPLS-RTucker provides the best prediction
SC R
results under all noise conditions. Although WMHOPLS-CP has the worst prediction performance at SNR = Inf dB and 20 dB, its performance is comparable to WMHOPLS-Tucker,
NU
WMHOPLS-RTucker and WMN-PLS with the noise level increasing to 10 dB and 5 dB. The superiority of multilinear models is significant for SNRs = 10 dB and 5 dB. As shown in Table 3, the
MA
RMSEP of WMHOPLS-RTucker decreases by about 20.4% as compared to the RMSEP of WMMPLS when SNR = 10 dB. For SNR = 5 dB, in comparison with WMMPLS, RMSEP values of
TE
D
WMHOPLS-Tucker, WMHOPLS-RTucker, WMHOPLS-CP and WMN-PLS decrease by about 16.9%, 21.4%, 18.7% and 18.4%, respectively. Prediction results of five models for a test batch at
CE P
the noise level of 5 dB are shown in Fig. 7. As shown in Fig. 7a, multilinear models have better prediction performance than WMMPLS at a high noise level, because their predictions are closer to
AC
simulation data. Note that WMHOPLS-Tucker and WMHOPLS-RTucker have lower efficiencies than WMN-PLS and WMHOPLS-CP, because it is time-consuming to select appropriate model parameters for them. By contrast, because WMN-PLS and WMHOPLS-CP have higher computational efficiency, lower RMSEP and better noise robustness, they may be more applicable to real industrial processes. Figure 6. Quality prediction results of five models for a test batch without noise. (a) Penicillin concentration, (b) Biomass concentration. Figure 7. Quality prediction results of five models for a test batch at the noise level of 5 dB. (a) Penicillin concentration, (b) Biomass concentration.
25
ACCEPTED MANUSCRIPT 5.2. Case II: Quality-relevant monitoring
T
In this case study, four training datasets containing 5, 10, 20 and 50 reference batches are
IP
constructed. The six fault batches in Table 2 constitute the test dataset. Five window mode
SC R
monitoring models, namely WMHOPLS-Tucker, WMHOPLS-RTucker, WMHOPLS-CP, WMN-PLS and WMMPLS, are built separately for comparison. The size of the moving window is set as 10.
NU
Fault detection rate (FDR) and false alarm rate (FAR) are used to quantify the monitoring
MA
performance. FDR is defined as the percentage of samples outside the control limit in fault conditions. FAR is the percentage of samples outside the control limit in non-fault conditions. A
D
larger FDR and a smaller FAR indicate the better fault detection performance.
TE
Fault detection results of five models are compared in Table 4, where average FDRs and FARs of
CE P
six fault batches are listed for each model. Fig. 8 shows monitoring charts of five models for the training dataset with 10 reference batches. Clearly, all monitoring models provide well fault
AC
detection results when the training dataset contains more than 20 reference batches. However, if the number of reference batches in the training dataset decreases to 10 and 5, QX and QY statistics of WMMPLS have larger FARs (≥ 10%), implying the poor monitoring performance. For the training dataset with 10 reference batches, four multilinear models all have well monitoring performance, except that WMN-PLS has a small FAR (≈ 4%) in the QY statistic (see Fig. 8). However, the QX and QY statistics of WMMPLS have larger FARs (see Fig. 8). When the training dataset only has 5 reference batches, similar to WMMPLS, the T2 statistics of four multilinear models almost fail to detect faults (FDR < 4%), and the QY statistics have large FARs (> 25%). However, different from WMMPLS, the QX statistics of multilinear models still successfully detect all process faults with 26
ACCEPTED MANUSCRIPT high FDRs (> 97%) and without false alarms. These results indicate that multilinear models are better than WMMPLS when the training dataset has fewer reference batches.
IP
T
Figure 8. Monitoring charts of five models for the training dataset with 10 reference batches. (a) WMHOPLS-Tucker, (b) WMHOPLS-RTucker, (c) WMHOPLS-CP, (d) WMN-PLS, (e) WMMPLS.
SC R
To further investigate the effect of noise on fault detection, white Gaussian noise with SNRs of 20 dB, 10 dB and 5 dB are added into the training dataset containing 20 reference batches as well as the
NU
test dataset. Table 5 lists monitoring results of five models at different noise levels, where Inf dB
MA
indicates the case without noise. Obviously, the noise degrades the monitoring performance of all models. A higher noise level results in the worse monitoring performance. The noise has the largest
D
influence on the T2 statistic, then the QY statistic and last the QX statistic. In the noise conditions, T2
TE
statistics of five models all have very small FDRs, and QY statistics show sharp decreases in FDRs
CE P
and large increases in FARs. However, except at a high noise level of 5 dB, QX statistics of four multilinear models have smaller decreases in FDRs and slightly increases in FARs, keeping good
AC
monitoring performance. Four multilinear models almost have similar monitoring performance. Compared with multilinear models, WMMPLS is more sensitive to noise, because its T2 statistic has a smaller FDR and its two Q statistics have larger FARs than those of multilinear models under the same conditions. As shown in Table 5, WMMPLS is actually not applicable for process monitoring when SNR ≤ 20 dB, due to its very small FDRs (< 6%) in T2 statistic and larger FARs (≥ 7%) in two Q statistics. Therefore, multilinear models are better than WMMPLS in dealing with noise data. After a fault was successfully detected by monitoring statistics, the fault process variable is diagnosed using the contribution plot. In the case of using the training dataset with 20 reference batches and without noise, the contribution plot of WMHOPLS-Tucker for the fault batch No. 5 is 27
ACCEPTED MANUSCRIPT illustrated in Fig. 9. Clearly, variable 1, i.e., aeration rate, is the largest contributor to all monitoring statistics at most sampling time in the fault condition, as marked in the red color in Fig. 9. Therefore,
IP
T
it is reasonable to identify the aeration rate as the fault variable, which is consistent with the preset
SC R
fault variable in Table 2. Contribution plots of WMHOPLS-RTucker, WMHOPLS-CP, WMN-PLS, and WMMPLS can be found in Supplementary data, and they also indicate that the aeration rate is the fault variable for fault batch No. 5.
NU
Figure 9. Contribution plots of WMHOPLS-Tucker for the fault batch No. 5.
MA
6. Conclusion
D
The multilinear PLS method was applied for quality prediction and quality-relevant monitoring in
TE
batch processes. Four multilinear PLS models, including HOPLS-Tucker, HOPLS-RTucker,
CE P
HOPLS-CP and N-PLS, were investigated. Especially, HOPLS-RTucker and HOPLS-CP were derived by imposing extra constraints on HOPLS-Tucker, and thus they can be regarded as two special cases of HOPLS-Tucker. HOPLS-RTucker restricts the loading matrices on some specific
AC
modes (called free mode) of multi-way data to be the identity matrix to keep information on these free modes unchanged. HOPLS-CP replaces the Tucker decomposition used in HOPLS-Tucker with the CP decomposition to get a simplified model. HOPLS-RTucker and HOPLS-CP both have higher computational efficiencies than HOPLS-Tucker. Moreover, HOPLS-CP model has the similar form as the N-PLS model, while their solving algorithms are totally different. HOPLS-CP may have a higher computational efficiency than N-PLS, while N-PLS may fit data better than HOPLS-CP. The quality prediction and quality-relevant monitoring methods were then developed by combining multilinear PLS with the moving data window technique. These methods were tested in a 28
ACCEPTED MANUSCRIPT fed-batch penicillin fermentation process. The results indicate that the multilinear PLS method outperform the unfold-PLS (i.e., MPLS) method in two aspects. One is that multilinear PLS has a
IP
T
better anti-noise capability than unfold-PLS. The other is that multilinear PLS has better monitoring
SC R
performance than unfold-PLS for datasets with fewer reference batches. Four multilinear PLS methods have little differences in the prediction and monitoring performance. By contrast, N-PLS
NU
and HOPLS-CP are more practical because of their simplicity and high computational efficiency.
MA
Acknowledgements
This work was supported by the National Natural Science Foundation of China (No. 61304116),
AC
References
CE P
TE
D
the Zhejiang Provincial Natural Science Foundation of China (No. LQ13B060004).
[1] P. Nomikos, J.F. MacGregor, Monitoring batch processes using multiway principal component analysis, AIChE J. 40 (1994) 1361–1375. [2] P. Nomikos, J.F. MacGregor, Multi-way partial least squares in monitoring batch processes, Chemom. Intell. Lab. Syst. 30 (1995) 97–108. [3] C.K. Yoo, J.M. Lee, P.A. Vanrolleghem, I.B. Lee, On-line monitoring of batch processes using multiway independent component analysis, Chemom. Intell. Lab. Syst. 71 (2004) 151–163. [4] X. Meng, A.J. Morris, E.B. Martin, On-line monitoring of batch processes using a PARAFAC representation, J. Chemom. 17 (2003) 65–81. 29
ACCEPTED MANUSCRIPT [5] D.J. Louwerse, A.K. Smilde, Multivariate statistical process control of batch processes based on three-way models, Chem. Eng. Sci. 55 (1999) 1225–1235.
SC R
component analysis, Comput. Chem. Eng. 28 (2004) 1837–1847.
IP
T
[6] J.M. Lee, C. Yoo, I.B. Lee, Fault detection of batch processes using multiway kernel principal
[7] N. Lu, F. Gao, Stage-based process analysis and quality prediction for batch processes, Ind. Eng.
NU
Chem. Res. 44 (2005) 3547–3555.
[8] C. Zhao, F. Wang, Z. Mao, N. Lu, M. Jia, Quality prediction based on phase-specific average
MA
trajectory for batch processes, AIChE J. 54 (2008) 693−705.
[9] J. Yu, Multiway gaussian mixture model based adaptive kernel partial least squares regression
TE
D
method for soft sensor estimation and reliable quality prediction of nonlinear multiphase batch processes, Ind. Eng. Chem. Res. 51 (2012) 13227−13237.
CE P
[10] Q. Zhao, C.F. Caiafa, D.P. Mandic, Z.C. Chao, Y. Nagasaka, N. Fujii, L. Zhang, A. Cichocki, Higher-order partial least squares (HOPLS): A generalized multi-linear regression method,
IEEE
AC
Trans. Pattern Anal. 35 (2013) 1660–1673. [11] L.J. Luo, S.Y. Bao, Z.L. Gao, J.Q. Yuan, Batch process monitoring with GTucker2 model, Ind. Eng. Chem. Res. 53 (2014) 15101−15110. [12] H.A.L. Kiers, I. Van Mechelen, Three-way component analysis: Principles and illustrative application, Psych. Methods. 6 (2001) 84–110. [13] R. Bro, Multi-way calibration. Multi-linear PLS, J. Chemom. 10 (1996) 47–62. [14] T.G. Kolda, B.W. Bader, Tensor decompositions and applications, SIAM Rev. 51 (2009) 455–500. 30
ACCEPTED MANUSCRIPT [15] T.G. Kolda, Multilinear operators for higher-order decompositions, Tech. Report. SAND2006-2081, Sandia National Laboratories, Albuquerque, New Mexico, Livermore, California,
IP
T
2006.
SC R
[16] P. Geladi, B.R. Kowalski, Partial least-squares regression: A tutorial, Anal. Chim. Acta 185 (1986) 1–17.
NU
[17] S. Wold, P. Geladi, K. Esbensen, J. Ohman, Multi-way principal components-and PLS-analysis, J. Chemom. 1 (1987) 41–56.
MA
[18] L. Stlhle, Aspects of the analysis of three-way data, Chemom. Intell. Lab. Syst. 7 (1989) 95–100.
TE
D
[19] A.K. Smilde, Comments on Multilinear PLS, J. Chemom. 11(1997) 367–377. [20] L. de Lathauwer, Decompositions of a higher-order tensor in block terms-Part II: Definitions
CE P
and uniqueness, SIAM J. Matrix Anal. Appl. 30 (2008) 1033–1066. [21] L. de Lathauwer, B. de Moor, J. Vandewalle, On the best rank-1 and rank-(R1, R2, ..., RN)
AC
approximation of higher-order tensors, SIAM J. Matrix Anal. Appl. 21 (2000) 1324–1342. [22] R.A. Harshman, Foundations of the PARAFAC procedure: Models and conditions for an “explanatory” multi-modal factor analysis, UCLA Working Papers in Phonetics 16 (1970) 1–84. [23] R. Bro, A. Smilde, S. de Jong, On the difference between lowrank and subspace approximation: improved model for multilinear PLS regression, Chemom. Intell. Lab. Syst. 58 (2001) 3–13. [24] P. Nomikos, J.F. MacGregor, Multivariate SPC charts for monitoring batch processes, Technometrics 37 (1995) 41–59. [25] E.B. Martin, A.J. Morris, Non-parametric confidence bounds for process performance 31
ACCEPTED MANUSCRIPT monitoring charts, J. Process Control 6 (1996) 349–358. [26] J.A. Westerhuis, S.P. Gurden, A.K. Smilde, Generalized contribution plots in multivariate
IP
T
statistical process monitoring, Chem. Intell. Lab. Syst. 51 (2000) 95–114.
SC R
[27] G. Birol, C. Undey, A. Cinar, A modular simulation package for fed-batch fermentation: penicillin production, Comput. Chem. Eng. 26 (2002) 1553–1565.
NU
[28] L.J. Luo, Tensor global-local preserving projections for batch process monitoring, Ind. Eng. Chem. Res. 53 (2014) 10166–10176.
MA
[29] L. J. Luo, S.Y. Bao, Z.L. Gao, J.Q. Yuan, Batch process monitoring with tensor global–local structure analysis, Ind. Eng. Chem. Res. 52 (2013) 18031–18042.
TE
D
[30] T. Hastie, R. Tibshirani, J. Friedman, The elements of statistical learning: data mining, inference,
AC
CE P
and prediction (2nd ed.), Springer, New York, 2011.
32
ACCEPTED MANUSCRIPT Table 1.Process variables in the fed-batch penicillin fermentation process
IP
T
Type
SC R
Unit L/h W L/h K g/L % L mmol/L K kcal/h L/h g/L g/L
Input variable
MA
NU
Process variables Aeration rate Agitator power input Substrate feeding rate Substrate feeding temperature Substrate concentration DO saturation Culture volume CO2 concentration pH Bioreactor temperature Generated heat Cold water flow rate Biomass concentration Penicillin concentration
Quality variable
D
Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14
TE
Table 2. Fault batches in the fed-batch penicillin fermentation process Fault batch No. Fault description
AC
CE P
1 2 3 4 5 6
10% step increase in aeration rate 10% step increase in agitator power input 10% step increase in substrate feeding rate 10% step decrease in substrate feeding rate 2 L/h ramp decrease in aeration rate 5 W ramp decrease in agitator power input
33
ACCEPTED MANUSCRIPT
Table 3. Prediction performance of five models at different noise levels WMHOPLS-RTucker
WMHOPLS-CP
R
RMSEP (g/L)
R
RMSEP (g/L)
Inf dB 0.9994
0.0027
0.9989
0.0054
0.9948
0.0252
20 dB 0.9949
0.0247
0.9954
0.0223
0.9920
10 dB 0.9874
0.0607
0.9886
0.0550
5 dB
0.0954
0.9813
0.0902
0.9802
WMMPLS
R
RMSEP (g/L)
R
RMSEP (g/L)
0.9986
0.0070
0.9994
0.0027
0.0387
0.9954
0.0223
0.9950
0.0243
0.9869
0.0633
0.9881
0.0573
0.9857
0.0691
0.9807
0.0933
0.9806
0.0937
0.9762
0.1148
MA N
US
CR
RMSEP (g/L)
TE D
R
IP
SNR
WMN-PLS
T
WMHOPLS-Tucker
AC
CE P
Bold values present the smallest RMSEP for each SNR.
34
ACCEPTED MANUSCRIPT Table 4. Average fault detection rates (FDRs) and false alarm rates (FARs) of six faults for five models
QX
T2
QY
QX
WMN-PLS
IP
WMHOPLS-CP T2
QY
CR
T2
WMHOPLS-RTucker
QX
QY
T2
QX
WMMPLS QY
T2
QX
QY
100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
20
100.0 100.0
10
97.0
100.0 100.0
96.5
100.0
5
3.63
97.2
100.0
3.63
50
0
0
0
0
20
0
0
0
10
0
0
0
5
0
0
25.6
99.7
100.0 100.0
99.8
100.0
99.7
100.0 100.0 100.0
99.8
100.0
99.8
99.8
99.5
100.0
99.7
98.8
100.0
99.8
99.0
100.0
99.7
97.2
100.0
3.63
97.2
100.0
3.63
97.2
100.0
3.3
100.0
99.8
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
4.4
0
10.0
26.7
0
0
25.6
0
0
25.6
0
0
25.6
0
34.4
23.3
0
AC
FAR
99.5
CE P
FDR
MA N
50
TE D
batch number
WMHOPLS-Tucker
US
Total reference
T
using different numbers of reference batches without noise
Bold values present the largest FAR in each case.
35
ACCEPTED MANUSCRIPT Table 5. Average fault detection rates (FDRs) and false alarm rates (FARs) of six faults for five models
WMHOPLS-CP
QX
QY
100.0
100.0
99.5
T2
QX
QY
T2
QX
QY
T2
QX
QY
99.8 100.0 99.7 100.0 100.0 100.0 99.8 100.0 99.8
20 dB
8.6
84.8
37.8
8.6
85.0
38.1
8.4
85.0
38.1
8.4
85.0
38.4
2.6
86.6
39.1
10 dB
7.8
82.5
25.2
7.9
82.5
24.8
7.8
82.7
25.2
7.8
82.0
25.2
4.8
83.7
30.4
5 dB
9.4
72.1
19.5
9.9
71.6
20.3
8.3
73.6
19.1
7.8
71.6
20.6
5.4
72.1
21.8
Inf dB
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
20 dB
0
1.7
6.3
0
1.9
6.3
0
1.7
6.3
0
1.7
6.3
0
7.0
9.3
10 dB
0
2.2
8.5
AC
Inf dB 100.0 100.0 99.7
T2
WMMPLS
US
QY
MA N
QX
CR
SNR T2
WMN-PLS
IP
WMHOPLS-Tucker WMHOPLS-RTucker
T
using 20 reference batches at different noise levels
0
2.6
8.5
0
2.4
8.1
0
2.4
8.1
0
8.6
11.1
5 dB
0
10.7
7.4
0
10.6
7.6
0
11.1
7.8
0
11.1
7.6
0
12.6
10.6
FAR
CE P
TE D
FDR
Bold values present the smallest FDR or the largest FAR for each SNR.
36
US
CR
IP
T
ACCEPTED MANUSCRIPT
AC
CE P
TE D
MA N
Fig. 1
37
AC
CE P
TE D
MA N
US
CR
IP
T
ACCEPTED MANUSCRIPT
Fig. 2 38
AC
CE P
TE D
MA N
US
CR
IP
T
ACCEPTED MANUSCRIPT
Fig. 3 39
AC
CE P
TE D
MA N
US
CR
IP
T
ACCEPTED MANUSCRIPT
Fig. 4 40
AC
Fig. 5
CE P
TE D
MA N
US
CR
IP
T
ACCEPTED MANUSCRIPT
41
AC
CE P
TE D
MA N
US
CR
IP
T
ACCEPTED MANUSCRIPT
Fig. 6a 42
AC
CE P
TE D
MA N
US
CR
IP
T
ACCEPTED MANUSCRIPT
Fig. 6b
43
AC
CE P
TE D
MA N
US
CR
IP
T
ACCEPTED MANUSCRIPT
Fig. 7a 44
AC
CE P
TE D
MA N
US
CR
IP
T
ACCEPTED MANUSCRIPT
Fig. 7b 45
AC
CE P
TE D
MA N
US
CR
IP
T
ACCEPTED MANUSCRIPT
Fig. 8a
46
AC
CE P
TE D
MA N
US
CR
IP
T
ACCEPTED MANUSCRIPT
Fig. 8b 47
AC
CE P
TE D
MA N
US
CR
IP
T
ACCEPTED MANUSCRIPT
Fig. 8c 48
AC
CE P
TE D
MA N
US
CR
IP
T
ACCEPTED MANUSCRIPT
Fig. 8d 49
AC
CE P
TE D
MA N
US
CR
IP
T
ACCEPTED MANUSCRIPT
Fig. 8e 50
AC
CE P
TE D
MA N
US
CR
IP
T
ACCEPTED MANUSCRIPT
Fig. 9
51
ACCEPTED MANUSCRIPT
Figures captions
IP
T
Figure 1. One component decomposition of a three-way array X(I×J×K).
CR
Figure 2. Schematic diagram of a HOPLS-Tucker model for three-way arrays X and Y.
US
Figure 3. Schematic diagram of a HOPLS-RTucker model for three-way arrays X and Y.
MA N
Figure 4. Schematic diagram of a HOPLS-CP model for three-way arrays X and Y. Figure 5. Preprocessing procedure for batch data.
TE D
Figure 6. Quality prediction results of five models for a test batch without noise.
CE P
(a) Penicillin concentration, (b) Biomass concentration. Figure 7. Quality prediction results of five models for a test batch at the noise level of 5 dB.
AC
(a) Penicillin concentration, (b) Biomass concentration. Figure 8. Monitoring charts of five models for the training dataset with 10 reference batches. (a) WMHOPLS-Tucker, (b) WMHOPLS-RTucker, (c) WMHOPLS-CP, (d) WMN-PLS, (e) WMMPLS. Figure 9. Contribution plots of WMHOPLS-Tucker for the fault batch No. 5. 52
ACCEPTED MANUSCRIPT
Highlights
IP
T
1. Two new multilinear PLS models named HOPLS-CP and HOPLS-RTucker are developed.
CR
2. Batch process monitoring methods are proposed based on multilinear PLS.
US
3. Multilinear PLS is used for the quality prediction in batch processes.
AC
CE P
TE D
MA N
4. Multilinear PLS has better monitoring and prediction abilities than unfold-PLS.
53