A FRAMEWORK FOR PLS-SIM INTEGRATION

A FRAMEWORK FOR PLS-SIM INTEGRATION

14th IFAC Symposium on System Identification, Newcastle, Australia, 2006 A FRAMEWORK FOR PLS-SIM INTEGRATION Riccardo Muradore ∗,∗∗ , Fabrizio Bezzo ...

3MB Sizes 1 Downloads 114 Views

14th IFAC Symposium on System Identification, Newcastle, Australia, 2006

A FRAMEWORK FOR PLS-SIM INTEGRATION Riccardo Muradore ∗,∗∗ , Fabrizio Bezzo ∗∗ ∗

European Southern Observatory, Garching bei M¨ unchen, Germany ∗∗ Dipartimento di Principi e Impianti di Ingegneria Chimica, University of Padova, Italy

Abstract: A novel algorithm is presented for the design of inferential estimators for process monitoring and control. The algorithm aims at integrating Partial Least Squares (PLS) techniques and Subspace Identification Methods (SIM) to exploit the main advantages of both methodologies. In particular, the algorithm will retain the PLS computational robustness in dealing with large sets of correlated inputs and outputs, whilst profiting by the SIM dynamic description of the system being investigated. Keywords: PLS, subspace identification methods

1. INTRODUCTION

2. PRELIMINARIES In this section, the PLS regression and SIM are briefly reviewed. The purpose is to highlight and to compare their pros and cons in order to better understand the proposed algorithm.

Partial Least Squares (PLS) is a very widespread technique to deal with large numbers of (possibly) correlated data ((Geladi and Kowalski, 1986), (Helland, 1988), (Helland, 1990)). However, a potential drawback of PLS consists in its inherently static nature. A potential solution is to include lagged variables in the input matrix in order to take into account past inputs and outputs (Ku et al., 1995), (Lakshminarayanan et al., 1997). A different approach is to merge Subspace Identification Methods (SIM) with Principal Component Analysis (PCA) regression (Negiz and C ¸ inar, 1997), (Wang and Qin, 2002), (Treasure et al., 2004). A comparison between static and dynamic regression can be found in (Shi and MacGregor, 2000). The goal of this paper is to suggest a methodology to combine PLS and SIM. In particular, the iterative algorithm structure of the PLS is preserved, but in the each estimation step SIM is used instead of static regression.

Let {x(·)} ∈ Rm and {y(·)} ∈ Rp be two jointly stationary second–order ergodic processes and X, Y two realizations 1 X := [ x(t0 ) x(t1 ) · · ·

x(tN −1 ) ] ∈ Rm×N ,

Y := [ y(t0 )

y(tN −1 ) ] ∈ Rp×N .

y(t1 ) · · ·

The goal is to construct a static (using PLS) or a dynamic (using SIM) model mapping future valˆ (t) of y(t). Observe ues of x(t) in an estimation y that x and y do not necessarily have the meaning of input and output of the plant. In fact, they can be thought to be two different sets of measurements, i.e. process variables and quality variables. To obtain good product and/or to keep the system 1 From now on, bold letters are used for random vectors whereas italic letters for the sample values.

291

ε, then set i := i + 1 and go back to step 2; else, tk := tik and go to step 7. (7) Linear regression step: find the loading vector pk = Σ−1 tk Σtk Xk regressing the rows of Xk on tk

under control, it is necessary to estimate the set of quality variables (having usually a small sample time) using the process variables (having a high sample time). For sake of readability, the random vectors x(·) and y(·) are assumed to be scaled to zero-mean and unit variance.

Xk = pˆk tk + Vk , pˆk =

(8) Matrix updating: compute the residual matrices Xk+1 and Yk+1

2.1 A review of PLS: NIPALS algorithm

ˆ [Xk |tk ] = Xk − qˆk tk Xk+1 = Xk − E ˆ Yk+1 = Yk − E [Yk |tk ] = Yk − pˆk tk .

The nonlinear iterative partial least squares (NIPALS) algorithm (Geladi and Kowalski, 1986) is written here with the rows and columns reversed with respect to the usual notation and without normalization. The goal is to give for each equation a geometric interpretation that will be useful in the following. Here, the Greek letter Σ is used to indicate variance and covariance matrices. Moreover, E[·] and E[·|·] are the expectation and the conditional expectation, respectively. The ˆ slightly different symbol E[·|·] is introduced for the linear projection operator (in the Gaussian case, ˆ E[·|·] = E[·|·]).

They comprise the information of the original matrices X and Y not explained by the first k loadings. (9) Check the stopping rule: it is possible to compute loadings until the maximum from p and m is achieved. Otherwise, the procedure can be stopped when the explained variance on the Y –block is big enough or when the increment between two steps is not significative. At the end of the procedure, the original matrices are written as

After setting Y1 = Y , X1 = X, the following iterations are repeated for each loading, k = 1, 2, . . .. Observe that the subscript in each variable means the loading number, whereas the apex the number of iteration.

N X = k=1 qˆk tk + E = QT + E N Y = k=1 pˆk tk + F = P T + F where N is the number of loadings and E and F are the error matrices. These equations have good numerical properties even if the original data are collinear. From the above matrices, the estimated linear static model becomes

(1) Initialization: set s0k be equal to a row of Yk . (2) Linear regression step: find the loading vector ˆ −1 ˆ X si Σ that regresses the rows of w ˆki = Σ k k si Xk on sik :

k

Xk = w ˆki sik + Eki , w ˆki =

Xk (sik )T . sik (sik )T

y(t) = Ax(t).

(1)

The matrix Eki is the estimation error. (3) Estimation step: compute the score tik as the LSE of sik based on Xk using the model (1)

2.2 A review of SIM A innovation state–space model of z =

 i T i −1 i T ˆk ) w ˆk (w ˆ k ) Xk . tik := sˆik = (w

 z(t) =

k

Yk = qˆki tik + Wki , qˆki =

Yki (tik )T . tik (tik )T

  x is y

given by 2

(4) Linear regression step: find the loading vector Σtik Yk that regresses the rows of Yk qki = Σ−1 ti on tik

Xk (tk )T . tk (tk )T

A C

K I

 e(t)

(4)

where the innovation process e has variance equal to Π. The unknowns are the matrices A, K, C, Π, where K is the steady–state Kalman gain.

(2)

The matrix Wki is the estimation error. as (5) Estimation step: compute the score si+1 k the LSE of tik based on Yk using the model (2)

2

 Here andin the following, the compact notation y = A B C D description

 i T i −1 i T := tˆik = (ˆ qk ) qˆk (ˆ qk ) yk . si+1 k (6) Check convergence: compare two subsequent i+1 − sik  > realizations of sk (sik , si+1 k ). If sk

292

u is used instead of the state–space model



x(t + 1) = Ax(t) + Bu(t) y(t) = Cx(t) + Du(t).

(3)

there are collinear components in the matrix Z (as it often happens in industrial data), the computation may be ill-posed. This means that even if the estimator (8) is correct from a theoretical point of view, it could still give wrong results because of numerical issues.

Subspace methods for time–series can be used to estimate all these matrices, (Van Overschee and De Moor, 1993), (Lindquist and Picci, 1996). As before an estimator of y(t) based on x(t) needs to be designed. The model (4) can be written as 





A x(t) =⎣ L y(t) C    Π11 e1 (t) = Var Π21 e2 (t)

K1 I 0

⎤  K2  e (t) , (5) 0 ⎦ 1 e2 (t) I

To take advantage of the above methods, an estimator like (8) is derived by following the PLS–philosophy. The idea is to replace linear regressions with subspace identifications in the steps 2, 4, 7 in the PLS iterative algorithm.

 Π12 . Π22

In order to find the dynamic system mapping x(t) into y(t), a transformation matrix to obtain uncorrelated noises ex , ey is first performed 

  I ex (t) = −Π21 Π−1 ey (t) 11

0 I



3. DYNAMIC LINEAR PLS ALGORITHM The algorithm is partitioned as the static PLS algorithm in Section 2.1, and it is explained by the flowchart in Figure 1. 2

 e1 (t) . e2 (t)

After setting J = Π21 Π−1 11 , the system (5) becomes

X

⎤  A K x Ky  e (t) x(t) (6) =⎣ L I 0 ⎦ x y(t) ey (t) C J I     Πx 0 ex (t) , (7) = Var ey (t) 0 Πy 





W (z) : X → sˆ

G J

Q(z) : Y → tˆ

With respect to the standard PLS, there are two important differences: a) static linear regressions are now substituted by system identifications. b) instead of (dynamically) regress Xk on sik in step 2 and Yk on tik in step 4, the algorithm regresses sik on Xk and tik on Yk , respectively. This avoids the inversion of transfer functions. Thus, the algorithm is structured as follows:

 x(t)

s

Fig. 1. Flowchart of the Dynamic PLS

At the end, the estimator takes the form A − GL C − JL

5 t

ˆ (t) based on (1) compute the state estimation x x(t0 ), . . ., x(t − 1) using a (one–step–ahead or a priori) steady–state Kalman filter, ˆ (t) in the (2) substitute the state estimation x quality measurement equation.



Y

3

where Kx := K1 and Ky := K2 + K1 J. Substituting the x–measurement equation into the state equation and into the y–measurement equation (quality measurement equation), a system mapping x(t) and ey (t) into y(t) is obtained. To get ˆ (t) for y(t) based on past measurethe estimator y ments x(t0 ),. . .,x(t − 1), it is necessary to

ˆ (t) = y

4

(8)

where G is the Kalman gain

−1 + Ky G = [A − Ky L] P LT LP LT + Πx and P is the positive–definite solution of the algebraic Riccati equation  −1 T   L P · P = (A − Ky L) P − P L LP LT + Πx T

· (A − Ky L) + Ky Πy KyT . It is important to highlight that numerical robustness of SIMs is strongly dependent on the data. If

293

(1) Initialization: set s0k equal to a variable of the vector yk and let s0k be the corresponding trajectory. (2) SIM step: identify the state–space system mapping xk into sik . The first step is to find the stochastic system the en describing  xk hanced output vector using the data sik   Xk sik ⎤ ⎡ i i i  i    K2,k Ak K1,k e1,k (t) xk (t) i ⎦ ⎣ = Lk I 0 sik (t) ei2,k (t) 0 1 cik  i   i  e1,k Π11 Πi12 i Var = Πk = . Πi21 Πi22 ei2,k

The quality measurement sik (t) is now a scalar fictitious variable. In order to find the system mapping xk into sik , one needs to do the same manipulations described in Section 2.2. All matrices have a subscript related to the loading number (k) and an apex related to the iteration number (i). Since the meaning of the matrices is the same, the mathematics is not reported. The model is given by

(6) Check convergence: compare two subsequent realizations of sk . If si+1 − sik  > ε then k i := i + 1 and go back to step 2; else, tk = tik (i.e. tk = tik ) and go to step 7. (7) SIM step: identify the state–space systems ˆ k and of y ˆk mapping tk into x   Ax,k Bx,k ˆ k (t) = x (11) tk (t) Cx,k Dx,k   Ay,k By,k ˆ k (t) = y tk (t). (12) Cy,k Dy,k

sik (t) = Wki (z)xk (t) +nik (t)    Wki (z) =



ˆ sik (t)

Aik − Gik Lik cik − Jki Lik

Gik Jki

Moreover, since it is necessary to move from the real measurement Xk (t) to the fictitious variable tk (t), the system mapping xk into tk   At,k Bt,k tk (t) = xk (t) Ct,k Dt,k  i  Ak − Gik Lik Gik = xk (t) (13) cik − Jki Lik Jki

 (9)

where the model noise nik (t) is orthogonal to ˆsik (t). (3) Estimation step: compute the scores tik . By redefining the variable ˆsik as the score variable tik , it is possible to obtain a realization tik (t), t = t0 , . . . , tN −1 as the output of the system (9) driven by Xk (t), t = t0 , . . . , tN −1 . The k–score data matrix at the i–step is defined as tik := [ tik (t0 )

tik (t1 )

···

tik (tN −1 ) ] ∈ R1×N .

(4) SIM step: identify the state–space system mapping yk into tik . The first step is to find the stochastic system the en describing  yk hanced output vector using the data tik   Yk : tik

(9) Check the stopping rule. The procedure is stopped when the explained variance on the Y –block is big enough or when the increment between two steps is not significative.

⎤  i  Fki Gi1,k Gi2,k w1,k (t) yk (t) i ⎦ ⎣ = M I 0 i k tik (t) w2,k (t) 0 1 hik  i    i w1,k (t) Λ11,k Λi12,k Var . = i w2,k (t) Λi21,k Λi22,k 





The model takes the form tik (t) = Qik (z)yk (t) +vki (t)    Qik (z) =



ˆ tik (t)

Fki − Gik Mki hik − Vki Mki

Gik Vki

si+1 k (t1 )

···

Let N be the number of dynamic loadings computed. The plant is written as N   Ax,k x(t) = Cx,k k=1

Bx,k Dx,k

N   Ay,k y(t) = Cy,k k=1

By,k Dy,k

 tk (t) + e(t) (16)  tk (t) + f (t) (17)

where e(t) and f (t) are the approximation errors. ˆ (t) of y(t) Note that to compute the estimation y based on the measurements x(t0 ), x(t1 ), . . ., x(t− 1), the transfer functions (13) from x1 (t), x2 (t), . . ., xN (t) to t1 (t), t2 (t), . . ., tN (t) are also needed (see the equations at the top of the next page).

 (10)

where the model noise vki (t) is orthogonal to ˆtik (t). (5) Estimation step: compute the scores sik+1 . By redefining the variable ˆtik (t) as the score variable sik+1 , it is possible to obtain a new realization si+1 k (t), t = t0 , . . . , tN −1 as the output of the system (10) driven by Yk (t), score matrix is t = t0 , . . . , tN −1 . The si+1 k si+1 := [ si+1 k k (t0 )

is also computed. (8) Matrix updating: compute the residual matrices Xk+1 and Yk+1 by subtracting the estimations obtained using (11) and (12), respectively,   Ax,k Bx,k Xk+1 := Xk − tk (14) Cx,k Dx,k   Ay,k By,k Yk+1 := Yk − tk . (15) Cy,k Dy,k

When the scores are available, it is possible to compute the estimation according to formula: N   Ay,k ˆ (t) = y Cy,k k=1

si+1 k (tN −1 ) ] .

294

=

N  k=1

By,k Dy,k

Gtk y (z)tk (t).

 tk (t)



At,1  Ct,1 At,2 t2 (t) = Ct,2 .. .  At,n tn (t) = Ct,i t1 (t) =

  Bt,1 x1 (t) = Dt,1   Bt,2 x2 (t) = Dt,2 Bt,n Dt,n



 xn (t) =

At,1 Ct,1 At,2 Ct,2 At,n Ct,n

 Bt,1 x(t) Dt,1    Ax,1 Bt,2 x(t) − Dt,2 Cx,1 Bt,n Dt,n

 x(t) −

Remark 1. Note that the updating of Xk in equation (14) is not necessary because the only necessary updating is the one related to Yk (this fact also holds in the standard PLS). If the input matrix is not updated, the new scores ¯ti (which differ from ti in equation (13)) can be calculated by  ¯ti (t) =

A¯t,i C¯t,i

¯t,i B ¯ Dt,i

 x(t)

Until now, we have used as input and output data only process and quality variables. It is also possible to introduce the controlled variables (if they are available) in the model by resorting to the subspace identification with inputs algorithms, (Van Overschee and De Moor, 1994). The above algorithm changes only in step 2 where the identified plant becomes ⎤⎡ ⎤ ⎡ i i i  u(t) K2,k Ak Bki K1,k xk (t) = ⎣ Lik 0 I 0 ⎦ ⎣ ei1,k (t) ⎦ sik (t) i 0 0 1 ck ei2,k (t)  i  i   e1,k Π11 Πi12 Var = Πik = . i Πi21 Πi22 e2,k Therefore the dynamic PLS takes the form

ˆ (t) = y

k=1

Cy,k

Dy,k



u(t) xk (t)

Bx,k Dx,k



 tk

=: Gxtn (z)x(t)

REFERENCES



u x At,k Bt,k Bt,k Ct,k 0 Dt,k  n  Ay,k By,k 

Ax,k Cx,k

The paper has presented and discussed a novel algorithm to integrate PLS and SIM methodologies. The main benefit is that a truly dynamic description of the system being considered is feasible without losing the numerical advantages of a PLS approach. Even if some work needs to be done to define the theoretical foundation for the proposed algorithm and demonstrate its convergence, the first simulations have produced promising results. The method is effective for the same reasons as the PLS algorithm. Recursively, it constructs subspaces that become after each iteration closer to each other (sik ). The only difference is that the computation also takes into account the process dynamics. In other words, the data fitting obtained by identification is more accurate that the static data fitting obtained by using the leastsquares method. Future work will include the application of the suggested algorithm to process data to assess its effectiveness when compared to standard PLS.

4. AN IMPROVEMENT: DYNAMIC PLS WITH INPUTS



k=1



=: Gxt2 (z)x(t)

t1 (t)

5. CONCLUSION

without computing x2 (t), x3 (t), . . ., xn (t). Then it is possible to resort to the more compact block diagram of figure 3 instead of using the one obtained following the previous construction, see figure 2.

tk (t) =

n−1

Bx,1 Dx,1

=: Gxt1 (z)x(t)







tk (t)

where the controlled inputs and the process variables are first projected into the score space and then are used to estimate the quality variables.

295

Geladi, P. and B. R. Kowalski (1986). Partial least-squares regression: a tutorial. Analytica Chimica Acta 185, 1–17. Helland, I.S. (1988). On the structure of partial least squares regression. Commun. Statist. B– Simulation Comput. 17(2), 581–607. Helland, I.S. (1990). Partial least squares regression and statistical models. Scand. J. Statist. 17, 97–114. Ku, W., R.H. Storer and C. Georgakis (1995). Disturbance detection and isolation by dynamicic principal component analysis. Chemometrics and Intelligent Laboratory Systems 30, 179–196. Lakshminarayanan, S., S.L. Shah and K. Nadakumar (1997). Modeling and control of multivariable processes: dynamic pls approach. AIChE J. 43(9), 2307–2322. Lindquist, A. and G. Picci (1996). Canonical correlation analysis, approximate covariance extension, and identification of stationary time series. Automatica 32(5), 709–733. Negiz, A. and A. C ¸ inar (1997). Statistical monitoring of multivariable dynamic pro-



x(t)

x ˆ(t)



−x1 (t)



+ 

At,1 Ct,1

Bt,1 Dt,1

Ax,1 Cx,1

Bx,1 Dx,1

At,2 Ct,2

Bt,2 Dt,2

Ax,2 Cx,2

Bx,2 Dx,2

.. .



t1 (t)



t2 (t)



Ay,1 Cy,1

By,1 Dy,1

Ay,2 Cy,2

By,2 Dy,2



+

At,N Ct,N

yˆ1 (t)



yˆ2 (t)





yˆ(t) +



.. .

.. . −xN (t)



Bt,N Dt,N





tN (t)

Ay,N Cy,N

By,N Dy,N

 yˆ (t) N

Fig. 2. Block diagram of the basic Dynamic PLS 

 x(t)

¯t,1 A ¯t,1 C

¯t,1 B ¯ t,1 D

¯t,2 A ¯t,2 C

¯t,2 B ¯ t,2 D



t¯1 (t)





t¯2 (t)



Ay,1 Cy,1

By,1 Dy,1

Ay,2 Cy,2

By,2 Dy,2



yˆ1 (t)



yˆ2 (t) yˆ(t) +

.. .

.. . 

¯t,N A ¯t,N C

¯t,N B ¯ t,N D



t¯N (t)



Ay,N Cy,N

By,N Dy,N

 yˆ (t) N

Fig. 3. Block diagram of the compact Dynamic PLS cesses with state-space models. AIChE J. 43(8), 2002–2020.

component analysis. Journal of Process Control 12, 841–855.

Shi, R. and J.F. MacGregor (2000). Modeling of dynamic systems using latent variable and subspace methods. J. of Chemometrics 14, 423–439. Treasure, R.J., U. Kruger and J.E. Cooper (2004). Dynamic multivariate statistical process control using subspace identification. J. Process Control 14, 279–292. Van Overschee, P. and B. De Moor (1993). Subspace algorithms for the stochastic identification problem. Automatica 29(3), 649–660. Van Overschee, P. and B. De Moor (1994). N4sid: Subspace algorithms for the identification of combined deterministic-stochastic systems. Automatica 30(1), 75–93. Wang, J. and S.J. Qin (2002). A new subspace identification approach based on principal

296