Chemometrics and Intelligent Laboratory Systems 107 (2011) 312–317
Contents lists available at ScienceDirect
Chemometrics and Intelligent Laboratory Systems j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / c h e m o l a b
Maintenance-free soft sensor models with time difference of process variables Hiromasa Kaneko, Kimito Funatsu ⁎ Department of Chemical System Engineering, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113–8656, Japan
a r t i c l e
i n f o
Article history: Received 19 January 2011 Received in revised form 27 March 2011 Accepted 25 April 2011 Available online 5 May 2011 Keywords: Soft sensor Process control Maintenance-free Prediction accuracy deterioration Time difference
a b s t r a c t In industrial plants, soft sensors are widely used to estimate process variables that are difficult to measure online. However, their predictive accuracy gradually decreases with changes in the state of the plants. Although regression models are reconstructed with database which includes newest data to solve this problem, some problems remain in practice. Therefore, we have attempted to reduce the effects of deterioration with age on soft sensor models without maintenance of the models. By constructing models based upon the time difference of an objective variable and that of explanatory variables, the effects of drift and gradual changes can be handled. We verified the superiority of the proposed method over traditional ones with simulation data and applied this method to actual industrial data. It was confirmed that the proposed method could achieve almost the same predictive accuracy as the updating model for 3 years without reconstruction of the model. © 2011 Elsevier B.V. All rights reserved.
1. Introduction Soft sensors have been widely used to estimate process variables that are difficult to measure online [1,2]. An inferential model is constructed between those variables that are easy to measure online and those that are not, and an objective variable is then estimated using that model. In particular, the partial least squares (PLS) method [3,4] has been used as a modeling method for soft sensors. In addition, various methods such as a nonlinear PLS method [5,6], methods using artificial neural network [7,8], and support vector machine based regression methods [9–11] have been researched for use as soft sensor methods. Through the use of soft sensors, the values of objective variables can be estimated with a high degree of accuracy. Their use, however, involves some practical difficulties. One crucial difficulty is that their predictive accuracy gradually decreases due to changes in the state of chemical plants, catalyzing performance loss, sensor and process drift, and so on. In order to reduce the degradation of a soft sensor model, the updating of regression models [12–17] and Just-In-Time (JIT) modeling [18,19] have been proposed. While many excellent results have been reported based upon the use of these methods, there remain some problems for the introduction of soft sensors into practice. First of all, if soft sensor models are reconstructed with the inclusion of any abnormal data, their predictive ability can deteriorate [17]. Though such abnormal data must be detected with high accuracy, under present circumstances it is difficult to accurately detect all of them [20]. Second, reconstructed models have a high
⁎ Corresponding author. Tel.: +81 3 5841 7751; fax: +81 5841 7771. E-mail address:
[email protected] (K. Funatsu). 0169-7439/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.chemolab.2011.04.016
tendency to specialize in predictions over a narrow data range [21]. Subsequently, when variations in the process variables occur, these models cannot predict the resulting variations in data with a high degree of accuracy. Third, if a soft sensor model is reconstructed, the parameters of the model, for example, the regression coefficients in linear regression modeling, are dramatically changed in some cases. Without the operators' understanding of a soft sensor model, the model cannot be practically applied. Whenever soft sensor models are reconstructed, operators check the parameters of the models and/or the indices into which the parameters are condensed so that they will be safe for operation. This takes a lot of time and effort. Fourth, it is not considered so significant, but the data used to reconstruct soft sensor models are also affected by the drift of sensors and the process. In the construction of the model, data must be selected from a database which includes both data affected by the drift and data after correction of the drift. In order to solve these problems, we have attempted to reduce the effects of sensor drift upon soft sensor models without maintenance of the models. By constructing models based upon the time difference of an objective variable y and that of explanatory variables X, the effects of deterioration with age, such as the drift, and gradual changes in the state of plants can be accounted for. In other words, models which are not affected by these changes must be constructed using not the values of process variables but the time difference in soft sensor modeling if these changes progress at a constant rate. Therefore, we have no need to reconstruct the soft sensor models for keeping up with those changes. In this paper, a model whose construction is based upon the time difference of y and that of X is referred to as a ‘time difference model.’ Time difference models can also have high predictive accuracy even after drift correction because the data are represented as the time difference that cannot be affected by the drift.
H. Kaneko, K. Funatsu / Chemometrics and Intelligent Laboratory Systems 107 (2011) 312–317
313
First, we showed the superiority of a time difference model over traditional soft sensor models with the simulation data and, next, applied the proposed method to real industrial data. The time difference model achieved highly predictive accuracy over 3 years without reconstruction of the model.
correlation. When time difference models are used, we should pay attention to the effect of the auto-correlation to the signal to noise ratio.
2. Time difference modeling method
We verified the superiority of a time difference model over traditional soft sensor models with the simulation data and applied the proposed method to the real industrial data obtained from an operation of a distillation column. We used a PLS method [3] as a regression approach. The details of PLS are shown in Appendix B.
Fig. 1 shows the difference between a traditional procedure and the proposed one. In a traditional procedure, modeling relationship between explanatory variables, X(t), and an objective variable, y(t), is done by regression methods after preparing data, X(t) and y(t), related to time t. In terms of prediction, the constructed model predicts the value of y(t′) with the new data x(t′). In time difference modeling, time differences, ΔX(t) and Δy(t), are first calculated between the present values, X(t) and y(t), and those in some time i before the target time, X(t − i) and y(t − i). ΔX ðt Þ = X ðt Þ−X ðt−iÞ Δyðt Þ = yðt Þ−yðt−iÞ
ð1Þ
3. Results and discussion
3.1. Modeling of the simulation data The proposed method was compared with traditional methods using the simulated data to verify the superiority of the proposed method over traditional ones. First, three vectors p1, p2, and p3 of uniform pseudorandom numbers whose range was from 3 to 6 were prepared. Then, an objective variable, y, was set as follows: 1=2
2
Then, relationship between ΔX(t) and Δy(t) is modeled by regression methods. In terms of prediction, the constructed model predicts the time difference of y(t′), Δy(t′), using the time difference of the new data, Δx(t′), calculated as follows: 0 0 0 Δx t = x t −x t −i
ð2Þ
+ p3 + Nð0; 0:1Þ
ð4Þ
where N(0, 0.1) is random numbers from normal distribution given a standard deviation of 0.1 and a mean of 0. Explanatory variables, X, were prepared as follows: xi = pi + 0:005 × ½1 2 3 ⋯ n T + N ð0; 0:1Þði = 1; 2; 3Þ x4 = x21
y(t′) can be calculated as follows: 0 0 0 y t = Δy t + y t −i
y = p1 + ðp2 + 5Þ
1=2
ð5Þ
x5 = ðx2 + 5Þ ð3Þ
because y(t′ − i) is given previously. By constructing time difference models, the effects of deterioration with age such as the drift and gradual changes in the state of plants can be accounted for, because data is represented as time difference that cannot be affected by these factors. Therefore, there is no need to reconstruct the models for keeping up with the changes that derive from those factors. In time series analysis, it is said that a signal to noise of a time-differenced variable is affected by the auto-correlation of the variable [22] while a linear trend with time is removed by time difference. Signal to noise ratios of a normal variable and a timedifferenced variable is shown in Appendix A. The signal to noise ratio decreases after time difference in the existence of the auto-
(a) Traditional procedure
x6 = Nð0; 0:1Þ X = ½x1 x2 x3 x4 x5 x6 where the second term of xi (i = 1, 2, 3) represents the effects of change with age at a constant rate and n is the number of samples. There exists nonlinearity between p1 and y, and p2 and y as shown in Eq. (4). Then, x4 and x5 represent the nonlinearity in Eq. (5). Accordingly, the effects of the change with age at a constant rate in x1 and x2 are non-linearized slightly. n was set as 201 and the numbers of training and test data were set as 101 and 100, respectively. After the preparation of data, the three methods listed below were applied to the data. A: Do not update a PLS model constructed between the values of y and those of X. B: Update a PLS model constructed between the values of y and those of X. C: Do not update a PLS model constructed between the time difference of y and that of X. The method C is the proposed method. The time difference was calculated between the present values and the previous values. The modeling and prediction results are shown in Table 1. The 2 details of the statistics are shown in Appendix C. The rpred value is an r2 value that is calculated with the test data. From Table 2, there was 2 little difference in the results of training data. The rpred value of A was small and this is degradation of the soft sensor model. Fig. 2 shows the
Table 1 Modeling and prediction results of the simulation data. Training data
(b) Proposed procedure Fig. 1. Difference between a traditional procedure and the proposed one. (a) Traditional procedure. (b) Proposed procedure.
r A: do not update model (value) B: update model (value) C: do not update model (time difference)
2
0.959 0.959 0.986
q
Test data 2
0.954 0.954 0.968
2 rpred
0.560 0.870 0.959
314
H. Kaneko, K. Funatsu / Chemometrics and Intelligent Laboratory Systems 107 (2011) 312–317
Table 2 Process variables. Symbol
Objective variable
A
Bottom product concentration
No.
Symbol
Explanatory variables
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
F1 F2 F3 F4 F5 F6 P1 P2 T1 T2 T3 T4 T5 T6 T7 T8 F1/F6 F4/F3
Reflux flow Reboiler flow Feed 1 flow Feed 2 flow Bottom flow Top flow Pressure 1 Pressure 2 Temperature 1 Temperature 2 Temperature 3 Temperature 4 Bottom temperature Feed 2 temperature Feed 1 temperature Top temperature Feed flow ratio Reflux ratio
relationship between y values and predicted y values with test data. We confirmed the bias of the prediction errors in Fig. 2(a). This bias is derived from the effects of the change in Eq. (5). Although the 2 rpred value was increased by updating the model as shown in the result of B and there existed almost the linear relationship between y values and predicted y values in Fig. 2(b), there still seemed the bias of the prediction errors. Predictive accuracy may increase if the number of the data used to construct the updating model changes, but in practice, it is difficult to decide the optimal number of the data with which soft sensors are constructed.
(a) Method A
(b) Method B
Fig. 3. Schematic representation of the distillation column.
On the other hand, the time difference model achieved the highest 2 rpred value in this case study. In addition, as shown in Fig. 2(c), the bias and the variation of the prediction errors were small, and thus, the predicted y values were in good agreement with y values. Therefore, we can say that the highly predictive model which was not affected by the effects of change with age could be constructed by using time difference of variables without reconstruction of the model. 3.2. Application to a distillation column We analyzed the data obtained from the operation of a distillation column at Mizushima works, Mitsubishi Chemical Corporation. Fig. 3 shows a schematic representation of the distillation column and Table 2 shows the process variables. An objective variable is the concentration of the bottom product having a lower boiling point; and explanatory variables are that represent 19 variables such as temperature and pressure. The input variables are F3 and F4, and the operational variables are F1 and F2. The measurement interval of y is 30 minutes. For training data we used data from monitoring that took place from January to March 2003, and for test data we used data from April 2003 to December 2006. Unsteady-state data that reflects variations caused by plant inspections were eliminated in advance because any soft sensor models inherently cannot predict the y-values in unsteady-state where there is little association between X and y in this case. However, we can judge the state of the plant is steady or unsteady by using the model proposed previously in this paper [17]. The four methods listed below were applied to the data. A: Do not update a PLS model constructed between the values of y and those of X. B: Do not update a PLS model constructed between the time difference of y and that of X. C: Update a PLS model constructed between the values of y and those of X. D: Update a PLS model constructed between the time difference of y and that of X.
(c) Method C Fig. 2. The relationship between y values and predicted y values with test data.
The result of D is shown only for the comparison with the other methods because our objective is to reduce the degradation of a soft sensor model without maintenance of the model, that is, without update of the model. The time difference was calculated between the present values and those that were 30 minutes before the present time because the
H. Kaneko, K. Funatsu / Chemometrics and Intelligent Laboratory Systems 107 (2011) 312–317
measurement interval of y was 30 minutes, and then, the smallest interval was also 30 minutes. The smaller the interval is, the better the predictive ability the constructed model with time difference will have. If the time interval is large, the model will be affected by the disturbance during the interval. In order to incorporate the dynamics of process variables into soft sensor models, X included each explanatory variable that was delayed for durations ranging from 0 minute to 60 minutes in steps of 10 minutes, that is, X consists of 7 time points (0, 10, 20, 30, 40, 50, 60 minutes) times 19 variables (7 × 19 = 133). In addition, the models of A and C include the y-variables delayed for 30 minutes in explanatory variables in order to ensure consistency with the time difference models. Thus, the number of explanatory variables is 134 for methods A and C and is 133 for method B and D. The number of samples is 4297 for methods A and B; and the models are updated with the newest 500 samples for each test sample for the methods C and D. The models constructed by the methods A, B, C, and D are corresponding to vector autoregressive (VAR) models in multiple time series analysis [23]. By using VAR models, we can incorporate useful information in the past of X and y to the soft sensors. In addition to VAR models, the explanatory variables of the models A, B, C and D includes the present value of the 19 explanatory variables in this case study. On the other hand, signal to noise ratios of time-differenced variables are affected by the auto-correlation of the variables [22]. The modeling and prediction results are shown in Table 3. 1H and 2H mean first half of the year and second half of the year, respectively. The details of the root mean squared error (RMSE) are shown in Appendix C. The RMSE values in Table 3 are corresponding to the 2 rpred values, which are r 2 values that are calculated with the test data. The training data is included in 1H, 2003 as an exception. The predictive accuracy of C was higher than A due to use of the time difference. In addition, C did not decrease the degree of accuracy after the passage of time, for example, in 2006. The deterioration of predictive accuracy was reduced by constructing a time difference model. The RMSE values for C are almost the same as those for B. It is important that a time difference model was constructed using only data from January to March 2003 and the model was not reconstructed, or maintained, for over 3 years. It is possible for a predictive model to be constructed without updating by using the time difference. The results from distance-based JIT models are not presented here, but they were almost identical to those of the updating models. As shown by the RMSEs of D, there was no increase in predictive accuracy with updating if a time difference model was used in comparison to method C. This may come from the decreased signal to noise ratio by the time difference in Appendix A. Fig. 4 shows the relationship between measured and predicted y in 2H, 2006. In Fig. 4(a), there exists the bias of the prediction errors as shown by the simulation data in Section 3.1. The time series plot of the RMSE values calculated with previous 48 data (1 day) in 2H, 2006 is shown in Fig. 5. The gray dashed line represents the result of A; the gray continuous line represents the result of B; and the black continuous line represents the result of C.
(a) Method A
315
(b) Method B
(c) MethodC Fig. 4. The relationship between measured and predicted y in 2H, 2006.
The RMSE values, that is, the prediction errors started to increase at the middle of the plot for method A, which came from the effects of deterioration with age. Meanwhile, the models of methods B and C could predict y-values without the long-term degradation of predictive accuracy. Though the plot of Fig. 4(b) showed an almost linear trend along the diagonal globally, if values of y are small, there remained the bias of the prediction errors. The updating model could not keep up with a change of the plants. On the other hand, the plot of Fig. 4(c) showed a much tighter clustering of predicted values along the diagonal, reflecting the higher prediction of density. Additionally, the bias of the prediction errors could be eased as shown in Fig. 4(c). Therefore, it is concluded that overall
Table 3 Modeling and prediction results for y. The RMSE values of each period are shown in this table. Year
2003
Month
1H
2H
1H
2H
1H
2H
1H
2H
0.417 0.216 0.220 0.217
0.528 0.266 0.275 0.281
0.411 0.337 0.381 0.377
0.976 0.374 0.454 0.459
0.809 0.204 0.230 0.220
0.343 0.311 0.329 0.328
0.474 0.354 0.408 0.416
0.447 0.290 0.310 0.320
Method
A B C D
2004
2005
2006
1H: first half of the year, 2H: second half of the year. A: do not update model (value), B: update model (value), C: do not update model (time difference), D: update model (time difference).
Fig. 5. The time series plot of the RMSE values calculated with previous 48 data (1 day) in 2H, 2006. The gray dashed line represents the result of A; the gray continuous line represents the result of B; and the black continuous line represents the result of C. The symbols of A, B, and C are summarized in the body text.
316
H. Kaneko, K. Funatsu / Chemometrics and Intelligent Laboratory Systems 107 (2011) 312–317
values of y could be predicted with high accuracy by using the proposed method.
Because e does not have auto-correlation, the signal to noise ratio of Δx(t), η2, is calculated as follows:
4. Conclusion
η2 =
In this paper, we have proposed the construction of time difference models for practical soft sensors in order to solve the problems involved in updating regression models and JIT modeling, and we then analyzed actual industrial data to which the proposed method was applied. The proposed method displayed almost the same predictive accuracy as the updating model for a period of over 3 years, even when the time difference model was not updated. We therefore confirmed the usefulness of the proposed method even after a period of 3 years without maintenance of the constructed model. In addition, the proposed method can be applied to any method used for constructing soft sensor models. In this study, the measurement interval of y was almost constant, but in the future, examples in which the measurement interval is not constant should be investigated for practical soft sensors using the proposed methods. Acknowledgements The authors acknowledge the support of Mizushima works, Mitsubishi Chemical and the financial support of Japan Society for the Promotion of Science. Appendix A. The signal to noise ratios of a process variable and time difference of the process variable When a process variable, x, is assumed to be a stationary process, we check the signal to noise ratios of the x-variable and the timedifference of the x-variable. First, x(t) is divided to important variation, v(t), and white noise, e(t), as follows:
xðt Þ = vðt Þ + eðt Þ
ðA:1Þ
The signal to noise ratio of x(t), η1, is calculated as follows: varfvðt Þg η1 = varfeðt Þg
ðA:2Þ
where var{v(t)} and var{e(t)} are variance of v(t) and that of e(t), respectively. Then, we consider time difference of x(t), Δx(t), as follows: Δxðt Þ = xðt Þ−xðt−iÞ
ðA:3Þ
varfΔvðt Þg varfΔeðt Þg
=
varfvðt Þ−vðt−iÞg varfeðt Þ−eðt−iÞg
=
varfvðt Þg + varfvðt−iÞg−2 covfvðt Þ; vðt−iÞg varfeðt Þg + varfeðt−iÞg
ðA:7Þ
where cov{v(t),v(t − i)} is covariance between v(t) and v(t − i), which means auto-correlation of v. Here, var{v(t − i)} equals var{v(t)}; var {e(t − i)} equals var{e(t)}; because x is a stationary process. Thus, η2 is rewritten as follows: η2 =
varfvðt Þg− covfvðt Þ; vðt−iÞg varfeðt Þg
ðA:8Þ
From Eqs. (A.2) and (A.8), η2 is lower than η1 for the autocorrelation of v(t). This means that the signal to noise ratio decreases by using time difference because a process variable and the i delayed variable do not usually have negative correlation in process data if i is small to some extent. When time difference models are used, we should pay attention to the effect of the auto-correlation to the signal to noise ratio while a linear trend with time is removed by time difference. Appendix B. PLS PLS is a method for relating explanatory variables, X, and an objective variable, y, using a linear multivariate model; it goes beyond traditional regression methods in that it also models the structures of X and y. In PLS modeling, the covariance between y and the score vector ti is maximized. A PLS model has higher predictive power than ordinary least-squares models [24]. A PLS model consists of the following two equations: X = TP′ + E
ðA:1Þ
y = Tq + f
ðA:2Þ
where T is a score matrix, P is an X-loading matrix, q is a y-loading vector, E is a matrix of X residuals, and f is the vector of y residuals. The PLS regression model is as follows: y = Xb + const
ðA:3Þ
−1 q b = W P′ W
ðA:4Þ
where W is an X-weight matrix and b is a vector of regression coefficients. Appendix C. Statistics
By using Eqs. (A.1) and (A.3), Δx(t) can be expressed as follows: Δxðt Þ = fvðt Þ + eðt Þg−fvðt−iÞ + eðt−iÞg = Δvðt Þ + Δeðt Þ
ðA:4Þ
To construct a highly predictive model, the number of components in the PLS models must be chosen appropriately. The r 2 and q 2 values are used as the measure and defined as follows: 2
r = 1− where 2
Δvðt Þ = vðt Þ−vðt−iÞ
ðA:5Þ
Δeðt Þ = eðt Þ−eðt−iÞ
ðA:6Þ
q = 1−
2
∑ðyobs −ycalc Þ ∑ðyobs −yÞ2
2 ∑ yobs −ypred ∑ðyobs −yÞ2
ðB:1Þ
ðB:2Þ
where yobs is the measured y value, ycalc is the calculated y value, and ypred is the predicted y value in the cross-validation procedure, which
H. Kaneko, K. Funatsu / Chemometrics and Intelligent Laboratory Systems 107 (2011) 312–317
is a technique for assessing how the results will generalize. In this study, the leave-one-out method [25] is used in the calculation of ypred. r 2 represents the fitting accuracy of the constructed models, and q 2 represents the predictive accuracy of the constructed models. Values close to unity for both r 2 and q 2 are favorable. The r 2 and q 2 values must both be compared using models constructed with the 2 same objective variables data. The rpred value is an r 2 value that is calculated with the test data. The root-mean-square error (RMSE) of ycalc and ypred is defined as follows:
RMSE =
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 u u∑ y −y t obs calc;pred n
ðB:3Þ
where ycalc,pred means ycalc or ypred. The lower the RMSE value is, the more accurately the prediction is obtained with the constructed model. References [1] M. Kano, Y. Nakagawa, Data-based process monitoring, process control, and quality improvement: recent developments and applications in steel industry, Comput. Chem. Eng. 32 (2008) 12–24. [2] P. Kadlec, B. Gabrys, S. Strandt, Data-driven soft sensors in the process industry, Comput. Chem. Eng. 33 (2009) 795–814. [3] S. Wold, M. Sjöström, L. Eriksson, PLS-regression: a basic tool of chemometrics, Chemom. Intell. Lab. Syst. 58 (2001) 109–130. [4] B. Lin, B. Recke, J.K.H. Knudsen, S.B. Jorgensen, A systematic approach for soft sensor development, Comput. Chem. Eng. 31 (2007) 419–425. [5] G. Baffi, E.B. Martin, A.J. Morris, Non-linear projection to latent structures revisited (the neural network PLS algorithm), Comput. Chem. Eng. 23 (1999) 1293–1307. [6] S.J. Zhao, J. Zhang, Y.M. Xu, Z.H. Xiong, Nonlinear projection to latent structures method and its applications, Ind. Eng. Chem. Res. 453 (2006) 843–3852. [7] V.R. Radhakrishnan, A.R. Mohamed, Neural networks for the identification and control of blast furnace hot metal quality, J. Process Control 10 (2000) 509–524.
317
[8] X.Z. Dai, W.C. Wang, Y.H. Ding, Z.Y. Sun, “Assumed inherent sensor” inversion based ANN dynamic soft-sensing method and its application in erythromycin fermentation process, Comput. Chem. Eng. 30 (2006) 1203–1225. [9] V.N. Vapnik, The nature of statistical learning theory, Springer, New York, 1999. [10] W.W. Yan, H.H. Shao, X.F. Wang, Soft sensing modeling based on support vector machine and bayesian model selection, Comput. Chem. Eng. 28 (2004) 1489–1498. [11] D.E. Lee, J.H. Song, S.O. Song, E.S. Yoon, Weighted support vector machine for quality estimation in the polymerization process, Ind. Eng. Chem. Res. 44 (2005) 2101–2105. [12] B.S. Dayal, J.F. MacGregor, Recursive exponentially weighted PLS and its applications to adaptive control and prediction, J. Process Control 7 (1997) 169–179. [13] S.J. Qin, Recursive PLS algorithms for adaptive data modeling, Comput. Chem. Eng. 22 (1998) 503–514. [14] S.J. Mu, Y.Z. Zeng, R.L. Liu, P. Wu, H.Y. Su, J. Chu, Online dual updating with recursive PLS model and its application in predicting crystal size of purified terephthalic acid (PTA) process, J. Process Control 16 (2006) 557–566. [15] Y.F. Fu, H.Y. Su, J.A. Chu, MIMO soft-sensor model of nutrient content for compound fertilizer based on hybrid modeling technique, Chin. J. Chem. Eng. 15 (2007) 554–559. [16] J.L. Liu, On-line soft sensor for polyethylene process with multiple production grades, Control Eng. Practice 15 (2007) 769–778. [17] H. Kaneko, M. Arakawa, K. Funatsu, Development of a new soft sensor method using independent component analysis and partial least squares, AIChE J. 55 (2009) 87–98. [18] C. Cheng, M.S. Chiu, A new data-based methodology for nonlinear process Modeling, Chem. Eng. Sci. 59 (2004) 2801–2810. [19] K. Fujiwara, M. Kano, S. Hasebe, A. Takinami, Soft-sensor development using correlation-based just-in-time modeling, AIChE J. 55 (2009) 1754–1765. [20] K. Ookita, Operation and quality control for chemical plants by soft-sensors, CICSJ Bulletin 24 (2006) 31–33 (in Japanese). [21] K. Kaneko, M. Arakawa, K. Funatsu, Applicability domains and accuracy of prediction of soft sensor models, AIChE J. 57 (2011) 1506–1513. [22] C.T.J. Alkemade, W. Snelleman, G.D. Boutilier, B.D. Pollard, J.D. Winefordner, T.L. Chester, N. Omenetto, A review and tutorial discussion of noise and signal-tonoise ratios in analytical spectrometry-I. Fundamental principles of signal-tonoise ratios, Spectrochim. Acta, Part B 33 (1978) 383–399. [23] H. Lütlepohl, New introduction to multiple time series analysis, Springer, Berlin, 2005. [24] K. Faber, B.R. Kowalski, Propagation of measurement errors for the validation of predictions obtained by principal component regression and partial least squares, J. Chemom. 11 (1997) 181–238. [25] J. Shao, Linear model selection by cross-validation, J. Am. Stat. Assoc. 88 (1993) 486–494.