ELSEVIER
Computers and Chemical Engineering 24 (2000) 871-877
Computers &Chemical Engineering www.elsevier.com/locate/compchemeng
A nonlinear soft sensor based on multivariate smoothing procedure for quality estimation in distillation columns Sungyong Park, Chonghun Han * Department of Chemical Engineering, Automation Research Center, Pohang University of Science and Technology (POSTECH), San 31 Hyojadong, Pohang, Kyungpuk 790-784, South Korea
Abstract
An accurate on-line measurement of quality variables are essential for the successful monitoring and control tasks in chemical process operations. However, due to the measurement difficulties such as the large time delays, the soft sensor, an inferential model, for the target quality variable, has been widely used as an alternative for the physical sensors. Partial least-squares (PLS) was used to develop a soft sensor because it can handle the correlations among many variables. However, the successful applications of linear projection methods like PLS were limited to only the cases without strong nonlinearities. This paper proposes a design methodology to build a soft sensor for chemical processes that can handle the correlations among many process variables and nonlinearities based on smoothness concept. The method has been directly motivated by the locally weighted regression that estimates a regression surface through multivariate smoothing. The proposed method will be illustrated by comparisons with other familiar methods. The industrial case studies have shown that the proposed method gives a better or equal performances over other methods such as PLS, nonlinear PLS and artificial neural networks. © 2000 Elsevier Science Ltd. All rights reserved. Keywords: Soft-sensing; Non-parametric regression; Smoothing; Distillation columns
1. Introduction
For successful monitoring and control of chemical plants, there are important quality variables that are difficult to measure on-line, due to limitations such as cost, reliability, and long dead time. These measurement limitations may cause important problems such as product loss, energy loss, toxic byproduct generation, and safety problem. A soft sensor, an inferential model, can estimate the qualities of interest on-line using other available on-line measurements such as temperatures and pressures. The soft sensor can be derived from the first principal model when the model offers the sufficient accuracy within the reasonable computation time. However, there are cases when the first principle model is not available, or sometimes it takes too much time to compute. As a result, empirical models are the most popular ones to develop soft sensors. Empirical models * Corresponding author. Tel.: + 82-562-2792279; fax: + 82-5622792699. E-mail address:
[email protected] (C. Han)
are usually obtained based on various modeling techniques such as multivariate statistics and artificial neural networks. Recently, multivariate statistical methods based on linear projection, such as principal component analysis (PCA) and partial least-squares (PLS), have attracted wide interests as robust methods for constructing empirical models, particularly when there are high dimensionality and collinearities in the data. Especially, PLS and its variations have been applied to many practical regression problems in chemical engineering such as estimating distillation compositions (Mejdell & Skogestad, 1991a,b; Kresta, Marlin & MacGregor, 1994), and estimating polymer quality variables (Skagerberg, MacGregor & Kiparissides, 1992). Although these linear projection methods can handle high dimensionality and collinearity, their major restriction is that only linear information is extracted from the data. Since many chemical processes are nonlinear in nature, it is desirable to have a robust method which can model any nonlinearity. At present, there are many nonlinear methods available. For example, artificial
0098-1354/00/$ - see front matter © 2000 Elsevier Science Ltd. All rights reserved. PII: S 0 0 9 8 - 1 3 5 4 ( 0 0 ) 0 0 3 4 3 - 4
872
S. Park, C. Han / Computers and Chemical Engineering 24 (2000) 871-877
neural networks (ANN), alternating conditional expectations (ACE), projection pursuit regression (PPR), nonlinear PLS (Frank & Lanteri, 1988; Wold, Kettaneh-Wold & Skagerberg, 1989; Qin & McAvoy, 1992; Malthouse, Tamhane & Mah, 1997) are wellknown nonlinear regression methods. This paper proposes a new nonlinear method that is conceptually simple and quite easy to use. This method has been motivated by locally weighted regression (LWR or loess) proposed by Cleveland and Devlin (1988). The LWR method has been applied to several fields such as providing a best-fit curve to data in geology (Howarth & McArthur, 1997), regression in near-infrared (NIR) or mid-infrared (MIR) spectroscopy (Naes, Isaksson & Kowalski, 1990; Andrade, Sanchez & Sarabia, 1999), analyzing enzymatic browning of vegetables (Bro & Heimdal, 1996). The LWR is motivated by the assumption that neighboring values of the predictor variables are the best indicators of the response variable in that range of predictor values. Hence, LWR is a way of estimating a regression surface through multivariate smoothing: the response variable is smoothed as a function of the predictor variables in a moving fashion. LWR consists of developing a moving local model to a set of nearest neighbors. This paper proposes a multivariate LWR approach for the chemical engineering applications with high dimensionality, collinearity, and nonlinearity to estimate qualities of interest and show the results of applications to an industrial splitter column and a crude column.
2. Theoretical backgrounds
2.1. Multivariate statistical methods based on linear projection: PCA, P L S PCA and PLS are basically multilinear algorithms which can handle the high dimensionality and collinearity by projecting the information in the data down into a low dimensional space defined by a small number of orthogonal latent variables (Pl, P2 . . . . . PA). The new values (tl, t2..... tg) for the latent variables in the new dimension summarize the information contained in the original data set. The scaled and centered X is represented in PCA as: X = t~p~ + t2P2w+ . . . + tAPTA+ E Where Pa is the latent variable calculated sequentially for each component to maximize its variance. In the case of PLS, response block, Y, is also decomposed as follows:
Y = tlq~ + t2q~ + . . . + tAq~ + F
The decomposition is similar to PCA, but PLS does the calculation in a particular way (Wold, 1966). PLS representation is such that the covariance between the latent variable (ta) and the responses (Ya) is maximized at each component. It is very important to determine the correct number of components for a PLS model. With many predictor variables there is a substantial risk for overfitting. Cross-validation which is used in this paper is a practical and reliable way to determine the number of components.
2.2. Locally weighted regression Locally weighted regression is a procedure for estimating a nonlinear regression surface to data through multivariate smoothing. The basic framework is as follows. Suppose that the data are generated by Yi =
g(xi) + ~i
As in the most commonly used framework for regression, the ei are assumed to be independent normal variables. Contrary to other regression methods, however, only that g is a smooth function of the predictor variables is supposed. With local fitting this method can estimate a wide class of smooth functions, much wider than what we could reasonably expect from any specific parametric class of functions (Cleveland & Devlin, 1988). For estimating the regression surface, the method is based on local regressions based on a linear function. The estimate ~(xi) of the regression at any point xi uses the Q closest observations to xi. Hence, a neighborhood in the space of predictor variables is defined. Each observation in the neighborhood is weighted according to its distance from xi; observations close to x~ have a large weight, and observations far from x~ have a small weight. That is, a linear function is fitted using weighted least-squares (WLS) with these weights. This weighting reduces the influence of outliers to the minimum. There are several parameters to be decided. Firstly, Q (the number of observations in each local regression), determining the degree of smoothness of the estimated surface, has to be determined. Instead of Q, we can think in terms of f (called smoothness factor, 0 < f < 1), the ratio of the number of observations used in each local regression and the number of all observations. Cleveland and Devlin (1988) introduced the M plot, using Mallows's Cp idea and Naes et al. (1990) proposed cross-validation to choose adequate f. Secondly, the type of function to fit at each local regression is a matter of choice. So far the most of LWR applications have used weighted least-squares
S. Park, C. Han / Computers and Chemical Engineering 24 (2000) 871-877
(WLS) regression. Thirdly, the distance measurement (called p) must be determined in order to select the observations that are to be involved in each local model. In many cases the distance concept exceedingly affects the performance of the LWR. Finally, selection of a weighting function wj(xi), for the Q points j = 1..... Q in the local regression may be required. This corresponds to determining the relative influence of the closest compared to the more distant observations. The common weighting function is a tricube function, which is equal to wj(x~) = W(p(xj, x~)/d(x,)) where W ( u ) = (1 - u 3 ) 3 for u less than 1 and W ( n ) = 0 for u larger than 1. p(xj, xi) is the distance from xj to x;. The function d(xi) is the maximum of p(xj, x~) over the Q points involved in the regression. Cleveland and Devlin 0988) showed that most of weight functions with smooth shape will give reasonable results. For the prediction of new observation, the Q points in the reference data set which are closest to the new observation are selected, then the local regression surface based on these Q observations is constructed in the same way.
3. Building an empirical soft sensor: general procedure
General procedure for building empirical soft sensors is four steps as follows. 3. I. Step 1: preliminary process understanding
First of all, general understanding of the process has to be obtained. Both theoretical analysis and the experience of the operators help identify the variables, variable relationships, approximate correlations, dynamic characteristics such as time delays. 3.2. Step 2: data preprocessing and analysis
Because an empirical model is based on the data, its success depends totally on the quality of the reference data collected. Outlier detection, noise reduction, mode analysis, data transformation, information gathering, generation of new information are frequently used data preprocessing techniques. 3.3. Step 3: model determination and validation
The model structure, the most important decision, is determined based on the data properties such as linearity, correlation, auto-correlation, and sample distribution. After determining a model, its accuracy and robustness are usually validated using a new test set.
873
4. Building a nonlinear soft sensor based on multivariate smoothing procedure
The chemical processes usually show high dimensionality, collinearity among process variables and nonlinearity. To handle these difficulties, we propose a multivariate L W R method. As a simple linear approximation technique, LWR can adequately model the nonlinear relation between predictors and a response as a piecewise linear method. Compared with other nonlinear methods, L W R are much simpler to use because only a few parameters need to be optimized. However, as the original L W R by Cleveland cannot handle the high dimensionality and collinearity, the method needs to be generalized to handle such problems. If the L W R is used to build a soft sensor for such cases, the dimensionality and collinearity will increase at local models. Two ways are proposed to handle the dimensionality and collinearity as follows. 4.1. Step 1: projection into smaller space
The first way is to transform the original variables into a few new variables using linear transformations such as PCA, PLS, Fourier transformation (FT), and wavelet transformation (WT) in data preprocessing step. These linear transformations will reduce the dimension significantly. For example, PCA and PLS transform the original variables into mutually orthogonal variables, so that the collinearity in the global space can be completely overcome and the meaningful distance measure can be obtained as an Euclidean distance of the normalized new variables. The LWR approach using PCA as the dimension reduction method has been applied to NIR analysis by Naes et al. (1990). Although PCA can compress X greatly, we are not sure how strong relationships the selected new variables have with Y. Contrary to PCA, the latent variables derived from PLS have much relation to Y, and can give the better performance than PCA, as will be shown in the second case study for an industrial crude column. Although FT or WT are not commonly used for the common process variables such as temperature, pressures, and flow rates, it is well-known that FT and WT are very useful for dimension reduction of any special cases such as NIR spectra (Trygg & Wold, 1998). These linear transformation before the application of LWR can reduce the dimension into a smaller manageable size and the collinearity into minimum. As a result, it is quite easy to compute the distances. This leads to the significant reduction in the computing time and the necessary memory size.
S. Park, C. Han/ Computers and Chemical Engineering 24 (2000) 871-877
874
4.2. Step 2: use P L S / O L S as a local regression method After step 1 where the high dimensionality and collinearity problems in the global space have been solved, to handle the remaining dimensionality and collinearity in the local space, a local regression method such as PLS and OLS is employed. This approach is usually useful when there are stronger nonlinearities or are many variables in the local space. As the nonlinearity increases, the smoothness becomes smaller, and consequently the optimal number of observations at each local regression, Q, becomes smaller. Thus the dimensionality and the risk of collinearity at each local model increases. Furthermore, the extreme collinearity among the new variables obtained through any linear orthogonal transform can exist at the local region, although they are mutually orthogonal in the global space. These may cause overfitting, and PLS regression at each local region is preferred. From the cross-validation at each local PLS regression, we can determine the latent variables and the number of them independently. This means that we can obtain the local feature more effectively, and this will lead to good estimation. If the case is very smooth and there are only a few variables, OLS may give sufficient performance. There are several parameters involved in this procedure: the type of linear transformation, its reduced dimension (i.e. the number of new variables, K), smoothness factor, f, and the type of regression method at local region. Although Cleveland, the originator of LWR, used the statistical criterion such as the M plot
T! mll2a2
T13" liP3
13
R-L30
Peea p - I 'm Idg/H
t
42
T56-1.6JJ
"~-
]7:A
Fig. 1. Splitter column.
to determine the parameters, it is more common to use the general cross-validation. In this paper the general cross-validation is used as a criterion to determine the four parameters at once, although this procedure is very time-consuming. We always use the Euclidean distance of new variables as the distance measure and the tricube function as weighting function.
5. Case studies: industrial distillation columns
Two cases of applications to the distillation columns are presented: one with strong nonlinearity and the other with relatively linear characteristics. These two cases will demonstrate the performance of LWR approach in both the linear and the nonlinear case. 5.1. An industrial splitter columns 5. I. 1. Preliminary process understanding This splitter column shown in Fig. 1 consists of 60 sieve trays. Liquid reformates, naphtha reformated at the reaction process, are fed to the 22nd tray, and the components less than C7 are removed as distillates and the components larger than C7 are produced as bottoms. The objective of this case study is to estimate the composition of toluene at the bottom using 16 process variables, which are measured on-line. 5.1.2. Data preprocessing There are 342 observations, and 12 observations have been removed from Hotelling's T 2, SPE, and residual analysis in PLS analysis. 280 observations have been chosen as the reference set, and 50 as the validation set. 5.1.3. Model determination and validation As previously stated, we use Euclidean distance of new variables and a tricube function as a distance measure and the weighting function respectively. Thus, at first the remaining four parameters, the type of transformation, K (the number of the new variables through any transformation), f (smoothness factor), and type of regression method at local region, have to be determined. The parameters were determined to have minimum cross-validation errors (CVE). The Table 1 shows the cross-validation results. The combination of PCA as the transformation and PLS as the regression method at local region gives the smallest CVE and these choices of K and f are similar to those when we use other methods as the transformation and the regression method at local region. The fact that the best smoothness factor is very small suggests indirectly that the system may be extremely nonlinear. The model with eight new variables and smoothness factor 0.1 shows the best performance and its RMSEP (root mean squared error of prediction) for
S. Park, C. Han / Computers and Chemical Engineering 24 (2000) 871-877 Table 1 CVE and RMSEP for four different combinations for the validation data set of the splitter columna
[PCA OLS] [PCA PLS] [PLS OLS] [PLS PLS]
f
K
CVE
RMSEP
0.1 0.1 0.2 0.1
6 8 6 7
0.0112 0.0074 0.0134 0.0089
0.00487 0.00218 0.00692 0.00448
a The former in the bracket is the transformation method and the latter regression method in local region, f and K are chosen to have the minimum CVE at each combination.
Table 2 Comparison of RMSEP with four different methods for the validation data set of the splitter column
Aa RMSEP RE (%)
Linear PLS
Nonlinear PLS
FFBPN
10 0.0155 85.6
8 0.0130 83.1
8 0.0082 73.2
a A is the number of latent variables or hidden nodes in the models.
0.04
0.~
O0 O0
m
• ,.,,. :"
,
al -i
•
_~ o.oo M GI
•
. ° °
_l
.°
¥ 8°
E
•
•
•
-0.02
-O.O4
l
o.oo
0.0'2
0.04
0.08
0.08
0.10
Values
Fig. 2. Residual plot from the linear PLS.
875
the validataion set is 0.00218. Other combinations of the transformation method and the regression method in the local region were tested to confirm our choice and the comparison is shown in Table 1. Notice that cross-validation gives a reasonable estimate of the optimal parameters. In addition we can recognize that PLS is superior to OLS as the regression method at local region. This can be easily understood because the dimensionality and collinearity in the local region increase, due to relatively large K and small f. These results of the proposed LWR approach have been compared with other methods: the conventional linear PLS, nonlinear PLS by Qin and McAvoy (1992), and feed forward backpropagation network (FFBPN). The results are presented in Table 2 for validation. We used the PLS_toolbox 1.5 and neural network toolbox 3.0 in MATLAB5.0. In the case of FFBPN, LevenbergMarquardt backpropagation method with hyperbolic tangent sigmoid transfer functions was used. The relative error (RE) is defined as follows.
(RMSEP - RMSEPLwR App~ RE ( % ) = \
RMSEP
-
] × 100
The relative errors are about 73-86%. The superior estimation performances of the LWR approach over linear PLS are shown in Figs. 2 and 3. The residual plots allow us more analyses. At the residual plot of linear PLS in Fig. 2, the most striking feature is the strong positive slope in the plot. This slope confirms the existence of strong nonlinearity. However, the residual plot of the LWR approach in Fig. 3 shows much less nonlinearity. In this plot, any substantial nonlinearity and biases are not detected. In this case study, to predict 50 validation data, 50 local PLS models were used with 1 ,-~4 latent variables. In total, 72 latent variables were used. On average, about 1.44 latent variables were used for each local PLS model, i.e. only about 1 or 2 latent variables were used for predition. 5.2. An industrial crude column
0.04
0.02 m
]
8
I _11. I1~" -
0.00
Jl,_ , ~ ° . °. l a l "o" 0" - -
-0.02
-0.04
0.00
0.~
0.04 0.08 Ol~ecved ~lues
0.08
Fig. 3. Residual plot from the LWR approach.
0.10
This case is also a real industrial case with relatively linear characteristics. The objective of this case study is to estimate the temperature of 90% distillated diesel. This estimated value is used to control the column to decrease the quantity of bunker-C oil in the distillated diesel. There are 57 on-line process measurements, such as the flow rates, temperatures, and pressures; 330 observations were sampled during 5 months; 280 observations were used as the reference set, and 50 as the validation set. From the cross-validation, PLS and OLS were chosen as the transformation method and the regression method in the local region respectively. Then the optimal number of new variables and the value of smoothness factor are 4 and 0.6, respectively. In the
S. Park, C. Han / Computers and Chemical Engineering 24 (2000) 871-877
876
previous case study, the choice of regression type at the local region is more important than that of the transformation type (Table 1). In this case study, however, the choice of transformation type is more important. The CVE and RMSEP of the validation set are shown in Table 3, compared with the other combinations. The results of cross-validation are interpreted as follows. As the splitter case is strongly nonlinear, linear PLS performs poorly, so the latent variables of linear PLS have less meaningful imformation. On the contrary, in the relatively linear crude case, linear PLS explains about 76% of the response with 4 latent variables, so the latent variables of PLS can be more efficient than PCA. In this case study the number of new variables is smaller and the value of smoothness factor is larger. This means that the data structure is 'long and lean' so the dimensionality and the collinearity decrease. This may make OLS more efficient than PLS as the regression method at local region. Table 4 shows the comparison with other methods. The details of the other methods are same with the previous case except using linear transfer functions instead of tangent sigmoid functions in FFBPN to obtain the better performance (in the case of using tangent sigrnoid function, the performance is poor). In this table it is important to note that the choice of K through cross-validation is not optimal. In practice, a larger K value than 4, gives a better result in almost all the combinations of the transfer method and the regression method in the local region, and in almost all f ranges. In this table the performances of all methods are very similar and the REs are very small. Table 3 CVE and RMSEP for four different combinations for the validation data set of the crude column a f
K
CVE
RMSEP
0.7 0.7 0.6 0.6
7 8 4 4
6370 7140 4090 4230
3.97 4.49 3.70 3.82
6. Conclusions In this paper we have proposed a nonlinear regression method called locally weighted regression for a soft sensor building. At first, any transformation method is applied to the raw data to solve the dimensionality problem. This is meaningful for the nonparametric modeling, in respect of variable reduction. Through this transformation procedure, we obtain a few new variables and they are considered new independent variables. Then, we perform local regression modeling by local fitting driven from LWR. The number of new variables, the value of smoothness factor, the type of transformation and regression at the local region, are determined by the general cross-validation. The case studies have shown that the cross-validation is the proper method, although the cross-validation does not always give the optimal choice. In the case of using PLS as the regression method at the local region, the latent variables are determined independently by CSV, so each local model can reflect the features at the specific region well. This proposed LWR approach is nonlinear, but can be considered as a piecewise linear method that is conceptually simple and easy to use. Although this LWR approach cannot guarantee the better performance than other well-known methods such as PLS and neural networks, it is certainly a good candidate to build a soft sensor in chemical plants with high nonlinearity and collinearity.
Acknowledgements The authors wish to acknowledge financial support from the Brain Korea 21 project and Dr Hong Cheol Ko at LG-Caltex for providing the industrial data for the case studies.
References [PCA OLS] [PCA PLS] [PLS OLS] [PLS PLS]
a The former in the bracket is the transformation method and the latter regression method in local region, f and K are chosen with the minimum CVE at each combination. Table 4 Comparison of RMSEP with four different methods for the validation data set of the crude column
Aa RMSEP RE (%)
Linear PLS
Nonlinear PLS
FFBPN
10 3.86 4.15
4 3.89 4.88
2 3.56 -3.93
a A is the number of latent variables or hidden nodes in the models.
Andrade, J. M., Sanchez, M. S., & Sarabia, L. A. (1999). Applicability of high-absorbance MIR spectroscopy in industrial quality control of reformed gasolines. Chemometrics oflntelligent Laboratory Systems, 46, 41-45. Bro, R., & Heimdal, H. (1996). Enzymatic browning of vegetables. Calibration and analysis of variance by multiway methods. Chemometrics of lntelligent Laboratory Systems, 34, 85-102. Cleveland, W. S., & Devlin, S. J. (1988). Locally weighted regression: an approach to regression analysis by local fitting. Journal of American Statistics Association, 83, 596-610. Frank, I. E., & Lanteri, S. (1988). A nonlinear regression model. Chemometrics of Intelligent Laboratory Systems, 3, 301-313. Howarth, R. J., & McArthur, J. M. (1997). Statistics for strontium isotope stratigraphy: a robust LOWESS fit to the marine Sr-isotope curve for 0 to 206 Ma, with look-up table for derivation of numeric age. Journal of Geology, 105, 441-456. Kresta, J. V., Marlin, & MacGregor, J. F. (1994). Development of inferential process models using PLS. Computers & Chemical Engineering, 18, 597-611.
S. Park, C. Han/ Computers and Chemical Engineering 24 (2000) 871-877 Malthouse, E. C., Tamhane, A. C., & Mah, R. S. H. (1997). Nonlinear partial least squares. Computers & Chemical Engineering, 21, 875-890. Mejdell, T., & Skogestad, S. (1991a). Estimation of distillation compositions from multiple temperature measurements using partial least-squares regression. Industrial & Engineering Chemistry Research, 30, 2543-2555. Mejdell, T., & Skogestad, S. (1991b). Composition estimator in a pilot-plant distillation column using multiple temperature. Industrial & Engineering Chemistry Research, 30, 2555-2564. Naes, T., Isaksson, T., & Kowalski, B. R. (1990). Locally weighted regression and scatter correction for near-infrared reflectance data. Analytical Chemistry, 62, 664-673.
877
Qin, S. J., & McAvoy, T. J. (1992). Nonlinear PLS modeling using neural networks. Computers & Chemical Engineering, 16, 379-391. Skagerberg, B., MacGregor, J. F., & Kiparissides, C. (1992). Multivariate data analysis applied to low-density polyethylene reactors. Chemometrics of Intelligent Laboratory Systems, 14, 341-356. Trygg, J., & Wold, S. (1998). PLS regression in wavelet compressed NIR spectra. Chemometrics of Intelligent Laboratory Systems, 42, 209-220. Wold, H. (1966). Nonlinear estimation by iterative least-squares procedures. In F. David, Research papers in statistics. New York: Wiley. Wold, S., Kettaneh-Wold, N., & Skagerberg, B. (1989). Nonlinear PLS modeling. Chemometrics of Intelligent Laboratory Systems, 7, 53-65.