Control Engineering Practice 21 (2013) 1157–1164
Contents lists available at SciVerse ScienceDirect
Control Engineering Practice journal homepage: www.elsevier.com/locate/conengprac
Soft-sensor development with adaptive variable selection using nonnegative garrote Jian-Guo Wang a, Shi-Shang Jang b,n, David Shan-Hill Wong b,n, Shyan-Shu Shieh c, Chan-Wei Wu d a School of Mechatronical Engineering and Automation, Shanghai University, Shanghai Key Lab of Power Station Automation Technology, Shanghai 200072, China b Department of Chemical Engineering, National Tsing-Hua University, Hsin-Chu 30013, Taiwan c Department of Occupational Safety and Hygiene, Chang Jung University, Tainan 71101, Taiwan d Energy & Air Pollution Control Section, New Materials R&D Department, China Steel Corporation, Kaohsiung 81233, Taiwan
art ic l e i nf o
a b s t r a c t
Article history: Received 14 June 2012 Accepted 17 May 2013 Available online 7 June 2013
In this study, a soft-sensor modeling algorithm with adaptive partial least squares nonnegative garrote is developed by incorporating nonstationary disturbance. The approach is capable of monitoring the stationary and nonstationary behaviors of the process dynamics. The procedure of adaptive variable selection ensures that a compact and robust input–output relation is obtained online. Hence, in addition to simply tracking prediction, the model can be used for the detection of structural model change and the emergence of disturbance. The advantages of the proposed method are demonstrated with a simulation example and two industrial applications to predict the temperature of a blast furnace hearth wall and to estimate impurity composition of a distillation column. & 2013 Elsevier Ltd. All rights reserved.
Keywords: Nonnegative garrote Partial least squares Variable selection Soft sensor Fault detection Structural model change
1. Introduction Many processes contain a few key quality variables and a large number of real-time sensor variables. When quality variables cannot be easily measured, for example, when the measurement requires lab work with long processing time and delay, it would be beneficial to employ a soft-sensor model that is able to predict these quality characteristics (as response variables) using readily available sensor variables (as candidate predictors). Such models are called inferential sensor models or soft-sensors (Fortuna, Graziani, Rizzo, & Xibilia, 2007; Kresta, Marlin, & MacGregor, 1996). A variety of soft-sensor applications and modeling methods have been studied in different fields (Badhe, Lonari, Tambe, & Kulkarni, 2007; Bhat, Saraf, & Gupta, 2006; Desai, Badhe, Tambe, & Kulkarni, 2005; White, 2003; Yoo & Lee, 2004). Soft-sensor development has many challenges. For example, the input–output relations may be nonlinear and may change with time as the process operating conditions change. Hence, nonparametric models such as neural networks or multiple local model networks are often used to handle nonlinearity (Dong & McAvoy, 1996;
n
Corresponding authors. Tel.: +886 3 571 3697; fax: +886 3 571 5408. E-mail addresses:
[email protected] (S.-S. Jang),
[email protected] (D.-H. Wong). 0967-0661/$ - see front matter & 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.conengprac.2013.05.006
Pan, Wong, & Jang, 2011a; Qin & McAvoy, 1992). Such black-box models inevitably suffer from possible overfitting and vague interpretability. To deal with the time-varying characteristics of industrial processes, adaptive models or just-in-time models are often used (Dayal & MacGregor, 1997; Pan, Wong, & Jang, 2011b; Li, Yue, ValleCervantes, & Qin, 2000; Qin, 1998). Modern process plants are usually loaded with many sensors. Data availability, however, does not guarantee successful extraction of useful information. Linear multivariate statistical analyses such as principal component regression (PCR), partial least squares (PLS), and canonical variable analysis (CVA) are often used in soft-sensor development to overcome problems of high dimension and collinearity of candidate predictors (Höskuldsson, 1988; Jackson, 1991; Pan et al., 2011a, 2011b; Shi & Macgregor, 2000). These methods attempt to compress sensor variables into a few latent variables. In PCR, the latent variables represent the main directions of variation in sensor variables. However, quality is usually irrelevant to these latent variables, and successful predictions are possible only if latent variables with small eigenvalues are included. PLS seeks the direction of maximum covariation between sensor variables and response variables, which may include directions that represent variations in the input that are irrelevant to predicting the response. A recursive PLS algorithm is derived by Qin (1998). In that work, recursive PLS was shown a useful tool for process monitoring. CVA would indeed maximize the correlation between input and output, but there is no guarantee that
1158
J.-G. Wang et al. / Control Engineering Practice 21 (2013) 1157–1164
the coefficients of irrelevant variables are zeros. Even if PCR, PLS and CVA are able to suppress the sensor variables into a few key latent variables, they may still include contributions from a large number of predictor variables. This is highly undesirable because operators would fail to understand the physical meanings of the input–output relation and focus on a manageable number of key factors. Proper variable selection is an important characteristic of a soft-sensor model that gives the model transparency and robustness. Simple statistical analyses have been conducted for process variables to identify a subset of measurements for use in the inferential measurements (Bhartiya & Whiteley, 2001). Ma et al. used a conventional stepwise variable selection method to develop a soft-sensor (Ma, Ko, Jang, Shieh, & Wong, 2009). Recently, shrinkage methods, which conduct variable selection by shrinking or setting some coefficients of a “greedy” model to zero, have been receiving a large amount of attention. Two popular forms of these methods are the nonnegative garrote (NNG) (Breiman, 1995) and least absolute shrinkage selection operator (LASSO) (Tibshirani, 1996). The NNG adopts a two-stage solution. First, it fixes the sign of regressors using the sign of estimates from ordinary least squares (OLS) and then conducts magnitude shrinkage. LASSO (or its variants) can deal with the sign and magnitude shrinkage simultaneously, but the objective function of the resulting optimization is a piecewise continuous function. These methods have the advantage that irrelevant variables will be shrunk to zero. While these methods and other similar methods such as sparse PLS (Chun & Keles, 2010; Lee, Lee, Lee, & Pawitan, 2011) have been immensely popular in biological data explorations and machine learning fields, they have not been widely used for modeling in the process industry. A modified implementation of NNG, partial least squares NNG (PLSNNG) has been developed for linear models (Pan, Bai, Yang, Jang, & Wong, 2012) for application in final quality predictions in the semiconductor manufacturing process, which is essentially a static input–output model. When the process conditions change, the model structure of this set of candidate predictor variables may change. In extreme cases, the process enters a “faulty” state, in which a nonstationary disturbance term that is not dependent on any of the sensor variables or input variables may be dominant. Thus, in addition to faster and more frequent predictions on key quality characteristics, a good soft-sensor model can be used for fault diagnosis or even control purposes only if the true input–output relation can be captured in a robust and transparent manner. In the present work, an adaptive PLSNNG approach is developed by incorporating nonstationary disturbance for efficient on-line applications of dynamic processes. A numerical simulation example and two industrial applications for temperature prediction of a blast furnace hearth wall and for impurity composition estimation of a distillation column are used to illustrate the advantages of our method over a more traditional method such as PLS. The remainder of this paper is organized as follows. In Section 2, the theory of NNG variable selection is reviewed; PLSNNG and the incorporation of nonstationary disturbance are then formulated. Computational results from a numerical simulation example are presented in Section 3. Section 4 describes the application of the proposed method to predict the temperature of a blast furnace hearth wall, and Section 5 presents its application in estimating the impurity composition of a distillation column. Conclusive remarks with a summary are given in Section 6.
2. Theory 2.1. NNG variable selection The NNG method can be generalized into a two-stage shrinkage method. In the first stage, the sign for each variable is determined
using an OLS procedure, and in the second stage, the corresponding magnitudes are computed by solving a series of constrained quadratic programming problems. Suppose for a set of observation data fX; yg, X∈Rnpx is the input matrix whose columns represent the measured candidate variables and y∈Rn1 is the corresponding vector of response data. The following derivation is given with the number of response variables being 1, but a similar procedure can be generalized to any number of variables. Suppose that X and y have been normalized to zero-mean OLS and unitary standard deviation. Let fβ^ ∈Rpx 1 g be a set of the OLS estimates of the coefficients of the following linear model: y ¼ Xβ þ ε
ð1Þ
The second-stage shrinkage can be formulated as the following constrained quadratic programming problem: OLS J ¼ minjjy−X β^ :ncjj2 cj
subject to cj ≥0;
px
∑ cj ≤s
ð2Þ
j¼1
where fcj jj ¼ 1; ⋯; px g are the magnitude coefficients to be determined and s is the so-called garrote parameter. As s decreases and the nonnegative garrote is tightened, more of the cj become zero and the remaining nonzero coefficients shrink. If s 4 px , the p constraint ∑j x¼ 1 cj ≤s is inactive. Thus, a solution path exists with 0 ≤ s ≤ px on which the appropriate shrinkage can be selected. Conventionally, cross-validation (Zhang, 1993) is used to estimate the prediction error and to select the best solution in the solution path so as to minimize the prediction error or the model error. 2.2. PLSNNG It can be seen that in the first stage of the NNG method, the signs are determined using the OLS estimates, and in the second stage, the focus is on the magnitude of coefficients. It appears that in NNG regression, the initial estimates (in stage 1), by comparison, are more important than the nonnegative shrinkage (in stage 2) because it specifies the general direction for future search. If the general direction is incorrect, good results cannot be expected. Hence, a reasonable and systematic way is needed to locate the correct search direction. It is known that PLS regression extracts the strongest relationship between fX; yg by finding the maximum covariance between latent variables and is able to handle high multicollinearity among candidate variables. Taking PLS estimates as initial estimates of the coefficients of the linear model (1), PLSNNG regression can be formulated as PLSðlÞ
J ¼ minjjy−X β^ cj
:ncjj2
subject to cj ≥0;
px
∑ cj ≤s
j¼1
PLSðlÞ
ð3Þ
where β^ denotes the PLS regression coefficient with l latent variables. The NNG regressions can be conducted for all possible values of l in the PLS regression, and hence, the PLSNNG regression has two adjustable parameters: l and s. Given l and s, a regression model can be obtained by solving the above constrained quadratic problem. The search for all possible number of latent variables is helpful for locating the correct variables and for improving the estimation accuracy and the model interpretability. The optimal regression coefficients can be determined with ν-fold cross-validation, the procedure of which is outlined as follows:
J.-G. Wang et al. / Control Engineering Practice 21 (2013) 1157–1164
1159
The dataset L ¼ fX; yg can be divided into L1 ; ⋯; LV subsets. The optimization problem in (3) can be solved for each subset using PLS with l latent variables as the initial guess coefficients
The optimal l0n , s0n , and c 0 n are determined with ν-fold crossn n validation. A new set of β^ ½w þ 1 and θ^ ½w þ 1 can be obtained 0n n PLSðl Þ from the dot product of c 0 and β^ 0 ½w þ 1 and then can be used
PLSðlÞ c ðνÞ ðl; sÞ ¼ argminjjy−X β^ :ncjj2Lv
to provide a prediction for response output in the next prediction horizon and to provide a disturbance estimate in the next regression window. It should be pointed out that although this adaptive soft-sensor modeling algorithm in this study is utilized in moving-windows mode, the algorithm can easily be given in a recursive form as described in the literature (Qin, 1998).
cj
subject to cj ≥0;
px
∑ cj ≤s
ð4Þ
j¼1
Let Lν ¼ L−Lν and the cross-validation error be given by PLSðlÞ :ncðνÞ ðl; sÞjj2L CVEðνÞ ðl; sÞ ¼ jjy−X β^
ð5Þ
ν
The optimal selection of l and s can be determined by minimization of the overall ν-fold cross-validation error given by 1 V n ∑ CVEðvÞ ðl; sÞ ð6Þ ðl ; sn Þ ¼ arg min V v¼1 0 ≤ l ≤ px 0 ≤ s ≤ px
The upper bounds for l and s are both px because the maximum number of latent components is equal to the number of variables; and cj ¼1 for all j if the PLS model is not shrunk. The overall coefficients can be determined by solving the optimization problem again n
PLSðl Þ c n ¼ argminjjy−X β^ :ncjj2L
In this section, the advantages of the proposed soft sensor algorithm, labeled PLSNNG, are investigated using a numerical example and compared with the PLS algorithm, labeled PLS, in which cross-validation is utilized to determine the optimal number of latent variables. In this example, the true model is given by ( ηk ¼ ηk−1 þ εk −θεk−1 ð13Þ yk ¼ X k β þ ηk The dataset contains 15,000 samples of response variable
cj
y∈R150001 , and the predictor vector has 100 candidate predictor
subject to cj ≥0;
3. Simulation example
px
∑ cj ≤sn
ð7Þ
j¼1
2.3. Adaptive PLSNNG incorporating nonstationary disturbance To allow for the process being in a faulty state, in which output responses cannot be explained by the candidate input, a nonstationary IMA(1,1) time series is added to the model as ( ηk ¼ ηk−1 þ εk −θεk−1 ð8Þ yk ¼ xk β þ ηk where εk ∈Nð0; s2 Þ is a Gaussian distributed random disturbance with zero mean and variance s2 . If θ 4 0, the time series is a stationary moving-average process. If θ ¼ 0, the time series is a nonstationary random walk. However, because disturbances are not measurable, they must be estimated using the existing model. In a moving-window approach, this can be achieved in the following manner. Let w be the current estimation window for n which a set of model coefficients with shrinkage selection β^ ½w n and θ^ ½w is available. If a new window of data ðX½w þ 1; y½w þ 1Þ comes in, the disturbances can be estimated as n η^ k ½w þ 1 ¼ yk ½w þ 1−xk ½w þ 1β^ ½w
ð9Þ n
ε^ k ½w þ 1 ¼ η^ k ½w þ 1−^ηk−1 ½w þ 1 þ θ^ ½wεk−1 ½w þ 1
ð10Þ
The new window of data ðX 0 ½w þ 1; y0 ½w þ 1Þ can be setup with x k ½w þ 1 ¼ xk ½w þ 1; −^εk−1 ½w þ 1 ð11Þ 0 yk ½w þ 1 ¼ yk ½w þ 1−^ηk−1 ½w þ 1 0
With X 0 ½w þ 1; y0 ½w þ 1, PLSNNG regression can be formulated as 0
jjy0 ½w þ 1−X 0 ½w þ 1β^ 0 ½w þ 1PLSðl Þ :nc0 jj2 J ¼ min 0
variables X∈R15000100 . Here, X is generated from a normal distribution with zero mean and a covariance matrix, i.e., X∼N ð0; ΞÞ, which is given by Ξij ¼ 0:8ji−jj , so that collinearity is introduced. The noise ε∼N ð0; 1Þ is a white noise with variance 1. The coefficient β has only five large elements (41st, 42nd, 45th, 50th, and 51st) with values of order 1. The values of 41st, 42th and 50th elements of β remain 3, 2 and 1 in the interval [1 5000] respectively. After k¼ 5000, the value of 41st element decrease from 3 to 2 linearly and the value of 42st element increase from 2 to 3 linearly. The 45th elements remain constant 1.5 throughout the process. The 50th element decrease linearly after k¼5000 and suddenly become zero at k¼ 13,000. The 51st element show a sudden shift upward from zero to 1 at k ¼7000 and then increase to 1.75 linearly. Five small elements (61st, 62nd, 65th, 70th, and 71st) whose values are one-fourth of five large elements throughout the process. The IMA(1, 1) term becomes white noise (θ ¼ 1) in intervals [1 5000] and [10001 15000] and is a random walk (θ ¼ 0) in [5001 10000]. Observation windows with 500 samples and a prediction horizon of 100 samples are used. After predictions, data of the prediction horizon are included in the operating window, and the 100 oldest samples in the old observation window are discarded. There are 145 moving windows. Prediction errors of PLS and PLSNNG are shown in Fig. 1. These show that both PLS and PLSNNG are comparable in accuracy in terms of tracking output. There is virtually no difference between PLS and PLSNNG in the ^ The corresponding estimation of the five big coefficients of β. estimates of five small coefficients of β^ and θ^ are shown in Figs. 2 and 3. Both the PLS and PLSNNG are able to track these coefficients; although the variations of small coefficients in the PLS model are larger. The overall performance of the two models is evaluated by five statistics
cj
Model error (ME) given by Breiman (1995)
subject to cj 0 ≥0;
px þ1
∑ cj 0 ≤s
j¼1
0
ð12Þ
T ^ ^ CovðXÞðβ−βÞ ME≡ðβ−βÞ
1160
J.-G. Wang et al. / Control Engineering Practice 21 (2013) 1157–1164
Model size (MS) MS≡
Prediction error (MSPE)
1 145 ∑ Count ðβ^ ½w≠0Þ 145 w ¼ 1 j∈½1;2⋯100 j
MSPE≡
1 145 100 h h ∑ ∑ ðy ½w−y^ k ½wÞ2 145 w ¼ 1 k ¼ 1 k
Correct selection ratio (CSR) CSR≡
The results are summarized in Table 1. It can be seen that whereas the PLS model and the PLSNNG model achieve relatively the same prediction error (MSPE ¼1.625 for PLS and 1.591 for PLSNNG), the PLSNNG model has relatively smaller values of model error (ME¼ 32.8 for PLS and 8.3 for PLSNNG). More importantly, in the PLS model, nonzero values are assigned to every irrelevant variable in every window. The magnitude of these coefficients is comparable to that of the small coefficients. Hence, the model sizes for PLS are always 100, indicating that all spurious candidate predictors are selected. In the PLSNNG model, the number of irrelevant variables that are falsely assigned values is much smaller. The PLSNNG has an average model size of 14.2, which is much closer to the actual value of 10. Furthermore, because every candidate variable is selected in the PLS model, the CSR is always 100%. However, the FSR is also 100%, which is unacceptably high. The PLSNNG model has a high CSR of 97.1%, but the FSR is maintained at 7.4%. False positive and false negatives were kept to low levels in the PLSNNG. According to Occam's Razor, the PLSNNG is a better model that achieves the same level of explanation but employs fewer assumptions (nonzero coefficients).
Count ðβ^ ½w≠0∧βj ½w≠0Þ 1 145 j∈½1;2⋯100 j ∑ Count ðβj ½w≠0Þ 145 w ¼ 1 j∈½1;2⋯100
False selection ratio (FSR) FSR≡
Count ðβ^ ½w≠0∧βj ½w ¼ 0Þ 1 145 j∈½1;2⋯100 j ∑ Count ðβj ½w ¼ 0Þ 145 w ¼ 1 j∈½1;2⋯100
8 PLS PLSNNG
Prediction error
6 4 2 0 -2 -4 -6 0
5000
10000
15000
Step(k)
4. Application to temperature prediction of a blast furnace hearth wall The blast furnace is a crucial reactor used in an integrated iron and steel plant to produce hot metal. To increase its working life
Fig. 1. Tracking of changes in response using PLS and PLSNNG model.
1.2
1.2
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
-0.2
-0.2
-0.4
-0.4 0
5000
10000
15000
0
5000
Step (k)
10000
15000
Step (k)
Fig. 2. Estimation of the small coefficients. (a) PLS and (b) PLSNNG.
1.5
1.5 Estimated True
Estimated True
1
1
0.5
0.5
0
0
-0.5
-0.5 0
5000
10000
Step (k)
15000
0
5000
10000
Step (k)
Fig. 3. Estimation of the IMA(1,1) coefficients θ. (a) PLS and (b) PLSNNG.
15000
J.-G. Wang et al. / Control Engineering Practice 21 (2013) 1157–1164
Nw
1 CM≡ ∑ jjβ jj Nw w ¼ 1 w is used as an alternate measure of size of the model. A model with a smaller CM can be regarded as a model with fewer assumptions. Table 3 compares the root mean square error (RMSE), MS and CM
of the PLS and PLSNNG models. Again, PLS assigns nonzero values to all of the 22 candidate predictor variables, whereas PLSNNG has a much smaller model size of 7.8. It is interesting to examine the effect of cooling water spray flow rate on the BFHW temperature. Fig. 6 compares estimates of coefficients of the cooling water spray flow rate estimated by the PLS and PLSNNG models. While the models give similar coefficients during the normal operating period, the coefficients estimated during the periods of temperature flare are substantially different. The coefficients of the PLS model become positive during periods of temperature flare that are physically unrealistic. The coefficients of the PLSNNG model become zeros, indicating that the cooling effect of spraying water become negligible. It should be noted that the sign and value of model gain of the predictor variable are extremely important if a soft-sensor model is used for control purposes. The PLSNNG model with a near zero but negative gain will require a large increase in the spray water rate, which would be a logical action in this event. On the other hand, positive values of the PLS model will require a decrease in spray water rate, which would not be the correct control action. The temperature flare appears to be caused by the detachment of slag skull that serves as an insulating layer for the furnace. As the slag skull detaches the insulation effect disappears and the temperature rises substantially. New slag will be formed and serve as a new insulating layer, and the BFHW temperature returns to normal. Both models estimate a smaller absolute value for the
120 Temperature of BFHW
and economize its operation, the temperature state of the blast furnace hearth wall (BFHW) must be strictly monitored and controlled by spray cooling water onto furnace shell. However, sometimes the operator may find that spray water does not prevent temperature flare. This fault is generally attributed to the detachment of slag skull on the inside wall of the blast furnace. It is therefore desirable to use a model to describe and record such events. The proposed soft-sensor algorithm is applied to the temperature modeling of a BFHW. The data used in this study were collected from a local blast furnace from January 1, 2010 to June 30, 2011. The variable expected to predict is the BFHW temperature, and 22 measured variables listed in Table 2 are used as candidate predictors. Variable 1 is the spray cooling water flow. The remaining variables are related to combustion process. Daily average is used, and there are 546 samples. After removing the samples with a spray cooling flow of less than 450 m3/h during the maintenance period, 528 samples remain. An observation window width of 35 and a prediction horizon of 1 are used, there are 493 windows. Fig. 4 illustrates the predictions of the BFHW temperature using PLS and PLSNNG. There are two clear periods (k 200 and k 4450) in which the system experiences substantial temperature flare, i.e., the system is in fault. Again, the models are comparable in terms of tracking and prediction. Fig. 5 shows that both models are able to determine that a nonstationary disturbance is required (θ∼0) to model the temperature behavior. Because the actual model is not available, there is no way of estimating model error. In addition to model size, the magnitude of the regression coefficient vector (CM), defined as the average L1 norm of the coefficients,
1161
Measured Predicted with PLS
100 80 60 40 0
100
200
300
400
500
600
500
600
Time (day)
Temperature of BFHW
120
Table 1 Comparison of variable selection and prediction performance between PLS and PLSNNG.
ME MS CSR (%) FSR (%) MSPE
PLS
PLSNNG
32.8 100 100 100 1.625
8.3 14.2 97.1 7.4 1.591
Measured Predicted with PLSNNG
100 80 First stationary phase
60
Second stationary phase
40 0
100
200
300
400
Time (day)
Fig. 4. Tracking of BFHW temperature using PLS and PLSNNG model.
Table 2 Candidate predictors for the Blast Furnace Hearth Wall temperature predictions. ID
Variable
Unit
ID
Variable
Unit
ID
Variable
Unit
3
17 18 19 20 21 22 CO CO2 BFG H2
CO in BFG CO2 in BFG H2 in BFG Utility of CO H2 input Small coke Carbon monoxide Carbon dioxide Blast furnace gas Hydrogen
% % % % % kg/kg
1 2 3 4 5 6 7
Spray flow Air flow Air pressure Air temperature Air humidity Coal flow Oxygen enrichment
m /s m3/s kg/m2 1C kg/m3 kg/m3 %
9 10 11 12 13 14 15
Flame temperature Tuyere speed Furance gas speed Carbon solution Blast efficiency Top gas volume Top gas pressure
1C m/s m/s kg/kg m3/kg m3/kg kg/m2
8
Ventilation
m3/s
16
Top gas temperature
1C
1162
J.-G. Wang et al. / Control Engineering Practice 21 (2013) 1157–1164
1.5
1.5
1
1
0.5
0.5
0
0
-0.5
-0.5 0
100
200
300
400
500
600
0
100
Time (day)
200
300
400
500
600
Time (day)
Fig. 5. Estimation of the disturbance parameter θ. (a) PLS and (b) PLSNNG.
Table 3 Comparison of variable selection and prediction performance of PLS and PLSNNG models for the BFHW temperature predictions. Methods
RMSE
MS
CM
PLS PLSNNG
1.0320 0.9424
22 7.8
3.852 1.465
model coefficient of spray water rate after the first temperature flare event. This value can be used as a good index for monitoring the physical condition of the equipment. If this coefficient becomes too small, there is substantial detachment of the slag skull and limited formation of a new slag layer. The furnace must be attended to closely. However, the PLS model produced occasionally positive values that are not realistic. The PLSNNG, on the other hand, gives negative but small values consistently. Hence, estimates of the PLSNNG model will be more reliable and convincing when used for monitoring purposes. This example shows that the ability of creating a physically realistic model with fewer spurious and incorrect coefficients is essential for a soft-sensor that is applied in monitoring, diagnosis and control.
5. Application to impurity predictions for o-xylene column This section describes the application of the proposed softsensor algorithm to isopropylbenzene (IPB) impurity composition estimation of an o-Xylene column located at the Taiwan CPC Corporation. In previous work (Ma et al., 2009), a stepwise regression variable selection procedure was used to develop a soft-sensor. In the present work, the PLS and PLSNNG algorithms are applied to another similar column in the CPC plant. The concentration of IPB impurity is detected by an online gas chromatograph (GC) and seven temperature sensors are installed as described previously. In the present study, the time horizon is also set to h ¼ 20 considering the time constant of the column. The response variable is the concentration of IPB impurity, and predictor variables are xij ¼ xi ðt−jΔTÞ; i ¼ 1…7; j ¼ 1…20, which are obtained from seven temperature measurements at 20 sampling times. Thus, the number of predictor variables is 140, and the proposed algorithm is used to choose the key variables from the set Ω ¼ fxij ¼ xi ðt−jΔTÞ; i ¼ 1…7; j ¼ 1…20g. The temperature dataset contains 15,000 samples in which temperature measurements
are reported every 1 min (Fig. 7). Concentrations of IPB impurity at the top of the column are measured by GC and reported every 8 min. Hence, there are 15,000/8 ¼1875 samples. When the observation window is set to 3000, there are 3000/8 ¼375 measured IPB data points. A prediction horizon of 80 min is used, and hence, there are 80/8¼ 10 impurity measurements in each prediction horizon and (1875−375)/10 ¼150 horizons of predictions. Fig. 8 shows measured and predicted IPB impurities with PLS and PLSNNG respectively. Again, both models successfully track the dynamics of the IPB, but the performance indices in Table 4 show that the PLSNNG model is able to achieve better prediction accuracy with a smaller model size and magnitude of coefficient vector. Fig. 9 illustrates (a) model selection frequency and (b) sum of magnitude of the regression coefficients for different temperatures (left to right in each cluster) at different delays (clusters with labels) with the PLSNNG model. It is interesting to note that both the model selection frequency and the magnitude of regression coefficients peak for variables at j¼7–10, indicating that the delay of the system is approximately 7–10 min or close to one composition measurement. For the PLS model, because every variable is selected, the importance of the variable can be indicated only by the magnitude of the coefficients. Fig. 10 also shows that the magnitude of regression coefficients peaks for variables at j¼ 7–10 indicating that the PLS model can also be used to diagnose the delay in the system. However, it should be pointed out that the magnitudes of regression coefficients for the other variables are higher compared to those in Fig. 9(b).
6. Conclusion In soft-sensor development, it is crucial to obtain accurate online tracking predictions of the response variable. However, to ensure that the soft-sensor can be used for purposes other than monitoring, e.g., feedback control, fault isolation, and diagnosis, it is desirable for the soft-sensor model to capture essential input– output behavior and assign a disturbance term when the response cannot be explained by the sensor inputs. In this study, it has been demonstrated that whereas a “greedy” algorithm such as PLS can be very good in tracking predictions, it is only adequate in identifying the key input–output relations and may include many spurious regressor variables, resulting in unreasonable regression coefficients. By incorporating nonstationary disturbance, we developed a softsensor modeling algorithm with adaptive PLSNNG. The developed model is able to capture essential input–output behavior clearly,
J.-G. Wang et al. / Control Engineering Practice 21 (2013) 1157–1164
0.5
Estimate of cooling coefficient
0.5
Estimate of cooling coefficient
1163
0
-0.5
0
-0.5
-1
-1 0
100
200
300
400
500
0
600
100
Time (day)
200
300
400
500
600
Time (day)
Fig. 6. Estimate of coefficient of the cooling water spray flow rate. (a) PLS and (b) PLSNNG.
210
45 40
200 Nonzero frequency
35
190 180 170
30 25 20 15 10
160 5
150
0
0
5000
10000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
15000
Time (min) Fig. 7. Variation of the temperature sensors.
1.8 1.6 Measured Predicted with PLS
0.4 0.2 0 0
5000
10000
15000
Time (min)
IPB (%)
0.6
1.4
Sum of absolute regression coefficient
IPB (%)
0.6
1.2 1 0.8 0.6 0.4
Measured Predicted with PLSNNG
0.4
0.2 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.2 0 0
5000
10000
15000
Fig. 9. (a) Model selection frequency and (b) regression coefficient magnitudes at different delays using PLSNNG.
Time (min) Fig. 8. Tracking of IPB using PLS and PLSNNG model.
Table 4 Prediction and model size statistics for the IPB soft-sensor. Algorithm
RMSE
MS
CM
PLS PLSNNG
0.0152 0.0145
140 11
0.5738 0.3645
resulting in more reliable coefficients and fewer spurious regressors. PLSNNG is the preferred technique when process control is the aim and simple PLS model is good enough in the application only for prediction. The advantages of the proposed method have been demonstrated with an artificial example and with two industrial applications to predict the wall temperature of a blast furnace hearth and to estimate the impurity composition of a distillation column.
1164
J.-G. Wang et al. / Control Engineering Practice 21 (2013) 1157–1164
2 1.8
Sum of absolute regression coefficient
1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Fig. 10. Magnitude of the regression coefficient vector of the PLS model.
Acknowledgments S.S. Jang would like to thank the financial support provided by National Science Council, Taiwan under the grant NSC100-2221E-007–058-MY2. D.S.H. Wong would like to acknowledge the support of grant NTHU-101N2072E1 from the Advanced Manufacturing and Service Management Center, National Tsing-Hua University, Taiwan. J.G. Wang would like to acknowledge the National Nature Science Foundation of China under Grant 61171145 and 61074032. References Badhe, Y. P., Lonari, J., Tambe, S. S., & Kulkarni, B. D. (2007). Improve polyethylene process control and product quality. Hydrocarbon Processing, 86, 53–60. Bhartiya, S., & Whiteley, J. R. (2001). Development of inferential measurements using neural networks. ISA Transactions, 40, 307–323. Bhat, S. A., Saraf, D. N., & Gupta, S. A. (2006). Use of agitator power as a soft sensor for bulk free-radical polymerization of methyl methacrylate in batch reactors. Industrial and Engineering Chemistry Research, 45, 4243–4255. Breiman, L. (1995). Better subset regression using the nonnegative garrote. Technometrics, 37, 373–384.
Chun, H., & Keles, S. (2010). Sparse partial least squares regression for simultaneous dimension reduction and variable selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72, 3–25. Dayal, S. B., & MacGregor, J. F. (1997). Recursive exponentially weighted PLS and its applications to adaptive control and prediction. Journal of Process Control, 7, 169–179. Desai, K., Badhe, Y., Tambe, S. S., & Kulkarni, B. D. (2005). Soft-sensor development for fed-batch bioreactors using support vector regression. Biochemical Engineering Journal, 27, 225–239. Dong, D., & McAvoy, T. J. (1996). Nonlinear principal component analysis based on principal curves and neural networks. Computers and Chemical Engineering, 20, 65–78. Fortuna, L., Graziani, S., Rizzo, A., & Xibilia, M. G. (2007). Soft sensors for monitoring and control of industrial processes. London, UK: Springer. Höskuldsson, A. (1988). PLS regression methods. Journal of Chemometrics, 2, 211–228. Jackson, J. E. (1991). A user's guide to principal components. New York: Wiley. Kresta, J. V., Marlin, T. E., & MacGregor, J. F. (1996). Development of inferential process models using PLS. Computers and Chemical Engineering, 18, 597–611. Lee, D., Lee, W., Lee, Y., & Pawitan, Y. (2011). Sparse partial least-squares regression and its applications to high-throughput data analysis. Chemometrics and Intelligent Laboratory Systems, 109, 1–8. Li, W. H., Yue, H. H., Valle-Cervantes, S., & Qin, S. J. (2000). Recursive PCA for adaptive process monitoring. Journal of Process Control, 10, 471–486. Ma, M. D., Ko, J. W., Wang, S. J., Jang, S. S., Shieh, S. S., & Wong, D. S. H. (2009). Development of adaptive soft sensor based on statistical identification of key variables. Control Engineering Practice, 17, 1026–1034. Pan, C. C., Bai, J., Yang, G. K., Jang, S. S., & Wong, D. S. H. (2013). Variable selection for predicting end-of-line property using ergodic PLS nonnegative garrote regression. Journal of Process Control [Submitted for publication] Pan, T. H., Wong, D. S. H., & Jang, S. S. (2011a). A virtual metrology model based on recursive canonical variateanalysis with applications to sputtering process. Journal of Process Control, 21, 830–839. Pan, T. H., Wong, D. S. H., & Jang, S. S. (2011b). Development of a novel soft sensor using a local model network with an adaptive subtractive clustering approach. Industrial and Engineering Chemistry Research, 49, 4738–4747. Qin, S. J. (1998). Recursive PLS algorithms for adaptive data modeling. Computers and Chemical Engineering, 22, 503–514. Qin, S. J., & McAvoy, T. J. (1992). Nonlinear PLS modeling using neural networks. Computers and Chemical Engineering, 16, 379–391. Shi, R. J., & Macgregor, J. F. (2000). Modeling of dynamic systems using latent variable and subspace methods. Journal of Chemometrics, 14, 423–439. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B (Methodological), 58, 267–288. White, D. C. (2003). Creating the smart plant. Hydrocarbon Processing, 82, 41–50. Yoo, C. K., & Lee, I. B. (2004). Soft sensor and adaptive model-based dissolved oxygen control for biological waste water treatment processes. Environmental Engineering Science, 21, 331–340. Zhang, P. (1993). Model selection via multifold cross validation. The Annals of Statistics, 21, 299–313.