Chemometrics and Intelligent Laboratory Systems 183 (2018) 147–157
Contents lists available at ScienceDirect
Chemometrics and Intelligent Laboratory Systems journal homepage: www.elsevier.com/locate/chemometrics
A novel ensemble model using PLSR integrated with multiple activation functions based ELM: Applications to soft sensor development Xiaohan Zhang a, b, 1, Qunxiong Zhu a, b, 1, Zhi-Ying Jiang a, b, Yanlin He a, b, *, Yuan Xu a, b, ** a b
College of Information Science & Technology, Beijing University of Chemical Technology, Beijing, 100029, China Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing, 100029, China
A R T I C L E I N F O
A B S T R A C T
Keywords: soft sensor Extreme learning machine Ensemble Multi-activation functions Partial least squares regression Process industry
Soft sensor plays a decisive role in making control strategies and production plans. However, the difficulty in establishing accurate and robust soft sensors using an individual model is continuously increasing due to the increasing scale and complexity in modeling data. To handle this problem, an effective ensemble model using partial least squares regression (PLSR) integrated with extreme learning machine (ELM) with multiple activation functions (PLSR-MAFELM) is proposed in this paper. The proposed PLSR-MAFELM is simple in construction: firstly, train several ELM models assigned with different activation functions using the least squares solution; secondly, combine ELM models for enhancing accuracy and stability performance; finally, obtain the optimal ensemble outputs by aggregating the outputs of individual ELM models using PLSR. To test the performance of the proposed PLSR-MAFELM model, a UCI benchmark dataset and two real-world applications are selected to carry out simulation case studies. Simulation results show that PLSR-MAFELM can achieve good stability and accuracy performance, which indicates that the generalization capability of soft sensors can be improved through combining some single models.
1. Introduction Intelligent measurement of key process variables plays a crucial role in implementing control strategies and production plans for modern process industries [1,2]. As the scale of processes tending to be larger and larger, it becomes more and more difficult to directly measure key variables such as the content and product quality using hard meters. In addition, buying some meters is too costly. Compared with hard meters, soft sensor is a promising alternative. In general, the soft sensor technology can be utilized as an effective way to tackling this problem. Soft sensor modeling methods are usually divided into two categories: one is mechanism modeling [3,4] and the other one is data-driven modeling [5, 6]. Mechanism models can be established by analyzing process mechanism. Excessive process knowledge and practical experience are required for building mechanism models. With the increasing complexity of processes, mechanism modeling is very costly and even infeasible. The other modeling method, i.e. the data-driven method, can be developed by learning the relationship between inputs and outputs. Data-driven approaches have been attracting increasing attention from researchers and
engineers. Artificial neural network (ANN) is very popular in the field of data-driven modeling. ANNs own good features of self-learning ability, self-adaption ability, and strong nonlinear mapping ability [7]. Thus, ANNs based soft sensor models have been successfully applied to highly nonlinear and severely uncertain systems. He et al. adopted the ANN technique to develop soft sensors for complex chemical processes [8]. Cong and Yu developed ANN based soft sensors for estimating water quality for a wastewater treatment process [9]. Although neural network based soft sensors have been successfully applied, the accuracy and stability ability of a single ANN model is not as good as expected due to some limitations: 1) some changes in training datasets, the number of hidden layer nodes or activation functions will result in obvious changes in outputs; 2) with the increasing scale of process industries, the collected process data tend to be more and more complex with strong coupling and high nonlinearity. Thus, an individual model has insufficient ability to achieve accurate and robust performance. Some techniques can be used to enhance the generalization performance of ANNs, such as regularization, ensemble [10,11], and so on. Among those techniques, the ensemble learning seems to be effective.
* Corresponding author. Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing, 100029, China. ** Corresponding author. Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing, 100029, China. E-mail addresses:
[email protected] (Y. He),
[email protected] (Y. Xu). 1 These authors contributed to the work equally and should be regarded as co-first authors. https://doi.org/10.1016/j.chemolab.2018.10.016 Received 1 August 2018; Received in revised form 30 October 2018; Accepted 31 October 2018 Available online 12 November 2018 0169-7439/© 2018 Elsevier B.V. All rights reserved.
X. Zhang et al.
Chemometrics and Intelligent Laboratory Systems 183 (2018) 147–157
nonlinear functions of Sigmoid, tanh, and Gaussian are selected due to the successful applications as activation functions in different kinds of ANNs. The proposed PLSR-MAFELM model is easy to build. Firstly, train some different single ELM models with the selected five effective activation functions. ELM models with randomly assigned input weights and different nonlinear activation functions increase the diversity of individual models for ensemble. Secondly, combine ELM models to build an ensemble model. The optimal ensemble outputs can be obtained by aggregating the outputs of ELM models by finding an optimal regression relationship between the outputs of individual models and the expected outputs using partial least squares regression. Through an ensemble strategy, the proposed PLSR-MAFELM may obtain high accuracy and good stability. To test the performance of the proposed PLSR-MAFELM, three case studies using a UCI benchmark dataset and two real-world applications are implemented. Simulation results show that the proposed PLSR-MAFELM achieves good stability and high accuracy performance. The remaining parts of this paper are organized as follows: Section 2 provides some preliminaries about the basic ELM model and PLSR; the procedure of constructing the proposed PLSR-MAFELM is given in detail in Section 3; simulations using a UCI benchmark dataset and two complex chemical processes are implemented; simulation results are analyzed in Section 4; The concluding remarks are drawn in Section 5.
Fig. 1. Structure of ELM
Ann ensemble was firstly presented by Hansen and Salamon [12]. Ensemble learning refers to learning the same problem using finite individual neural networks (also known as subnets) and then aggregating the outputs of some individual ANN models. The generalization capability and stability can be much enhanced by aggregating the outputs of several single ANN models. What is more, previous studies indicated that because of the complementary strategy, the ANN ensemble model can achieve better performance than an individual ANN model when solving the same problem [13,14]. Due to the excellent capability, the ensemble learning strategy has been continuously attracting attentions from researchers in recent years. In this paper, an effective ANN model called extreme learning machine (ELM) is selected. ELM is an ANN with a feed-forward structure and only one hidden layer [15]. Different from some other back propagation based ANNs, there are no initial parameters such as the learning rate and the expected error required to determine. In ELM, a random generation manner is adopted for input weights and biases, and an extremely fast and optimal solution is utilized to calculate the weights between the hidden layer and the output layer [16,17]. Therefore, extremely fast learning speed can be achieved [18,19]. ELM has been widely used as an effective tool for soft sensor development. In addition, the activation functions of ELM can be assigned using any discontinuous and continuous functions. Due to these salient features, ELM is selected as an individual ANN model for the ensemble study in this paper. The output of an ensemble model is usually calculated through a simple average or a weighted average manner [20]. An ensemble model based on ELM was developed by training several single ELM models with random input parameters and random hidden nodes, and the median value is used as the model whole output [21]. Using the median value as the ensemble model whole output may be not accurate, resulting in that the uncertainty and collinearity of the ensemble model is large. To deal with this problem, the partial least squares regression (PLSR) algorithm is adopted to obtain the ensemble output by establishing an optimal model between individual models and the ensemble outputs. PLSR is good at dealing with multicollinearity and noisy data. Thus, the generalization performance of ensemble models can be improved by using PLSR [22–24]. Hence, a novel ensemble model integrating partial least squares regression with multiple activation functions based ELM (PLSR-MAFELM) is proposed. In the proposed PLSR-MAFELM, five functions of tanh, sin, Sigmoid, cos, and Gaussian are adopted as activation functions. The sin and cos functions are selected for handling process data with periodicity. The other three
2. Preliminaries The proposed soft sensor model is based on extreme learning machine (ELM) and partial least squares regression (PLSR). In this section, preliminaries of ELM and PLSR are provided. 2.1. ELM ELM with extremely fast training speed was presented in 2006 [25] and has been popular with researchers in recent years. Fig. 1 shows the ELM structure. It is clearly understood that ELM is a three-layer ANN. The learning algorithm of ELM is totally different from the traditional error back propagation (EBP) learning algorithm. A random generation manner is adopted for input weights and biases, and an extremely fast and optimal solution is utilized to calculate the weights connecting the output layer and the hidden layer [26]. In the EBP algorithm, some initial parameters such as the learning rate, the expected error and the circulation iteration times should be determined optimally. Unsuitable parameters will decrease the performance of EBP based ANNs. Compared with the EBP algorithm, some flaws such as adjusting initial parameters and falling into local optimal solutions can be avoided in ELM. The learning process of ELM is listed below. Suppose that a training dataset with I different samples S ¼ fðxi ; yi Þji ¼ 1; 2; …; Ig 2 ½Rn Rm is available. The number of the hidden nodes is K. Eq. (1) shows the output calculation of ELM: θðxi Þ ¼
K X
βk gðWk xi þ bk Þ
(1)
k¼1
where Wk ¼ ðwk1 ; wk2 ; …; wkn ÞT are the input connection weights, bk represents the bias of the corresponding middle layer node, βk ¼ ðβk1 ; βk2 ; …; βkm ÞT represent the output connection weights, Wk xi indicates inner product, and gðÞ represents the activation function. Eq. (2) is a simplified matrix manner for Eq. (1): Hβ ¼ Y
where HðW; x; bÞ
(2) gðW1 x1 þ b1 Þ ⋮ gðW1 xI þ b1 Þ
⋯ gðWK x1 þ bK Þ ⋱ ⋮ ¼IK is the hid⋯ gðWK xI þ bK Þ T
T
den layer output matrix. β ¼ ðβT1 ; βT2 ; …; βTK ÞKm , Y ¼ ðy T1 ; y T2 ; …; y TI ÞIm . A random generation manner is adopted for the learning parameters 148
X. Zhang et al.
Chemometrics and Intelligent Laboratory Systems 183 (2018) 147–157
Fig. 2. Structure of the ensemble model.
the performance of the individual ELM model.
of biases and weights, and the minimal 2-norm least-square solution is utilized to obtain output weights β: b β ¼ HþY
2.2. PLSR (3) PLSR is a new type of multivariate statistical data analysis method. PLSR was proposed by Wood and Abano et al., in 1983 [27]. In the past ten years, PLSR has been developed rapidly in terms of theories, methods and applications. PLSR can be described as a second-generation regression analysis method [28,29]. Two factors are taken into consideration in PLSR: extract principal components of the input X and the output Y; maximize the correlation between the principal components extracted from the input X and the output Y. In brief, PLSR is a combination of three basic algorithms of Principal Component Analysis (PCA), Canonical Correlation Analysis (CCA) and multiple linear regression analysis. Especially, when the internal height of each variable is linearly related, the PLSR method will be effective. The algorithm principle of PLSR is discussed below.
where H þ is the Moore-Penrose generalized inverse of H. Thus, the ELM learning procedure is summarized below: (1) Generate input weights and biases using a random manner. (2) Obtain the output matrix H of the hidden layer. (3) Use the least square solution to obtain the output weights β : β ¼ H þ y. Although ELM has some salient features, the individual ELM model, the collinearity information, and only one activation function assigned in the ELM still limit the performance. So, a novel ensemble model based on ELM models with multiple activation functions is proposed for enhancing 149
X. Zhang et al.
Chemometrics and Intelligent Laboratory Systems 183 (2018) 147–157
Table 2 Input factors of the PTA dataset. Input attribute no.
Descriptions
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Ethylene rate Catalyst feed rate Flash tank pressure Water in the ethane Reverse reactor pressure Hydrogen partial pressure ratio Reverse hydrogen ethylene partial pressure ratio Methane rate Ethylene partial pressure Reverse reactor temperature Ethylene partial pressure reactor Reactor temperature Reactor pressure Ethylene vinyl carbon monoxide content of impurities Catalyst feed rate
Input attribute no.
Attribute description
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Steam flow Feed quantity Water reflux Reflux tank level Reflux temperature N-butyl acetate side reflux N-butyl acetate main reflux Feed temperature Temperature of top tower Temperature point above the 35th tray Tray temperature near the low sensitive plate Tray temperature near the up sensitive plate Temperature point between the 35th tray and 40th tray Temperature point between the 44th tray and the 50th tray Temperature point between the 53rd tray and the 58th tray Produced quantity of top tower Feed composition (acetic acid content)
Table 3 Information of the three selected datasets.
Table 1 Input factors of the HDPE dataset. Descriptions
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Ethylene rate Catalyst feed rate Flash tank pressure Water in the ethane Reverse reactor pressure Hydrogen partial pressure ratio Reverse hydrogen ethylene partial pressure ratio Methane rate Ethylene partial pressure Reverse reactor temperature Ethylene partial pressure reactor Reactor temperature Reactor pressure Ethylene vinyl carbon monoxide content of impurities Catalyst feed rate
Input dimension
Output dimension
CCS HDPE PTA
8 15 17
1 1 1
2.2.1. Extract the first pair of components and make them most relevant Consider a set of data with m independent variables X1 ;X2 …;Xm , p dependent variables Y1 ; Y2 …; Yp are available. All variables are standardized. T and U are the components extracted from the independent variables and the dependent variables, respectively. The extracted components are often referred as partial least squares factors. T1 and U1 are shown as follows:
Fig. 3. Construction flowchart of the proposed PLSR-MAFELM model.
Input attribute no.
Datasets
T1 ¼ X11 X1 þ X12 X2 þ … þ X1m Xm ¼ X1 : X
(4)
U1 ¼ Y11 Y1 þ Y12 Y2 þ … þ Y1p Yp ¼ Y1 : Y
(5)
where X1 : ¼ ðX11 ; X12 ; …; X1m Þ represent model effect weights, and Y1 : ¼ ðY11 ; Y12 ; …; Y1p Þ represent dependent variable weights. In order to ensure that the variation information of T1 and U1 are extracted from variable groups as much as possible and ensure that the correlation between T1 and U1 is the largest, extracting the first component is transformed into a conditional extremum problem.
150
< t; u >¼< X0 ω1 ; Y0 υ1 >¼ ωT1 X T0 Y T0 υT1
(6)
ωT1 ω1 ¼ ω1 2 ¼ 1; υT1 υ1 ¼ υ1 2 ¼ 1
(7)
X. Zhang et al.
Chemometrics and Intelligent Laboratory Systems 183 (2018) 147–157
Fig. 4. Hidden layer node number determination of individual ELM models for CCS.
unit vectors ω1 and υ1 to make θ1 ¼ ωT1 X T0 Y0 υ1 maximum by calculating the eigenvalues and the eigenvectors of the matrix X T0 Y0 Y T0 X0 . θ1 is the maximum eigenvalue and the corresponding unit eigenvector is the solution ω1 . υ1 is obtained from the formula υ1 ¼ 1θ Y T0 X0 ω1 .
Table 4 Testing simulation results for the CCS benchmark dataset. Methods
ARE
RMSE
SD
FLNN BPNN ELM(Sigmoid) ELM(tanh) ELM(Gaussian) ELM(cos) ELM(sin) ELM ensemble [22] PLSR-MAFELM
0.199 0.161 0.163 0.168 0.163 0.173 0.161 0.154 0.141
7.124 7.164 6.646 7.328 6.897 7.045 6.633 6.080 6.181
0.203 0.382 0.262 0.246 0.214 0.291 0.237 0.193 0.159
2.2.2. Establish the equation from the initial variables to T1 X0 ¼ t1 aT1 þ E1
(8)
Y0 ¼ t1 βT1 þ F1
(9)
where aT1 ¼ ða11 ; …; a1m Þ; βT1 ¼ ðβ11 ; …; β1p Þ are the parameter vectors for only one argument t1 ; E1 and F1 are the n m and the n p residual matrix of the input space and the output space, respectively. The coefficient vectors a1 and β1 can be obtained on the basis of Ordinary Least Square (OLS). a1 is the model effect loading.
where t1 and u1 are the score vectors of the first pair of components obtained from samples; X0 and Y0 are the initial variables. Using the Lagrangian multiplier method, the above problem is transformed into
Fig. 5. Testing prediction distributions of CCS. 151
X. Zhang et al.
Chemometrics and Intelligent Laboratory Systems 183 (2018) 147–157
Fig. 6. Testing measurement errors of CCS.
Fig. 7. Hidden layer node number determination of individual ELM models for HDPE.
2.2.3. Obtain the final regression relationship If the first extracted component cannot reach the accuracy of the regression model, then use the residual matrix E1 , F1 instead of X0 , Y0 and repeat the Step 1 and Step 2 to extract components. Suppose that there are r components that are finally extracted. The regression equations of X0 and Y0 for the r components are shown as follows: X0 ¼ t1 aT1 þ … þ tr aTr þ Er
(10)
Y0 ¼ t1 βT1 þ … þ tr βTr þ Fr
(11)
equations to the original variables. The details of PLSR can be found in the reference [30]. The salient features of PLSR are mainly reflected in the following aspects: (1) PLSR is a regression modeling method of multiple dependent variables for multiple independent variables; (2) PLSR can extract the most comprehensive explanatory variables from the dependent variables and identify the information and noise in systems via using the method of decomposing and filtering information; (3) PLSR can be used for comprehensive applications of multiple data analysis methods.
Combine the previous equations to obtain the standardized variables Yj ¼ aj1 X1 þ … þ ajm Xm by regression, and then reduce the regression 152
X. Zhang et al.
Chemometrics and Intelligent Laboratory Systems 183 (2018) 147–157
Fig. 8. Hidden layer node number determination of individual ELM models for PTA.
collinearity. Finally, the proposed PLSR-MAFELM model is established. The steps of constructing the proposed PLSR-MAFELM are provided below. Assume that the data collected from processes are represented as S ¼ fðXi ; Yi Þji ¼ 1; 2; …; Ig , where Xi ¼ ½xi1; xi2; …; xin 2 Rn represents the ith training input data and xin represents the nth element in Xi; Yi 2 R indicates that there is a single output. The collected data are randomly separated into two sections: a training sample set Str ¼ fðXtr ; Ytr Þtr ¼
PLSR is good at dealing with multicollinearity and noisy information. Thus, PLSR is used to obtain the ensemble outputs by establishing an optimal regression model between the outputs of individual models and the expected outputs. 3. The proposed PLSR-MAFELM model Single ELM model cannot meet the requirements of accuracy and stability when building soft sensors. In order to handle this problem, in this paper a novel ensemble model integrating PLSR with multiple activation functions based ELM (PLSR-MAFELM) is proposed. Fig. 2 shows the topology structure of the proposed PLSR-MAFELM. The proposed PLSR-MAFELM owns several salient features listed as follows: first, several commonly used activation functions are adopted in each ELM model for increasing the diversity of the individual model; secondly, several single ELM models are combined; thirdly, PLSR is adopted to aggregate the outputs of individual ELM models for achieving optimal ensemble outputs. In the proposed PLSR-MAFELM model, five widely used activation functions of cos, sin, Gaussian, tanh and Sigmoid are assigned to five single ELM model, respectively. There is collinearity among the individual models’ outputs. The PLSR is used to deal with the
1; 2; …; Ntr ; Xtr 2 RNtr n ; Ytr 2 RNtr 1 g and a testing sample set Ste ¼ fðXte ; Yte Þte ¼ 1; 2; …; Nte ; Xte 2 RNte n ; Yte 2 RNte 1 g, I ¼ Ntr þ Nte . Step 1. Data preprocessing The input and output attributes are normalized to the same order of amplitude using Eq. (12) and Eq. (13), respectively.
Table 5 Testing performance comparison for HDPE.
xin ¼
xmax xin n ði ¼ 1; 2; …; IÞ xmax xmin n n
(12)
Yi ¼
Yi Ymin ði ¼ 1; 2; …; IÞ Ymax Ymin
(13)
Table 6 Testing performance comparison for PTA.
Methods
ARE (%)
RMSE (%)
SD (%)
Methods
ARE (%)
RMSE
SD (%)
FLNN BPNN ELM(Sigmoid) ELM(tanh) ELM(Gaussian) ELM(cos) ELM(sin) ELM ensemble [22] PLSR-MAFELM
0.038 0.046 0.163 0.168 0.163 0.173 0.161 0.042 0.029
0.070 0.118 6.646 7.328 6.897 7.045 6.633 0.115 0.043
0.058 0.034 0.262 0.246 0.214 0.291 0.237 0.106 0.031
FLNN BPNN ELM (sigmoid) ELM (tanh) ELM (Gaussian) ELM (cos) ELM (sin) ELM ensemble [22] PLSR-MAFELM
0.544 0.585 0.561 0.538 0.488 0.546 0.577 0.443 0.367
0.305 0.384 0.432 0.328 0.378 0.445 0.498 0.304 0.228
0.318 0.399 0.666 0.409 0.598 0.721 0.831 0.440 0.298
153
X. Zhang et al.
Chemometrics and Intelligent Laboratory Systems 183 (2018) 147–157
Fig. 9. Testing prediction distributions of HDPE.
Fig. 10. Testing prediction distributions of PTA.
Fig. 12. Testing errors of PTA.
Step 2. Generate five input weight matrices randomly W 1 ; W 2 ; W 3 ; T
W 4 ; W 5 , W 1 ¼ ðw11a ; w12a ; …; w1na Þ ; a ¼ 1; 2; …; A; W 2 ¼ ðw21b ; w22b ; …; T
w2nb ÞT ; b ¼ 1;2; …; B; W 3 ¼ ðw31c ; w32c ; …; w3nc Þ ; c ¼ 1; 2; …; C; W 4 ¼ T ðw41d ; w42d ; …; w4nd Þ ;d ¼ 1;2;…;D;
T ¼ ðw51e ;w52e ;…;w5ne Þ
W ;e ¼ 1;2;…;E: The hidden node number of the five ELM models assigned with the 5
activation functions of sinðxÞ; cosðxÞ; tanhðxÞ; ! ðxμÞ2 2σ 2
1 1þex ;
1 and pffiffiffiffi exp 2π σ
is represented as A, B, C, D, and E, respectively. The trial-and-error
method is used to determine the hidden node optimal number. The hidden layer inputs of five ELM models are calculated using the following equations, respectively.
Fig. 11. Testing errors of HDPE.
where xmax ¼ maxfx1n ; x2n ; …; xin g , xmin ¼ minfx1n ; x2n ; …; xin g , n n min ¼ 6 x ; Y is the minimum value of the output vector Y, and Ymax is xmax min n n the corresponding maximum value.
154
X 1tr ¼ Xtr W 1 ¼ x1tra Ntr A
(14)
X 2tr ¼ Xtr W 2 ¼ x2trb Ntr B
(15)
X 3tr ¼ Xtr W 3 ¼ x3trc Ntr C
(16)
X 4tr ¼ Xtr W 4 ¼ x4trd Ntr D
(17)
X. Zhang et al.
Chemometrics and Intelligent Laboratory Systems 183 (2018) 147–157
X 5tr ¼ Xtr W 5 ¼ x5tre Ntr E
b tr ¼ μ1 Y b 1tr þ μ2 Y b 2tr þ μ3 Y b 3tr þ μ4 Y b 4tr þ μ5 Y b 5tr Y
(18)
Step 3. Five activation functions are assigned respectively and the hidden layer outputs are obtained For the five single ELM models, sinðxÞ; cosðxÞ; tanhðxÞ;
Step 7. For the testing samples Ste , the predictions of the proposed b te can be obtained via the coefficient PLSR-MAFELM ensemble model Y obtained in Eq. (34).
1 1þex ;
! 1 and pffiffiffiffi 2π σ
exp
ðxμÞ2 2σ 2
b te ¼ μ1 Y b 1te þ μ2 Y b 2te þ μ3 Y b 3te þ μ4 Y b 4te þ μ5 Y b 5te Y
are assigned, respectively. Then, five hidden
layer output matrix can be obtained as follows: H 1tr ¼ g x1tr1 þ t11 ⋯ g x1trA þ t 1A Ntr A
(19)
H 2tr ¼ g x2tr1 þ t21 ⋯ g x2trB þ t 2B Ntr B
(20)
H 3tr ¼ g x3tr1 þ t31 ⋯ g x3trC þ t3C Ntr C
(21)
H 4tr
¼ g x4tr1 þ t41 ⋯ g x4trD þ t4D Ntr D
(22)
H 5tr
¼ g x5tr1 þ t51 ⋯ g x5trE þ t 5E Ntr E
(23)
(34)
(35)
Step 8. Evaluate the performance of the proposed PLSR-MAFELM In order to evaluate the performance, the root mean square error (RMSE), the average relative error (ARE) and the standard deviation (SD) are calculated. RMSE, ARE, and SD can be calculated out using Eq. (36), Eq. (37) and Eq. (38), respectively. vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u I u1 X b te 2 ARMSE ¼ t Yte Y I i¼1 I b te 1X Yte Y ARE ¼ I i¼1 Yte
Where t 1a ; t 2b ; t 3c ; t 4d ; t 5e represent the hidden node biases of the corresponding five single ELM models, respectively; a ¼ 1,2,…, A; b ¼ 1,2,…, B; c ¼ 1,2,…, C; d ¼ 1,2,…, D; e ¼ 1,2,…, E.
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u I u1 X SD ¼ t ðYte μÞ2 I i¼1
(36)
(37)
(38)
Step 4. obtain the output weights of the five ELM models þ β1tr ¼ H 1tr Ytr
(24)
þ β2tr ¼ H 2tr Ytr
(25)
þ β3tr ¼ H 3tr Ytr
(26)
þ β4tr ¼ H 4tr Ytr
(27)
þ β5tr ¼ H 5tr Ytr
(28)
where Nte represents the number of testing data, Yte represents the actual b te is the prediction of Yte ; μ is the arithmetic mean of value and Y Yte : The least square solution is adopted to train each single ELM model through Eqs. (24)-(28). PLSR with good ability in dealing with collinearity and noise is adopted to aggregate the outputs of individual models. Based on the analyses above, the construction flowchart of the proposed PLSR-MAFELM model is shown in Fig. 3. 4. Case study The PLSR-MAFELM performance is validated using one UCI benchmark dataset and two real-world industrial datasets. To make comparisons, different activation functions based five more individual ELM models, the ELM ensemble model proposed in the Ref. [21], the back propagation neural network (BPNN), and the functional link neural network (FLNN) are also developed. In each case study, 10-fold cross validation is carried out to test the performance. The data for training models are randomly divided into ten copies and nine of them are performed in turn as verification. The average of the 10 results is used as an indication of accuracy of models.
where H þ is the Moore-Penrose generalized inverse of the hidden layer output matrix; Ytr is the actual outputs of the corresponding training samples. Step 5. Calculate the outputs of individual ELM models for training samples Str . b 1tr ¼ H 1tr þ β1tr Y
(29)
b 2tr ¼ H 2tr þ β2tr Y
(30)
b 3tr ¼ H 3tr þ β3tr Y
(31)
b 4tr ¼ H 4tr þ β4tr Y
(32)
b 5tr ¼ H 5tr þ β5tr Y
(33)
4.1. Data descriptions To confirm the PLSR-MAFELM effectiveness, three datasets consisting of one UCI benchmark dataset and two industrial datasets are selected. The selected benchmark dataset is called Concrete Compressive Strength (CCS) obtained from the database from University of California, Irvine (UCI) and the website is listed as follows: http://archive.ics.uci.edu/ml/ datasets.html. The other two real-world industrial datasets are collected from a Purified Terephthalic Acid (PTA) process and a High Density Polyethylene (HDPE) process, respectively. The CCS dataset obtained from UCI consists of 1030 samples with 8 input factors and 1 output factor. High density polyethylene can be produced using the HDPE process. In the HDPE process, a key process
For the testing samples Ste , use the same manner to calculate the b 2 =Y b 3 =Y b 4 =Y b5. b 1 =Y outputs of individual ELM models Y te te te te te Step 6. Aggregate the outputs of five individual ELM models through establishing an optimal regression model between the outputs of each single ELM model and the expected outputs using PLSR.
155
X. Zhang et al.
Chemometrics and Intelligent Laboratory Systems 183 (2018) 147–157
errors of the other models, the proposed PLSR-MAFELM's testing errors are closer to the zero line. Simulation results of two industrial datasets also verify that the proposed PLSR-MAFELM's performance is more accurate and stable than other models.
variable named ethylene unit consumption should be predicted. From mechanism analyses, 15 variables shown in Table 1 are selected as input factors. Altogether 135 HDPE samples are collected. For the PTA process, the acetic acid consumption is necessary to predict accurately. Through mechanism analyses, 17 variables shown in Table 2 closely related to the acetic acid consumption are selected as the input factors. Altogether 260 PTA samples are collected. The selected three datasets are typical regression datasets with high nonlinearity and complexity. The information of the selected datasets is shown in Table 3.
4.3. Summary of simulations Simulation results of one UCI dataset and two real-world industrial dataset validate that the proposed PLSR-MAFELM achieves good accuracy and stability performance. The individual model performance is not as good as expected indicating that the performance of the individual model is unstable. In the proposed PLSR-MAFELM model, a complementary role is played by integrating different activation functions based ELM models for identifying complex nonlinear regression exactly. In addition, PLSR with good ability in dealing with collinearity and noise is utilized to aggregate the outputs of individual models for obtaining optimal ensemble outputs. Hence, the proposed PLSR-MAFELM ensemble model can achieve better performance than other models.
4.2. Simulation results analyses To validate the PLSR-MAFELM performance, five individual ELM ! models with cosðxÞ; sinðxÞ; tanhðxÞ;
1 pffiffiffiffi exp 2π σ
μÞ ðx 2σ 2
2
; 1þe1 x , an ELM
ensemble model, the BPNN, and the FLNN are developed. First, appropriate parameters of these models should be firstly selected. The optimal parameters can be determined when the corresponding relative errors reach the least. Finally, ARE, RMSE and SD are calculated.
5. Conclusions
4.2.1. Simulation result analyses of the CCS dataset Fig. 4 shows the performance in terms of the relative error for the CCS dataset. From Fig. 4, when the hidden layer node number is 160, the corresponding relative error of the ELM model with the sinðxÞ activation function reaches the least. In other words, the optimal hidden node number of the corresponding ELM model can be assigned as 160. Similarly, the hidden node number for the other ELM models with activation functions of cos, tanh, Sigmoid, and Gaussian is determined as 160, 170, 165 and 185, respectively. The proposed PLSR-MAFELM model is established based on the optimal structure of ELM models. The testing simulation performance for CCS is shown in Table 4. It can be seen that the proposed PLSR-MAFELM can achieve much smaller value in terms of ARE, RMSE and SD than other models. To further show the generalization performance, CCS testing prediction distributions and testing errors of the proposed PLSRMAFELM model, the ensemble model, the BPNN model, the FLNN model, and the ELM (sin) model (the individual model with the best performance) are provided in Fig. 5 and Fig. 6, respectively. From Fig. 5, the proposed PLSR-MAFELM model outputs are closer to the actual values than those of other models. From Fig. 6, the testing errors of the proposed PLSR-MAFELM are much closer to the zero line than those of other models. Higher accuracy and better stability can be obtained by the proposed PLSR-MAFELM than other models based on the simulation results.
In order to build reliable soft sensor models, a novel ensemble model integrating partial least squares regression with multiple activation functions based ELM (PLSR-MAFELM) is presented. The proposed PLSRMAFELM model can be easily established: firstly, assign different activation functions on ELM models and train the individual ELM models; secondly, aggregate the outputs of individual ELM models by using PLSR. The performance of the proposed PLSR-MAFELM model is verified by one benchmark dataset and two real-world industrial datasets. Simulation results indicated the proposed PLSR-MAFELM could obtain good performance in not only accuracy but also stability. The proposed PLSRMAFELM model can serve as an effective regression tool for reliable and intelligent soft sensor. In the future research work, other kinds of aggregating methods will be studied and utilized. Acknowledgements This research is supported by the National Natural Science Foundation of China under Grant Nos. 61703027 and 61533003, the Fundamental Research Funds for the Central Universities under Grant Nos. JD1808 and XK1802-4. Appendix A. Supplementary data Supplementary data to this article can be found online at https://doi. org/10.1016/j.chemolab.2018.10.016.
4.2.2. Simulation result analyses of industrial datasets Fig. 7 and Fig. 8 show the performance in terms of the relative error for the HDPE and PTA datasets, respectively. Similar with the node number determination of the CCS simulation, for the HDPE dataset the optimal hidden node number of the five corresponding ELM models is determined as 65; 60; 55; 65;and70, respectively. From Fig. 8, for the PTA dataset the optimal hidden node number of the five corresponding ELM models is determined as 55, 60, 35, 50 and 60, respectively. Then the proposed PLSR-MAFELM model can be constructed. The testing simulation performance for HDPE and PTA datasets is shown in Table 5 and Table 6, respectively. It can be concluded that the proposed PLSR-MAFELM model can achieve much smaller value in terms of ARE, RMSE and SD than other models. HDPE and PTA testing prediction distributions of the proposed PLSR-MAFELM model, the ensemble model, the BPNN model, the FLNN model, and the individual model with the best performance are provided in Fig. 9 and Fig. 10 to show the performance further, respectively. From Figs. 9 and 10, the proposed PLSRMAFELM outputs are closer to the actual values than those of other models. The HDPE and PTA testing errors are provided in Fig. 11 and Fig. 12, respectively. From Figs. 11 and 12, compared with the testing
References [1] Y.L. He, Y. Xu, Z.Q. Geng, Q.X. Zhu, Hybrid robust model based on an improved functional link neural network integrating with partial least square (IFLNN-PLS) and its application to predicting key process variables, ISA Trans. 61 (2016) 155–166. [2] T. Wang, H. Gao, J. Qiu, A combined adaptive neural network and nonlinear model predictive control for multirate networked industrial process control, IEEE Transactions on Neural Networks and Learning Systems 27 (2) (2016) 416–425. [3] C. Paris, L. Bruzzone, A growth-model-driven technique for tree stem diameter estimation by using airborne LiDAR data, IEEE Trans. Geosci. Rem. Sens. (99) (2018) 1–17. [4] A. Neri, E. Cagno, G. Di Sebastiano, A. Trianni, Industrial sustainability: modelling drivers and mechanisms with barriers, J. Clean. Prod. 194 (2018) 452–472. [5] M.B. Radac, R.E. Precup, R.C. Roman, Data-driven model reference control of MIMO vertical tank systems with model-free VRFT and Q-Learning, ISA Trans. 73 (2018) 227–238. [6] Y. Xie, X. Tang, B. Song, X. Zhou, Y. Guo, Data-driven adaptive fractional order PI control for PMSM servo system with measurement noise and data dropouts, ISA Trans. 75 (2018) 172–188. [7] N. Jiang, J. Xu, S. Zhang, Neural network control of networked redundant manipulator system with weight initialization method, Neurocomputing 13 (2018) 117–129. 156
X. Zhang et al.
Chemometrics and Intelligent Laboratory Systems 183 (2018) 147–157 [19] Z.M. Yaseen, R.C. Deo, A. Hilal, A.M. Abd, L.C. Bueno, S. Salcedo-Sanz, M.L. Nehdi, Predicting compressive strength of lightweight foamed concrete using extreme learning machine model, Adv. Eng. Software 115 (2018) 112–125. [20] J. Tang, J. Qiao, J. Zhang, Z. Wu, T. Chai, W. Yu, Combinatorial optimization of input features and learning parameters for decorrelated neural network ensemblebased soft measuring model, Neurocomputing 275 (2018) 1426–1440. [21] R. Zhang, Z.Y. Dong, Y. Xu, K. Meng, K.P. Wong, Short-term load forecasting of Australian National Electricity Market by an ensemble model of extreme learning machine, IET Generation, Transm. Distrib. 7 (4) (2013) 391–397. [22] Y.L. He, Z.Q. Geng, Y. Xu, Q.X. Zhu, A robust hybrid model integrating enhanced inputs based extreme learning machine with PLSR (PLSR-EIELM) and its application to intelligent measurement, ISA Trans. 58 (2015) 533–542. [23] J. Hussain, F. Mabood, A. Al-Harrasi, L. Ali, T.S. Rizvi, F. Jabeen, S.H.S. Al Ghawi, New robust sensitive fluorescence spectroscopy coupled with PLSR for estimation of quercetin in Ziziphus mucronata and Ziziphus sativa, Spectrochim. Acta Mol. Biomol. Spectrosc. 5 (2018) 152–157. [24] J. Qiao, G. Wang, W. Li, X. Li, A deep belief network with PLSR for nonlinear system modeling, Neural Network. 104 (2018) 68–79. [25] G.B. Huang, Q.Y. Zhu, C.K. Siew, Extreme learning machine: theory and applications, Neurocomputing 70 (1) (2006) 489–501. [26] G.B. Huang, H. Zhou, X. Ding, R. Zhang, Extreme learning machine for regression and multiclass classification, IEEE Transactions on Systems 42 (2) (2012) 513–529. [27] S. Wold, C. Albano, M. Dun, Pattern Regression Finding and Using Regularities in Multivariate Data, Analysis Applied Science Publication, 1983. [28] G. Gagula, D. Magdic, D. Horvat, PLSR modelling of quality changes of lager and malt beer during storage, J. Inst. Brew. 122 (1) (2016) 116–125. [29] Q. Cheng, D.W. Sun, Application of PLSR in correlating physical and chemical properties of pork ham with different cooling methods, Meat Sci. 70 (4) (2005) 691–698. [30] M. Nocita, A. Stevens, G. Toth, P. Panagos, B. van Wesemael, L. Montanarella, Prediction of soil organic carbon content by diffuse reflectance spectroscopy using a local partial least square regression approach, Soil Biol. Biochem. 68 (2014) 337–347.
[8] X.H. Zhang, Q.X. Zhu, Y.L. He, Y. Xu, Energy modeling using an effective latent variable based functional link learning machine, Energy 162 (2018) 883–891. [9] Q. Cong, W. Yu, Integrated soft sensor with wavelet neural network and adaptive weighted fusion for water quality estimation in wastewater treatment process, Measurement 124 (2018) 436–446. [10] K. Dai, J. Zhao, F. Cao, A novel decorrelated neural network ensemble algorithm for face recognition, Knowl. Base Syst. 89 (2015) 541–552. [11] M. Chao, S.Z. Xin, L. San Min, Neural network ensembles based on copula methods and Distributed Multiobjective Central Force Optimization algorithm, Eng. Appl. Artif. Intell. 32 (2013) 203–212. [12] L.K. Hansen, P. Salamon, Neural network ensembles, IEEE Trans. Pattern Anal. Mach. Intell. 12 (10) (1990) 993–1001. [13] B.T. Pham, A. Shirzadi, D.T. Bui, I. Prakash, M.B. Dholakia, A hybrid machine learning ensemble approach based on a Radial Basis Function neural network and Rotation Forest for landslide susceptibility modeling: a case study in the Himalayan area, India, Int. J. Sediment Res. 33 (2) (2018) 157–170. [14] E. Hadavandi, J. Shahrabi, S. Shamshirband, A novel boosted-neural network ensemble for modeling multi-target regression problems, Eng. Appl. Artif. Intell. 45 (2015) 204–219. [15] Z.S. Chen, B. Zhu, Y.L. He, L.A. Yu, A PSO based virtual sample generation method for small sample sets: applications to regression datasets, Eng. Appl. Artif. Intell. 59 (2017) 236–243. [16] Z. Geng, J. Dong, J. Chen, Y. Han, A new Self-Organizing Extreme Learning Machine soft sensor model and its applications in complicated chemical processes, Eng. Appl. Artif. Intell. 62 (2017) 38–50. [17] H. Liu, X. Mi, Y. Li, An experimental investigation of three new hybrid wind speed forecasting models using multi-decomposing strategy and ELM algorithm, Renew. Energy 123 (2018) 694–705. [18] X.H. Zhang, Q.X. Zhu, Y.L. He, Y. Xu, A novel robust ensemble model integrated extreme learning machine with multi-activation functions for energy modeling and analysis: application to petrochemical industry, Energy 162 (2018) 593–602.
157