Journal Pre-proof Soft sensor modeling of industrial process data using kernel latent variables-based relevance vector machine Hongbin Liu, Chong Yang, Mingzhi Huang, ChangKyoo Yoo
PII: DOI: Reference:
S1568-4946(20)30089-2 https://doi.org/10.1016/j.asoc.2020.106149 ASOC 106149
To appear in:
Applied Soft Computing Journal
Received date : 6 January 2019 Revised date : 8 January 2020 Accepted date : 27 January 2020 Please cite this article as: H. Liu, C. Yang, M. Huang et al., Soft sensor modeling of industrial process data using kernel latent variables-based relevance vector machine, Applied Soft Computing Journal (2020), doi: https://doi.org/10.1016/j.asoc.2020.106149. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2020 Elsevier B.V. All rights reserved.
*Manuscript Click here to view linked References
Journal Pre-proof
4
Hongbin Liu1,*, Chong Yang1,**, Mingzhi Huang2, ChangKyoo Yoo3
2
5
1
pro of
3
Soft Sensor Modeling of Industrial Process Data Using Kernel Latent Variables-Based Relevance Vector Machine
1
Co-Innovation Center of Efficient Processing and Utilization of Forest Resources, Nanjing Forestry University, Nanjing 210037, China
6 7
2
Environmental Research Institute, Key Laboratory of Theoretical Chemistry of
Environment Ministry of Education, South China Normal University, Guangzhou
9
510631, China 3
Department of Environmental Science and Engineering, College of Engineering, Kyung Hee University, Yongin 446701, Korea
11
lP
10
re-
8
Corresponding authors:
13 14
*H.L. Tel.: +86-25-85427620; Fax: +86-25-85428793; E-mail:
[email protected] **C.Y. Tel.: +86-25-85427620; Fax: +86-25-85428793; E-mail:
[email protected]
15
First author:
16 17
H.L. E-mail:
[email protected]
19 20 21 22
Revised Version
Jo
18
urn a
12
Applied Soft Computing January 8, 2020
1
Journal Pre-proof Abstract: A composite model integrating latent variables of kernel partial least
2
squares with relevance vector machine (KPLS-RVM) has been proposed to improve
3
the prediction performance of conventional soft sensors when facing industrial
4
processes. First, the latent variables are extracted to cope with the high dimensionality
5
and complex collinearity of nonlinear process data by using KPLS projection. Then,
6
the probabilistic method RVM is used to develop predictive function between latent
7
variables and the output variable. The performance of the proposed method is
8
evaluated through two case studies based on subway indoor air quality (IAQ) data and
9
wastewater treatment processes (WWTP) data, respectively. The results show the
10
superiority of KPLS-RVM in prediction performance over the other counterparts
11
including least squares support vector machine (LSSVM), PLS-LSSVM, PLS-RVM,
12
and KPLS-LSSVM. For the prediction of effluent chemical oxygen demand in
13
WWTP data, the coefficient of determination value of KPLS-RVM has been
14
improved by approximately 7.30-19.65% in comparison with the other methods.
re-
lP
urn a
15
pro of
1
Keywords: Latent variable modeling; Kernel partial least squares; Relevance vector
17
machine; Indoor air quality; Wastewater treatment processes
18 19
Jo
16
2
Journal Pre-proof 1
1. Introduction Important variables in industrial processes should be accurately measured to
3
guarantee process quality, such as the effluent chemical oxygen demand (COD) and
4
total nitrogen (TN) in wastewater treatment processes [1, 2]. Nevertheless, it is always
5
difficult to measure these quality variables online. Although hardware soft sensors can
6
be used for online measurement, some problems still exist, including time-consuming
7
maintenance, need for calibration, aged deterioration, insufficient accuracy, and
8
measurement delay [3, 4]. In recent years, soft sensors can be used to alleviate the
9
shortcomings of their hardware counterparts by constructing the statistical models
10
between the difficult-to-measure variables and the easy-to-measure variables, through
11
which the important variables will be predicted online [5]. In practice, the soft sensors
12
are generally divided into two types: model-driven and data-driven. The model-driven
13
method is usually based on first principle models, which interpret the physical and
14
chemical traits of the process. However, it is always difficult to develop soft sensors
15
from the first principal models due to the lack of generalization ability. For the
16
data-driven model, the emphasis is placed on mining the hidden information of data.
17
Thus, the data-driven soft sensors could be more suitable for dealing with the complex
18
process features [1].
Jo
urn a
lP
re-
pro of
2
19
In industrial processes, nonlinearity is one of the most obvious factors which
20
hinder the modeling performance of soft sensors. To deal with nonlinear
21
characteristics of data efficiently, several machine learning-based methods have been
22
proposed [6-12]. Gonzaga et al. [7] developed a soft sensor based on artificial neural 3
Journal Pre-proof
network (ANN) to provide an online estimation of the polyethylene terephthalate
2
viscosity and realize a beneficial way for real-time process control. Based on the
3
combination of just-in-time and least squares support vector machine (LSSVM), Liu
4
and Yoo [13] developed an accurate soft sensor to predict particulate matters in a
5
subway station. Compared to ANN, LSSVM model provides better generalization
6
ability for nonlinear industrial process data [14]. In addition to the nonlinearity,
7
however, the uncertain characteristic provided by random noises is also very common
8
among process variables. In this case, the probabilistic interpretation is more suitable
9
for soft sensor modeling. Nevertheless, LSSVM cannot capture the uncertainty of the
10
variables, and the kernel function must satisfy the Mercer’s conditions. Different with
11
aforementioned models, relevance vector machine (RVM) is a probabilistic method
12
which has been proposed in 2001 [15]. While taking the similar structure of function
13
as support vector machine, RVM adopts a probabilistic framework without the
14
limitation of kernel function form. In particular, the posterior distributions of the
15
model weights governed by the hyper-parameters provide RVM model with sparse
16
structure. Detailed comparisons of advantages and disadvantages of LSSVM and
17
RVM are listed in Table 1. Based on RVM model, Ge and Song [14] made a
18
comparison with LSSVM through two industrial case studies, and then they proposed
19
that the probabilistic estimation of RVM is very important for the prediction of
20
process data which contain random noises. Liu et al. [16] combined Bayesian
21
principal component analysis with RVM to construct a probabilistic self-validating
22
soft sensor for wastewater treatment plants. Considering the modeling efficiency of
Jo
urn a
lP
re-
pro of
1
4
Journal Pre-proof
the RVM method, Chen et al. [17] built a nonlinear regression between the score
2
matrices of PLS for WLAN indoor localization, and the results showed that the
3
RVM-PLS algorithm performed higher positioning accuracy. Other applications of
4
the RVM soft sensor have been developed in works by Wong et al. [18], Ji et al. [19],
5
Wang et al. [20], etc.
pro of
1
In practice, however, the industrial data sets usually have high dimensional and
7
collinear features derived from the complexity of process control and the underlying
8
physical or chemical principles, which will always result in an unreliable prediction
9
for the soft sensor [21]. To handle the high dimensionality of process data, latent
10
variable methods (LVMs) are proposed [22]. The main idea of LVMs is to transform
11
the high-dimensional data into desirable components, usually named as latent
12
variables (LVs), for interpreting the properties behind the original data with reduced
13
dimensionality. Moreover, due to the uncorrelated feature of LVs, the collinearity of
14
process data can also be removed. Generally, LVMs are developed by linear statistical
15
methods, such as principal component analysis (PCA), partial least squares (PLS), and
16
canonical correlation analysis (CCA) [22-24]. Based on the LVs, LVMs has been
17
mainly used in fault detection and diagnosis. However, the traditional process
18
monitoring is based on the assumption that the process variables have linear
19
correlation with each other. Actually, due to the various operating conditions, some of
20
the industrial processes usually contain the variables with more complicated features
21
[25]. Therefore, the traditional LVMs may not perform well when used for industrial
22
process data. For the nonlinear extensions of LVMs, kernel-based algorithms are
Jo
urn a
lP
re-
6
5
Journal Pre-proof
widely accepted due to the efficient interpretation for nonlinear features [26]. After
2
using kernel functions, the nonlinear mapping and complex inner product calculations
3
can be implemented in a simpler way. Thus, the nonlinearity and collinearity of
4
industrial process data can be overcome by kernel-based LVMs simultaneously. Table
5
1 provides the detailed illustrations of advantages and disadvantages of LVMs and
6
kernel-based LVMs. Successful applications have been found in different areas
7
including fault detection, reconstruction, and diagnosis [27-31]. For the prediction
8
model construction, there are also quite a few number of studies. Nicolaï [32] has
9
applied kernel partial least squares (KPLS) to the estimation of sugar content in apple
10
based on near infrared reflectance measurements. Wang [33] evaluated the
11
effectiveness of the KPLS-based prediction by using two nonlinear simulation
12
examples. To improve the prediction performance for modeling structure-activity
13
relationship in pharmaceutical industry, Huang [34] proposed a modified method
14
which took the variable importance of KPLS into account.
urn a
lP
re-
pro of
1
Although the powerful nonlinear models and latent variable methods have been
16
introduced separately in this work, they can actually be used at the same time. For
17
example, the nonlinear methods such as LSSVM and ANN can be used to build the
18
inner structures of linear PLS for addressing the nonlinear problem [35, 36].
19
Moreover, the LVs of latent variable models can also be utilized as the input of
20
nonlinear methods. As mentioned above, by using the LVMs, the high dimensionality
21
and colinearity of the process data can be solved simultaneously. Therefore, the
22
prediction tasks can be carried out more easily and efficiently [37].
Jo
15
6
Journal Pre-proof
Based on the research above, the modeling performance can be further improved
2
with the combinations of LVMs and nonlinear methods. In this work, a new kernel
3
LVs-based soft sensor, denoted as KPLS-RVM, has been proposed to interpret the
4
high dimensionality, nonlinear relations, and uncertain feature of industrial process
5
data concurrently. First, KPLS is proposed to handle the nonlinear characteristic and
6
collinear structure of industrial data by using the lower dimensions latent variables.
7
Then the kernel LVs instead of the original inputs are used for the probabilistic
8
estimation of RVM model. Therefore, the LVs of KPLS model are integrated to
9
obtain the KPLS-RVM model, through which the more-accurate prediction results can
10
be achieved. To make a fair comparison, the corresponding counterparts including
11
LSSVM, RVM, PLS-LSSVM, PLS-RVM, and KPLS-LSSVM are also conducted for
12
industrial modeling. The detailed explanations for the proposed methods (represented
13
by acronyms) are provided in Appendix A.
lP
re-
pro of
1
This paper is organized in the following manner. Section 2 gives the construction
15
process of the composite KPLS-RVM algorithm. Section 3 provides two industrial
16
cases implemented by ourselves with the software MATLAB R2010b on a desktop
17
computer with 3.3 GHz CPU and 4 GB of memory. Finally, conclusions are presented
18
in Section 4.
Method LVMs
Jo
19
urn a
14
Table 1. Comparisons of four basic elements of composite soft sensors Advantages
Disadvantages
(1) Easy to implement
(1) Can be unreliable when facing
(2) Be able to reduce dimensionality and
nonlinear data sets 7
Journal Pre-proof
eliminate collinearity of industrial (2) The number of LVs should be data Kernel-based
(1) Be able to reduce dimensionality and (1) Suitable eliminate collinearity of industrial data (2) By
kernel
parameter
tuning is always needed
pro of
LVMs
determined in advance
(2) The number of LVs should be
using
kernel
functions,
nonlinearity can be interpreted
the
determined in advance
(3) Kernel selection is necessary for various of nonlinearity
(1) Provides generalization performance (2) Simple calculation
(1) Kernel function must satisfy
re-
LSSVM
Mercer’s conditions
RVM
urn a
lP
(2) The kernel parameters should be tuned (3) Cannot provide the uncertainty
(1) Has no limitation of kernel function (1) form
of predictions The hyper-parameters should be tuned
(2) Provides sparse structure
1
Jo
(3) Probabilistic prediction
2
2. Materials and methods
3
2.1. Partial least squares (PLS)
8
Journal Pre-proof 1
The target of partial least squares is to search for some suitable latent variables that
2
can exploit variance structures of process variables while trying to interpret quality
3
variables [23].
4
Assuming that all of the input matrix X [x1 ,
, xn ] R nm and output matrix
Y [ y1 ,
6
number of measurements, m and p represent the number of process variables quality
7
variables, respectively. The original input and output data are usually scaled to
8
zero-mean and unit-variance before further decomposition. The PLS then projects the
9
scaled data X and Y to a low dimensional space defined by the following score
re-
10
, y n ] R n p used in the full text share the same structures, where n is the
pro of
5
vectors, respectively
X = TPT E T Y = UQ F
(1)
lP
11
where T and U are latent matrices of X and Y, P and Q are loading matrices, E and F
13
represent residual matrices. Traditionally, all important latent variables can be
14
extracted one after another using deflations and iterations methods named the
15
nonlinear iterative partial least squares (NIPALS). The details of the NIPALS
16
algorithm can be found in [38].
18
2.2. Kernel Partial least squares (KPLS)
Jo
17
urn a
12
19
The construction of KPLS can be divided into two steps: first the input variables are
20
transformed into a high-dimensional feature space by nonlinear mapping, and then the
21
linear PLS method can be modelled between the high-dimensional variables and
22
output variables. According to Cover’s theorem, the nonlinear features of input data 9
Journal Pre-proof 1
could be changed into linear ones after the proposed feature mapping [39]. Therefore,
2
the nonlinear information can be extracted efficiently by linear composition of KPLS. The nonlinear mapping used here can be expressed as i (xi ) [34]. Then in the
4
feature space, the inner product of two mapped vectors i and j can be performed
5
explicitly by using the kernel function of the original vectors x i and x j
6
( i 1, 2,
pro of
3
, n , j 1, 2, , n )
i T j K (xi , x j )
7 8
(2)
The kernel function K ( ) used in this paper is Gaussian function: 2
K (xi , x j ) exp(( xi x j ) / cG2 )
(3)
re-
9 10
where cG is the kernel width need be tuned before. The sample data in feature space
11
can be set as
13
n
(4)
decomposition can be written as
urn a
15
, (xn ))T
T T Then K ( X) ( X) denotes the kernel Gram matrix, and the KPLS i i i 1
14
lP
( X) ( (x1 ), (x2 ),
12
= TPT E T Y = UQ F
(5)
16
where P and Q are loading matrices for and Y, respectively, E and F represent
17
the residual matrices, T = [t1 , t 2 ,
18
matrices (l is the number of latent variables) which derived from the NIPALS
19
algorithm. Then the deflations of K and Y are given by
U = [u1 ,u 2 ,
,ul ] are the latent
Jo
, t l ] and
20
K (I tt T K(I tt T
(6)
21
Y Y tt T Y
(7) 10
Journal Pre-proof 1
where I is an identify matrix with n-dimensional and t represents a latent variable of
2
matrix K in each loop. For the training data in feature space, the kernel latent matrix
3
T takes the form [26]
T KU(TTKU)-1
5
(8)
pro of
4
For the test data, the latent matrix is expressed as
Tt K t U(TT KU)-1
6
(9)
where K t K (xi , x j ) , x i is the ith test variable, x j is the jth training variable. It is
8
worth mentioning that before decomposition of high-dimensional space, centralization
9
should be carried out [33].
re-
7
10
13 14
lP
12
2.3. Relevance vector machine
To compare with RVM, LSSVM is an appropriate choice for modeling purpose, and the detailed description of LSSVM is provided in reference [40]. Based on the framework of SVM, RVM was developed through the Bayesian
urn a
11
x , y
15
probabilistic method by Tipping [14, 15]. Let
16
pairs, the following probabilistic function can be used to express the regression
17
between the input data and output data
y f (x, w ) e
i
i 1,2, n
denote the input-target
(10)
Jo
18
i
19
where e comes from the noise process, which is subject to the mean-zero Gaussian
20
distribution with variance σ2. Similar to the LSSVM, the nonlinear function f(x) can
21
also be described as a linearly-parameterized model with the weighted sum of the
22
kernel functions 11
Journal Pre-proof n
n
i 1
i 0
f (x, w ) wi K (x, xi ) w0 wi (x)
1
w w0 , w1 , w2 ,
, wn
and ( x)
2
where
3
(x)= 1,K (x, x1 ), K (x, x 2 ), , K ( x, x n ) .
4
T
,
are
(11)
kernel
functions
with
T
According to the description above, the condition distribution of the output data can
p( y x) N ( f (x, w), 2 ) . Considering the assumption of
be described as
6
independence of y, the likelihood of the data set can be developed as p(y w, 2 )
7
y ( y1 , y2 , , yn )T
1 (2 )
2 n 2
pro of
5
exp{
1
2
y w } 2
2
= 1 (x), 2 (x),
, n ( x)
(12)
8
where
9
maximum-likelihood estimation could be used to get suitable parameters of w and σ2
10
in equation (12). However, over fitting problem could be inevitable due to the same
11
number of parameters and training data. Thus a Bayesian perspective is adopted by
12
RVM to constrain the parameters w and σ2 submitting a prior probability distribution,
13
and the details of which can be found in Appendix B, from which the posterior
14
parameter distribution is also provided by using the Bayesian rule [15].
T
.
The
lP
re-
and
urn a
15
,
Assuming that the optimal hyper-parameters α o and
o2 have been obtained
16
from Appendix B, then for the new sample data (xnew), the estimation using RVM can
17
be calculated as
19 20 21
p( yˆnew xnew , y, αo , o2 ) p( yˆnew xnew , w, o2 ) p(w xnew , y, αo , o2 ) dw
Jo
18
(13)
which is also Gaussian distributed, with
y ,new oT (x new )
(14)
y2,new o2 T (x new ) Σo (x new )
(15)
12
Journal Pre-proof 1
where
(xnew ) [1, K (xnew , x1 ), K (xnew , x2 ), , K (xnew , xn )]T
,
and
Σo ( o2ψ T (x)ψ (x) diag(αo ))1 .
2 3
2.4. Implementation procedure of soft sensors based on latent variables
pro of
4
In this work, the method of Jolliffe’s three parameters was adopted as a detection
6
criteria for data pretreatment, and the details of this method can be found in references
7
[13, 41]. The root-mean-square error (RMSE) and coefficient of determination (R2)
8
were employed to compare the modeling performance of the proposed models,
9
precisely.
re-
5
10
n
RMSE=
( yˆi yi )
( yˆ i yi )
lP
R = 1
11
(16)
n
n
2
2
i 1
i=1 n
( yi y )
2
2
(17)
i=1
In equations (16) and (17), yi and yˆ i are the measured and predicted values,
12
respectively, y is the mean value and n represents the number of samples.
urn a
13 14
The overall procedure of the aforementioned prediction model is shown in Fig. 1. And the main steps are described as follows:
16
Step 1. Detect outliers using Jolliffe’s three parameters [42].
17
Step 2. Scale X and y to be zero mean and unit variance.
18
Step 3. Calculate latent variables of KPLS model.
Jo
15
19
Step 3.1. Compute the kernel matrix of X by K (X)T (X) .
20
Step 3.2. Calculate the latent matrix for the training data set by T KU(TT KU)-1
21
. 13
Journal Pre-proof 1 2
T -1 Step 3.3. Calculate the latent matrix for the test data set by Tt K t U(T KU) .
Step 4. Obtain a training model using RVM.
o2 using T and y.
Step 4.1. Optimize hyper-parameters α o and
4
Step 4.2. Calculate posterior mean .
5
T Step 4.3. Predict yˆ new using y ,i o (t t ,i ) , where t t ,i is a row vector of Tt
6
pro of
3
and i is the sample number.
7
Step 5. Rescale yˆ new according to the mean and variance of y.
8
Step 6. Measure the prediction accuracy using RMSE and R2.
11 12 13
Jo
10
urn a
lP
re-
9
Fig. 1. Flow chart of KPLS-RVM soft sensor modeling.
3. Results and discussion
14
Journal Pre-proof
Given the fact that the indoor air quality (IAQ) plays an important role for public
2
health, and the emissions of hazardous pollutants from wastewater treatment
3
processes (WWTPs) present the significant challenges for environmental protection, it
4
is necessary to construct the reliable and accurate prediction models for process
5
control and monitoring. In this section, we have used the proposed soft sensor to
6
model two types of quality variables from IAQ data in a subway station [43] and in a
7
papermaking WWTP data [44]. The preprocessing step based on Jolliffe’s three
8
parameters was implemented before further analysis. As shown in plots (a) and (b) of
9
Fig. 2, for the IAQ data and the papermaking WWTP data, the points beyond the
re-
10
pro of
1
control limits should be excluded from the sample, respectively.
(a)
Jo
urn a
lP
11
15
(b)
pro of
Journal Pre-proof
Fig. 2. Outlier detection using Jolliffe’s three parameters in terms of (a) IAQ data
2
and (b) papermaking WWTP data (dotted lines represent the control limits).
re-
1
3
3.1. Indoor air quality in a subway station
lP
4
The IAQ data for testing the modeling performance were collected from a central
6
subway station in Seoul in January 2010. As shown in Fig. 3, there are eight IAQ
7
variables, containing nitrogen monoxide (NO), nitrogen oxides (NO2), carbon
8
monoxide (CO), carbon dioxide (CO2), particulate matter with diameters of less than
9
10 μm (PM10) and 2.5 μm (PM2.5), temperature, and humidity. Originally 744 data
10
points with a sampling interval of 1 hour were extracted from the subway IAQ
11
measuring system. Among the 744 original data points, eight samples are removed
12
because at least one variable had a missing value. After removing 41 outliers shown in
13
Fig. 2 (a), the total number of samples used for process modeling are 695, and the
14
numbers of training and test data are set as 535 and 160, respectively.
Jo
urn a
5
15
16
re-
pro of
Journal Pre-proof
1
lP
Fig. 3. Variations in the IAQ data
2
To illustrate the performance of KPLS-RVM, PM2.5 of IAQ data is used for
4
prediction. Five counterparts, including LSSVM, RVM, PLS-LSSVM, PLS-RVM,
5
and KPLS-LSSVM, are considered for comparison. Moreover, five times Monte
6
Carlo cross validation is used to evaluate the prediction accuracy of each model. In
7
each cross validation of the latent variable-based model, the optimum number of LVs
8
should be determined in advance. According to the variance of each LV listed in Table
9
2, it is clear that there is no obvious increase in the cumulative variance of output
10
variable after the third LV is reached. Therefore, we choose three LVs for the PLS
11
modeling. In the same way, it is appropriate to select five LVs for KPLS modeling
12
based on Table 3. Combined with the latent variables chosen before, the cumulative
Jo
urn a
3
17
Journal Pre-proof
variance values of PLS (78.75% for input data and 77.52% for output data) seem to be
2
similar with those of KPLS (80.23% for input data and 78.36% for output data).
3
However, there is a significant difference between the input data variance (16.07%)
4
and output data variance (3.68%) of the third LV in PLS, which could, to some degree,
5
weaken its capacity of data interpretation. The number of LVs, all of the other
6
parameters for the first cross validation are provided in Table 4.
7 8
pro of
1
Table 2. Percent variance captured by PLS model Input variables (X)
re-
LV
Output variable (y)
Cumulative variance (%)
Variance (%)
Cumulative variance (%)
1
39.14
39.14
58.42
58.42
2
23.54
62.68
15.42
73.84
3
16.07
78.75
3.68
77.52
4
7.94
86.69
0.56
78.09
5
6.99
93.68
0.25
78.34
6
3.37
97.05
0.03
78.37
7
2.95
100
0
78.37
10
urn a
9
lP
Variance (%)
Jo
Table 3. Percent variance captured by KPLS model Input variables (X)
LV
1
Output variable (y)
Variance (%)
Cumulative variance (%)
Variance (%)
Cumulative variance (%)
37.89
37.89
48.01
48.01
18
Journal Pre-proof
21.30
59.19
17.39
65.40
3
9.98
69.17
6.73
72.13
4
8.14
77.31
2.28
74.41
5
2.92
80.23
3.95
78.36
6
7.28
87.50
7
1.15
88.66
1 2
pro of
2
1.03
79.39
1.99
81.38
Table 4. Parameters of different modeling methods for IAQ data set Parameters for latent variables
Parameters for soft sensors
LSSVM
NaN
Kernel-‘Gaussian’;
re-
Methods
NaN
urn a
RVM
PLS-LSSVM
Number of LVs-3
Number of LVs-3
Jo
PLS-RVM
lP
kernel width-3.23;
KPLS-LSSVM
KPLS-RVM
regularization parameter-9.35 Kernel-‘Gaussian’; kernel width-1.60 Kernel-‘Gaussian’; kernel width-1.66; regularization parameter-4.40 Kernel-‘Gaussian’; kernel width-8.95
Number of LVs-5;
Kernel-‘Gaussian’
kernel-‘Gaussian’;
kernel width-1.55;
kernel width-1.87
regularization parameter-4.33
Number of LVs-5;
Kernel-‘Gaussian’; 19
Journal Pre-proof kernel-‘Gaussian’;
kernel width-2.30
kernel width-1.87 1
For the detailed performance of different models, the average prediction indices
3
derived from cross validations are shown in Table 5. In terms of the non-LVs models,
4
RVM shows slightly better prediction performance than LSSVM, with a RMSE of
5
19.565 and a R2 of 0.785. The reason could be the uncertainty of the process data
6
captured by RVM, which provides this model with more accurate estimation than
7
LSSVM does. After using the latent variables of PLS, however, neither PLS-LSSVM
8
nor PLS-RVM shows any obvious improvement of estimation accuracy. Moreover,
9
the prediction results of PLS-RVM are even worse, giving the values of RMSE and R2
10
are 20.163 and 0.773, respectively. This could be because of the nonlinearity between
11
the IAQ data cannot be interpreted by linear latent variables of PLS efficiently, which
12
can be reinforced by analyzing the uneven distribution of variance reported in Table 2.
13
In terms of the KPLS-based methods, the KPLS-LSSVM shows a little improvement
14
in prediction results of the proposed models, and the values of RMSE and R2 for test
15
data are 19.854 and 0.798, respectively. Moreover, the KPLS-RVM exhibits the best
16
prediction ability with the minimum value of RMSE 18.770 and the maximum
17
value of R 2 0.820 , which illustrates that the latent variables of kernel PLS are
18
beneficial to enhancing the modeling accuracy. Compared with LSSVM, RVM,
19
PLS-LSSVM, PLS-RVM, and KPLS-LSSVM methods, the RMSE value of
20
KPLS-RVM has been reduced by 4.69%, 4.06%, 5.13%, 7.42%, and 5.46%, and the
Jo
urn a
lP
re-
pro of
2
20
Journal Pre-proof R2 value has been improved by 6.91%, 4.46%, 5.94%, 6.08%, and 2.76%, respectively.
2
To provide more information about the performance of KPLS-RVM model, PM2.5
3
prediction plots of training and test data are shown in Fig. 4, through which we can
4
find that the data samples are well fitted.
pro of
1
In addition to prediction accuracy, the execution time of each model has also been
6
provided in Table 5. In terms of the training set, running times of LSSVM and
7
PLS-LSSVM are 0.079 s and 0.082 s, which are significantly lower than the time of
8
0.618 s for KPLS-LSSVM. This is mainly because of the fact that kernel processing
9
step of input data is a little bit time consuming. Similarly, the prediction time of
10
KPLS-LSSVM (0.139 s) for all 162 testing samples is larger than those of LSSVM
11
(0.015 s) and PLS-LSSVM (0.013 s) models. In terms of RVM model, the optimal
12
hyper-parameters ( α o and
13
Therefore, the training time of RVM is pretty long, with 150 iterations for 59.219 s.
14
Meanwhile, it is worthy to note that both of PLS-RVM and KPLS-RVM models
15
provide relatively fewer iterations in Table 5, thereby saving the running time and
16
reaching 10.775 s and 22.195 s, respectively. This results from the fact that latent
17
variables have lower dimensionality (3 LVs of PLS and 5 LVs of KPLS), thus leading
18
to the higher convergence rate of iterative process for hyper-parameters. Although the
19
training times of RVM-related models are obviously longer than those of
20
LSSVM-related models, there are dramatic changes in terms of online prediction
21
efficiency. Compared to the results of LSSVM-related models, all of the prediction
22
times for RVM-related models are reduced slightly. This is mainly due to the sparse
re-
5
Jo
urn a
lP
o2 in Section 2.3) are obtained by suitable iterations.
21
Journal Pre-proof 1
structure of RVM, which simplifies the prediction steps to some extent.
2 3
Table 5. Comparison of different modeling methods for prediction of PM2.5 Training set
Test set
Methods RMSE R2
Time (s)
LSSVM
14.320 0.890
0.079
RVM
14.124 0.892
59.219 (150 iterations) 19.565 0.785 0.010
PLS-LSSVM
18.355 0.820
0.082
PLS-RVM
20.606 0.770
10.775 (20 iterations)
20.163 0.773 0.011
0.618
19.854 0.798 0.139
18.081 0.8181 22.195 (50 iterations)
re-
KPLS-RVM
19.694 0.767 0.015
19.786 0.774 0.013
18.770 0.820 0.105
5 6 7 8
Jo
urn a
lP
4
Time (s)
pro of
KPLS-LSSVM 16.059 0.854
RMSE R2
Fig. 4. Prediction results of PM2.5 using KPLS-RVM
3.2. Wastewater treatment process
22
Journal Pre-proof
To evaluate the generalization ability of the proposed methods, process data
2
collected from a paper mill wastewater treatment plant in Guangdong province were
3
also used for modeling. The detailed statements of the plant are listed in reference
4
[44]. As shown in Fig. 5, there are eight variables of data, including influent chemical
5
oxygen demand (COD), influent suspended solid (SS), flow rate (Q), pH, temperature,
6
dissolved oxygen (DO), effluent COD, and effluent SS. After data pretreatment shown
7
in plot (b) of Fig. 2, the total number of samples used for modeling are 162. The first
8
100 samples are used for model training, and the other 62 samples for testing. In this
9
section, the first six variables are used to construct the models and realize the prediction for effluent COD.
13 14
Jo
12
urn a
lP
11
re-
10
pro of
1
Fig. 5. Variations in the papermaking WWTP data
15
Similar to the case of IAQ, the LSSVM, RVM, PLS-LSSVM, PLS-RVM, and
16
KPLS-LSSVM methods are also used for comparison, and the tuning parameters of 23
Journal Pre-proof
different methods in the first cross validation are listed in Table 6. The average
2
prediction results of RMSE and R2 are shown in Table 7. In terms of the LSSVM
3
model, the RMSE value is 5.071, and the R2 value is 0.565, which still shows
4
insufficient modeling capacity in comparison with RVM ( RMSE 4.819 and
5
R 2 0.608 ). Simultaneously, it is worth noting that both of the prediction results of
6
conventional models (LSSVM and RVM) have been improved slightly by using the
7
latent variables of PLS, and this phenomenon is different with that in Table 5.
8
Therefore, it can be found that latent variables of linear PLS indeed have potentials to
9
improve the prediction performance. Based on the PLS-LSSVM and PLS-RVM
10
methods, KPLS-LSSVM and KPLS-RVM continue to enhance the prediction results
11
to a more desirable extend. Moreover, it is clear that the smallest RMSE value (4.443)
12
and the biggest R2 value (0.676) can be obtained by using KPLS-RVM method, and
13
its prediction plots for effluent COD are shown in Fig. 6. In comparison with the other
14
counterparts (LSSVM, RVM, PLS-LSSVM, PLS-RVM, and KPLS-LSSVM), the
15
RMSE value of KPLS-RVM decreases by 12.38%, 7.80%, 17.63%, 8.82%, and
16
4.12%, and the R2 value increases by 19.65%, 11.18%, 15.95%, 10.10%, and 7.30%,
17
respectively.
urn a
lP
re-
pro of
1
In terms of the model execution efficiency, the conclusions drawn from the Table 7
19
are similar to the preceding case. Not surprisingly, LSSVM-related models show the
20
desirable training times, with the values of 0.027 s for LSSVM, 0.031 s for
21
PLS-LSSVM, and 0.033 s for KPLS-LSSVM. It is notable that there is no significant
22
difference in all of the LSSVM-related models for training times. This is principally
Jo
18
24
Journal Pre-proof
because the number of training samples in this case is obviously less than the IAQ
2
data, thereby reducing the time consumption of kernel processing step. For the
3
RVM-related models, the training times still mainly depend on the iterations used for
4
obtaining the hyper-parameters. It can be observed that PLS-RVM model runs fastest
5
in the three RVM-related models, with 20 iterations for 0.154 s. Since the numbers of
6
training iterations for RVM and KPLS-RVM are 120 and 100, respectively, the
7
training times are increased accordingly, with values of 0.639 s and 0.541 s.
8
Compared with the KPLS-RVM, the number of PLS-RVM iterations is reduced by 80
9
with the reduction of LVs number reported in Table 6. Therefore, it is clear that the
10
number of LVs is the key to iterations number. This once again illustrates that the
11
suitable latent variables are beneficial for improving the training speed of RVM model.
12
For the test set, RVM-related models still show higher execution efficiency than the
13
LSSVM-related models due to the lower computational complexity provided by the
14
sparse behaviour of RVM model. Among the three RVM-related models, RVM
15
provides the minimum prediction time (0.9 ms), which is slightly lower than that of
16
PLS-RVM model (1.7 ms). Considering the presence of kernel step, the prediction
17
time is a little bit larger for the KPLS-RVM model (10.3 ms).
urn a
lP
re-
pro of
1
19
Jo
18
Table 6. Parameters of different modeling methods for papermaking WWTP data set Methods
Parameters for latent variables
Parameters for soft sensors
LSSVM
NaN
Kernel-‘Gaussian’;
25
Journal Pre-proof kernel width-6.85; regularization parameter-11.95 RVM
NaN
Kernel-‘Gaussian’; kernel width-12.0
Number of LVs-2
Kernel-‘Gaussian’;
pro of
PLS-LSSVM
kernel width-10.38;
regularization parameter-23.93
PLS-RVM
Number of LVs-2
Kernel-‘Gaussian’; kernel width-1.6
Number of LVs-3; kernel-‘Gaussian’; kernel width-14
kernel width-1.27; regularization parameter-32.81
Number of LVs-2.74;
lP
KPLS-RVM
Kernel-‘Gaussian’
re-
KPLS-LSSVM
kernel-‘Gaussian’;
Kernel-‘Gaussian’; kernel width-2.2
kernel width-2.55
Table 7. Comparison of different modeling methods for prediction of effluent COD Training set
Methods
RMSE R2
LSSVM
3.812
Jo
2
urn a
1
Time (s)
0.774 0.027
Test set RMSE R2 5.071
Time (ms)
0.565 6
RVM
3.622
0.795 0.639 (120 iterations) 4.819
0.608 0.9
PLS-LSSVM
3.986
0.730 0.031
5.394
0.583 8
PLS-RVM
4.206
0.723 0.154 (20 iterations)
4.873
0.614 1.7
0.724 0.033
4.634
0.630 17
KPLS-LSSVM 4.250
26
Journal Pre-proof KPLS-RVM
4.174
0.804 0.541 (100 iterations) 4.443
0.676 10.3
re-
pro of
1
2
Fig. 6. Prediction results of effluent COD using KPLS-RVM
4 5
3.3. Discussion
lP
3
According to these two case studies, it can be observed that KPLS, based on its
7
latent variables, brings the evident improvements in prediction results for
8
conventional models. It means that the kernel LVs can efficiently exploit the
9
information of nonlinear industrial data with lower dimensionality. In terms of linear
10
PLS, however, the latent variables cannot necessarily perform well on feature
11
extraction. Compared with the performance of KPLS-based models, PLS-based ones
12
could be more susceptible to the sample size and internal characteristics of data.
Jo
urn a
6
13
For the prediction model, RVM provides a sparse structure and probabilistic
14
prediction, which outperforms the point estimation of LSSVM. Therefore, it is 27
Journal Pre-proof
reasonable that the KPLS-RVM model, composed by the latent variables of KPLS and
2
RVM function, can be used to get the most accurate prediction results with high
3
execution efficiency for the test set. In terms of the training set, however, the
4
hyper-parameters of RVM need to be obtained by the iterative steps in advance. Thus
5
the training times for RVM-related models are apparently higher than those of
6
LSSVM-related models. According to the experimental results of the proposed cases,
7
the latent variables can be used to accelerate the convergence speed of the iterative
8
process with their lower data dimensions, and therefore alleviate the time-consuming
9
training process. Therefore, the superiority of KPLS-RVM model is not only placed
re-
10
pro of
1
on the better prediction accuracy, but also on the improvement of training efficiency.
12
4. Conclusions
lP
11
In this paper, an advanced model named KPLS-RVM has been introduced for
14
industrial process modeling. The basic idea of this model was to apply the kernel
15
latent variables to interpret the nonlinear features of industrial data with lower
16
dimensions and concise data structure. These features further facilitated the modeling
17
performance of the sparse model RVM. To evaluate the performance of the proposed
18
approach, two real industrial data sets have been used for modeling. The results of
19
study cases showed strong robustness of KPLS-RVM model with the highest
20
predictive accuracy for unknown output data and the desirable prediction time. For
21
the training time of RVM-related models, however, the time consumption is large due
22
to the iterations for achieving the suitable hyper-parameters. Although the latent
Jo
urn a
13
28
Journal Pre-proof
variables of LVMs can be used to improve the training efficiency, the training time is
2
still too long for the data with large volume. Therefore, faster algorithms should be
3
considered to fundamentally overcome the time-consuming drawback for the training
4
process of the RVM-related model in the future.
pro of
1
5 6
Appendix A. Explanations of main acronyms
7
Table A1. Explanations of the acronyms for different modeling methods Explanations
LSSVM
Least squares support vector machine
RVM
Relevance vector machine
PLS-LSSVM
Partial least squares-based least squares support vector machine
PLS-RVM
Partial least squares-based relevance vector machine
lP
re-
Acronyms
KPLS-LSSVM Kernel partial least squares-based least squares support vector
KPLS-RVM
9 10 11
12
13
Kernel partial least squares-based relevance vector machine
Appendix B. Prior and posterior parameter distribution of RVM algorithm In order to obtain the parameters of w and σ2 from Bayesian perspective, the
Jo
8
urn a
machine
zero-mean Gaussian prior distribution should be defined first: n i wi2 1 12 p (w α ) N ( wi 0, ) ) i exp( i 0 (2 )( n 1) 2 i 0 2 n
1 i
(B.1)
n
p() gamma(i a, b) i 0
(B.2)
29
Journal Pre-proof p( ) gamma( c, d )
1
= -2 , α is a vector of
2
where
3
takes the form
(B.3)
M +1 hyper-parameters, and the gamma function
gamma( a, b) (a)1 ba αa-1ebα
4
(B.4)
where (a) 0 t a 1e t dt . To make the priors proposed non-informative, the
6
parameters a, b, c, and d are set as: a b c d 104 .
7 8
pro of
5
With the prior probability of the parameters defined, the Bayesian rule can be used to obtain the posterior parameter distribution
p ( w y , α, )
p ( y w, 2 ) p( w α ) p ( y α, 2 )
re-
2
9
(B.5)
1 1 1 2 Σ exp{ ( w μ) T Σ 1 ( w μ)} ( n 1) 2 (2 ) 2 where the posterior mean and covariance are given as follows
11 12
lP
10
=( -2 T (x)ψ(x)+A)-1
(B.6)
= -2 Σψ T (x)y
(B.7)
13
with A=diag(α0 ,α1 ,
14
optimal point prediction of the hyper-parameters α o = o o ,
15
and the details of this procedure can also be found in Tipping [15].
urn a
,αn ) . The marginal likelihood can be used to obtain the , no and o2 ,
During the process of optimizing the hyper-parameters, many of the α i tend to
17
infinity. It means that many weights wi in equation (B.1) will be zero, which leads to
18
a pruned basis function with less complex. Then the equations (B.6) and (B.7) can be
19
updated using the optimal hyper-parameters as shown below
Jo
16
20
Σo ( o2ψ T (x)ψ (x) Ao )1
(B.8)
21
o = o2 Σoψ T (x)y
(B.9) 30
Journal Pre-proof 1
where Ao =diag(α o ) .
2
Acknowledgments
4
This study was supported by the Foundation of Nanjing Forestry University (No.
5
GXL029), a grant from the Subway Fine Dust Reduction Technology Development
6
Project of the Ministry of Land Infrastructure and Transport (19QPPW-B152306-01),
7
and the National Research Foundation of Korea (NRF) grant funded by the Ministry
8
of Science and ICT (No. NRF-2019H1D3A1A02071051).
pro of
3
re-
9 10
References
12
[1] P. Kadlec, B. Gabrys, S. Strandt, Data-driven soft sensors in the process industry,
13
Comput. Chem. Eng. 33 (4) (2009) 795-814.
14
[2] H. Liu, M. Huang, C. Yoo, A fuzzy neural network-based soft sensor for modeling
15
nutrient removal mechanism in a full-scale wastewater treatment system, Desalin.
16
Water Treat. 51 (31-33) (2013) 6184-6193.
17
[3] M. Kano, K. Fujiwara, Virtual sensing technology in process industries : Trends
18
and challenges revealed by recent industrial applications, J. Chem. Eng. Jpn. 46 (1)
19
(2013) 1-17.
20
[4] M. Kano, Y. Nakagawa, Data-based process monitoring, process control, and
21
quality improvement: Recent developments and applications in steel industry, Comput.
22
Chem. Eng. 32 (1–2) (2008) 12-24.
Jo
urn a
lP
11
31
Journal Pre-proof
[5] H. Kaneko, K. Funatsu, Moving window and just-in-time soft sensor model based
2
on time differences considering a small number of measurements, Ind. Eng. Chem.
3
Res. 54 (2) (2015) 700-704.
4
[6] J.C. Principe, N.R. Euliano, W.C. Lefebvre, Neural and Adaptive Systems:
5
Fundamentals Through Simulations, Wiley, New York, 2000.
6
[7] J.C.B. Gonzaga, L.A.C. Meleiro, C. Kiang, R.M. Filho, ANN-based soft-sensor
7
for real-time process monitoring and control of an industrial polymerization process,
8
Comput. Chem. Eng. 33 (1) (2009) 43-49.
9
[8] D.M. Himmelblau, Accounts of experiences in the application of artificial neural
re-
pro of
1
networks in chemical engineering, Ind. Eng. Chem. Res. 47 (16) (2008) 5782-5796.
11
[9] G. Mountrakis, J. Im, C. Ogole, Support vector machines in remote sensing: A
12
review, ISPRS J. Photogramm. 66 (3) (2011) 247-259.
13
[10] V.N. Vapnik, The Nature of Statistical Learning Theory, Springer, New York,
14
2000.
15
[11] C.T. Lin, C.S.G. Lee, Neural Fuzzy Systems: A Neuro-fuzzy Synergism to
16
Intelligent Systems, Prentice Hall PTR, Upper Saddle River NJ, 1996.
17
[12] M. Huang, Y. Ma, J. Wan, X. Chen, A sensor-software based on a genetic
18
algorithm-based neural fuzzy system for modeling and simulating a wastewater
19
treatment process, Appl. Soft Comput. 27 (2015) 1-10.
20
[13] H. Liu, C. Yoo, A robust localized soft sensor for particulate matter modeling in
21
Seoul metro systems, J. Hazard. Mater. 305 (2016) 209-218.
22
[14] Z. Ge, Z. Song, Nonlinear soft sensor development based on relevance vector
Jo
urn a
lP
10
32
Journal Pre-proof
machine, Ind. Eng. Chem. Res. 49 (18) (2010) 8685-8693.
2
[15] M.E. Tipping, A. Smola, Sparse Bayesian learning and the relevance vector
3
machine, J. Mach. Learn. Res. 1 (3) (2001) 211-244.
4
[16] Y. Liu, J. Chen, Z. Sun, Y. Li, D. Huang, A probabilistic self-validating
5
soft-sensor with application to wastewater treatment, Comput. Chem. Eng. 71 (2014)
6
263-280.
7
[17] C. Chen, Y. Wang, Y. Zhang, Y. Zhai, Indoor positioning algorithm based on
8
nonlinear PLS integrated with RVM, IEEE Sens. J. 18 (2) (2018) 660-668.
9
[18] P.K. Wong, Q. Xu, C.M. Vong, H.C. Wong, Rate-dependent hysteresis modeling
10
and control of a piezostage using online support vector machine and relevance vector
11
machine, IEEE T. Ind. Electron. 59 (4) (2011) 1988-2001.
12
[19] J. Ji, H. Wang, K. Chen, Y. Liu, N. Zhang, J. Yan, Recursive weighted kernel
13
regression for semi-supervised soft-sensing modeling of fed-batch processes, J.
14
Taiwan Inst. Chem. E. 43 (1) (2012) 67-76.
15
[20] X. Wang, B. Jiang, N. Lu, Adaptive relevant vector machine based RUL
16
prediction under uncertain conditions, ISA T. 87 (2019) 217-224.
17
[21] Y. Dong, S.J. Qin, Dynamic latent variable analytics for process operations and
18
control, Comput. Chem. Eng. 114 (2018) 69-80.
19
[22] S.J. Qin, Survey on data-driven industrial process monitoring and diagnosis,
20
Annu. Rev. Control. 36 (2) (2012) 220-234.
21
[23] Q. Zhu, Q. Liu, S.J. Qin, Concurrent quality and process monitoring with
22
canonical correlation analysis, J. Process Contr. 60 (2017) 95-103.
Jo
urn a
lP
re-
pro of
1
33
Journal Pre-proof
[24] L. Van Der Maaten, E. Postma, J. Van den Herik, Dimensionality reduction : A
2
comparative review, J. Mach. Learn. Res. 10 (2009) 66-71.
3
[25] Z. Ge, Z. Song, F. Gao, Review of recent research on data-based process
4
monitoring, Ind. Eng. Chem. Res. 52 (10) (2013) 3543-3562.
5
[26] R. Rosipal, L.J. Trejo, Kernel partial least squares regression in reproducing
6
kernel Hilbert space, J. Mach. Learn. Res. 2 (Dec) (2001) 97-123.
7
[27] J.M. Lee, C.K. Yoo, W.C. Sang, P.A. Vanrolleghem, I.B. Lee, Nonlinear process
8
monitoring using kernel principal component analysis, Chem. Eng. Sci. 59 (1) (2004)
9
223-234.
re-
pro of
1
[28] Y. Zhang, R. Sun, Y. Fan, Fault diagnosis of nonlinear process based on KCPLS
11
reconstruction, Chemom. Intell. Lab. Syst. 140 (2015) 49-60.
12
[29] R. Jia, Z. Mao, F. Wang, D. He, Self-tuning final product quality control of batch
13
processes using kernel latent variable model, Chem. Eng. Res. Des. 94 (2015)
14
119-130.
15
[30] N. Sheng, Q. Liu, S.J. Qin, T. Chai, Comprehensive monitoring of nonlinear
16
processes based on concurrent kernel projection to latent structures, IEEE T. Autom.
17
Sci. Eng. 13 (2) (2016) 1129-1137.
18
[31] Q. Liu, Q. Zhu, S.J. Qin, T. Chai, Dynamic concurrent kernel CCA for
19
strip-thickness relevant fault diagnosis of continuous annealing processes, J. Process
20
Contr. 67 (2018) 12-22.
21
[32] B.M. Nicolaï, K.I. Theron, J. Lammertyn, Kernel PLS regression on wavelet
22
transformed NIR spectra for prediction of sugar content of apple, Chemom. Intell.
Jo
urn a
lP
10
34
Journal Pre-proof
Lab. Syst. 85 (2) (2007) 243-252.
2
[33] M. Wang, G. Yan, Z. Fei, Kernel PLS based prediction model construction and
3
simulation on theoretical cases, Neurocomputing 165 (2015) 389-394.
4
[34] X. Huang, Y.-P. Luo, Q.-S. Xu, Y.-Z. Liang, Incorporating variable importance
5
into kernel PLS for modeling the structure–activity relationship, J. Math. Chem. 56 (3)
6
(2018) 713-727.
7
[35] Y. Lv, J. Liu, T. Yang, Nonlinear PLS integrated with error-based LSSVM and its
8
application to NOx modeling, Ind. Eng. Chem. Res. 51 (49) (2012) 16092-16100.
9
[36] S.J. Zhao, J. Zhang, Y.M. Xu, Z.H. Xiong, Nonlinear projection to latent
10
structures method and its applications, Ind. Eng. Chem. Res. 45 (11) (2006)
11
3843-3852.
12
[37] B. Xie, Y.-w. Ma, J.-q. Wan, Y. Wang, Z.-c. Yan, L. Liu, Z.-y. Guan, Modeling
13
and multi-objective optimization for ANAMMOX process under COD disturbance
14
using hybrid intelligent algorithm, Environ. Sci. Pollut. R. 25 (21) (2018)
15
20956-20967.
16
[38] P. Geladi, B.R. Kowalski, Partial least-squares regression: A tutorial, Anal. Chim.
17
Acta 185 (1986) 1-17.
18
[39] J. Shawe-Taylor, N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge
19
University Press, Cambridge, 2004.
20
[40] J.A.K. Suykens, T.V. Gestel, J.D. Brabanter, B.D. Moor, J. Vandewalle, Least
21
Squares Support Vector Machines, World Scientific, London, 2002.
22
[41] L. Fortuna, S. Graziani, A. Rizzo, M.G. Xibilia, Soft Sensors for Monitoring and
Jo
urn a
lP
re-
pro of
1
35
Journal Pre-proof
Control of Industrial Processes, Springer Science & Business Media, London, 2007.
2
[42] I.T. Jolliffe, Principal Component Analysis, Springer-Verlag, New York, 2002.
3
[43] H. Liu, C. Yang, M. Huang, D. Wang, C. Yoo, Modeling of subway indoor air
4
quality using Gaussian process regression, J. Hazard. Mater. 359 (2018) 266-273.
5
[44] J. Wan, M. Huang, Y. Ma, W. Guo, Y. Wang, H. Zhang, W. Li, X. Sun, Prediction
6
of effluent quality of a paper mill wastewater treatment using an adaptive
7
network-based fuzzy inference system, Appl. Soft Comput. 11 (3) (2011) 3238-3246.
pro of
1
Jo
urn a
lP
re-
8
36
Journal Pre-proof *Declaration of Interest Statement
Declaration of interests ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Jo
urn a
lP
re-
pro of
☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:
*Highlights (for review)
Journal Pre-proof
Research Highlights
1. Latent variables (LVs) of KPLS and relevance vector machine (RVM) are proposed
pro of
simultaneously. 2. LVs of KPLS are integrated with RVM for modeling the nonlinear data of industrial processes.
3. LVs are used to handle high dimensionality and complex collinearity of nonlinear data.
re-
4. RVM is developed to construct probabilistic and predictive function between LVs and y variables.
Jo
urn a
lP
5. KPLS-RVM shows the desirable predictive accuracy of the quality variable.
*Credit Author Statement
Journal Pre-proof
CRediT author statement
Jo
urn a
lP
re-
pro of
Hongbin Liu: Supervision, Conceptualization, Writing- Reviewing and Editing Chong Yang: Writing- Original draft preparation, Methodology, Software Mingzhi Huang: Data curation, Validation ChangKyoo Yoo: Data curation, Funding acquisition