Soft sensor modeling of industrial process data using kernel latent variables-based relevance vector machine

Soft sensor modeling of industrial process data using kernel latent variables-based relevance vector machine

Journal Pre-proof Soft sensor modeling of industrial process data using kernel latent variables-based relevance vector machine Hongbin Liu, Chong Yang...

2MB Sizes 0 Downloads 48 Views

Journal Pre-proof Soft sensor modeling of industrial process data using kernel latent variables-based relevance vector machine Hongbin Liu, Chong Yang, Mingzhi Huang, ChangKyoo Yoo

PII: DOI: Reference:

S1568-4946(20)30089-2 https://doi.org/10.1016/j.asoc.2020.106149 ASOC 106149

To appear in:

Applied Soft Computing Journal

Received date : 6 January 2019 Revised date : 8 January 2020 Accepted date : 27 January 2020 Please cite this article as: H. Liu, C. Yang, M. Huang et al., Soft sensor modeling of industrial process data using kernel latent variables-based relevance vector machine, Applied Soft Computing Journal (2020), doi: https://doi.org/10.1016/j.asoc.2020.106149. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2020 Elsevier B.V. All rights reserved.

*Manuscript Click here to view linked References

Journal Pre-proof

4

Hongbin Liu1,*, Chong Yang1,**, Mingzhi Huang2, ChangKyoo Yoo3

2

5

1

pro of

3

Soft Sensor Modeling of Industrial Process Data Using Kernel Latent Variables-Based Relevance Vector Machine

1

Co-Innovation Center of Efficient Processing and Utilization of Forest Resources, Nanjing Forestry University, Nanjing 210037, China

6 7

2

Environmental Research Institute, Key Laboratory of Theoretical Chemistry of

Environment Ministry of Education, South China Normal University, Guangzhou

9

510631, China 3

Department of Environmental Science and Engineering, College of Engineering, Kyung Hee University, Yongin 446701, Korea

11

lP

10

re-

8

Corresponding authors:

13 14

*H.L. Tel.: +86-25-85427620; Fax: +86-25-85428793; E-mail: [email protected] **C.Y. Tel.: +86-25-85427620; Fax: +86-25-85428793; E-mail: [email protected]

15

First author:

16 17

H.L. E-mail: [email protected]

19 20 21 22

Revised Version

Jo

18

urn a

12

Applied Soft Computing January 8, 2020

1

Journal Pre-proof Abstract: A composite model integrating latent variables of kernel partial least

2

squares with relevance vector machine (KPLS-RVM) has been proposed to improve

3

the prediction performance of conventional soft sensors when facing industrial

4

processes. First, the latent variables are extracted to cope with the high dimensionality

5

and complex collinearity of nonlinear process data by using KPLS projection. Then,

6

the probabilistic method RVM is used to develop predictive function between latent

7

variables and the output variable. The performance of the proposed method is

8

evaluated through two case studies based on subway indoor air quality (IAQ) data and

9

wastewater treatment processes (WWTP) data, respectively. The results show the

10

superiority of KPLS-RVM in prediction performance over the other counterparts

11

including least squares support vector machine (LSSVM), PLS-LSSVM, PLS-RVM,

12

and KPLS-LSSVM. For the prediction of effluent chemical oxygen demand in

13

WWTP data, the coefficient of determination value of KPLS-RVM has been

14

improved by approximately 7.30-19.65% in comparison with the other methods.

re-

lP

urn a

15

pro of

1

Keywords: Latent variable modeling; Kernel partial least squares; Relevance vector

17

machine; Indoor air quality; Wastewater treatment processes

18 19

Jo

16

2

Journal Pre-proof 1

1. Introduction Important variables in industrial processes should be accurately measured to

3

guarantee process quality, such as the effluent chemical oxygen demand (COD) and

4

total nitrogen (TN) in wastewater treatment processes [1, 2]. Nevertheless, it is always

5

difficult to measure these quality variables online. Although hardware soft sensors can

6

be used for online measurement, some problems still exist, including time-consuming

7

maintenance, need for calibration, aged deterioration, insufficient accuracy, and

8

measurement delay [3, 4]. In recent years, soft sensors can be used to alleviate the

9

shortcomings of their hardware counterparts by constructing the statistical models

10

between the difficult-to-measure variables and the easy-to-measure variables, through

11

which the important variables will be predicted online [5]. In practice, the soft sensors

12

are generally divided into two types: model-driven and data-driven. The model-driven

13

method is usually based on first principle models, which interpret the physical and

14

chemical traits of the process. However, it is always difficult to develop soft sensors

15

from the first principal models due to the lack of generalization ability. For the

16

data-driven model, the emphasis is placed on mining the hidden information of data.

17

Thus, the data-driven soft sensors could be more suitable for dealing with the complex

18

process features [1].

Jo

urn a

lP

re-

pro of

2

19

In industrial processes, nonlinearity is one of the most obvious factors which

20

hinder the modeling performance of soft sensors. To deal with nonlinear

21

characteristics of data efficiently, several machine learning-based methods have been

22

proposed [6-12]. Gonzaga et al. [7] developed a soft sensor based on artificial neural 3

Journal Pre-proof

network (ANN) to provide an online estimation of the polyethylene terephthalate

2

viscosity and realize a beneficial way for real-time process control. Based on the

3

combination of just-in-time and least squares support vector machine (LSSVM), Liu

4

and Yoo [13] developed an accurate soft sensor to predict particulate matters in a

5

subway station. Compared to ANN, LSSVM model provides better generalization

6

ability for nonlinear industrial process data [14]. In addition to the nonlinearity,

7

however, the uncertain characteristic provided by random noises is also very common

8

among process variables. In this case, the probabilistic interpretation is more suitable

9

for soft sensor modeling. Nevertheless, LSSVM cannot capture the uncertainty of the

10

variables, and the kernel function must satisfy the Mercer’s conditions. Different with

11

aforementioned models, relevance vector machine (RVM) is a probabilistic method

12

which has been proposed in 2001 [15]. While taking the similar structure of function

13

as support vector machine, RVM adopts a probabilistic framework without the

14

limitation of kernel function form. In particular, the posterior distributions of the

15

model weights governed by the hyper-parameters provide RVM model with sparse

16

structure. Detailed comparisons of advantages and disadvantages of LSSVM and

17

RVM are listed in Table 1. Based on RVM model, Ge and Song [14] made a

18

comparison with LSSVM through two industrial case studies, and then they proposed

19

that the probabilistic estimation of RVM is very important for the prediction of

20

process data which contain random noises. Liu et al. [16] combined Bayesian

21

principal component analysis with RVM to construct a probabilistic self-validating

22

soft sensor for wastewater treatment plants. Considering the modeling efficiency of

Jo

urn a

lP

re-

pro of

1

4

Journal Pre-proof

the RVM method, Chen et al. [17] built a nonlinear regression between the score

2

matrices of PLS for WLAN indoor localization, and the results showed that the

3

RVM-PLS algorithm performed higher positioning accuracy. Other applications of

4

the RVM soft sensor have been developed in works by Wong et al. [18], Ji et al. [19],

5

Wang et al. [20], etc.

pro of

1

In practice, however, the industrial data sets usually have high dimensional and

7

collinear features derived from the complexity of process control and the underlying

8

physical or chemical principles, which will always result in an unreliable prediction

9

for the soft sensor [21]. To handle the high dimensionality of process data, latent

10

variable methods (LVMs) are proposed [22]. The main idea of LVMs is to transform

11

the high-dimensional data into desirable components, usually named as latent

12

variables (LVs), for interpreting the properties behind the original data with reduced

13

dimensionality. Moreover, due to the uncorrelated feature of LVs, the collinearity of

14

process data can also be removed. Generally, LVMs are developed by linear statistical

15

methods, such as principal component analysis (PCA), partial least squares (PLS), and

16

canonical correlation analysis (CCA) [22-24]. Based on the LVs, LVMs has been

17

mainly used in fault detection and diagnosis. However, the traditional process

18

monitoring is based on the assumption that the process variables have linear

19

correlation with each other. Actually, due to the various operating conditions, some of

20

the industrial processes usually contain the variables with more complicated features

21

[25]. Therefore, the traditional LVMs may not perform well when used for industrial

22

process data. For the nonlinear extensions of LVMs, kernel-based algorithms are

Jo

urn a

lP

re-

6

5

Journal Pre-proof

widely accepted due to the efficient interpretation for nonlinear features [26]. After

2

using kernel functions, the nonlinear mapping and complex inner product calculations

3

can be implemented in a simpler way. Thus, the nonlinearity and collinearity of

4

industrial process data can be overcome by kernel-based LVMs simultaneously. Table

5

1 provides the detailed illustrations of advantages and disadvantages of LVMs and

6

kernel-based LVMs. Successful applications have been found in different areas

7

including fault detection, reconstruction, and diagnosis [27-31]. For the prediction

8

model construction, there are also quite a few number of studies. Nicolaï [32] has

9

applied kernel partial least squares (KPLS) to the estimation of sugar content in apple

10

based on near infrared reflectance measurements. Wang [33] evaluated the

11

effectiveness of the KPLS-based prediction by using two nonlinear simulation

12

examples. To improve the prediction performance for modeling structure-activity

13

relationship in pharmaceutical industry, Huang [34] proposed a modified method

14

which took the variable importance of KPLS into account.

urn a

lP

re-

pro of

1

Although the powerful nonlinear models and latent variable methods have been

16

introduced separately in this work, they can actually be used at the same time. For

17

example, the nonlinear methods such as LSSVM and ANN can be used to build the

18

inner structures of linear PLS for addressing the nonlinear problem [35, 36].

19

Moreover, the LVs of latent variable models can also be utilized as the input of

20

nonlinear methods. As mentioned above, by using the LVMs, the high dimensionality

21

and colinearity of the process data can be solved simultaneously. Therefore, the

22

prediction tasks can be carried out more easily and efficiently [37].

Jo

15

6

Journal Pre-proof

Based on the research above, the modeling performance can be further improved

2

with the combinations of LVMs and nonlinear methods. In this work, a new kernel

3

LVs-based soft sensor, denoted as KPLS-RVM, has been proposed to interpret the

4

high dimensionality, nonlinear relations, and uncertain feature of industrial process

5

data concurrently. First, KPLS is proposed to handle the nonlinear characteristic and

6

collinear structure of industrial data by using the lower dimensions latent variables.

7

Then the kernel LVs instead of the original inputs are used for the probabilistic

8

estimation of RVM model. Therefore, the LVs of KPLS model are integrated to

9

obtain the KPLS-RVM model, through which the more-accurate prediction results can

10

be achieved. To make a fair comparison, the corresponding counterparts including

11

LSSVM, RVM, PLS-LSSVM, PLS-RVM, and KPLS-LSSVM are also conducted for

12

industrial modeling. The detailed explanations for the proposed methods (represented

13

by acronyms) are provided in Appendix A.

lP

re-

pro of

1

This paper is organized in the following manner. Section 2 gives the construction

15

process of the composite KPLS-RVM algorithm. Section 3 provides two industrial

16

cases implemented by ourselves with the software MATLAB R2010b on a desktop

17

computer with 3.3 GHz CPU and 4 GB of memory. Finally, conclusions are presented

18

in Section 4.

Method LVMs

Jo

19

urn a

14

Table 1. Comparisons of four basic elements of composite soft sensors Advantages

Disadvantages

(1) Easy to implement

(1) Can be unreliable when facing

(2) Be able to reduce dimensionality and

nonlinear data sets 7

Journal Pre-proof

eliminate collinearity of industrial (2) The number of LVs should be data Kernel-based

(1) Be able to reduce dimensionality and (1) Suitable eliminate collinearity of industrial data (2) By

kernel

parameter

tuning is always needed

pro of

LVMs

determined in advance

(2) The number of LVs should be

using

kernel

functions,

nonlinearity can be interpreted

the

determined in advance

(3) Kernel selection is necessary for various of nonlinearity

(1) Provides generalization performance (2) Simple calculation

(1) Kernel function must satisfy

re-

LSSVM

Mercer’s conditions

RVM

urn a

lP

(2) The kernel parameters should be tuned (3) Cannot provide the uncertainty

(1) Has no limitation of kernel function (1) form

of predictions The hyper-parameters should be tuned

(2) Provides sparse structure

1

Jo

(3) Probabilistic prediction

2

2. Materials and methods

3

2.1. Partial least squares (PLS)

8

Journal Pre-proof 1

The target of partial least squares is to search for some suitable latent variables that

2

can exploit variance structures of process variables while trying to interpret quality

3

variables [23].

4

Assuming that all of the input matrix X  [x1 ,

, xn ]  R nm and output matrix

Y  [ y1 ,

6

number of measurements, m and p represent the number of process variables quality

7

variables, respectively. The original input and output data are usually scaled to

8

zero-mean and unit-variance before further decomposition. The PLS then projects the

9

scaled data X and Y to a low dimensional space defined by the following score

re-

10

, y n ]  R n p used in the full text share the same structures, where n is the

pro of

5

vectors, respectively

X = TPT  E  T Y = UQ  F

(1)

lP

11

where T and U are latent matrices of X and Y, P and Q are loading matrices, E and F

13

represent residual matrices. Traditionally, all important latent variables can be

14

extracted one after another using deflations and iterations methods named the

15

nonlinear iterative partial least squares (NIPALS). The details of the NIPALS

16

algorithm can be found in [38].

18

2.2. Kernel Partial least squares (KPLS)

Jo

17

urn a

12

19

The construction of KPLS can be divided into two steps: first the input variables are

20

transformed into a high-dimensional feature space by nonlinear mapping, and then the

21

linear PLS method can be modelled between the high-dimensional variables and

22

output variables. According to Cover’s theorem, the nonlinear features of input data 9

Journal Pre-proof 1

could be changed into linear ones after the proposed feature mapping [39]. Therefore,

2

the nonlinear information can be extracted efficiently by linear composition of KPLS. The nonlinear mapping used here can be expressed as i   (xi ) [34]. Then in the

4

feature space, the inner product of two mapped vectors i and  j can be performed

5

explicitly by using the kernel function of the original vectors x i and x j

6

( i  1, 2,

pro of

3

, n , j  1, 2, , n )

i T j  K (xi , x j )

7 8

(2)

The kernel function K ( ) used in this paper is Gaussian function: 2

K (xi , x j )  exp(( xi  x j ) / cG2 )

(3)

re-

9 10

where cG is the kernel width need be tuned before. The sample data in feature space

11

can be set as

13

n

(4)

decomposition can be written as

urn a

15

,  (xn ))T

T T Then K   ( X)  ( X)   denotes the kernel Gram matrix, and the KPLS i i i 1

14

lP

 ( X)  ( (x1 ),  (x2 ),

12

 = TPT  E  T Y = UQ  F

(5)

16

where P and Q are loading matrices for  and Y, respectively, E and F represent

17

the residual matrices, T = [t1 , t 2 ,

18

matrices (l is the number of latent variables) which derived from the NIPALS

19

algorithm. Then the deflations of K and Y are given by

U = [u1 ,u 2 ,

,ul ] are the latent

Jo

, t l ] and

20

K  (I  tt T K(I  tt T 

(6)

21

Y  Y  tt T Y

(7) 10

Journal Pre-proof 1

where I is an identify matrix with n-dimensional and t represents a latent variable of

2

matrix K in each loop. For the training data in feature space, the kernel latent matrix

3

T takes the form [26]

T  KU(TTKU)-1

5

(8)

pro of

4

For the test data, the latent matrix is expressed as

Tt  K t U(TT KU)-1

6

(9)

where K t  K (xi , x j ) , x i is the ith test variable, x j is the jth training variable. It is

8

worth mentioning that before decomposition of high-dimensional space, centralization

9

should be carried out [33].

re-

7

10

13 14

lP

12

2.3. Relevance vector machine

To compare with RVM, LSSVM is an appropriate choice for modeling purpose, and the detailed description of LSSVM is provided in reference [40]. Based on the framework of SVM, RVM was developed through the Bayesian

urn a

11

x , y 

15

probabilistic method by Tipping [14, 15]. Let

16

pairs, the following probabilistic function can be used to express the regression

17

between the input data and output data

y  f (x, w )  e

i

i 1,2, n

denote the input-target

(10)

Jo

18

i

19

where e comes from the noise process, which is subject to the mean-zero Gaussian

20

distribution with variance σ2. Similar to the LSSVM, the nonlinear function f(x) can

21

also be described as a linearly-parameterized model with the weighted sum of the

22

kernel functions 11

Journal Pre-proof n

n

i 1

i 0

f (x, w )   wi K (x, xi )  w0   wi (x)

1

w   w0 , w1 , w2 ,

, wn 

and  ( x)

2

where

3

 (x)= 1,K (x, x1 ), K (x, x 2 ), , K ( x, x n )  .

4

T

,

are

(11)

kernel

functions

with

T

According to the description above, the condition distribution of the output data can

p( y x)  N ( f (x, w),  2 ) . Considering the assumption of

be described as

6

independence of y, the likelihood of the data set can be developed as p(y w,  2 ) 

7

y  ( y1 , y2 , , yn )T

1 (2 )

2 n 2

pro of

5

exp{

1

2

y  w } 2

2

 =  1 (x), 2 (x),

, n ( x) 

(12)

8

where

9

maximum-likelihood estimation could be used to get suitable parameters of w and σ2

10

in equation (12). However, over fitting problem could be inevitable due to the same

11

number of parameters and training data. Thus a Bayesian perspective is adopted by

12

RVM to constrain the parameters w and σ2 submitting a prior probability distribution,

13

and the details of which can be found in Appendix B, from which the posterior

14

parameter distribution is also provided by using the Bayesian rule [15].

T

.

The

lP

re-

and

urn a

15

,

Assuming that the optimal hyper-parameters α o and

 o2 have been obtained

16

from Appendix B, then for the new sample data (xnew), the estimation using RVM can

17

be calculated as

19 20 21

p( yˆnew xnew , y, αo ,  o2 )   p( yˆnew xnew , w,  o2 ) p(w xnew , y, αo ,  o2 ) dw

Jo

18

(13)

which is also Gaussian distributed, with

 y ,new  oT (x new )

(14)

 y2,new   o2  T (x new ) Σo (x new )

(15)

12

Journal Pre-proof 1

where

 (xnew )  [1, K (xnew , x1 ), K (xnew , x2 ), , K (xnew , xn )]T

,

and

Σo  ( o2ψ T (x)ψ (x)  diag(αo ))1 .

2 3

2.4. Implementation procedure of soft sensors based on latent variables

pro of

4

In this work, the method of Jolliffe’s three parameters was adopted as a detection

6

criteria for data pretreatment, and the details of this method can be found in references

7

[13, 41]. The root-mean-square error (RMSE) and coefficient of determination (R2)

8

were employed to compare the modeling performance of the proposed models,

9

precisely.

re-

5

10

n

RMSE=

 ( yˆi  yi )

 ( yˆ i  yi )

lP

R = 1

11

(16)

n

n

2

2

i 1

i=1 n

 ( yi  y )

2

2

(17)

i=1

In equations (16) and (17), yi and yˆ i are the measured and predicted values,

12

respectively, y is the mean value and n represents the number of samples.

urn a

13 14

The overall procedure of the aforementioned prediction model is shown in Fig. 1. And the main steps are described as follows:

16

Step 1. Detect outliers using Jolliffe’s three parameters [42].

17

Step 2. Scale X and y to be zero mean and unit variance.

18

Step 3. Calculate latent variables of KPLS model.

Jo

15

19

Step 3.1. Compute the kernel matrix of X by K   (X)T  (X) .

20

Step 3.2. Calculate the latent matrix for the training data set by T  KU(TT KU)-1

21

. 13

Journal Pre-proof 1 2

T -1 Step 3.3. Calculate the latent matrix for the test data set by Tt  K t U(T KU) .

Step 4. Obtain a training model using RVM.

 o2 using T and y.

Step 4.1. Optimize hyper-parameters α o and

4

Step 4.2. Calculate posterior mean  .

5

T Step 4.3. Predict yˆ new using  y ,i  o  (t t ,i ) , where t t ,i is a row vector of Tt

6

pro of

3

and i is the sample number.

7

Step 5. Rescale yˆ new according to the mean and variance of y.

8

Step 6. Measure the prediction accuracy using RMSE and R2.

11 12 13

Jo

10

urn a

lP

re-

9

Fig. 1. Flow chart of KPLS-RVM soft sensor modeling.

3. Results and discussion

14

Journal Pre-proof

Given the fact that the indoor air quality (IAQ) plays an important role for public

2

health, and the emissions of hazardous pollutants from wastewater treatment

3

processes (WWTPs) present the significant challenges for environmental protection, it

4

is necessary to construct the reliable and accurate prediction models for process

5

control and monitoring. In this section, we have used the proposed soft sensor to

6

model two types of quality variables from IAQ data in a subway station [43] and in a

7

papermaking WWTP data [44]. The preprocessing step based on Jolliffe’s three

8

parameters was implemented before further analysis. As shown in plots (a) and (b) of

9

Fig. 2, for the IAQ data and the papermaking WWTP data, the points beyond the

re-

10

pro of

1

control limits should be excluded from the sample, respectively.

(a)

Jo

urn a

lP

11

15

(b)

pro of

Journal Pre-proof

Fig. 2. Outlier detection using Jolliffe’s three parameters in terms of (a) IAQ data

2

and (b) papermaking WWTP data (dotted lines represent the control limits).

re-

1

3

3.1. Indoor air quality in a subway station

lP

4

The IAQ data for testing the modeling performance were collected from a central

6

subway station in Seoul in January 2010. As shown in Fig. 3, there are eight IAQ

7

variables, containing nitrogen monoxide (NO), nitrogen oxides (NO2), carbon

8

monoxide (CO), carbon dioxide (CO2), particulate matter with diameters of less than

9

10 μm (PM10) and 2.5 μm (PM2.5), temperature, and humidity. Originally 744 data

10

points with a sampling interval of 1 hour were extracted from the subway IAQ

11

measuring system. Among the 744 original data points, eight samples are removed

12

because at least one variable had a missing value. After removing 41 outliers shown in

13

Fig. 2 (a), the total number of samples used for process modeling are 695, and the

14

numbers of training and test data are set as 535 and 160, respectively.

Jo

urn a

5

15

16

re-

pro of

Journal Pre-proof

1

lP

Fig. 3. Variations in the IAQ data

2

To illustrate the performance of KPLS-RVM, PM2.5 of IAQ data is used for

4

prediction. Five counterparts, including LSSVM, RVM, PLS-LSSVM, PLS-RVM,

5

and KPLS-LSSVM, are considered for comparison. Moreover, five times Monte

6

Carlo cross validation is used to evaluate the prediction accuracy of each model. In

7

each cross validation of the latent variable-based model, the optimum number of LVs

8

should be determined in advance. According to the variance of each LV listed in Table

9

2, it is clear that there is no obvious increase in the cumulative variance of output

10

variable after the third LV is reached. Therefore, we choose three LVs for the PLS

11

modeling. In the same way, it is appropriate to select five LVs for KPLS modeling

12

based on Table 3. Combined with the latent variables chosen before, the cumulative

Jo

urn a

3

17

Journal Pre-proof

variance values of PLS (78.75% for input data and 77.52% for output data) seem to be

2

similar with those of KPLS (80.23% for input data and 78.36% for output data).

3

However, there is a significant difference between the input data variance (16.07%)

4

and output data variance (3.68%) of the third LV in PLS, which could, to some degree,

5

weaken its capacity of data interpretation. The number of LVs, all of the other

6

parameters for the first cross validation are provided in Table 4.

7 8

pro of

1

Table 2. Percent variance captured by PLS model Input variables (X)

re-

LV

Output variable (y)

Cumulative variance (%)

Variance (%)

Cumulative variance (%)

1

39.14

39.14

58.42

58.42

2

23.54

62.68

15.42

73.84

3

16.07

78.75

3.68

77.52

4

7.94

86.69

0.56

78.09

5

6.99

93.68

0.25

78.34

6

3.37

97.05

0.03

78.37

7

2.95

100

0

78.37

10

urn a

9

lP

Variance (%)

Jo

Table 3. Percent variance captured by KPLS model Input variables (X)

LV

1

Output variable (y)

Variance (%)

Cumulative variance (%)

Variance (%)

Cumulative variance (%)

37.89

37.89

48.01

48.01

18

Journal Pre-proof

21.30

59.19

17.39

65.40

3

9.98

69.17

6.73

72.13

4

8.14

77.31

2.28

74.41

5

2.92

80.23

3.95

78.36

6

7.28

87.50

7

1.15

88.66

1 2

pro of

2

1.03

79.39

1.99

81.38

Table 4. Parameters of different modeling methods for IAQ data set Parameters for latent variables

Parameters for soft sensors

LSSVM

NaN

Kernel-‘Gaussian’;

re-

Methods

NaN

urn a

RVM

PLS-LSSVM

Number of LVs-3

Number of LVs-3

Jo

PLS-RVM

lP

kernel width-3.23;

KPLS-LSSVM

KPLS-RVM

regularization parameter-9.35 Kernel-‘Gaussian’; kernel width-1.60 Kernel-‘Gaussian’; kernel width-1.66; regularization parameter-4.40 Kernel-‘Gaussian’; kernel width-8.95

Number of LVs-5;

Kernel-‘Gaussian’

kernel-‘Gaussian’;

kernel width-1.55;

kernel width-1.87

regularization parameter-4.33

Number of LVs-5;

Kernel-‘Gaussian’; 19

Journal Pre-proof kernel-‘Gaussian’;

kernel width-2.30

kernel width-1.87 1

For the detailed performance of different models, the average prediction indices

3

derived from cross validations are shown in Table 5. In terms of the non-LVs models,

4

RVM shows slightly better prediction performance than LSSVM, with a RMSE of

5

19.565 and a R2 of 0.785. The reason could be the uncertainty of the process data

6

captured by RVM, which provides this model with more accurate estimation than

7

LSSVM does. After using the latent variables of PLS, however, neither PLS-LSSVM

8

nor PLS-RVM shows any obvious improvement of estimation accuracy. Moreover,

9

the prediction results of PLS-RVM are even worse, giving the values of RMSE and R2

10

are 20.163 and 0.773, respectively. This could be because of the nonlinearity between

11

the IAQ data cannot be interpreted by linear latent variables of PLS efficiently, which

12

can be reinforced by analyzing the uneven distribution of variance reported in Table 2.

13

In terms of the KPLS-based methods, the KPLS-LSSVM shows a little improvement

14

in prediction results of the proposed models, and the values of RMSE and R2 for test

15

data are 19.854 and 0.798, respectively. Moreover, the KPLS-RVM exhibits the best

16

prediction ability with the minimum value of RMSE  18.770 and the maximum

17

value of R 2  0.820 , which illustrates that the latent variables of kernel PLS are

18

beneficial to enhancing the modeling accuracy. Compared with LSSVM, RVM,

19

PLS-LSSVM, PLS-RVM, and KPLS-LSSVM methods, the RMSE value of

20

KPLS-RVM has been reduced by 4.69%, 4.06%, 5.13%, 7.42%, and 5.46%, and the

Jo

urn a

lP

re-

pro of

2

20

Journal Pre-proof R2 value has been improved by 6.91%, 4.46%, 5.94%, 6.08%, and 2.76%, respectively.

2

To provide more information about the performance of KPLS-RVM model, PM2.5

3

prediction plots of training and test data are shown in Fig. 4, through which we can

4

find that the data samples are well fitted.

pro of

1

In addition to prediction accuracy, the execution time of each model has also been

6

provided in Table 5. In terms of the training set, running times of LSSVM and

7

PLS-LSSVM are 0.079 s and 0.082 s, which are significantly lower than the time of

8

0.618 s for KPLS-LSSVM. This is mainly because of the fact that kernel processing

9

step of input data is a little bit time consuming. Similarly, the prediction time of

10

KPLS-LSSVM (0.139 s) for all 162 testing samples is larger than those of LSSVM

11

(0.015 s) and PLS-LSSVM (0.013 s) models. In terms of RVM model, the optimal

12

hyper-parameters ( α o and

13

Therefore, the training time of RVM is pretty long, with 150 iterations for 59.219 s.

14

Meanwhile, it is worthy to note that both of PLS-RVM and KPLS-RVM models

15

provide relatively fewer iterations in Table 5, thereby saving the running time and

16

reaching 10.775 s and 22.195 s, respectively. This results from the fact that latent

17

variables have lower dimensionality (3 LVs of PLS and 5 LVs of KPLS), thus leading

18

to the higher convergence rate of iterative process for hyper-parameters. Although the

19

training times of RVM-related models are obviously longer than those of

20

LSSVM-related models, there are dramatic changes in terms of online prediction

21

efficiency. Compared to the results of LSSVM-related models, all of the prediction

22

times for RVM-related models are reduced slightly. This is mainly due to the sparse

re-

5

Jo

urn a

lP

 o2 in Section 2.3) are obtained by suitable iterations.

21

Journal Pre-proof 1

structure of RVM, which simplifies the prediction steps to some extent.

2 3

Table 5. Comparison of different modeling methods for prediction of PM2.5 Training set

Test set

Methods RMSE R2

Time (s)

LSSVM

14.320 0.890

0.079

RVM

14.124 0.892

59.219 (150 iterations) 19.565 0.785 0.010

PLS-LSSVM

18.355 0.820

0.082

PLS-RVM

20.606 0.770

10.775 (20 iterations)

20.163 0.773 0.011

0.618

19.854 0.798 0.139

18.081 0.8181 22.195 (50 iterations)

re-

KPLS-RVM

19.694 0.767 0.015

19.786 0.774 0.013

18.770 0.820 0.105

5 6 7 8

Jo

urn a

lP

4

Time (s)

pro of

KPLS-LSSVM 16.059 0.854

RMSE R2

Fig. 4. Prediction results of PM2.5 using KPLS-RVM

3.2. Wastewater treatment process

22

Journal Pre-proof

To evaluate the generalization ability of the proposed methods, process data

2

collected from a paper mill wastewater treatment plant in Guangdong province were

3

also used for modeling. The detailed statements of the plant are listed in reference

4

[44]. As shown in Fig. 5, there are eight variables of data, including influent chemical

5

oxygen demand (COD), influent suspended solid (SS), flow rate (Q), pH, temperature,

6

dissolved oxygen (DO), effluent COD, and effluent SS. After data pretreatment shown

7

in plot (b) of Fig. 2, the total number of samples used for modeling are 162. The first

8

100 samples are used for model training, and the other 62 samples for testing. In this

9

section, the first six variables are used to construct the models and realize the prediction for effluent COD.

13 14

Jo

12

urn a

lP

11

re-

10

pro of

1

Fig. 5. Variations in the papermaking WWTP data

15

Similar to the case of IAQ, the LSSVM, RVM, PLS-LSSVM, PLS-RVM, and

16

KPLS-LSSVM methods are also used for comparison, and the tuning parameters of 23

Journal Pre-proof

different methods in the first cross validation are listed in Table 6. The average

2

prediction results of RMSE and R2 are shown in Table 7. In terms of the LSSVM

3

model, the RMSE value is 5.071, and the R2 value is 0.565, which still shows

4

insufficient modeling capacity in comparison with RVM ( RMSE  4.819 and

5

R 2  0.608 ). Simultaneously, it is worth noting that both of the prediction results of

6

conventional models (LSSVM and RVM) have been improved slightly by using the

7

latent variables of PLS, and this phenomenon is different with that in Table 5.

8

Therefore, it can be found that latent variables of linear PLS indeed have potentials to

9

improve the prediction performance. Based on the PLS-LSSVM and PLS-RVM

10

methods, KPLS-LSSVM and KPLS-RVM continue to enhance the prediction results

11

to a more desirable extend. Moreover, it is clear that the smallest RMSE value (4.443)

12

and the biggest R2 value (0.676) can be obtained by using KPLS-RVM method, and

13

its prediction plots for effluent COD are shown in Fig. 6. In comparison with the other

14

counterparts (LSSVM, RVM, PLS-LSSVM, PLS-RVM, and KPLS-LSSVM), the

15

RMSE value of KPLS-RVM decreases by 12.38%, 7.80%, 17.63%, 8.82%, and

16

4.12%, and the R2 value increases by 19.65%, 11.18%, 15.95%, 10.10%, and 7.30%,

17

respectively.

urn a

lP

re-

pro of

1

In terms of the model execution efficiency, the conclusions drawn from the Table 7

19

are similar to the preceding case. Not surprisingly, LSSVM-related models show the

20

desirable training times, with the values of 0.027 s for LSSVM, 0.031 s for

21

PLS-LSSVM, and 0.033 s for KPLS-LSSVM. It is notable that there is no significant

22

difference in all of the LSSVM-related models for training times. This is principally

Jo

18

24

Journal Pre-proof

because the number of training samples in this case is obviously less than the IAQ

2

data, thereby reducing the time consumption of kernel processing step. For the

3

RVM-related models, the training times still mainly depend on the iterations used for

4

obtaining the hyper-parameters. It can be observed that PLS-RVM model runs fastest

5

in the three RVM-related models, with 20 iterations for 0.154 s. Since the numbers of

6

training iterations for RVM and KPLS-RVM are 120 and 100, respectively, the

7

training times are increased accordingly, with values of 0.639 s and 0.541 s.

8

Compared with the KPLS-RVM, the number of PLS-RVM iterations is reduced by 80

9

with the reduction of LVs number reported in Table 6. Therefore, it is clear that the

10

number of LVs is the key to iterations number. This once again illustrates that the

11

suitable latent variables are beneficial for improving the training speed of RVM model.

12

For the test set, RVM-related models still show higher execution efficiency than the

13

LSSVM-related models due to the lower computational complexity provided by the

14

sparse behaviour of RVM model. Among the three RVM-related models, RVM

15

provides the minimum prediction time (0.9 ms), which is slightly lower than that of

16

PLS-RVM model (1.7 ms). Considering the presence of kernel step, the prediction

17

time is a little bit larger for the KPLS-RVM model (10.3 ms).

urn a

lP

re-

pro of

1

19

Jo

18

Table 6. Parameters of different modeling methods for papermaking WWTP data set Methods

Parameters for latent variables

Parameters for soft sensors

LSSVM

NaN

Kernel-‘Gaussian’;

25

Journal Pre-proof kernel width-6.85; regularization parameter-11.95 RVM

NaN

Kernel-‘Gaussian’; kernel width-12.0

Number of LVs-2

Kernel-‘Gaussian’;

pro of

PLS-LSSVM

kernel width-10.38;

regularization parameter-23.93

PLS-RVM

Number of LVs-2

Kernel-‘Gaussian’; kernel width-1.6

Number of LVs-3; kernel-‘Gaussian’; kernel width-14

kernel width-1.27; regularization parameter-32.81

Number of LVs-2.74;

lP

KPLS-RVM

Kernel-‘Gaussian’

re-

KPLS-LSSVM

kernel-‘Gaussian’;

Kernel-‘Gaussian’; kernel width-2.2

kernel width-2.55

Table 7. Comparison of different modeling methods for prediction of effluent COD Training set

Methods

RMSE R2

LSSVM

3.812

Jo

2

urn a

1

Time (s)

0.774 0.027

Test set RMSE R2 5.071

Time (ms)

0.565 6

RVM

3.622

0.795 0.639 (120 iterations) 4.819

0.608 0.9

PLS-LSSVM

3.986

0.730 0.031

5.394

0.583 8

PLS-RVM

4.206

0.723 0.154 (20 iterations)

4.873

0.614 1.7

0.724 0.033

4.634

0.630 17

KPLS-LSSVM 4.250

26

Journal Pre-proof KPLS-RVM

4.174

0.804 0.541 (100 iterations) 4.443

0.676 10.3

re-

pro of

1

2

Fig. 6. Prediction results of effluent COD using KPLS-RVM

4 5

3.3. Discussion

lP

3

According to these two case studies, it can be observed that KPLS, based on its

7

latent variables, brings the evident improvements in prediction results for

8

conventional models. It means that the kernel LVs can efficiently exploit the

9

information of nonlinear industrial data with lower dimensionality. In terms of linear

10

PLS, however, the latent variables cannot necessarily perform well on feature

11

extraction. Compared with the performance of KPLS-based models, PLS-based ones

12

could be more susceptible to the sample size and internal characteristics of data.

Jo

urn a

6

13

For the prediction model, RVM provides a sparse structure and probabilistic

14

prediction, which outperforms the point estimation of LSSVM. Therefore, it is 27

Journal Pre-proof

reasonable that the KPLS-RVM model, composed by the latent variables of KPLS and

2

RVM function, can be used to get the most accurate prediction results with high

3

execution efficiency for the test set. In terms of the training set, however, the

4

hyper-parameters of RVM need to be obtained by the iterative steps in advance. Thus

5

the training times for RVM-related models are apparently higher than those of

6

LSSVM-related models. According to the experimental results of the proposed cases,

7

the latent variables can be used to accelerate the convergence speed of the iterative

8

process with their lower data dimensions, and therefore alleviate the time-consuming

9

training process. Therefore, the superiority of KPLS-RVM model is not only placed

re-

10

pro of

1

on the better prediction accuracy, but also on the improvement of training efficiency.

12

4. Conclusions

lP

11

In this paper, an advanced model named KPLS-RVM has been introduced for

14

industrial process modeling. The basic idea of this model was to apply the kernel

15

latent variables to interpret the nonlinear features of industrial data with lower

16

dimensions and concise data structure. These features further facilitated the modeling

17

performance of the sparse model RVM. To evaluate the performance of the proposed

18

approach, two real industrial data sets have been used for modeling. The results of

19

study cases showed strong robustness of KPLS-RVM model with the highest

20

predictive accuracy for unknown output data and the desirable prediction time. For

21

the training time of RVM-related models, however, the time consumption is large due

22

to the iterations for achieving the suitable hyper-parameters. Although the latent

Jo

urn a

13

28

Journal Pre-proof

variables of LVMs can be used to improve the training efficiency, the training time is

2

still too long for the data with large volume. Therefore, faster algorithms should be

3

considered to fundamentally overcome the time-consuming drawback for the training

4

process of the RVM-related model in the future.

pro of

1

5 6

Appendix A. Explanations of main acronyms

7

Table A1. Explanations of the acronyms for different modeling methods Explanations

LSSVM

Least squares support vector machine

RVM

Relevance vector machine

PLS-LSSVM

Partial least squares-based least squares support vector machine

PLS-RVM

Partial least squares-based relevance vector machine

lP

re-

Acronyms

KPLS-LSSVM Kernel partial least squares-based least squares support vector

KPLS-RVM

9 10 11

12

13

Kernel partial least squares-based relevance vector machine

Appendix B. Prior and posterior parameter distribution of RVM algorithm In order to obtain the parameters of w and σ2 from Bayesian perspective, the

Jo

8

urn a

machine

zero-mean Gaussian prior distribution should be defined first: n i wi2 1 12 p (w α )   N ( wi 0,  )  )   i exp(  i 0 (2 )( n 1) 2 i 0 2 n

1 i

(B.1)

n

p()   gamma(i a, b) i 0

(B.2)

29

Journal Pre-proof p( )  gamma( c, d )

1

 = -2 , α is a vector of

2

where

3

takes the form

(B.3)

M +1 hyper-parameters, and the gamma function

gamma( a, b)  (a)1 ba αa-1ebα

4 

(B.4)

where (a)  0 t a 1e  t dt . To make the priors proposed non-informative, the

6

parameters a, b, c, and d are set as: a  b  c  d  104 .

7 8

pro of

5

With the prior probability of the parameters defined, the Bayesian rule can be used to obtain the posterior parameter distribution

p ( w y , α,  ) 

p ( y w,  2 ) p( w α ) p ( y α,  2 )

re-

2

9

(B.5)

1 1 1 2  Σ exp{ ( w  μ) T Σ 1 ( w  μ)} ( n 1) 2 (2 ) 2 where the posterior mean and covariance are given as follows

11 12

lP

10

=( -2 T (x)ψ(x)+A)-1

(B.6)

 = -2 Σψ T (x)y

(B.7)

13

with A=diag(α0 ,α1 ,

14

optimal point prediction of the hyper-parameters α o =   o  o ,

15

and the details of this procedure can also be found in Tipping [15].

urn a

,αn ) . The marginal likelihood can be used to obtain the , no  and  o2 ,

During the process of optimizing the hyper-parameters, many of the α i tend to

17

infinity. It means that many weights wi in equation (B.1) will be zero, which leads to

18

a pruned basis function with less complex. Then the equations (B.6) and (B.7) can be

19

updated using the optimal hyper-parameters as shown below

Jo

16

20

Σo  ( o2ψ T (x)ψ (x)  Ao )1

(B.8)

21

o = o2 Σoψ T (x)y

(B.9) 30

Journal Pre-proof 1

where Ao =diag(α o ) .

2

Acknowledgments

4

This study was supported by the Foundation of Nanjing Forestry University (No.

5

GXL029), a grant from the Subway Fine Dust Reduction Technology Development

6

Project of the Ministry of Land Infrastructure and Transport (19QPPW-B152306-01),

7

and the National Research Foundation of Korea (NRF) grant funded by the Ministry

8

of Science and ICT (No. NRF-2019H1D3A1A02071051).

pro of

3

re-

9 10

References

12

[1] P. Kadlec, B. Gabrys, S. Strandt, Data-driven soft sensors in the process industry,

13

Comput. Chem. Eng. 33 (4) (2009) 795-814.

14

[2] H. Liu, M. Huang, C. Yoo, A fuzzy neural network-based soft sensor for modeling

15

nutrient removal mechanism in a full-scale wastewater treatment system, Desalin.

16

Water Treat. 51 (31-33) (2013) 6184-6193.

17

[3] M. Kano, K. Fujiwara, Virtual sensing technology in process industries : Trends

18

and challenges revealed by recent industrial applications, J. Chem. Eng. Jpn. 46 (1)

19

(2013) 1-17.

20

[4] M. Kano, Y. Nakagawa, Data-based process monitoring, process control, and

21

quality improvement: Recent developments and applications in steel industry, Comput.

22

Chem. Eng. 32 (1–2) (2008) 12-24.

Jo

urn a

lP

11

31

Journal Pre-proof

[5] H. Kaneko, K. Funatsu, Moving window and just-in-time soft sensor model based

2

on time differences considering a small number of measurements, Ind. Eng. Chem.

3

Res. 54 (2) (2015) 700-704.

4

[6] J.C. Principe, N.R. Euliano, W.C. Lefebvre, Neural and Adaptive Systems:

5

Fundamentals Through Simulations, Wiley, New York, 2000.

6

[7] J.C.B. Gonzaga, L.A.C. Meleiro, C. Kiang, R.M. Filho, ANN-based soft-sensor

7

for real-time process monitoring and control of an industrial polymerization process,

8

Comput. Chem. Eng. 33 (1) (2009) 43-49.

9

[8] D.M. Himmelblau, Accounts of experiences in the application of artificial neural

re-

pro of

1

networks in chemical engineering, Ind. Eng. Chem. Res. 47 (16) (2008) 5782-5796.

11

[9] G. Mountrakis, J. Im, C. Ogole, Support vector machines in remote sensing: A

12

review, ISPRS J. Photogramm. 66 (3) (2011) 247-259.

13

[10] V.N. Vapnik, The Nature of Statistical Learning Theory, Springer, New York,

14

2000.

15

[11] C.T. Lin, C.S.G. Lee, Neural Fuzzy Systems: A Neuro-fuzzy Synergism to

16

Intelligent Systems, Prentice Hall PTR, Upper Saddle River NJ, 1996.

17

[12] M. Huang, Y. Ma, J. Wan, X. Chen, A sensor-software based on a genetic

18

algorithm-based neural fuzzy system for modeling and simulating a wastewater

19

treatment process, Appl. Soft Comput. 27 (2015) 1-10.

20

[13] H. Liu, C. Yoo, A robust localized soft sensor for particulate matter modeling in

21

Seoul metro systems, J. Hazard. Mater. 305 (2016) 209-218.

22

[14] Z. Ge, Z. Song, Nonlinear soft sensor development based on relevance vector

Jo

urn a

lP

10

32

Journal Pre-proof

machine, Ind. Eng. Chem. Res. 49 (18) (2010) 8685-8693.

2

[15] M.E. Tipping, A. Smola, Sparse Bayesian learning and the relevance vector

3

machine, J. Mach. Learn. Res. 1 (3) (2001) 211-244.

4

[16] Y. Liu, J. Chen, Z. Sun, Y. Li, D. Huang, A probabilistic self-validating

5

soft-sensor with application to wastewater treatment, Comput. Chem. Eng. 71 (2014)

6

263-280.

7

[17] C. Chen, Y. Wang, Y. Zhang, Y. Zhai, Indoor positioning algorithm based on

8

nonlinear PLS integrated with RVM, IEEE Sens. J. 18 (2) (2018) 660-668.

9

[18] P.K. Wong, Q. Xu, C.M. Vong, H.C. Wong, Rate-dependent hysteresis modeling

10

and control of a piezostage using online support vector machine and relevance vector

11

machine, IEEE T. Ind. Electron. 59 (4) (2011) 1988-2001.

12

[19] J. Ji, H. Wang, K. Chen, Y. Liu, N. Zhang, J. Yan, Recursive weighted kernel

13

regression for semi-supervised soft-sensing modeling of fed-batch processes, J.

14

Taiwan Inst. Chem. E. 43 (1) (2012) 67-76.

15

[20] X. Wang, B. Jiang, N. Lu, Adaptive relevant vector machine based RUL

16

prediction under uncertain conditions, ISA T. 87 (2019) 217-224.

17

[21] Y. Dong, S.J. Qin, Dynamic latent variable analytics for process operations and

18

control, Comput. Chem. Eng. 114 (2018) 69-80.

19

[22] S.J. Qin, Survey on data-driven industrial process monitoring and diagnosis,

20

Annu. Rev. Control. 36 (2) (2012) 220-234.

21

[23] Q. Zhu, Q. Liu, S.J. Qin, Concurrent quality and process monitoring with

22

canonical correlation analysis, J. Process Contr. 60 (2017) 95-103.

Jo

urn a

lP

re-

pro of

1

33

Journal Pre-proof

[24] L. Van Der Maaten, E. Postma, J. Van den Herik, Dimensionality reduction : A

2

comparative review, J. Mach. Learn. Res. 10 (2009) 66-71.

3

[25] Z. Ge, Z. Song, F. Gao, Review of recent research on data-based process

4

monitoring, Ind. Eng. Chem. Res. 52 (10) (2013) 3543-3562.

5

[26] R. Rosipal, L.J. Trejo, Kernel partial least squares regression in reproducing

6

kernel Hilbert space, J. Mach. Learn. Res. 2 (Dec) (2001) 97-123.

7

[27] J.M. Lee, C.K. Yoo, W.C. Sang, P.A. Vanrolleghem, I.B. Lee, Nonlinear process

8

monitoring using kernel principal component analysis, Chem. Eng. Sci. 59 (1) (2004)

9

223-234.

re-

pro of

1

[28] Y. Zhang, R. Sun, Y. Fan, Fault diagnosis of nonlinear process based on KCPLS

11

reconstruction, Chemom. Intell. Lab. Syst. 140 (2015) 49-60.

12

[29] R. Jia, Z. Mao, F. Wang, D. He, Self-tuning final product quality control of batch

13

processes using kernel latent variable model, Chem. Eng. Res. Des. 94 (2015)

14

119-130.

15

[30] N. Sheng, Q. Liu, S.J. Qin, T. Chai, Comprehensive monitoring of nonlinear

16

processes based on concurrent kernel projection to latent structures, IEEE T. Autom.

17

Sci. Eng. 13 (2) (2016) 1129-1137.

18

[31] Q. Liu, Q. Zhu, S.J. Qin, T. Chai, Dynamic concurrent kernel CCA for

19

strip-thickness relevant fault diagnosis of continuous annealing processes, J. Process

20

Contr. 67 (2018) 12-22.

21

[32] B.M. Nicolaï, K.I. Theron, J. Lammertyn, Kernel PLS regression on wavelet

22

transformed NIR spectra for prediction of sugar content of apple, Chemom. Intell.

Jo

urn a

lP

10

34

Journal Pre-proof

Lab. Syst. 85 (2) (2007) 243-252.

2

[33] M. Wang, G. Yan, Z. Fei, Kernel PLS based prediction model construction and

3

simulation on theoretical cases, Neurocomputing 165 (2015) 389-394.

4

[34] X. Huang, Y.-P. Luo, Q.-S. Xu, Y.-Z. Liang, Incorporating variable importance

5

into kernel PLS for modeling the structure–activity relationship, J. Math. Chem. 56 (3)

6

(2018) 713-727.

7

[35] Y. Lv, J. Liu, T. Yang, Nonlinear PLS integrated with error-based LSSVM and its

8

application to NOx modeling, Ind. Eng. Chem. Res. 51 (49) (2012) 16092-16100.

9

[36] S.J. Zhao, J. Zhang, Y.M. Xu, Z.H. Xiong, Nonlinear projection to latent

10

structures method and its applications, Ind. Eng. Chem. Res. 45 (11) (2006)

11

3843-3852.

12

[37] B. Xie, Y.-w. Ma, J.-q. Wan, Y. Wang, Z.-c. Yan, L. Liu, Z.-y. Guan, Modeling

13

and multi-objective optimization for ANAMMOX process under COD disturbance

14

using hybrid intelligent algorithm, Environ. Sci. Pollut. R. 25 (21) (2018)

15

20956-20967.

16

[38] P. Geladi, B.R. Kowalski, Partial least-squares regression: A tutorial, Anal. Chim.

17

Acta 185 (1986) 1-17.

18

[39] J. Shawe-Taylor, N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge

19

University Press, Cambridge, 2004.

20

[40] J.A.K. Suykens, T.V. Gestel, J.D. Brabanter, B.D. Moor, J. Vandewalle, Least

21

Squares Support Vector Machines, World Scientific, London, 2002.

22

[41] L. Fortuna, S. Graziani, A. Rizzo, M.G. Xibilia, Soft Sensors for Monitoring and

Jo

urn a

lP

re-

pro of

1

35

Journal Pre-proof

Control of Industrial Processes, Springer Science & Business Media, London, 2007.

2

[42] I.T. Jolliffe, Principal Component Analysis, Springer-Verlag, New York, 2002.

3

[43] H. Liu, C. Yang, M. Huang, D. Wang, C. Yoo, Modeling of subway indoor air

4

quality using Gaussian process regression, J. Hazard. Mater. 359 (2018) 266-273.

5

[44] J. Wan, M. Huang, Y. Ma, W. Guo, Y. Wang, H. Zhang, W. Li, X. Sun, Prediction

6

of effluent quality of a paper mill wastewater treatment using an adaptive

7

network-based fuzzy inference system, Appl. Soft Comput. 11 (3) (2011) 3238-3246.

pro of

1

Jo

urn a

lP

re-

8

36

Journal Pre-proof *Declaration of Interest Statement

Declaration of interests ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Jo

urn a

lP

re-

pro of

☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:

*Highlights (for review)

Journal Pre-proof

Research Highlights

1. Latent variables (LVs) of KPLS and relevance vector machine (RVM) are proposed

pro of

simultaneously. 2. LVs of KPLS are integrated with RVM for modeling the nonlinear data of industrial processes.

3. LVs are used to handle high dimensionality and complex collinearity of nonlinear data.

re-

4. RVM is developed to construct probabilistic and predictive function between LVs and y variables.

Jo

urn a

lP

5. KPLS-RVM shows the desirable predictive accuracy of the quality variable.

*Credit Author Statement

Journal Pre-proof

CRediT author statement

Jo

urn a

lP

re-

pro of

Hongbin Liu: Supervision, Conceptualization, Writing- Reviewing and Editing Chong Yang: Writing- Original draft preparation, Methodology, Software Mingzhi Huang: Data curation, Validation ChangKyoo Yoo: Data curation, Funding acquisition