Multivariate data modeling using modified kernel partial least squares

Multivariate data modeling using modified kernel partial least squares

chemical engineering research and design 9 4 ( 2 0 1 5 ) 466–474 Contents lists available at ScienceDirect Chemical Engineering Research and Design ...

1MB Sizes 6 Downloads 178 Views

chemical engineering research and design 9 4 ( 2 0 1 5 ) 466–474

Contents lists available at ScienceDirect

Chemical Engineering Research and Design journal homepage: www.elsevier.com/locate/cherd

Multivariate data modeling using modified kernel partial least squares Gao Yingbin, Kong Xiangyu ∗ , Hu Changhua, Zhang Zhengxin, Li Hongzeng, Hou Li’an The Xi’an Research Institute of High Technology, Xi’an, Shaanxi 710025, PR China

a b s t r a c t There are two problems, which should be paid attention to when using kernel partial least squares (KPLS), one is overfitting and another is how to eliminate the useless information mixed in the independent variables X. In this paper, the stochastic gradient boosting (SGB) method is adopted to solve the overfitting problems and a new method called kernel net analyte preprocessing (KNAP) is proposed to remove undesirable systematic variation in X that is unrelated to Y. Thus, by combining the two methods, a final modeling approach named modified KPLS (MKPLS) is proposed. Two simulation experiments are carried out to evaluate the performance of the MKPLS method. The simulation results show that MKPLS method can not only be resistant to overfitting but also improve the prediction accuracy. © 2014 The Institution of Chemical Engineers. Published by Elsevier B.V. All rights reserved.

Keywords: Kernel partial least squares; Stochastic gradient boosting; Kernel net analyte preprocessing; Overfitting; Multivariate data modeling; Data preprocessing

1.

Introduction

Partial least squares (PLS) regression, which was proposed by Wold in 1983, is one of the most commonly used calibration methods in chemometrics (Wold et al., 1983). PLS regression searches for a set of components (called latent variables) that performs a simultaneous decomposition of independent variables (X) and dependent variables (Y) with the constraint that these components explain as much as possible of the covariance between X and Y (Abdi, 2010). PLS is a powerful technique for process modeling and calibration in systems where the predictor variables are collinear, measurement data contain noise, variables have high dimensionality, and where there are fewer observations than predictor variables (Zhang et al., 2010a). But PLS regression is a linear method and is inappropriate for describing the underlying data structure because such systems may exhibit significant nonlinear characteristics (Zhang and Zhang, 2009). To solve this issue, a nonlinear PLS method, called kernel partial least squares (KPLS), was proposed by Rosipal and Trejo (Rosipal and Trejo, 2002). The



original datasets are nonlinearly transformed into a feature space of arbitrary dimensionality via nonlinear mapping, and then a linear model is created in the feature space (Zhang et al., 2012; Zhang and Hu, 2011). Because it’s easy to understand and operate, KPLS has been widely used in many fields, such as pattern recognition (Qu et al., 2010), signal processing (Helander et al., 2012), fault diagnosis (Zhang et al., 2010b), and so on. Data preprocessing methods can reduce the noise effect on the data, extract more useful information for model building, and improve the prediction ability and model robustness. Many data preprocessing methods have been proposed in recent years, such as multiplicative scatter correction (MSC) (Thennadil et al., 2006), standard normal variate (SNV) (Barnes et al., 1989), Savitzky–Golay smoothing and differentiation (Savitzky and Golay, 1964), and so on. Recent work has focused on one method called net analyte preprocessing (NAP), which was firstly proposed by Lorber (Lorber, 1986). Lorber proved that the useless information in X, which is not related to the dependent variables Y for model building, can be completely removed by NAP and the prediction accuracy can also

Corresponding author. Tel.: +86 02984744954. E-mail address: [email protected] (X. Kong). Received 16 May 2014; Received in revised form 1 September 2014; Accepted 3 September 2014 Available online 15 September 2014 http://dx.doi.org/10.1016/j.cherd.2014.09.004 0263-8762/© 2014 The Institution of Chemical Engineers. Published by Elsevier B.V. All rights reserved.

467

chemical engineering research and design 9 4 ( 2 0 1 5 ) 466–474

be improved. However, this method can be effectively performed only on a set of observations that vary linearly. When the variations are nonlinear data, the linear NAP is inappropriate for fitting the nonlinear data. This situation will affect the performance of NAP about removing systematic variation from an input set X not correlated to the response set Y (Zhang et al., 2010a). For this reason, inspired by the kernel function methods, we propose a new method called kernel net analyte preprocessing (KNAP) in this paper, which solves the issue of data nonlinearity compared to NAP. Another problem, which has to face with when using PLS or KPLS, is how to avoid overfitting. Overfitting is a commonly observed situation where the learner performs well on training data, but has a large error on the test data (Hawkins, 2004). In (Zhang et al., 2004), Massart proposed a weighted averaged PLS (APLS) method, which has compared predication ability and is also relatively robust to overfitting. Recently, a new method called boosting has drawn much attention. In (Schapire, 1990), based on the so-called margin theory, Schapire proved that boosting was more robust to overfitting. Combining the boosting method with PLS, Massart proposed a new method called boosting partial least squares (BPLS) to solve the overfitting problem (Zhang et al., 2005). BPLS has been used in many fields, such as quantitative structureactivity/property relationship (QSAR/QSPR) study (Zhou et al., 2007), near infrared spectroscopy (Tan et al., 2010), mass spectrometry analysis (He et al., 2004), and so on. Although so many applications of boosting method in PLS regression modeling, there are little attention paid to the KPLS regression modeling. Since many applications have demonstrated the superiority of KPLS over PLS in solving nonlinear problems (Zhang and Hu, 2011; Qu et al., 2010; Helander et al., 2012; Zhang et al., 2010b), it is necessary to study the performance of boosting in the KPLS modeling. In (Friedman, 2002), a variant of boosting method called stochastic gradient boosting (SGB) was proposed, and this method has less computation time and higher prediction accuracy than the boosting method. Combing the SGB with KPLS, we proposed a new method called stochastic gradient boosting-kernel partial least squares (SGB-KPLS) in this paper, which aims to solve the overfitting problem when using KPLS method. In this paper, the KNAP method is introduced into SGB-KPLS modeling procedure and a final method called KNAP-SGB-KPLS (modified kernel partial least squares (MKPLS) for short) is proposed. The rest of the paper is organized as follows. In Section 2, the basic theories and algorithms of KPLS and SGB are introduced, and then detail descriptions of the proposed methods (KNAP and MKPLS) are given. Computer simulations are carried out in Section 3. Finally, our conclusions are drawn in Section 4.

2.

Modified kernel partial least squares

2.1.

Notations

In order to conveniently understand the below mentioned symbols, some essential notations are illuminated in this section. Throughout the present work, matrices will be noted in capital bold (as in X), column vectors in small bold (as in x), and scalar variables in italicized characters (as in n). Some notational symbols are listed below: F feature space ˚(X) data matrix in feature space F

K kernel matrix K = ˚(X)˚T (X) K* net analyte kernel matrix X independent variables matrix Y dependent variables matrix I identity matrix b regression coefficient vector h number of latent variables m number of basis regression models when using SGB method n number of samples in data set nc size of subsample used in the SGB method v shrinkage value

2.2.

Kernel partial least squares

KPLS is an extension of PLS in the nonlinear feature space. According to Cover’s theorem, the nonlinear structure in the feature space is more likely to be linear after a highdimensional nonlinear mapping (Rosipal, 2003). This higher dimensional linear space is referred to as the feature space F (Zhang et al., 2010b). First, consider a nonlinear transformation of the input data xi , i = 1, 2, . . ., n into feature space F. ˚ : xi ∈ Rn → ˚(xi ) ∈ F

(1)

n

where it is assumed that ˚(xi ) = 0, i.e. mean centering i=1 in the high-dimensional space should be performed before applying KPLS. ˚(xi ) is a nonlinear mapping function that projects the input vectors from the original space to F. Note that the dimensionality of the feature space F is arbitrarily large, and can even be infinite. Denote ˚(X) as the (n × s) matrix whose ith row is the vector ˚(xi ) in the s-dimensional feature space F. By means of the introduction of the kernel trick K(xi , xj ) = ˚(xi )˚T (xj ), one can avoid both performing explicit nonlinear mappings and computing dot products in the feature space (Cao et al., 2011). The commonly used  kerr nel functions are the polynomial kernel K(x1 , x2 ) = xi , xj ,

 

2 

radial basis kernel K(x1 , x2 ) = exp −xi − xj  /c , and sig-





moidal kernel K(x1 , x2 ) = tanh(ˇ0 xi , xj + ˇ1 ), where c, r, ˇ0 , ˇ1 are the parameters of the kernels and should be predefined by users. The steps of KPLS method are as follows: For i = 1, 2, . . ., h (h is the number of latent variables), repeat the following steps: Step 1: Initialize, set Ki = K, Y i = Y, set ui equal to any column of Yi . Step  2: Compute the score vector of ˚(X) : ti = Kui / ui T Ki ui .   Step 3: Compute the loading vector of Y i : qi = Y i ti / t i T ti . Step 4: Compute the score vector of Y i : ui = Y i qi /qi T qi . Step 5: If ui converges, then go to step 6; else return to step 2. Ki+1 = (I − t i t i T /ti T ti )Ki (I − ti ti T /ti T ti ) Step 6: Deflation , Y i+1 = (I − ti ti T /ti T ti )Y i i = i + 1, and go to step 2. After all the h latent variables are extracted, the regression coefficient b in KPLS can be obtained from b = ˚T U(T T KU)

−1 T

T Y

(2)

where T = [t1 , t2 , . . ., t h ] and U = [u1 , u2 , . . ., uh ] are the score matrix. As a result, when the number of test data is nt , the

468

chemical engineering research and design 9 4 ( 2 0 1 5 ) 466–474

predictions on training data and test data can be made as follows, respectively: −1 Yˆ train = ˚b = KU(T T KU) TY

(3)

−1 Yˆ test = ˚test b = Ktest U(T T KU) T T Y

(4)

where ˚test is the matrix of the mapped test data and Ktest is the nt × n test kernel matrix whose elements are Ktest (i, j) = K(xi , xj ), where xi is the ith test vector and xj is the jth training vector. Before applying KPLS, mean centering of the data should be carried out in the feature space (Schölkopf et al., 1998). This can be done by substituting the kernel matrices K and Ktest ˜ and K ˜ test , where with K ˜ = K



I−

1 1n 1Tn n

˜ test = (Ktest − K

 

K I−

1 1n 1Tn n



1 1 1n 1T K)(I − 1n 1Tn ) n t n n

(5)

(6)

where I is an n-dimensional identity matrix, 1n and 1nt are vectors whose elements are all ones, with length n and nt , respectively.

2.3.

Kernel net analyte preprocessing

Generally speaking, multivariate process data contains unwanted systematic variation in X that is unrelated to Y. In order to remove undesirable systematic variation and achieve better models in multivariate calibration, Lorber proposed a signal preprocessing method called NAP (Lorber, 1986). Many different NAP methods have been proposed in recent years (Skibsted et al., 2005; Lorber et al., 1997; Bro and Andersen, 2003; Ferre and Faber, 2003; Goicoechea and Olivier, 2001). In Goicoechea and Olivier (2001), Goicoechea proposed a NAP method, which was called simple NAP (SNAP) method. SNAP is a preprocessing and filtering method to remove unrelated variation from a given data set X. Although SNAP has better performance with linear data, it cannot deal with the systems, which may exhibit strong nonlinear characteristics. Inspired by the kernel function method, we proposed a new method called kernel net analyte preprocessing (KNAP) in this paper, whose steps are listed as follows: Step 1: Calculate ˚(X*), which is the projection of ˚(X) orthogonal to y



˚(X∗ ) = I − y(yT y)

−1 T

y



˚(X)

(7)

Step 2: Calculate matrix V, where V is a n × a matrix composed of the first a eigenvectors of the square matrix ˚T (X*)˚(X*) (i.e. those associated with the largest a eigenvalues). The optimum number a can be estimated by cross-validation method (Collado et al., 2000) or from the study of the error in predicting a set of validation samples (Berger et al., 1998). Step 3: The net analyte variables ˚(X∗SC ) are obtained by using the filter matrix F NAP . ˚(X∗SC ) = ˚(X)F NAP = ˚(X)(I − VV T )

(8)

where F NAP is an projection matrix, which projects the independent variables onto the space orthogonal to that spanned by the analytes.

From the above steps, we can learn that there are two consecutive orthogonal projection operations during the KNAP method. We can obtain ˚(X*) after the first projection and ˚(X∗SC ) after the second one. Replace matrix ˚(X) with matrix ˚(X∗SC ) before using KPLS method, and then carry out the ordinary KPLS procedures to build a regression model. This method can be effectively performed on the nonlinear data, which can be proved by the following simulations, here we call it kernel net analyte preprocessing-kernel partial least squares (KNAP-KPLS).

2.4.

Stochastic gradient boosting as an additive model

The boosting method was first introduced into the machine learning field by Schapire in 1990 (Schapire, 1990). In 1995, Freund and Schapire proposed a modified method called AdaBoost (adaptive boosting) for regression modeling (Freund and Schapire, 1995). In 2002, a variant of boosting called SGB (stochastic gradient boosting) was invented by Friedman (Friedman, 2002). Based on the probably approximately correct learning framework and the concept of ensemble methods, boosting can combine the outputs of many ‘weak’ learners (classifiers or regression methods) to produce a powerful model (Cao et al., 2010). The basic principle of boosting as an additive model is to sequentially construct additive regression models by fitting a basic model to the current residuals that are not fitted by previous models. Generally speaking, an additive model can be expressed as:

F(X) = v1 f1 (X) + v2 f2 (X)+, . . ., +vm fm (X) =

m

vi fi (X)

(9)

i=1

where



2

fi (X) = arg min(Y − F i−1 (X) − fi (X) )

(10)

fi (X)

where argmin(*) is the function to find the optimal f(X) that minimizes (*), vi is the shrinkage parameter which may be a constant or not (Cao et al., 2010), and m is the number of f(X) added. The above-mentioned method is termed gradient boosting (GB). In Friedman (2002), Friedman incorporated randomization analogous to bagging into the procedure and proposed a well-known method called stochastic gradient boosting (SGB). At each iteration of SGB, a fraction  ( = nc /n, nc is the size of subsample) of the training data are randomly sampled (without replacement) from the full training samples. These randomly selected subsamples are subsequently used in place of the full samples to fit the base learner and compute the model update for current iteration. The rest of the SGB method is identical to the gradient boosting method (Cao et al., 2010). According to Friedman’s research, the smaller the fraction  was, the more the random samples used in successive iterations will differ, thereby introducing more overall randomness into the procedure. However, making the value of  smaller reduces the amount of data available to train the base learner. And this would cause the variance associated with the individual base learner estimates to increase (Friedman, 2002).

469

chemical engineering research and design 9 4 ( 2 0 1 5 ) 466–474

For test data Xnew and ynew . Step 5: Calculate ˚(X∗new ) after denoising by application of the F NAP matrix to the test data in Xnew : ˚(X∗new ) = ˚(Xnew )F NAP

(12)

Step 6: Calculate yˆ new according Eq. (11). In step 4, only the vi part of the fitted yˆ i is used as useful regression information, which is a very important operation. This step means only vi part of the current model is used at each step and the remaining 1 − vi part is put back, which can effectively avoid overfitting (Zhang et al., 2005).

3.

Fig. 1 – Scheme for explanation of MKPLS.

2.5.

Modified KPLS

Combining the SGB method with KNAP-KPLS regression model, we propose a new method called MKPLS in this section. Initially, the systematic variations are removed by KNAP from the input data set X. Then a KPLS model is built with nc samples, which are randomly sampled from the whole training data. Use this model to predict all the training data and calculate the prediction residual of the response y. From then on, MKPLS sequentially adds a series of KPLS model. Unlike the original KPLS, which fits the residual of y with the residual of ˚(X), MKPLS only considers the residual of y, i.e., it always fits the residual of y with the original ˚(X) (Zhang et al., 2005). Note that this process makes the MKPLS less explainable than the original KPLS model; however, it does not affect the application of MKPLS when the interest is in avoiding overfitting and improving prediction accuracy. The sequential adding is repeated, resulting many weighted KPLS models. The final prediction is the sum of these weighted KPLS models. The scheme of MKPLS is given in Fig. 1 and the steps of MKPLS are as follows: For training data set X and y. Step 1: Remove systematic variation from the input data set X by KNAP. Step 2: Initially build a basic KPLS model (Sub-model 1) y1 = f1 (X) with nc samples. Use sub-model 1 to predict all the training data and denote the perdition results by yˆ 1 , then calculate the residual: yres,1 = y − v1 yˆ 1 . Step 3: For i = 2, . . ., m, repeat the following steps:

Simulation experiments

In this section, two experiments are carried out to evaluate the performance of the method proposed here (MKPLS) and three other methods such as KPLS, SGB-KPLS, SGB-NAP-KPLS, where SGB-KPLS means combining the SGB method and KPLS method together and SGB-NAP-KPLS means combining the SGB method, NAP method and KPLS method together. The first one is numerical example and the second uses concrete compressive strength data. To compare performance of the four methods in the two simulation experiments, we introduce two measures to evaluate the models’ prediction ability. The first one is root-mean-square error (RMSE) of the predication residuals, defined as



− yˆ i )

2

(13)

n

where yi is the true value, yˆ i is the prediction value, and n is the total number of samples. Another measure of the model fit to the data is the coefficient of determination R2 , defined as R2 = 1 −

SSR SSY

where SSR =

(14) n

2

(yi − yˆ i ) is the sum of squares of the predic-

i=1

tion residual, SSY =

n

2

(yi − y¯ i ) is the sum of squares of the

i=1

response variable corrected for the mean, and R2 is the ratio of the explained variation to the total variation and such that 0 < R2 < 1; the higher the value of R2 , the better the prediction relation. Generally speaking, the model with a R2 value of 0.7 is considered a useful model (Zhang et al., 2010a).

3.1. (a) nc samples are randomly picked out as the new training data. (b) Fit the current residual yres,i using the nc samples through a basic model: yi = fi (X). (c) Use the current model to predict all the training data in the data set. (d) Update the residual: yres,i = yres,i−1 − vi yˆ i .

n (y i=1 i

RMSE =

Numerical simulation

Consider the following four equations (Wang and Wu, 2006)

⎧ g1 (x1 ) = sin(x1 /2) ⎪ ⎪ ⎪ ⎪ ⎨ g2 (x2 ) = x2 2

⎪ g3 (x3 ) = −x3 ⎪ ⎪ ⎪ ⎩

(15)

y = g1 (x1 ) + g2 (x2 ) + g3 (x3 ) + e

Step 4: The final prediction is

yˆ = v1 yˆ 1 + v2 yˆ 2 +, . . ., +vm yˆ m =

m i=1

vi yˆ i

(11)

where x1 , x2 , x3 are randomly produced and are equally distributed on the interval [−1, 1]. e is Gaussian and randomly produced with mean 0 and standard deviation 0.1. We randomly take 100 points by the above equations, among which

chemical engineering research and design 9 4 ( 2 0 1 5 ) 466–474

0.45

0.45

0.4

0.4

0.35

0.35

0.3

0.3

0.25

0.25

RMSE

RMSE

470

0.2

SGB-KPLS SGB-NAP-KPLS MKPLS

0.2

0.15

0.15

0.1

0.1

0.05

0.05

0

0

5

10

15

20

25

0

30

0

10

20

30

40

70 samples are randomly chosen as training dataset and the remaining ones are used as test dataset. The radial basis kernel is selected. Some other initial parameters are set as follows: a = 21, h = 17, m = 100, nc = 40, v = 0.9.

0.5

0.4

RMSE

90

100

SGB-KPLS SGB-NAP-KPLS MKPLS

0.5

RMSE

0.4 0.35 0.3 0.25 0.2

0.35 0.3 0.25 0.2

15

20

25

Latent variables numbers Fig. 3 – RMSE curve of the test data using KPLS in numerical example.

30

0

10

20

30

40

50

60

70

80

90

100

Iterations

Fig. 5 – The RMSE curves of the test data with different iteration times in number example. the RMSE is insensitive to m. In other words, the SGB method is resistant to overfitting. At the same time, we can find that MKPLS has lower RMSE value than the other two methods for the test data. In the next part, we will study the prediction accuracy of the four methods.

3.1.2.

0.45

10

80

0.45

Figs. 2 and 3 show the RMSE curves of the data sets obtained by KPLS method using different latent variables. From the two figures, we can see that the RMSE curve drops quickly in the beginning and then tends to slow as the number of latent variables (h) increase for the training data, however this situation does not occur in Fig. 3. For the test data, the RMSE values decreases quickly when h is less than 7, while the RMSE curve rises again as h increases. That is to say overfitting occurs. Next, we will study that whether the SGB method can solve the overfitting problem or not. The SGB-KPLS, SGB-NAP-KPLS and MKPLS methods are used to build models. Set the number of basic models (m) in SGB from 1 to 100, and then the RMSE values are obtained by the three models. As shown in Figs. 4 and 5, the RMSE curves drops dramatically after the first iterations and tends to a constant as iteration continues. No increase can be found in the two figures, which means that

5

70

0.55

Overfitting problem test

0

60

Fig. 4 – The RMSE curves of the training data with different iteration times in number example.

Fig. 2 – RMSE curve of the training data using KPLS in numerical example.

3.1.1.

50

Iterations

Latent variables numbers

Prediction accuracy contrast

The prediction results of the four regression models for the training and test data sets are listed in Table 1. The SGBKPLS predicts the test data with R2 = 0.6967 and RMSE = 0.3033. After NAP preprocessing, SGB-NAP-KPLS slightly improved predictive ability (R2 = 0.6999 and RMSE = 0.3001) compared to SGB-KPLS. While after KNAP preprocessing, MKPLS shows the best predictive ability for the test data, with the highest R2 of 0.7506 and the lowest RMSE of 0.2488. In other words, KNAP can effectively remove undesirable systematic variation and improve the prediction capability. In order to further illustrate the good prediction capability of MKPLS, we show the prediction results in Figs. 6–9. For the four figures, the true data is plotted with red line and the green line means the prediction results. From the four figures, we can conclude that MKPLS is the best in terms of maintaining the highest predictive ability over the test data among the four methods.

471

chemical engineering research and design 9 4 ( 2 0 1 5 ) 466–474

Table 1 – Summary of prediction results in number example. KPLS

SGB-KPLS

SGB-NAP-KPLS

MKPLS

Training

RMSE R2

0.0166 0.9834

0.0143 0.9857

0.0139 0.9861

0.0021 0.9963

Test

RMSE R2

0.3067 0.6933

0.3033 0.6967

0.3001 0.6999

0.2488 0.7506

Table 2 – The input and output variables of concrete data. Input variables (kg/m3 )

Cement, blast furnace slag, fly ash, water super-plasticizer, coarse aggregate, fine aggregate Slump (cm), flow (cm), compressive strength (Mpa)

Output variables

2 1.5

KPLS

2

True value Prediction results

1.5

1

1

0.5

0.5

0

0

-0.5

-0.5

-1

-1

-1.5

-1.5

-2

-2

-2.5 0

5

10

15

20

25

30

-2.5 0

SGB-NAP-KPLS True value Prediction results

5

10

Fig. 6 – Prediction results of test data using KPLS in number example. (For interpretation of the references to colour in the text, the reader is referred to the web version of this article.)

2 1.5

SGB-KPLS 2

True value Prediction results

1.5 1

0.5

0.5

0

0

-0.5

-0.5

-1

-1

-1.5

-1.5

-2

-2

5

10

15

20

25

30

Fig. 8 – Prediction results of test data using SGB-NAP-KPLS in number example. (For interpretation of the references to colour in the text, the reader is referred to the web version of this article.)

1

-2.5 0

15

Samples

Samples

20

25

Samples Fig. 7 – Prediction results of test data using SGB-KPLS in number example. (For interpretation of the references to colour in text, the reader is referred to the web version of this article.)

30

-2.5 0

MKPLS True value Prediction results

5

10

15

20

25

30

Samples Fig. 9 – Prediction results of test data using MKPLS in number example. (For interpretation of the references to colour in the text, the reader is referred to the web version of this article.)

472

chemical engineering research and design 9 4 ( 2 0 1 5 ) 466–474

0.45 0.4

SGB-KPLS SGB-NAP-KPLS MKPLS

0.5

0.35

RMSE

0.3

0.4

0.25 0.3 0.2 0.15

0.2

0.1 0.1

0.05 0

0

5

10

15

20

25

30

0

Latent variables numbers

10

20

30

40

50

60

70

80

90

100

Iterations

Fig. 10 – RMSE curve of the training data using KPLS in concrete example.

Fig. 12 – The RMSE curves of the training data with different iteration times in concrete example.

0.48

0.55 SGB-KPLS SGB-NAP-KPLS MKPLS

0.46 0.5

0.44 0.42

0.45

RMSE

0.4 0.4

0.38 0.36

0.35

0.34 0.3

0.32 0.3

0.25

0

5

10

15

20

25

30

0.2

Latent variables numbers

20

30

40

50

60

70

80

90

100

Iterations

Fig. 11 – RMSE curve of the test data using KPLS in concrete example.

Fig. 13 – The RMSE curves of the test data with different iteration times in concrete example.

From the above simulation experiments, we can conclude that MKPLS can not only avoid the overfitting problem but also improve the prediction accuracy in numerical example. In order to further validate its performance, the MKPLS method is applied to model concrete compressive strength in the next experiment.

3.2.

10

are listed in Table 2. According to (Yeh, 2007), the compressive strength, which is the most important index for concrete, is a highly nonlinear function of these input variables. In this experiment, 70 samples, which are randomly picked out from the whole 103 samples, are used for modeling, and the remaining part is used as the test set. The radial basis kernel is selected. Some other initial parameters are set as follows: a = 13, h = 12, m = 100, nc = 40, v = 0.9.

Modeling concrete compressive strength

Concrete is one of the most important materials in civil engineering. A real concrete data set was described in (Yeh, 2007) and can be downloaded from http://archive.ics.uci. edu/ml/datasets/Concrete+Slump+Test. The data set consists of data from 103 samples. There are 7 input variables and 3 output variables in the data set. The input and output variables

3.2.1.

Overfitting problem test

The RMSE curves of the training data and the test data using KPLS method with different numbers of latent variables (h) are shown in Figs. 10 and 11. From Figs. 10 and 11, we can find that the RMSE curve drops quickly at the beginning and then tends

Table 3 – Summary of prediction results in concrete example. KPLS

SGB-KPLS

SGB-NAP-KPLS

MKPLS

Training

RMSE R2

0.0116 0.9884

0.0109 0.9891

0.0096 0.9904

0.0005 0.9995

Test

RMSE R2

0.3035 0.6965

0.3002 0.6998

0.2981 0.7019

0.2299 0.7701

473

chemical engineering research and design 9 4 ( 2 0 1 5 ) 466–474

2.5 2

KPLS 2.5

True value Predition result

2

1.5

1.5

1

1

0.5

0.5

0

0

-0.5

-0.5

-1

-1

-1.5

-1.5

-2

-2

-2.5

5

10

15

20

25

30

-2.5

SGB-NAP-KPLS True value Predition result

5

10

Fig. 14 – Prediction results of test data using KPLS in concrete example. (For interpretation of the references to colour in the text, the reader is referred to the web version of this article.)

to a constant for the training data, while the RMSE curve for the test data drops at first when h is less than 10 and then rise again as h increases. It is obvious that overfitting occurs. The effect of the number of basic models (m) on the prediction results is discussed in the following simulations. The RMSE values of the training data and the test data with a wide range of m are shown in Figs. 12 and 13. The RMSE curves drop dramatically at first and then tend to some constants as m increases. This means that the SGB method has the capability to avoid overfitting.

2

SGB-KPLS

2.5

True value Predition result

2 1.5

1

1

0.5

0.5

0

0

-0.5

-0.5

-1

-1

-1.5

-1.5

-2

-2

5

10

15

25

30

20

25

30

Samples Fig. 15 – Prediction results of test data using SGB-KPLS in concrete example. (For interpretation of the references to colour in the text, the reader is referred to the web version of this article.)

Prediction accuracy contrast

The predication results of the four methods for the concrete data are listed in Table 3. From Table 3, we can see that MKPLS shows obviously improved predictive ability compared to the other three methods for the training data. Furthermore compared to the other three methods, MKPLS shows the best predictive ability for the test data, with a highest R2 of 0.7701 and a lowest RMSE of 0.2299. Thus, we can confirm again that MKPLS has the best predication performance among the four methods. For further contrast, Figs. 14–17 show the detail prediction results of the four methods in concrete example. The red lines are used for the true data and the green lines for the prediction results in the four figures. From the four figures, we can

1.5

-2.5

20

Fig. 16 – Prediction results of test data using SGB-NAP-KPLS in concrete example. (For interpretation of the references to colour in the text, the reader is referred to the web version of this article.)

3.2.2.

2.5

15

Samples

Samples

-2.5

True value Predition result

5

10

15

20

25

30

Samples Fig. 17 – Prediction results of test data using MKPLS in concrete example. (For interpretation of the references to colour in the text, the reader is referred to the web version of this article.)

474

chemical engineering research and design 9 4 ( 2 0 1 5 ) 466–474

conclude that MKPLS model can fit the data more adequately than the other methods.

4.

Conclusions

In this paper, we proposed a new multivariate data modeling method called MKPLS. The proposed method combines SGB, which has the ability to solve overfitting problem, and KNAP, which can effectively remove information not correlated to a target parameter. The prediction performance of MKPLS was compared to those of KPLS, SGB-KPLS and SGB-NAP-KPLS. Of the four methods, MKPLS gave the best performance in terms of regression fitting capacity and solving overfitting problem.

Acknowledgements This work was supported by the National Science Fund for Distinguished Youth Scholars of China (61025014) and National Natural Science Foundation of China under Grants 61074072 and 61374120.

References Abdi, H., 2010. Partial least squares regression and projection on latent structure regression (PLS regression). Wiley Interdiscip. Rev.: Comput. Stat. 2 (1), 97–106. Barnes, R.J., Dhanoa, M.S., Lister, S.J., 1989. Standard normal variate transformation and detrending of near infrared diffuse reflectance spectra. Appl. Spectrosc. 43 (5), 772–777. Berger, A.J., Koo, T.W., Itzkan, I., Feld, M.S., 1998. An enhanced algorithm for linear multivariate calibration. Anal. Chem. 70 (3), 623–627. Bro, R., Andersen, C.M., 2003. The theory of net analyte signal vectors in inverse regression. J. Chemom. 17 (12), 646–652. Cao, D.S., Xu, Q.S., Liang, Y.Z., Zhang, L.X., Li, H.D., 2010. The boosting: a new idea of building models. Chemom. Intell. Lab. Syst. 100 (1), 1–11. Cao, D.S., Liang, Y.Z., Xu, Q.S., Hu, Q.N., Zhang, L.X., Fu, G.H., 2011. Exploring nonlinear relationships in chemical data using kernel-based methods. Chemom. Intell. Lab. Syst. 107 (1), 106–115. Collado, M.S., Mantovani, V.E., Goicoechea, H.C., Olivieri, A.C., 2000. Simultaneous spectrophotometric multivariate calibration determination of several components of ophthalmic solutions: phenylephrine, chloramphenicol, antipyrine, methylparaben and thimerosal. Talanta 52 (5), 909–920. Ferre, J., Faber, N.M., 2003. Net analyte signal calculation in multivariate calibration. Chemom. Intell. Lab. Syst. 69 (1–2), 123–1236. Freund, Y., Schapire, R.E., 1995. A decision-theoretic generalization of on-line learning and an application to boosting. In Computational learning theory. Springer, Berlin Heidelberg, pp. 23–37. Friedman, J.H., 2002. Stochastic gradient boosting. Comput. Stat. Data Anal. 38 (4), 367–378. Goicoechea, H.C., Olivier, A.C., 2001. A comparison of orthogonal signal correction and net analyte preprocessing methods. Theoretical and experimental study. Chemom. Intell. Lab. Syst. 56 (2), 73–81. Hawkins, D.M., 2004. The problem of overfitting. J. Chem. Inf. Comput. Sci. 44 (1), 1–12. He, P., Xu, C.J., Liang, Y.Z., Fang, K.T., 2004. Improving the classification accuracy in chemistry via boosting technique. Chemom. Intell. Lab. Syst. 70 (1), 39–46.

Helander, E., Silen, H., Virtanen, T., Gabbouj, M., 2012. Voice conversion using dynamic kernel partial least squares regression. Audio Speech Lang. Process. IEEE Trans. 20 (3), 806–817. Lorber, A., 1986. Error propagation and figures of merit for quantification by solving matrix equations. Anal. Chem. 58 (6), 1167–1172. Lorber, A., Faber, K., Kowalski, B.R., 1997. Net analyte signal calculation for multivariate calibration. Anal. Chem. 69 (8), 1620–1626. Qu, H.N., Li, G.Z., Xu, W.S., 2010. An asymmetric classifier based on partial least squares. Pattern Recognit. 43 (10), 3448–3457. Rosipal, R., 2003. Kernel partial least squares for nonlinear regression and discrimination. Neural Netw. World 13, 291–300. Rosipal, R., Trejo, L.J., 2002. Kernel partial least squares regression in reproducing kernel Hilbert space. J. Mach. Learn. Res. 2, 97–123. Savitzky, A., Golay, M.J.E., 1964. Smoothing and differentiation of data by simplified least squares procedure. Anal. Chem. 36 (8), 1627–1639. Schapire, R.E., 1990. The strength of weak learnability. Mach. Learn. 5 (2), 197–227. Schölkopf, B., Smola, A., Müller, K.R., 1998. Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10 (5), 1299–1319. Skibsted, E.T.S., Boelens, H.F.M., Westerhuis, J.A., et al., 2005. Net analyte signal based statistical quality control. Anal. Chem. 77 (22), 7103–7114. Tan, C., Wang, J., Wu, T., Qin, X., Li, M., 2010. Determination of nicotine in tobacco samples by near-infrared spectroscopy and boosting partial least squares. Vib. Spectrosc. 54 (1), 35–41. Thennadil, S.N., Martens, H., Kohler, A., 2006. Physics-based multiplicative scatter correction approaches for improving the performance of calibration models. Appl. Spectrosc. 60 (3), 315–321. Wang, H.W., Wu, Z.B.M.J., 2006. Partial Least Squares Regression-Linear and Nonlinear Methods. National Defense Industry Press, China, pp. 210–212. Wold, S., Martens, H., Wold, H., 1983. The multivariate calibration problem in chemistry solved by the PLS method. In: Matrix Pencils. Springer, Berlin Heidelberg, pp. 286–293. Yeh, I.-C., 2007. Modeling slump flow of concrete using second-order regressions and artificial neural networks. Cement Concr. Compos. 29 (6), 474–480. Zhang, Y., Hu, Z., 2011. Multivariate process monitoring and analysis based on multi-scale KPLS. Chem. Eng. Res. Des. 89 (12), 2667–2678. Zhang, Y., Zhang, Y., 2009. Complex process monitoring using modified partial least squares method of independent component regression. Chemom. Intell. Lab. Syst. 98 (2), 143–148. Zhang, M.H., Xu, Q.S., Massart, D.L., 2004. Averaged and weighted average partial least squares. Anal. Chim. Acta 504 (2), 279–289. Zhang, M.H., Xu, Q.S., Massart, D.L., 2005. Boosting partial least squares. Anal. Chem. 77 (5), 1423–1431. Zhang, Y., Teng, Y., Zhang, Y., 2010a. Complex process quality prediction using modified kernel partial least squares. Chem. Eng. Sci. 65 (6), 2153–2158. Zhang, Y., Zhou, H., Qin, S.J., Chai, T., 2010b. Decentralized fault diagnosis of large-scale processes using multi-block kernel partial least squares. Ind. Inform. IEEE Trans. 6 (1), 3–10. Zhang, Y., Li, S., Hu, Z., 2012. Improved multi-scale kernel principal component analysis and its application for fault detection. Chem. Eng. Res. Des. 90 (9), 1271–1280. Zhou, Y.P., Cai, C.B., Huan, S., Jiang, J.H., Wu, H.L., Shen, G.L., Yu, R.Q., 2007. QSAR study of angiotensin II antagonists using robust boosting partial least squares regression. Anal. Chim. Acta 593 (1), 68–74.