Chemometrics and Intelligent Laboratory Systems 139 (2014) 58–63
Contents lists available at ScienceDirect
Chemometrics and Intelligent Laboratory Systems journal homepage: www.elsevier.com/locate/chemolab
Software Description
A MATLAB toolbox for class modeling using one-class partial least squares (OCPLS) classifiers Lu Xu a,⁎, Mohammad Goodarzi b, Wei Shi a, Chen-Bo Cai c,⁎, Jian-Hui Jiang d,⁎ a
College of Material and Chemical Engineering, Tongren University, Tongren, 554300, Guizhou, PR China Department of Biosystems, Faculty of Bioscience Engineering, Katholieke Universiteit Leuven, Kasteelpark Arenberg 30, B-3001, Leuven, Belgium College of Chemistry and Life Science, Chuxiong Normal University, Chuxiong 675000, PR China d State Key Laboratory of Chemo/BioSensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha 410082, PR China b c
a r t i c l e
i n f o
Article history: Received 25 June 2014 Received in revised form 12 September 2014 Accepted 16 September 2014 Available online 23 September 2014 Keywords: MATLAB toolbox Class modeling One-class partial least squares (OCPLS) classifiers Nonlinear and robust algorithms Fault diagnosis
a b s t r a c t One-class classifiers are widely used to solve the classification problems where control or class modeling of a target class is necessary, e.g., untargeted analysis of food adulterations and frauds, tracing the origins of a food with Protected Denomination of Origin, fault diagnosis, etc. Recently, one-class partial least squares (OCPLS) has been developed and demonstrated to be a useful technique for class modeling. For analysis of nonlinear and outlier-contaminated data, nonlinear and robust OCPLS algorithms are required. This paper describes a free MATLAB toolbox for class modeling using OCPLS classifiers. The toolbox includes ordinary, nonlinear and robust OCPLS methods. The nonlinear algorithm is based on the Gaussian radial basis function (GRBF), and the robust algorithm is based on the partial robust M-regression (PRM). The usage of the toolbox is demonstrated by analysis of a real data set. © 2014 Elsevier B.V. All rights reserved.
1. Introduction It was recognized by one of the founders of chemometrics, Kowalski, and Bender [1], that “whenever something must be learned from objects (elements, compounds, and mixtures) and a chemical/physical theory has not been sufficiently developed, pattern recognition may provide a solution.” The above viewpoint has been fully proved by various applications of pattern recognition techniques in order to understand complex objects in chemistry [2–4]. Besides the commonly used multi-class classification or discriminant analysis (DA) techniques, recently the so-called one-class classifiers [5–7] or class modeling techniques (CMTs) [8–16] have attracted much attention and the difference between DA and CMTs has been discussed by some authors [7,13–15]. While DA aims at classifying two or more predefined classes [17], CMTs are especially useful when it is necessary to define or model the range of a target class. Some typical problems which require the use of CMTs include untargeted detection of food adulterations or frauds, tracing the geographical origins of protected denomination of origin (PDO) foods [18,19], fault diagnosis, etc. Some usually used CMTs may include the following: (1) soft independent modeling of class analogy (SIMCA) [8] using principal component analysis (PCA); (2) unequal dispersed classes (UNEQ) [9,10] based ⁎ Corresponding authors. Tel.: +86 856 5222556; fax: +86 856 5230977. E-mail addresses:
[email protected] (L. Xu),
[email protected] (C.-B. Cai),
[email protected] (J.-H. Jiang).
http://dx.doi.org/10.1016/j.chemolab.2014.09.005 0169-7439/© 2014 Elsevier B.V. All rights reserved.
on the hypothesis of multivariate normal distribution and the Hotelling's T2 test; (3) potential functions methods [20] by estimation of the multivariate probability distribution; and (4) those based on artificial neural networks (ANNs) and support vector machines (SVMs) [5,21,22]. The most popular CMTs are SIMCA and PCA-related techniques [23–25], which is especially useful in chemometrics by extracting a few of primary and informative components or latent variables (LVs). Partial least squares (PLS), as one of the cornerstones of chemometrics, has been widely used to solve both regression and classification problems. The rationale of PLS-DA has been demonstrated by the relationship among PLS, canonical correlation analysis (CCA) and linear discriminant analysis (LDA) [26]. Recently, one-class partial least squares (OCPLS) or PLS class model (PLSCM) [27] has been proposed and demonstrated to be an effective tool for class modeling. Unlike SIMCA, whose components explain most of the data variances, OCPLS components consider simultaneously the explained variances and the compactness of the target class. Moreover, OCPLS can be performed as a special PLS regression and works in the framework of Multivariate Calibration. Practical data analysis sometimes encounters nonlinear and outliercontaminated data sets, which would cause bias or even breakdown in estimation of OCPLS parameters. Therefore, it is necessary to develop nonlinear and robust algorithms for OCPLS. This paper describes a free MATLAB toolbox for OCPLS, including the ordinary linear, nonlinear Gaussian radial basis function (RBF) OCPLS (GRBF-OCPLS) and robust
L. Xu et al. / Chemometrics and Intelligent Laboratory Systems 139 (2014) 58–63
59
OCPLS using partial robust M-regression (PRM) [28,29]. A real data set of fruit juice is used to demonstrate the usage of the toolbox. The software can be downloaded from http://www.tinyupload.org/ q3fvf84wj6g or requested from the corresponding authors.
response outliers (with a small SD and a large ACR) and bad leverage objects (with a large SD and a large ACR). For multivariate statistical quality control (MSQC), good leverage objects, response outliers, and bad leverage objects can be detected as different types of outliers.
2. Methods
2.2. Nonlinear Gaussian RBF-OCPLS (GRBF-OCPLS)
2.1. Ordinary OCPLS
As a commonly used nonlinear RBF, the Gaussian RBF is used to develop a nonlinear OCPLS model. The Gaussian RBF transformation is placed at the position of the training objects and the number of RBFs equals the number of training objects [32,33]. For the RBFtransformed matrix (n by n), the value of the jth ( j = 1, 2, 3 …, n) feature for the ith (i = 1, 2, 3 …, n) object is computed as
Suppose X (n by p) contains n objects with p feature variables from the target class to be controlled, OCPLS performs a special PLS regression as 1 ¼ XbPLS þ e
ð1Þ
where 1 is the response vector (n by 1) with all elements being units (1s), bPLS (p by 1) contains the PLS regression coefficients and the vector e (n by 1) contains model residuals. Note variables or features in X should NOT be centered; otherwise, all the variables in X would be orthogonal to 1. The above OCPLS model is computed using the SIMPLS algorithm [30]. The number of primary LVs or components can be estimated by cross validation (CV). When an OCPLS model is built, two types of distance measures, the Hotelling's T2 based on score distance (SD) and the absolute centered residual (ACR) of responses, can be derived and computed as ^ j −μ^ e ACR ¼ 1−y
ð2Þ
and 2
T ¼
K X ti −ti 2
ð3Þ
s2t;i
i¼1
^ j is the fitted response of object j and μ^ e is the mean of training where y errors in Eq. (1); ti and s2t,i are the mean and sample variance of the ith LV and ti, respectively; and K is the number of significant LVs. The ACR can be assumed to have a normal distribution with a mean of 0. The standard deviation of model residual can be estimated by CV: vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u N uX ^e ¼ t ^i −μ^ e Þ2 =ðN−1Þ ð1−y σ
ð4Þ
0
2 1 x −x j B i C ai j ¼ exp @− A σ2
ð7Þ
where ‖ ‖ is the Euclidean distance and σ is the kernel width. The number of OCPLS LVs and the magnitude of σ 2 can be estimated simultaneously by examining the predicted residuals obtained by CV. 2.3. Robust OCPLS using PRM (PRM-OCPLS) Partial robust M-regression (PRM) [28,29] has been proved to be an effective and reliable robust PLS algorithm by downweighting both orthogonal and leverage objects. For PLS regression, the objectweighting strategy of PRM cannot only obtain robust regression coefficients but can also improve the model accuracy for noisy calibration objects. However, for one-class classifiers, the difference or diversity among regular objects is expected and should be allowed to cover the range of variations within the same group. Our major concern with reweighting of the normal training objects is that it would make the representativity of objects and estimation of model parameters more complicated. Therefore, in this toolbox, robust OCPLS algorithm is developed by using PRM to detect the orthogonal and leverage outliers and developing an ordinary OCPLS model with outliers removed. Given a cutoff value of the percent of outliers, PRM is performed using all the training data and the objects with the least weights by PRM are detected as outliers. 3. MATLAB functions and an example of analysis
i¼1
^i is the where N is the total number of left-out objects during CV and y predicted response of the ith left-out object. Given a confidence level, α, the upper confidence limits (UCLs) for ACR and T2 can be derived as ^e ACRUCL ¼ Z α=2 σ
ð5Þ
and 2 T UCL
¼
n2 −1 K nðn−K Þ
F αðK;n−K Þ
ð6Þ
3.1. Data set The fruit juice data set [34] contains 69 pure juice objects with the contents of 11 amino acids. As suggested by the original source, the first 50 pure objects are used for training OCPLS models, the remaining 19 objects are used for test. Moreover, the assay values of the last 19 pure objects are multiplied by (1-D) to simulate dilution by water. In this work, the D is 0.3, meaning 30% (w/w) of adulteration using water. Therefore, we have 38 objects (19 pure and 19 adulterated) for prediction. All the 11 feature variables are scaled to be in [0, 1]. 3.2. MATLAB version and description of functions
where Zα/2 is the upper (100α/2)% critical point of the standard normal distribution, and Fα(K,n − K) is the upper 100α% critical point of the F-distribution with (K, n – K) degrees of freedom [31]. SD is a measure of the distance from an object to the center of the class in the space spanned by the primary OCPLS LVs, while ACR can be seen as the dispersion measure of the projection onto the vector of OCPLS regression coefficients. An excessive large value of SD or ACR will indicate that an object has deviated from the bulk of the class. According to the values of ACR and T2, a test object can be assigned to one of the four groups: regular or normal objects (with a small SD and a small ACR), good leverage objects (with a large SD and a small ACR),
The MATLAB codes in the toolbox are written and tested with MATLAB R2010a (Mathworks, Sherborn, MA). Descriptions of the MATLAB functions are presented in Table 1. In the toolbox, three M-files of examples about how to call the functions are also included. 3.3. Preprocessing and rescaling of data OCPLS requires the input training data are NOT variable-centered by 0. Traditional data preprocessing methods can be performed when necessary. This toolbox does not include any data preprocessing except
60
L. Xu et al. / Chemometrics and Intelligent Laboratory Systems 139 (2014) 58–63
MCCV of OCPLS
Table 1 The MATLAB functions included in the toolbox. Description
dist1 mccv mccvgrbf ocpls optrescale
Calculate the matrix of squared distances MCCV for ordinary (linear) OCPLS MCCV for nonlinear GRBF-OCPLS Develop ordinary (linear) OCPLS model Rescale the original variables to be in [0,1] or to have unit lengths (without centering with zero) To develop robust PRM-OCPLS model Traditional leave-K-out cross validation for ordinary (linear) OCPLS Traditional leave-K-out cross validation for the nonlinear GRBF-OCPLS Training and/or prediction of the nonlinear GRBF-OCPLS Training and/or prediction of ordinary (linear) OCPLS Training and/or prediction of the robust PRM-OCPLS Estimate the parameters of nonlinear GRBF-OCPLS by cross validation Estimate the parameters of ordinary (linear) OCPLS by cross validation Estimate the parameters of robust PRM-OCPLS by cross validation
prmocpls tcv tcvgrbf tpgrbfocpls tpocpls tpprmocpls tunegrbfocpls tuneocpls tuneprmocpls
0.28
Standard deviation of residual
MATLAB functions
0.285
0.275 0.27 0.265 0.26 0.255 0.25 0.245 0.24 0.235
1
2
3
4
5
6
7
8
9
10
Number of LVs
Training of OCPLS (3 LVs) 0.8
a function “optrescale” for rescaling the raw variables to be in [0, 1] or to have unit lengths (not centered by 0) optionally.
0.7 0.6
Two functions can be used successively to develop an ordinary OCPLS model. The function “tuneocpls” can be used to perform the traditional leave-K-out CV and Monte Carlo cross validation (MCCV) [35] of OCPLS. The output of “tuneocpls” is a vector containing the standard deviations of model residual estimated by CV or MCCV with different numbers of LVs. The graphical output is shown in Fig. 1a for the juice data set. The model residuals are also necessary to define the UCL of ACR. As seen from Fig. 1a, one can estimate and select the number of primary LVs. It is obvious that the first 3 LVs are significant. With standard deviations of model residual and the selected number of LVs, one can run the function “tpocpls” to train the model and predict the test objects. Besides the graphical results as in Fig. 1b and c, the following data outputs can be obtained by performing the function “tpocpls”:
0.5
ACR
3.4. Ordinary OCPLS
0.3 0.2 0.1 0
3.5. Nonlinear GRBF-OCPLS The nonlinear GRBF-OCPLS model can also be developed by calling two functions. All the raw variables are rescaled into [0, 1] with
0
5
10
15
Score distance
Prediction of OCPLS (3 LVs)
NNUCL.SD, a scalar of the UCL of the SD;
1 0.9 0.8 0.7 0.6
ACR
NNUCL.ACR, a scalar of the UCL of the ACR; NNTresults. SD, a vector of the SD of the training objects; NNTresults. ACR, a vector of the ACR of the training objects; NNTresults.SDclass, a vector of the class labels (0/1) of the training objects by SD; NNTresults.ACRclass, a vector of the class labels (0/1) of the training objects by ACR; NNTresults.Tclass, a vector of the class labels (0/1) of the training objects considering both SD and ACR; NNPresults. SD, a vector of the SD of the prediction objects; NNPresults. ACR, a vector of the ACR of the prediction objects; NNPresults.SDclass, a vector of the class labels (0/1) of the prediction objects by SD; NNPresults.ACRclass, a vector of the class labels (0/1) of the prediction objects by ACR; NNPresults.Tclass, a vector of the class labels (0/1) of the prediction objects considering both SD and ACR.
0.4
0.5 0.4 0.3 0.2 0.1 0
0
5
10
15
Score distance Fig. 1. The cross validation, training and predictions of ordinary OCPLS for the fruit juice data. (For interpretation of the references to colour in this figure, the reader is referred to the web version of this article.)
L. Xu et al. / Chemometrics and Intelligent Laboratory Systems 139 (2014) 58–63
NNerrsm, a matrix of the standard deviations of model residual using different values of σ 2 and numbers of LVs; NNerrs, a vector of the lowest standard deviations of model residual in the screened range of σ 2 with different numbers of LVs; NNwid, a vector of the values of σ2 that obtain the lowest model residuals with different numbers of LVs.
MCCV of GRBF-OCPLS
0.035 σ2=10 0.034
Standard deviation of residual
reference to the training set before RBF transformation. By running the function “tunegrbfocpls,” traditional CV or MCCV can be performed to select the proper number of LVs and the value of Gaussian kernel width, σ 2, in Eq. (7). A plot like Fig. 2a can be obtained, where the values of σ 2 among the screened interval that obtain the lowest model residuals are labeled with different numbers of LVs. Besides the graphical results, the following outputs can be obtained by performing the function “tunegrbfocpls”:
61
0.033
σ2=10
0.032 0.031 0.03
σ2=10
2 σ2=10 σ =10
0.026
σ2=10
σ2=10
σ2=10
0.028 0.027
With the selected number of LVs and value of σ 2, as well as the estimated standard deviations of model residual, one can run the function “tpgrbfocpls” to train the model and predict the test objects. The graphical results for training and prediction of the juice data are shown in Fig. 2b and c (5 LVs and σ 2 = 10), respectively. The data outputs are the same as for the ordinary OCPLS.
σ2=10
0.029
σ2=10
1
2
3
4
5
6
7
8
9
10
Number of LVs
Training of GRBF-OCPLS (5 LVs,σ2=10) 0.09 0.08 0.07
3.6. Robust PRM-OCPLS
ACR
0.06
Two functions can be called to perform robust PRM-OCPLS. First, given the percents of outliers (0.1 as default), the function “tuneprmocpls” can be used to detect the outliers by PRM and perform traditional CV or MCCV only using the regular objects. Fig. 3a demonstrates the MCCV of OCPLS with outliers removed. The data outputs of the function “tuneprmocpls” are as follows:
0.04 0.03 0.02
NNoutls, a matrix with elements of 0/1 to indicate regular objects and outliers by PRM using different numbers of LVs;
0.01 0
NNerrs, a vector of the standard deviations of model residual by ordinary OCPLS (with outliers removed) with different numbers of LVs by CV or MCCV;
It is demonstrated that the toolbox is very easy to use. For each of the three algorithms, namely, ordinary (linear) OCPLS, nonlinear GRBF-OCPLS and robust PRM-OCPLS, calling two functions is sufficient to tune and train the models, and to predict the test objects. Some remarks can be made on the toolbox. (1) The features or predictor variables should NOT be centered by 0; other rescaling or preprocessing is optional. (2) When using the function “tunegrbfocpls” to select the parameters for the nonlinear GRBF-OCPLS, different ranges of σ 2 with proper steps (e.g., [0.01:0.01:1], [1:1:100], etc.) can be screened by running the function “tunegrbfocpls” repeatedly. Moreover, in order to select the proper parameters, besides the plot as shown in Fig. 2a, one can also examine the elements of “errsm,” to see the model residuals with different combinations of LVs number and σ 2. Sometimes it may need some experience to select the best parameters. One should avoid selecting an excessively large or small value of σ 2, which would cause overfitting of OCPLS, because a too large/small σ 2 for the Gaussian function will obtain responses that excessively approach 1/0, respectively.
0
5
10
15
20
Score distance
Prediction of GRBF-OCPLS (5 LVs,σ2=10)
After running the function “tuneprmocpls,” outliers can be detected. The function “tpprmocpls” can be run to train an ordinary OCPLS model using a training set without outliers and to predict the test objects. The graphical results for training and prediction of the juice data by PRM-OCPLS are shown in Fig. 3b and c, respectively. The data outputs of the function “tpprmocpls” are the same as for ordinary OCPLS.
0.2
0.15
ACR
4. Discussions and remarks
0.05
0.1
0.05
0
0
5
10
15
20
25
30
Score distance Fig. 2. The cross validation, training and predictions of nonlinear GRBF-OCPLS for the juice data. (For interpretation of the references to colour in this figure, the reader is referred to the web version of this article.)
This is not desirable for class modeling because it means overfitting and all the objects is too close to each other. (3) When outliers exist, robust PRM-OCPLS should be used. Based on the evaluation of the uncertainty in the data, it is suggested that
62
L. Xu et al. / Chemometrics and Intelligent Laboratory Systems 139 (2014) 58–63
one should assume a proper percent (not too large) of outliers to avoid excluding too many regular objects when running the functions “tuneprmocpls” and “tpprmocpls.” (4) For the ordinary, nonlinear and robust OCPLS algorithms, the sensitivity and specificity of prediction can be estimated using the output “Presults.Tclass” if the real labels of prediction objects are known.
MCCV of OCPLS (outliers removed) 0.24
Standard deviation of residual
0.235 0.23 0.225
5. Independent testing 0.22
5.1. Test result 1
0.215 0.21 0.205
1
2
3
4
5
6
7
8
9
10
Number of LVs
My research group and I have tested the toolbox for class modeling using one-class partial least squares (OCPLS) classifiers in several data sets. The OCPLS is a powerful tool for classification problem. The MATLAB routines that compose the toolbox are easy to use and generate many graphic outputs useful to the user. Dr. Adriano A. Gomes, Dr. Licarion Pinto and Dr. Mário Cesar Ugulino de Araújo, Universidade Federal da Paraíba, CCEN, Departamento de Química, Caixa Postal 5093, CEP 58051‐970, João Pessoa, PB, Brazil.
Training of OCPLS (2 LVs, outliers removed) 0.6
5.2. Test result 2 During the last weeks, we have carefully tested thorough the MATLAB routines. The OCPLS method and its modified versions indeed represent a valuable tool in the field. As for the MATLAB routines, we found them satisfactorily user-friendly and with comprehensive help comments. Dr. Paolo Oliveri, Department of Pharmacy, University of Genoa, Via Brigata Salerno 13, 16147 Genoa, Italy. Dr. M. Isabel López, Chemometrics, Qualimetrics and Nanosensors Group, Department of Analytical and Organic Chemistry, Rovira i Virgili University, Marcel·lí Domingo s/n, 43007 Tarragona, Spain.
0.5
ACR
0.4
0.3
0.2
0.1
0
Conflict of interest statement The authors declare no conflicts of interest. 0
1
2
3
4
5
6
7
8
9
10
Acknowledgment
Score distance
This work was financially supported by Guizhou Provincial Department of Education 125 Plan Major Project No. QJH2013(027). The authors are in great debt to Dr. S. Serneels and Dr. C. Croux for their generosity to allow us to adjust their MATLAB code for PRM and include it in this toolbox to perform robust OCPLS. We are also grateful to Dr. Paolo Oliveri, Dr. M. Isabel López, Dr. Adriano A. Gomes, Dr. Licarion Pinto and Dr. Mário Cesar Ugulino de Araújo, as well as other reviewers for testing the MATLAB codes and reviewing the manuscript.
Prediction of PRM-OCPLS (2 LVs) 1 0.9 0.8 0.7
ACR
0.6
References
0.5 0.4 0.3 0.2 0.1 0
0
2
4
6
8
10
12
14
16
Score distance Fig. 3. The cross validation, training and predictions of robust PRM-OCPLS for the juice data. (For interpretation of the references to colour in this figure, the reader is referred to the web version of this article.)
[1] B.R. Kowalski, C.F. Bender, Solving chemical problems with pattern recognition, Naturwissenschaften 62 (1975) 10–14. [2] G. Downey, Food and food ingredient authentication by mid-infrared spectroscopy and chemometrics, TrAC Trends Anal. Chem. 17 (1998) 418–424. [3] B.K. Lavine, N. Mirjankar, R.K. Vander Meer, Chemometrics, Anal. Chem. 85 (2011) 1308–1316. [4] B.K. Lavine, J. Workman, Chemometrics, Anal. Chem. 83 (2013) 705–714. [5] D.M.J. Tax, One-Class Classification, Delft University of Technology, Delft, The Netherlands, 2001. (Ph.D. Thesis). [6] S. Kittiwachana, D.L.S. Ferreira, G.R. Lloyd, L.A. Fido, D.R. Thompson, R.E.A. Escott, R.G. Brereton, One class classifiers for process monitoring illustrated by the application to online HPLC of a continuous process, J. Chemom. 24 (2010) 96–110. [7] R.G. Brereton, One-class classifiers, J. Chemom. 25 (2011) 225–246. [8] S. Wold, Pattern recognition by means of disjoint principal components models, Pattern Recogn. 8 (1976) 127–139. [9] M.P. Derde, D.L. Massart, UNEQ: a disjoint modelling technique for pattern recognition based on normal distribution, Anal. Chim. Acta. 184 (1986) 33–51. [10] M.P. Derde, D.L. Massart, Comparison of the performance of the class modeling techniques UNEQ, SIMCA, and PRIMA, Chemom. Intell. Lab. Syst. 4 (1988) 65–93.
L. Xu et al. / Chemometrics and Intelligent Laboratory Systems 139 (2014) 58–63 [11] M.P. Derde, L. Kaufman, D.L. Massart, A non-parametric class modeling technique, J. Chemom. 3 (1989) 375–395. [12] M. Forina, S. Lanteri, L. Sarabia, Distance and class space in the UNEQ classmodeling technique, J. Chemom. 9 (1995) 69–89. [13] E.K. Kemsley, Discriminant Analysis and Class Modelling of Spectroscopic Data, Wiley, Chichester, 1998. [14] M. Forina, P. Oliveri, S. Lanteri, M. Casale, Class-modeling techniques, classic and new, for old and new problems, Chemom. Intell. Lab. Syst. 93 (2008) 132–148. [15] M. Forina, P. Oliveri, M. Casale, S. Lanteri, Multivariate range modeling, a new technique for multivariate class modeling: the uncertainty of the estimates of sensitivity and specificity, Anal. Chim. Acta. 622 (2008) 85–93. [16] M. Forina, M. Casale, P. Oliveri, S. Lanteri, CAIMAN brothers: a family of powerful classification and class modeling techniques, Chemom. Intell. Lab. Syst. 96 (2009) 239–245. [17] M. Goodarzi, W. Saeys, M.C.U. de Araujo, R.K.H. Galvão, Y. Vander Heyden, Binary classification of chalcone derivatives with LDA or KNN based on their antileishmanial activity and molecular descriptors selected using the successive projections algorithm feature-selection technique, Eur. J. Pharm. Sci. 51 (2014) 189–195. [18] P. Oliveri, V. Di Egidio, T. Woodcock, G. Downey, Application of class-modelling techniques to near infrared data for food authentication purposes, Food Chem. 125 (2011) 1450–1456. [19] P. Oliveri, G. Downey, Multivariate class modeling for the verification of foodauthenticity claims, TrAC Trends Anal. Chem. 35 (2012) 74–86. [20] M. Forina, C. Armanino, R. Leardi, G. Drava, A class-modelling technique based on potential functions, J. Chemom. 5 (1991) 435–453. [21] F. Marini, A.L. Magrì, R. Bucci, Multilayer feed-forward artificial neural networks for class modeling, Chemom. Intell. Lab. Syst. 88 (2007) 118–124. [22] F. Marini, J. Zupan, A.L. Magrì, Class-modeling using Kohonen artificial neural networks, Anal. Chim. Acta. 544 (2005) 306–314.
63
[23] M. Hubert, P.J. Rousseeuw, S. Verboven, A fast method for robust principal components with applications to chemometrics, Chemom. Intell. Lab. Syst. 60 (2002) 101–111. [24] M. Hubert, P. Rousseeuw, T. Verdonck, Robust PCA for skewed data and its outlier map, Comput. Stat. Data Anal. 53 (2009) 2264–2274. [25] I. Stanimirova, B. Walczak, D.L. Massart, V. Simeonov, A comparison between two robust PCA algorithms, Chemom. Intell. Lab. Syst. 71 (2004) 83–95. [26] M. Barker, W. Rayens, Partial least squares for discrimination, J. Chemom. 17 (2003) 166–173. [27] L. Xu, C.B. Cai, D.H. Deng, Multivariate quality control solved by one-class partial least squares regression: identification of adulterated peanut oils by mid-infrared spectroscopy, J. Chemom. 25 (2011) 568–574. [28] S. Serneels, C. Croux, P. Filzmoser, P.J. Van Espen, Partial robust M-regression, Chemom. Intell. Lab. Syst. 79 (2005) 55–64. [29] M. Daszykowski, S. Serneels, K. Kaczmarek, P. Van Espen, C. Croux, B. Walczak, TOMCAT: A MATLAB toolbox for multivariate calibration techniques, Chemom. Intell. Lab. Syst. 85 (2007) 269–277. [30] S. de Jong, SIMPLS: An alternative approach to partial least squares regression, Chemom. Intell. Lab. Syst. 18 (1993) 251–263. [31] T. Kourti, J.F. MacGregor, Multivariate SPC methods for process and product monitoring, J. Qual. Technol. 28 (1996) 409–428. [32] B. Walczak, D.L. Massart, The radial basis functions—partial least squares approach as a flexible non-linear regression technique, Anal. Chim. Acta. 331 (1996) 177–185. [33] B. Walczak, D.L. Massart, Application of radial basis functions—partial least squares to non-linear pattern recognition problems: diagnosis of process faults, Anal. Chim. Acta. 331 (1996) 187–193. [34] C. Fuchs, R.S. Kenett, Multivariate quality control: theory and applications, Marcel Dekker, Inc., New York, 1998. [35] Q.S. Xu, Y.Z. Liang, Monte Carlo cross validation, Chemom. Intell. Lab. Syst. 56 (2001) 1–11.