Designing classification filters for integrated sensing and processing using optimal discriminant vectors

Designing classification filters for integrated sensing and processing using optimal discriminant vectors

Chemometrics and Intelligent Laboratory Systems 149 (2015) 22–27 Contents lists available at ScienceDirect Chemometrics and Intelligent Laboratory S...

813KB Sizes 0 Downloads 42 Views

Chemometrics and Intelligent Laboratory Systems 149 (2015) 22–27

Contents lists available at ScienceDirect

Chemometrics and Intelligent Laboratory Systems journal homepage: www.elsevier.com/locate/chemolab

Designing classification filters for integrated sensing and processing using optimal discriminant vectors Yan Liu, Wensheng Cai, Xueguang Shao ⁎ Research Center for Analytical Sciences, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), Tianjin 300071, PR China

a r t i c l e

i n f o

Article history: Received 17 July 2015 Received in revised form 25 September 2015 Accepted 27 September 2015 Available online 3 October 2015 Keywords: Integrated sensing and processing Near-infrared spectroscopy Discriminant vectors Optical filters Classification

a b s t r a c t Integrated sensing and processing (ISP) is a new strategy for instrument design to simplify quantitative or qualitative analysis. One of the ISP approaches is processing the optical spectrum with filters to obtain analytical results directly. ISP filters based on optimal discriminant vectors are designed in this study for the problem of classification. The method starts with performing principal component analysis (PCA) on the spectra of multiclass samples, and then constructs the optimal orthogonal discriminant vectors using the PCA scores by maximizing Fisher's discriminant function. Therefore, the filters for discriminating the samples can be obtained by transforming the loadings with the discriminant vectors. Applying the filters onto the spectra of new samples, the difference between samples of different class can be obtained and the difference can be used for discrimination of these samples. NIR datasets of vitamins, cephalosporins and Chinese patent medicines are used to test the performance of the filters. The results show that, for each of the datasets, the samples of different class can be correctly identified. © 2015 Elsevier B.V. All rights reserved.

1. Introduction Integrated sensing and processing (ISP) are receiving more and more attention for designing new instruments to simplify the procedures of spectral analysis. ISP aims to design and optimize sensing systems that integrate the traditionally independent units of sensing, signal processing, communication and modeling. By employing ISP, the quantitative or qualitative analysis within traditional sensing system can be achieved. One of the ISP approaches involves processing the optical spectrum with filters to calculate the analytical results about the samples in sensing stage directly [1]. Hieftje et al. reported that an organic solvent can be used as a filter for quantitative determination of the trace substances in solutions [2–4]. Myrick et al. designed a pair of optical filters to simulate the regression vector produced by PCA. Light passing through the paired filters produces an analog detector signal that is directly proportional to the chemical or physical property [5]. In later works, the design was improved and used in UV-visible and NIR spectroscopy [6–9]. The key of the design is based on the solidstate optical filters fabricated with thin films, termed multivariate optical elements (MOEs). On the other hand, Lodder et al. demonstrated that absorption filters can be constructed with different molecules and used as mathematical factors to generate a factor-analytic optical

⁎ Corresponding author at: College of Chemistry, Nankai University, Tianjin 300071, PR China. Tel.: +86 22 23503430, fax: +86 22 23502458. E-mail address: [email protected] (X. Shao).

http://dx.doi.org/10.1016/j.chemolab.2015.09.015 0169-7439/© 2015 Elsevier B.V. All rights reserved.

calibration in a high-throughput spectrometer, termed molecular factor computing (MFC) [10–12]. For either MFC or MOEs, a digital filter that describes the transmittance curve of the filter should be obtained before making the real filter. Although studies on MFC and MOEs were reported, few methods on designing the digital filter were proposed. In order to design the digital filter, Small et al. constructed multivariate calibration models through using Gaussian basis functions to extract relevant information from single-beam spectral data [13]. The widths and peak positions of the basis functions were optimized by genetic algorithm (GA). The method was used for quantitative analysis of glucose in mimic biological samples, and the results were found to be better than partial least-squares regression (PLSR). Fisher linear discriminant analysis has been the widely used method to design filters for discriminant analysis. The method can effectively classify the samples from different classes [14]. It consists of a series of non-orthogonal discriminant vectors. Okada et al., however, have shown that orthogonal vectors are more powerful than the classical non-orthogonal ones, in terms of both discriminant ratio and mean error probability [15]. As a result, the optimal discriminant vectors are developed as an improvement of Fisher linear discriminant analysis [16–18]. The method may be a good choice to design the filters in MFC or MOEs. Near-infrared (NIR) spectroscopy is a rapid and nondestructive analytical technique and has been widely used in food, agriculture, industrial production etc. The weak signals, broad and overlapped peaks of near-infrared spectra, however, contain not only chemical information

Y. Liu et al. / Chemometrics and Intelligent Laboratory Systems 149 (2015) 22–27

23

Table 1 Information of samples in the datasets. Data

Medicine

Manu.

Label

Num.

Cal.

Pred.

1

Vitamin B1 Vitamin B1 Vitamin B2 Vitamin B2 Vitamin B4 Vitamin B12 Vitamin C Vitamin C Total Cefaclor Cefixime Cefixime Cefalexin Cefalexin Cefalexin Total CPM1 CPM1 CPM1 CPM2 CPM2 CPM3 CPM3 Total

1 2 3 4 5 6 7 8

Va1 Va2 Vb3 Vb4 Vc5 Vd6 Ve7 Ve8

1 2 3 4 5 6

Ca1 Cb2 Cb3 Cc4 Cc5 Cc6

1 2 3 4 5 6 7

Pa1 Pa2 Pa3 Pb4 Pb5 Pc6 Pc7

18 18 18 18 18 18 18 18 144 18 36 19 24 18 18 133 45 36 35 43 48 31 53 291

12 12 12 12 12 12 12 12 96 12 24 12 16 12 12 88 30 24 23 28 32 20 35 192

6 6 6 6 6 6 6 6 48 6 12 7 8 6 6 45 15 12 12 15 16 11 18 99

2

3

Fig. 2. Scores of the calibration samples in LD1–LD2 subspace.

but also noise and variant background. Therefore, to obtain an exact result of quantitative or qualitative analysis from the spectra, a reliable multivariate calibration model is indispensable [19,20]. The most commonly used methods to construct a multivariate calibration model, for quantitative and qualitative analyses, respectively, are partial least

Fig. 1. Scores of the calibration samples in PC1–PC2 (a), PC3–PC4 (b), PC5–PC6 (c), PC7–PC8 (d), PC9–PC10 (e) and PC11–PC12 (f) subspaces.

24

Y. Liu et al. / Chemometrics and Intelligent Laboratory Systems 149 (2015) 22–27

2. Theory and calculation The basic principle of ISP can be summarized as follows. A mixed light beam passes through sequentially the sample cell, the MFC or MOE filter and then the detector. The detected signal is a sum of the light at different wavelengths that pass through the sample and the filter. When the filter is well designed, only the light at selective wavelength can be detected, and thus the detected signal is the final result of the analysis. The process is an equivalence of the projection of the spectrum onto the filter, which can be described by y¼

N X

ð1Þ

xi f i

i¼1

Fig. 3. Mean spectra of the eight class medicines and the coefficients of the filters.

squares (PLS) and principal component analysis (PCA). Different methods were developed to improve the performance of the two methods. For instance, chemometric approaches are used to remove noise and variant background of the spectra. Denoising by Savitzky– Golay (SG) smoothing, Fourier transform and wavelet transform [21, 22] have been investigated. Various methods for correcting or removing the background variation have been developed, such as multiplicative scatter correction (MSC), standard normal variate (SNV) and wavelet transform [23,24]. A large number of methods for variable selection have been proposed to simplify and improve the PLS and other multivariate analysis models [25–37]. For discrimination or classification problem, soft independent modeling of class analogies (SIMCA) [38,39], kernel PCA [40], robust PCA [41], and principal component accumulation method [42,43] are adopted to improve the classification results of PCA. In MFC or MOEs, physical filters are used to process the spectrum and obtain the results of quantitative or qualitative analysis. Compared with the physical filters, chemometric models are more useful when updates to the filter are necessary. On the other hand, physical filters can be designed according to the models of these methods. In this work, the filters for ISP are designed based on the optimal discriminant vectors. By passing the NIR spectra of the samples through the designed filters, the drugs of different classes can be classified directly. The results of three datasets indicate that the filters provide an efficient way for classification problems and can be used in designing the ISP instruments.

where xi and fi are the intensity of the spectrum and the coefficient of the filter at ith wavelength, respectively. N is the number of wavelength and y is the response of the detector. For quantitative analysis, the detected response ( y) will be proportional to the content of the analyzing target, and for classification, the value will be a parameter that determines the class of the sample. On the other hand, because NIR spectroscopy are commonly composed of weak, broad and overlapping peaks and contain a large amount of redundant information, the spectra can be represented by a series of scores in the domain of principal components (PCs) to reduce the dimension of the spectra by principal component analysis (PCA). The spectra X can be decomposed onto scores and loadings by, X ¼ TP

ð2Þ

where T is the scores of the spectra and P is the loadings of the PC spaces, respectively. Because the major information of the spectra is generally contained in the scores of former PCs, the first m scores can be used for classification. To further improve the effect of the classification, the optimal set of discriminant vectors dk can be obtained by optimizing Eqs. (3)–(5), T

arg max dk

T

dk Bdk

!

T

dk Wdk 

dk1 dk2 ¼

1ðk1 ¼ k2Þ ðk1; k2 ¼ 1; 2; …; nÞ 0ðk1≠k2Þ

T

dk Wdk ¼ 1

ð3Þ

ð4Þ ð5Þ

where B and W is the between-class covariance and the within-class covariance calculated from the first m scores, respectively. Eq. (4) is used to guarantee the discriminant vectors to be orthogonal to each other, and Eq. (5) is used to normalize the discriminant vectors. Therefore, the function of the discriminant vectors is to obtain the best separation between the samples of different classes in PC (score) spaces. Because two discriminant vectors can generally include enough information for classification, the first two discriminant vectors (k = 1 and 2) are used in this study.

Fig. 4. Transformed scores of the calibration samples obtained with the filters constructed with the loadings of PC1 − 4 (a), PC1 − 8 (b) and PC1 − 12 (c).

Y. Liu et al. / Chemometrics and Intelligent Laboratory Systems 149 (2015) 22–27

25

After the discriminant vectors are obtained, the loadings (P1…m) can be further transformed by f k ¼ P1…m dk

ð6Þ

and fk can be used as conceptual optical filters because applying fk to a spectrum obtains the transformed scores of the spectrum, which can be used for classification or discrimination. ynew;k ¼

N X

x new;i f k;i ¼ x new P1…m dk

ð7Þ

i¼1

Therefore, with the two filters (k = 1 and 2), two values of ynew can be obtained and used for classification. ynew can be known as transformed scores of PCA. If a model is built with the transformed scores of the calibration spectra, discrimination can be achieved. On the other hand, if the filters are used in an instrument of MFC or MOE, the two values of ynew can be obtained directly in the measurement. 3. Data description Three NIR spectral datasets were used in this study. Dataset 1 includes 144 NIR spectra of vitamin tablets with no cover material in eight classes. Dataset 2 includes 133 NIR spectra of cephalosporin capsules in six classes. Dataset 3 includes 192 NIR spectra of Chinese patent medicines (CPM) tablets with sugarcoat in seven classes. A class means the samples of one medicine made in one manufacturer. In the following description, ‘V’, ‘C’ and ‘P’ followed by a lowercase and a number were used to define the classes, where the lowercase represent the subtitle of the medicine and the number denotes the manufacturer. For example, Va1 denotes the class of vitamin tablet with a subtitle produced in manufacturer ‘1’ and Pc7 denotes the class of CPM with a subtitle produced in manufacturer ‘7’. The information of the samples in the three datasets is summarized in Table 1. Before the collection, the tablets of datasets 1 and 3 were polished in order to remove the impurity and sugarcoat, respectively. As to the capsules of dataset 2, the samples were measured by the probe directly without pretreatment. All the spectra were recorded on an MPA FT-NIR spectrometer (Bruker, Germany) in reflectance mode and the wavenumber range 3999.7–11995.3 cm− 1 with the digitization interval 3.857 cm−1. In the calculations, the variables from 3999.7 to 10,000.2 cm−1 (1557 data points) were used. Before calculation, continuous wavelet transform (CWT) with Haar wavelet and a scale parameter 20 is applied as the pretreatment method to eliminate the variant background in the spectra [44,45]. For each datasets, the spectra are divided into a calibration and validation set by Kennard–Stone (KS) method [46]. 4. Result and discussion 4.1. Classification results with PCA and Fisher linear discriminant analysis PCA is used to investigate the classification of the samples. Taking dataset 1 as an example, the distribution of the calibration samples in PC1–PC2, PC3–PC4, PC5–PC6, PC7–PC8, PC9–PC10 and PC11–PC12 subspaces was shown in Fig. 1(a)–(f), respectively. Different colors were used to denote the samples in different classes. It can be seen that, some of the classes can be separated in the PCs subspace. For examples, Vd6, Ve7 and Ve8 are separated in the subspaces of PC1–PC2, Vb3, Va1 and Vb4 are separated in the subspaces of PC3–PC4, PC5–PC6 and PC7–PC8, respectively. However, some classes, e.g., Vc5, cannot be separated in any PC subspaces. Moreover, although the scores of Va2 are well assembled in the subspaces of PC7–PC8, it is hard to find a clear border between Va2 and Vc5. On the other hand, as shown in Fig. 1, it is hard to see classification information after PC10. If more classes are involved, it may be impossible to have a well classification. Such results indicate that a satisfactory classification cannot be obtained by PCA.

Fig. 5. Transformed scores of the validation samples in dataset 1.

Fisher linear discriminant analysis is used to further investigate the classification of the samples. The scores in linear discriminant (LD) subspaces is calculated by the scores of calibration samples in PCs subspaces and shown in Fig. 2. It can be seen that Va2, Vb3, Vb4 and Vd6 are separated in the subspaces of LD1–LD2. However, some classes, e.g., Va1, Vc5, Ve7 and Ve8, cannot be separated. The results indicate that Fisher linear discriminant analysis can improve the results of classification, but the results, however, are still need to be improved.

4.2. Construction of the filters The optimal discriminant vectors are calculated by optimizing Eq. (3). In order to use as much as the information contained in the PCs, the scores PC1–PC12 are used as the input in the calculation. Then, the concept filters f1 and f2 are constructed by transforming the loadings in PC1–PC12 with the obtained discriminant vectors d1 and d2 using Eq. (5). Fig. 3 shows the coefficients of f1 and f2. For comparison, the mean pre-treated spectra of the eight class medicines in dataset 1 are plotted in the figure. Comparing the coefficients of f1 and f2 with the mean spectra of the eight class medicines, however, it can be found that there is a close relationship between the differences of the spectra and the intensity of the filter. For example, in the spectral regions of 4300–4700, 4950–5400, and 6700–7030 cm−1, as labeled by the three rectangles in the figure, the coefficients of the two filters are much larger than other wavenumbers. In these regions, the differences between the spectra of different classes can obviously be observed. This character of the filters can be explained by the principle in designing the filters. The optimal set of discriminant vectors are obtained by maximizing the ratio of the between-class variance and the within-class variance in the spectra of different classes. On the other hand, if the detail of the two filters is compared, the complement can be clearly seen due to the orthogonality. Therefore, the differences between the spectra can be ultimately

Table 2 Calculation results of the true positive and false positive in data set 1. Class

Va1 Va2 Vb3 Vb4 Vc5 Vd6 Ve7 Ve8

True positive (%)

False positive (%)

Cal.

Val.

Cal.

Val.

100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0

100.0 100.0 100.0 83.3 100.0 100.0 100.0 83.3

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

26

Y. Liu et al. / Chemometrics and Intelligent Laboratory Systems 149 (2015) 22–27

encoded in the filters, and different filter encodes different aspect of the differences. As a result, the filters may have a strong ability in classification or discrimination for the samples with similar NIR spectra. 4.3. Classification with the filters The constructed filters f1 and f2 are used to investigate the classification of dataset 1. In order to build the classification model, calibration set is used. Fig. 4(a)–(c) shows the transformed scores obtained with different filters constructed with the loadings of PC1 − 4, PC1 − 8 and PC1 − 12, respectively. The confidence ellipses for each class are plotted in the figure, and the ellipses are models to discriminate prediction samples. It is clear that, compared with the results in Fig. 1, the classification is much more efficient than PCA method. On the other hand, comparing the results in Fig. 4(a), (b) and (c), it is obvious that a better classification can be obtained when more PCs are used. Most of the classes are overlapped when only four PCs are used in Fig. 4(a), two classes are overlapped when eight PCs are used in Fig. 4(b), but all the classes are separated in Fig. 4(c) when 12 PCs are used. Therefore, using more PCs to include more information will be beneficial to construct efficient filters. Fig. 5 shows the classification effect for the validation spectra. The confidence ellipses (p = 0.05) of each class are the same as in Fig. 4(c), which are used as the models for the classes, and the values of y1 and y2 for the prediction samples are obtained with Eq. (7) and the filters constructed with PC1 − 12. It can be seen that almost all the samples are located in the corresponding ellipse. Although there are exceptions for one Vb4 sample and one Ve8 sample, they are located near the border of the ellipse. To further investigate the performance of the filters, discrimination results in terms of the true positive (TP) and false positive (FP) for both the calibration and validation sets are calculated and shown in Table 2. The two parameters are defined as the ratio of the number of correctly classified and misclassified samples, respectively, to the total number of the samples that belong to and not belong to the class. It was found that all the samples in the calibration set are all correctly classified with TPs 100% and FPs 0%. For the validation samples, only the TPs for the classes of Vb4 and Ve8 are 83.3% due to the one sample out of the model, but the FPs are still 0%. The efficiency of the filters is sufficiently proved by the results. 4.4. Classification results of datasets 2 and 3 Datasets 2 and 3 are used to further investigate the performance of the conceptual optical filters. Similarly as for the dataset 1, the filters for the two datasets are constructed using the proposed method, and then the transformed scores are calculated using Eq. (7) and the filters, respectively. The results are plotted in Figs. 6 and 7, in which the circles

Fig. 7. Transformed scores of the calibration (circle) and validation (square) samples in dataset 3.

represent the samples in calibration set and the squares represent the samples in validation set. It can be found from Fig. 6 that all the samples are correctly classified or discriminated with only one validation sample of Cc6. Furthermore, the difference between the composition and producers can be shown in the figure. Clearly, the medicines with different subtitles are separated along the y1 direction, and the difference between manufacturers can be seen along the y2 direction for Cb and Cc samples. Fig. 7 shows the results of dataset 3 for the seven classes of Chinese patent medicines. It can be found that the classification of the dataset is also acceptable. Only one validation sample of Pc6 is out of the model. However, there is one calibration sample out of the ellipse for each class, respectively, except for Pa2 and Pb5. On one hand, it is normal for the sample in the calibration set because the confidence ellipse is obtained with p = 0.05. On the other hand, the results indicate that the variation between the samples of Chinese patent medicine is large due to the large variation of the raw materials. 5. Conclusion A method for designing filters for ISP is proposed based on the problem of medicine classification in this study. The method starts with a principal component analysis (PCA) on the spectra of multi-class samples, and then calculates two optimal sets of orthogonal discriminant vectors using the PCA scores by maximizing Fisher's discriminant function. By transforming the loadings with the discriminant vectors, the filters for discrimination can be obtained. The filters are found highly efficient to extract the differences between the spectra of different class samples, making the filters to have a strong ability in classification or discrimination. Three NIR datasets are used to test the performance of the filters. Almost all the samples are correctly identified, which indicates that the filters can be used in designing the new ISP instruments. Conflict of interest There is no conflict of interest. Acknowledgments This study is supported by National Natural Science Foundation of China (Nos. 21175074 and 21475068). References

Fig. 6. Transformed scores of the calibration (circle) and validation (square) samples in dataset 2.

[1] [2] [3] [4]

S.E. Bialkowski, Anal. Chem. 58 (1986) 2561–2563. A. Fong, G.M. Hieftje, Appl. Spectrosc. 49 (1995) 493–498. A. Fong, G.M. Hieftje, Appl. Spectrosc. 49 (1995) 1261–1267. M.R. Fisher, G.M. Hieftje, Appl. Spectrosc. 50 (1996) 1246–1252.

Y. Liu et al. / Chemometrics and Intelligent Laboratory Systems 149 (2015) 22–27 [5] M.P. Nelson, J.F. Aust, J.A. Dobrowolski, P.G. Verly, M.L. Myrick, Anal. Chem. 70 (1998) 73–82. [6] O. Soyemi, D. Eastwood, L. Zhang, H. Li, J. Karunamuni, P. Gemperline, R.A. Synowicki, M.L. Myrick, Anal. Chem. 73 (2001) 1069–1079. [7] J.A. Swanstrom, L.S. Bruckman, M.R. Pearl, M.N. Simcock, K.A. Donaldson, T.L. Richardson, T.J. Shaw, M.L. Myrick, Appl. Spectrosc. 67 (2013) 320–629. [8] J.A. Swanstrom, L.S. Bruckman, M.R. Pearl, E. Abernathy, T.L. Richardson, T.J. Shaw, M.L. Myrick, Appl. Spectrosc. 67 (2013) 630–639. [9] M.R. Pearl, J.A. Swanstrom, L.S. Bruckman, T.L. Richardson, T.J. Shaw, H.M. Sosik, M.L. Myrick, Appl. Spectrosc. 67 (2013) 640-637. [10] L.A. Cassis, B. Dai, A. Urbas, R.A. Lodder, Proc. SPIE-Int. Soc. Opt. Eng. 5329 (2004) 239–253. [11] L.A. Cassis, A. Urbas, R.A. Lodder, Anal. Bioanal. Chem. 382 (2005) 868–872. [12] B. Dai, A. Urbas, C.C. Douglas, R.A. Lodder, Pharm. Res. 24 (2007) 1441–1449. [13] T. Tarumi, Y.P. Wu, G.W. Small, Anal. Chem. 81 (2009) 2199–2207. [14] R.A. Fisher, Ann. Hum. Genet. 7 (1936) 179–188. [15] T. Okada, S. Tomita, Pattern Recogn. 18 (1985) 139–144. [16] J.W. Sammon JR., IEEE Trans. Comput. 19 (1970) 826–829. [17] D.H. Foley, J.W. Sammon JR., IEEE Trans. Comput. 24 (1975) 281–289. [18] J. Duchene, S. Leclercq, IEEE Trans. Pattern Anal. Mach. Intell. 10 (1988) 978–983. [19] M. Blanco, I. Villarroya, TrAC Trends Anal. Chem. 21 (2002) 240–250. [20] J. Moros, S. Garrigues, M. de la Guardia, TrAC Trends Anal. Chem. 29 (2010) 578–591. [21] X.G. Shao, A.K.M. Leung, F.T. Chau, Acc. Chem. Res. 36 (2003) 276–283. [22] X.G. Shao, C.X. Ma, Chemom. Intell. Lab. Syst. 69 (2003) 157–165. [23] M. Hu, Y.Z. Li, G. Yang, G.B. Li, M.L. Li, Z.N. Wen, Amino Acids 42 (2011) 1773–1781. [24] F.Y. Tan, X.Y. Feng, M.L. Li, Z.M. Wang, L. Yang, Y.Z. Li, Y. Feng, F.S. Nie, Anal. Chim. Acta 629 (2008) 38–46. [25] H.D. Li, Y.Z. Liang, Q.S. Xu, D.S. Cao, Anal. Chim. Acta 648 (2009) 77–84.

[26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46]

27

C. Tan, X. Qin, M.L. Li, Chinese J Anal. Chem. 37 (2009) 1834–1838. X.G. Shao, G.R. Du, M. Jing, W. Cai, Chemom. Intell. Lab. Syst. 114 (2012) 44–49. X.G. Shao, M. Zhang, W.S. Cai, Anal. Methods 4 (2012) 467–473. R.F. Shan, W.S. Cai, X.G. Shao, Chemom. Intell. Lab. Syst. 131 (2014) 31–36. V. Centner, D.L. Massart, O.E. de Noord, S. de Jong, B.M. Vandeginste, C. Sterna, Anal. Chem. 68 (1996) 3851–3858. W.S. Cai, Y.K. Li, X.G. Shao, Chemom. Intell. Lab. Syst. 90 (2008) 188–194. T. Chen, E. Martin, Anal. Chim. Acta 631 (2009) 13–21. R.K.H. Galvao, M.C.U. Araujo, W.D. Fragoso, E.C. Silva, G.E. Jose, S.F.C. Soares, H.M. Paiva, Chemom. Intell. Lab. Syst. 92 (2008) 83–91. Y.H. Yun, W.T. Wang, M.L. Tan, Y.Z. Liang, H.D. Li, D.S. Cao, H.M. Lu, Q.S. Xu, Anal. Chim. Acta 807 (2014) 36–43. R. Leardi, J. Chemom. 14 (2000) 643–655. Q. Shen, J.H. Jiang, C.X. Jiao, S.Y. Huan, G.L. Shen, R.Q. Yu, J. Chem. Inf. Comp. Sci. 44 (2004) 2027–2031. F. Lindgren, P. Geladi, A. Berglund, M. Sjostrom, S. Wold, J. Chemom. 9 (1995) 331–342. S. Wold, Pattern Recogn. 8 (1976) 127–139. N. Kumar, A. Bansal, G.S. Sarma, R.K. Rawal, Talanta 123 (2014) 186–199. B. Scholkopf, S. Mika, C.J.C. Burges, IEEE T. Neural Networ. 10 (1999) 1000–1017. M. Hubert, P.J. Rousseeuw, K. Vanden Branden, Technometrics 47 (2005) 64–79. J.J. Liu, W.S. Cai, X.G. Shao, Sci. China Chem. 54 (2011) 802–811. Y. Wang, X. Ma, Y.D. Wen, J.J. Liu, W.S. Cai, X.G. Shao, Anal. Methods 4 (2012) 2893–2899. C.X. Ma, X.G. Shao, J. Chem. Inf. Comput. Sci. 44 (2004) 907–911. D. Chen, X.G. Shao, B. Hu, Q.D. Su, Anal. Chim. Acta 511 (2004) 37–45. R.W. Kennard, L.A. Stone, Technometrics 11 (1969) 137–148.