Journal Pre-proof Near infrared spectral variable optimization by final complexity adapted models combined with uninformative variables elimination-A validation study Xiangzhong Song, Yue Huang, Kuangda Tian, Shungeng Min
PII:
S0030-4026(19)31918-7
DOI:
https://doi.org/10.1016/j.ijleo.2019.164019
Reference:
IJLEO 164019
To appear in:
Optik
Received Date:
30 October 2019
Accepted Date:
6 December 2019
Please cite this article as: Song X, Huang Y, Tian K, Min S, Near infrared spectral variable optimization by final complexity adapted models combined with uninformative variables elimination-A validation study, Optik (2019), doi: https://doi.org/10.1016/j.ijleo.2019.164019
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier.
Near infrared spectral variable optimization by final complexity adapted models combined with uninformative variables elimination-A validation study Xiangzhong Song a, Yue Huang a*, Kuangda Tian b, Shungeng Min b a
College of Food Science and Nutritional Engineering, China Agricultural University, Beijing 100083,
ro of
P.R. China b
College of Science, China Agricultural University, Beijing 100193, P.R. China
lP
re
-p
* Corresponding author: Tel.: +86 10 62733091 Email address:
[email protected] (Y. Huang).
Abstract:
na
A combination method for spectral variable selection was proposed in this study. In this method, predictive property ranked variable reduction with final complexity adapted models (FCAM)
ur
was used for further variable refinement following method of uninformative variables
Jo
elimination (UVE). In practice, two different near infrared spectral (NIRS) datasets were investigated to evaluate the quantitative performance of proposed method. Results showed that UVE-FCAM selected much fewer variables with better prediction than single UVE on both spectral datasets. Moreover, by contrast, both prediction and modeling stability of UVE-FCAM were proved to be better than a widely-used combination method as UVE-SPA. Overall results demonstrated UVE-FCAM could be a promising alternative method for optimizing variables, 1
and FCAM had the potential to be an effective variable refinement following other variable selection methods.
Keywords: Uninformative variables elimination (UVE), Final complexity adapted models
na
1. Introduction
lP
re
-p
ro of
(PPRVR-FCAM), Near-infrared spectroscopy (NIRS), Variable selection
In analytical chemistry, multivariate calibration algorithms are widely used for establishing
ur
regression models upon spectral data. With the development of instrumental technology, the spectra obtained usually consist of hundreds or even thousands of response signals which are
Jo
considered as large amount of variables. Actually, in most cases, there are only a few informative variables responsible for the interested property or even the whole sample. The rest noisy or uninformative variables always interfere with the prediction performance of calibration model, or make the calibration process very time-consuming. Therefore, lots of variable selection methods have been developed in the past decades, such as interval PLS (iPLS) [1], moving windows PLS 2
(MWPLS) [2, 3], uninformative variables elimination (UVE) [4-6], successive projection algorithm (SPA) [7, 8], competitive adaptive reweighted sampling (CARS) [9], genetic analysis-PLS (GA-PLS) [10, 11], iteratively retaining informative variables (IRIV) [12], variable iterative space shrinkage approach (VISSA) et al [13-15], other integrated variable optimization researches [16-19]. All these methods are focusing on selecting informative variables or spectral regions according to a specific strategy. Apparently, every strategy has its own advantages and of
been developed to achieve complementary advantages [20-25].
ro of
course disadvantages, so more and more combinatorial trials between different strategies have
As a classic variable selection method, UVE has been used to solve such problems and improve
-p
the quality of models by eliminating the irrelevant information and noise from the data matrix.
re
Commonly, a regression model is developed by PLS with the chosen variables. Then, calibration
lP
and prediction models obtained by the selecting characteristic information could be better than those obtained by the full-spectrum because the characteristic wavelengths instead of raw spectra
na
are used for developing the model. Although UVE can eliminate uninformative variables very effectively, its purpose is not trying to find the optimal subset of variables with the best
ur
prediction performance. Furthermore, the number of variables retained by UVE is still too large. In order to overcome these drawbacks, several variable selection methods have been used for
Jo
further refinement following UVE in many reports, such as iPLS, GA, SPA, etc [21, 23, 24, 26]. Among all these combinations, UVE-SPA proposed by our group may be one of the most popular choices [27-32]. Benefit from advantages of two treatments, UVE-SPA can select informative variables with as low as possible chances of multi-collinearity more effectively. However, the stability of UVE-SPA is unsatisfactory, namely, UVE-SPA always selects different 3
variables in different calculation, which may because that any variants of the input variables can have a great influence on the selection results of SPA [7]. Predictive property ranked variable reduction with final complexity adapted models, abbreviated as FCAM, is a promising stepwise variable selection method [26, 33, 34]. In this method, variable reduction is implemented iteratively according to the absolute value of regression coefficients. Although the computational burden of FCAM is usually heavy especially when
ro of
there are too many variables in the spectra, the stability of FCAM was proved to be better than
UVE-iPLS and UVE-GA [26]. Considering the poor stability of traditional combination methods based on UVE, a new combination strategy of variable selection was proposed by utilizing
-p
FCAM as a variable refining method following UVE in this study. The proposed method was
re
applied to two near infrared spectral datasets for analyzing protein content of corn and fipronil
lP
content in pesticide formulation. Meanwhile, as a common used selection method based on UVE, UVE-SPA was chosen for performance comparison.
na
2. Theory and algorithms 2.1. FCAM method
ur
FCAM is a variable selection method based on regression coefficient of PLS model. Flowchart and detail information about FCAM have been reported [26], and main steps of FCAM can be
Jo
summarized as follows:
(a) Build a full spectrum PLS model on training set with the best model complexity A determined by cross validation method. The absolute value of PLS regression coefficients corresponding to every variable, abbreviated as REG, is used as a variable property for variable reduction. Meanwhile, root mean squared error of cross validation (RMSECV) value of the PLS 4
model is assessed by five-fold cross validation among training set. (b) Rank variables in descending order of REG generated in step (a), and remove the variable with the smallest REG. (c) Build a new PLS model based on the latent variables A from retained variables, RMSECV value of new PLS model is assessed by the way used in step (a). (d) Rank variables in descending order of REG generated in step (c), and eliminate the variable
ro of
with the smallest REG.
(e) Repeat step (c)-(d) until the number of remaining variables equals to A.
(f) For the remaining A variables, repeat step (c)-(d) until the number of remaining variables
re
is always equal to the number of remaining variables.
-p
equals to 1, except that PLS model complexity used in step (c) is decreased gradually to keep it
lP
(g) Find the lowest RMSECV value denoted as RMSECVmin from all RMSECV values. Cutoff value denoted as RMSECVcrit is calculated according to Equation 1. (1)
na
RMSECVcrit2 = F(α, M, M)×RMSECVmin2
Where α represents the significance level of one-tailed F-test, and M means degrees of freedom
ur
of both the numerator and denominator. In this work, α and M are set as 0.05 and the number of training set samples respectively.
Jo
(h) Denote the RMSECV value in step (a) as RMSECVFS, if RMSECVcrit > RMSECVFS, then RMSECVcrit is set to RMSECVFS. Find the smallest variables set with RMSECV values less than or equal to RMSECVcrit, which will be considered as the best variables set selected by the FCAM method. 2.2. UVE method 5
UVE is a variable selection method based on the stability value of PLS model, which has been reported in many researches [4-6]. Therefore, only two key points of UVE are summarized as follows: (a) Stability value The stability value was acquired by hundreds of repeating Monte Carlo cross calculation, which can be considered as a parameter of frequency of retained variables. Stability value c is
ro of
calculated by Monte Carlo cross validation, because it has been proved to be more effective than
leave-one-out cross validation in reports [5, 6, 35]. Monte Carlo cross validation was repeated for 500 times in this article, and 80% training samples were selected randomly to construct PLS
-p
model in each repeat.
re
(b) Cutoff threshold
lP
Maximum of absolute value of c among the artificial random variables is used as the cutoff threshold, variables with absolute value of c less than which will be regarded as uninformative
na
and eliminated. 2.3. UVE-FCAM method
ur
UVE-FCAM is a combined variable selection method. In this method, uninformative variables are eliminated by UVE firstly, and then the further refinement of informative variables retained
Jo
by UVE is implemented by FCAM. In terms of variable selection, advantages of UVE–FCAM can be summarized into three aspects as the follows: (a) UVE-FCAM can select much fewer variables than single UVE with comparable prediction performance, because FCAM can retain as fewer as informative variables without loss of prediction ability. 6
(b) Both selection performance and computational burden of FCAM can be improved significantly in the combined method, since most uninformative variables have been eliminated by UVE. (c) Benefit from the good stability of FCAM, the stability of UVE-FCAM will be better than UVE-SPA.
3. Data collection and treatment
ro of
3.1. Corn
This dataset consists of spectra of 80 corn samples measured on three different NIR
spectrometers. The NIR spectra all consist of 700 variables range from 1100 to 2498 nm with an
-p
interval of 2 nm. In this study, spectra obtained by instrument denoted as “m5” were used, and
re
the protein content was considered as the interest property. In addition, the dataset was divided
(KS) algorithm. [36].
na
3.2. Pesticide
lP
into a calibration set (50 samples) and an independent test set (30 samples) by the Kennard-Stone
The pesticide dataset consists of 90 pesticide formulation samples. According to our previous
ur
study [37], a 10% acetamiprid reagent, which adding different densities of fipronil was used to prepare 90 fipronil fortified samples, whose total mass was approximately 10 g. The mass
Jo
percentage range of fipronil was from 0.1% to 4.5%, among which there were 35 samples with concentrations between 0.1% and 1.0%, and 55 samples between 1.0% and 4.5%. NIR spectra were obtained by a FT-NIR spectrometer (Spectrum ONE NTS, PerkinElmer, USA) over the wavenumber range of 4000-10000cm-1 at 4cm-1 resolution. Each spectrum was the average of 32 scans. Spectra of fipronil technical (>99% purity, w/w) were obtained by using the accessories of 7
integrating sphere and sampling rotator under the same spectral acquisition condition. The dataset was divided into a calibration set (50 samples) and an independent test set (40 samples) randomly. 3.3. Calculation All algorithms were implemented with MATLAB (Version 2012a, The Mathworks, Natick, MA, USA). Codes of SPA algorithm were obtained at the website: www.ele.ita.br/kawakami/spa/. KS,
ro of
UVE and FCAM algorithms were programmed independently. In order to compare the intrinsic
feature of each algorithm, all calculations were based on original spectra without any pre-treatment in this study. PLS regression was used as the multivariate calibration method in
-p
this study. For each PLS model, the optimal number of latent variables (LVs) was optimized by
re
five-fold cross validation, and the maximum number of latent variables was set to 15 for both
lP
datasets in this work. Root mean square error of cross validation (RMSECV), root mean square error of calibration (RMSEC) and root mean square error of prediction (RMSEP) was obtained
na
for model evaluation. In addition, each variable selection method was repeated for 100 times to evaluate their stabilities.
ur
4. Results and discussion 4.1. Corn data
Jo
Results of different methods on corn dataset were presented in Table 1. As shown, the prediction performances of all four variable selection methods were better than that of full spectrum, it was demonstrated that quantitative performance of NIR spectral analysis can be actually improved by proper variable selection. And the ranking of prediction performance of all four variable selection methods was as follows: UVE-FCAM>UVE>UVE-SPA>FCAM. It was clear that all 8
the methods including UVE can perform better than single FCAM, which conversely indicated that the performance of FCAM still had capacity to be further improved by removing more uninformative variables from full spectrum. More importantly, the computational burden of FCAM can be reduced significantly by UVE, since UVE retained only about 86 variables from 700 variables in the full spectrum. Compared with 86 variables retained by UVE, UVE-FCAM selected only 22 variables with even slightly better prediction performance, indicating that
ro of
FCAM can be an effective variable refining method after UVE. In contrast, although UVE-SPA selected even fewer variables than UVE-FCAM, its prediction was inferior to that of UVE, since certain informative variables may be eliminated along with the removing of collinear variables
-p
by SPA.
re
Mean spectrum of corn samples and accumulated selected frequencies of variables retained by
lP
all four variable selection methods were illustrated in Fig. 1a-e, respectively. Spectral regions of 1700-1800nm and 2100-2200nm, marked by grey shadows, were assigned to the first overtone of
na
C-H stretching and varied vibration combinations of N-H in protein structure respectively, which have been proved to be the informative regions in Fu’s report [38]. It also showed that most
ur
variables selected by four methods were centralized in these two informative regions, which just explained that all four variable selection methods performed better than none selection. Although
Jo
these four variable selection methods all selected a lot of uninformative variables outside informative regions, the selected frequencies of uninformative variables by UVE-FCAM were apparently much less than that of UVE, FCAM and UVE-SPA. Especially, the distribution of variables selected by UVE-SPA was more scattered than UVE-FCAM even though the number of variables was less. It reflected the stability of UVE-FCAM was better than UVE-SPA. The 9
fact was due to that some informative variables hidden in two informative regions might be eliminated by SPA along with the removing of collinear variables, because the chance of collinearity between adjacent variables within certain specific waveband is usually high especially for NIR data [15]. 4.2. Pesticide system Spectrum of each pesticide sample consisted of 6001 variables. After repeating 100 runs for
ro of
every variable selection method, results of each method were listed in Table 2. As shown,
although the average number of variables retained by UVE-FCAM was far less than that of UVE, its predictive performance was still comparable with that of UVE. Therefore it was also validated
-p
that FCAM can be a good alternative approach for the refinement of variables retained by UVE.
re
Plus, the fact that the computational burden of FCAM can be reduced significantly by UVE was
lP
tested again in this system, as the average number of variables to be refined by FCAM has been greatly reduced from 6001 to 1300. However, both single used FCAM and UVE-SPA performed
na
poorly, which may due to that some important variables have been eliminated, as they all retained no more than 20 variables.
ur
Mean spectrum of pesticide formulation and fipronil technical were shown in Fig. 2a-2b respectively. Four major characteristic absorption peaks of fipronil technical were marked in gray
Jo
shadow. Among which, peaks in 4400-4600 cm-1 region were assigned to the combination of stretching vibration of C-H, peaks near 5000 cm-1 were corresponding to the combination of stretching vibration of N-H, the weak peak near 6000 cm-1 was related to the first overtone of aromatic C-H stretching, and peaks in the range of 6500- 6800 cm-1 were attributed to the first overtone of N-H stretching. Because the content of fipronil technical in the pesticide formulation 10
ranged as 0.1-5% (w/w), characteristic peaks of fipronil technical could hardly be identified from spectrum of pesticide formulation. However, these four characteristic peaks were still considered as informative regions for establishing calibration model of fipronil technical content. Variables obtained by UVE, FCAM, UVE-FCAM and UVE-SPA were presented in Fig. 2c-f, respectively. From Fig. 2c and Fig. 2e, it was found that both UVE and UVE-FCAM retained variables from all the informative regions mentioned above, but UVE selected much more uninformative
ro of
variables outside these four regions than UVE-FCAM. However, both FCAM and UVE-SPA selected insufficient variables from only two or three informative regions, which definitely led to their poor prediction performance.
-p
It was worth noting that although the variables retained by FCAM and UVE-SPA were less than
re
20, distributions of them were quite different, as shown in Fig. 2d and Fig. 2f respectively. The
lP
distribution of variables retained by UVE-SPA was more dispersive, and their selected frequencies were all below 50 times within 100 repeats. This fact may be resulted from that SPA
na
was supposed to select variables with minimum collinearity. In contrast, as to FCAM, the selected frequency was always over 80 times within 100 repeats for most informative variables,
ur
which indicated the stability of FCAM was superior to UVE-SPA. Benefit from good stability of FCAM, the selected frequencies were always over 60 times within 100 repeats for most
Jo
informative variables obtained by UVE-FCAM, illustrating that the stability of UVE-FCAM was better than UVE-SPA. For example, 5026 cm-1 was an informative variable, the selected frequencies of UVE-SPA was only 6 times within 100 repeats. Obviously the stability of UVE-SPA was unsatisfactory when compared with other methods.
5. Conclusion 11
UVE-FCAM method was proposed for spectral variable selection in this study, performance of which was evaluated by two NIR datasets. Results show that UVE-FCAM can select much fewer variables with comparable or even slightly better prediction performance than single UVE for both datasets. It indicates that FCAM could be a useful variable refining method following UVE. Meanwhile, as most uninformative variables have been eliminated by UVE, the computational burden of FCAM is reduced significantly when compared with single FCAM. In addition, results
ro of
also show that UVE-FCAM can outperform UVE-SPA in terms of both prediction performance
and stability. Therefore, UVE-FCAM could be a good alternative method for the selection of more refined variables, and FCAM has the potential to be used as an effective variable
re
-p
refinement method after other variable selection methods without loss of prediction performance.
lP
Declaration of Interest Statement The authors declare that they have no known competing financial
na
interests or personal relationships that could have appeared to influence
ur
the work reported in this paper.
Jo
Acknowledgements
Authors gratefully acknowledge the financial support provided by the Open Fund of State Key Laboratory of Water Resource Protection and Utilization in Coal Mining (Grant No.SHJT-17-42.17); Fundamental Research Funds for the Central Universities of China (No. 3142017100).
12
Reference: [1] M.G. Nespeca, W.D. Pavini, J.E. Oliveira, Multivariate filters combined with interval partial least square method: A strategy for optimizing PLS models developed with near infrared data of multicomponent solutions. Vib. Spectrosc. 102 (2019) 97-102. [2] J.H. Jiang, R.J. Berry, H.W. Siesler, Y. Ozaki, Wavelength interval selection in
ro of
multicomponent spectral analysis by moving window partial least-squares regression with
applications to mid-infrared and near-infrared spectroscopic data. Anal. Chem. 74 (2002) 3555-3565.
-p
[3] J.M. Chen, Z.W. Yin, Y. Tang, T. Pan, Vis-NIR spectroscopy with moving-window PLS
re
method applied to rapid analysis of whole blood viscosity. Anal. Bioanal. Chem. 409 (2017)
lP
2737-2745.
[4] V. Centner, D.L. Massart, O.E. Noord, et al., Elimination of uninformative variables for
na
multivariate calibration. Anal. Chem. 68 (1996) 3851-3858. [5] Q.J. Han, H.L. Wu, C.B. Cai, L. Xu, R.Q. Yu, An ensemble of Monte Carlo uninformative
ur
variable elimination for wavelength selection. Anal. Chim. Acta 612 (2008) 121-125. [6] J.T. Rocha, L. Oliveira, J.C. Dias, et al., Sulfur Determination in Brazilian Petroleum
Jo
Fractions by Mid-infrared and Near-infrared Spectroscopy and Partial Least Squares Associated with Variable Selection Methods. Energ. Fuel 30 (2016) 698-705.
[7] A.A. Gomes, R.H. Galvão, M.U. Araújo, et al., The successive projections algorithm for interval selection in PLS. Microchem. J. 110 (2013) 202-208. [8] T. Mizutani, M. Tanaka, Efficient preconditioning for noisy separable nonnegative matrix 13
factorization problems by successive projection based low-rank approximations. Mach. Learn. 107 (2018) 643-673. [9] K.Y. Zheng, Q.Q. Li, J.J. Wang, et al., Stability competitive adaptive reweighted sampling (SCARS) and its applications to multivariate calibration of NIR spectra. Chemometr. Intell. Lab 112 (2012) 48-54. [10] M.W. Assis, D.O. Fusco, R.C. Costa, et al., PLS, iPLS, GA-PLS models for soluble solids
spectroscopy. J. Sci. Food. Agr. 98 (2018) 5750-5755.
ro of
content, pH and acidity determination in intact dovyalis fruit using near-infrared
[11] C.S. Miaw, C. Assis, A.R. Silva, et al., Determination of main fruits in adulterated nectars by
-p
ATR-FTIR spectroscopy combined with multivariate calibration and variable selection
re
methods. Food Chem. 254 (2018) 272-280.
lP
[12] Y.H. Yun, W.T. Wang, M.L. Tan, et al., A strategy that iteratively retains informative variables for selecting optimal variable subset in multivariate calibration. Anal. Chim. Acta
na
807 (2014) 36-43.
[13] B.C. Deng, P. Ma, C.C. Lin, et al., A new method for wavelength interval selection that
ur
intelligently optimizes the locations, widths and combinations of the intervals. Analyst 140 (2015) 1876-1885.
Jo
[14] B.C. Deng, Y.H. Yun, D.S. Cao, et al., A bootstrapping soft shrinkage approach for variable selection in chemical modeling. Anal. Chim. Acta 908 (2016) 63-74.
[15] Y.H. Yun, W.T. Wang, B.C. Deng, et al., Using variable combination population analysis for variable selection in multivariate calibration. Anal. Chim. Acta 862 (2015) 14-23. [16] C.M. Andersen, R. Bro, Variable selection in regression-a tutorial. J. Chemometrics 24 14
(2010) 728-737. [17] X.B. Zou, J.W. Zhao, M.J. Povey, M. Holmes, H.P. Mao, Variables selection methods in near-infrared spectroscopy. Anal. Chim. Acta 667 (2010) 14-32. [18] T. Mehmood, K.H. Liland, L. Snipen, S. Sabo, A review of variable selection methods in partial least squares regression. Chemometr. Intell. Lab 118 (2012) 62-69. [19] L.C. Paula, A.S. Soares, T.W. Soares, Epistasis-based FSA: Two versions of a novel
ro of
approach for variable selection in multivariate calibration. Eng. Appl. Artif. Intel. 81 (2019) 213-222.
[20] S.F. Ye, D. Wang, S. Min, Successive projections algorithm combined with uninformative
-p
variable elimination for spectral variable selection. Chemometr. Intell. Lab 91 (2008)
re
194-199.
lP
[21] J. Li, C. Zhao, W. Huang, C. Zhang, Y. Peng, A combination algorithm for variable selection to determine soluble solid content and firmness of pears. Anal. Methods 6 (2014)
na
2170-2180.
[22] G. Tang, Y. Huang, K. Tian, et al., A new spectral variable selection pattern using
ur
competitive adaptive reweighted sampling combined with successive projections algorithm. Analyst 139 (2014) 4894-4902.
Jo
[23] Q. Ouyang, J. Zhao, Q. Chen, Measurement of non-sugar solids content in Chinese rice wine using near infrared spectroscopy combined with an efficient characteristic variables selection algorithm. Spectrochim. Acta A 151 (2015) 280-285. [24] M. Barycki, A. Sosnowska, K. Jagiello, T. Puzyn, Multi-Objective Genetic Algorithm (MOGA) As a Feature Selecting Strategy in the Development of Ionic Liquids’ Quantitative 15
Toxicity-Toxicity Relationship Models. J. Chem. Inf. Model. 58 (2018) 2467-2476. [25] A.S. Saad, A.M. AlAlamein, M.M. Galal, Traditional versus advanced chemometric models for the impurity profiling of paracetamol and chlorzoxazone: Application to pure and pharmaceutical dosage forms. Spectrochim. Acta A 205 (2018) 376-380. [26] G. Tang, X. Song, J. Hu, H. Yan, K. Qiu, Characterization of a Pesticide Formulation by Medium Wave Near-Infrared Spectroscopy with Uninformative Variable Elimination and
ro of
Successive Projections Algorithm. Anal. Lett. 47 (2014) 2570-2579.
[27] J. Elder, The apparent paradox of complexity in ensemble modeling. Handbook Stat Anal Data Min Appl (Second Edit), (2018) 705-718.
-p
[28] R.M. Balabin, S.V. Smirnov, Variable selection in near-infrared spectroscopy:
re
Benchmarking of feature selection methods on biodiesel data. Anal. Chim. Acta 692 (2011)
lP
63-72.
[29] H. Yang, B. Kuang, A.M. Mouazen, Quantitative analysis of soil nitrogen and carbon at a
na
farm scale using visible and near infrared spectroscopy coupled with wavelength reduction. Eur. J. Soil. Sci 63 (2011) 410-420.
ur
[30] N. Omidikia, M. Kompany-Zareh, Uninformative variable elimination assisted by Gram-Schmidt Orthogonalization/successive projection algorithm for descriptor selection in
Jo
QSAR. Chemometr. Intell. Lab 128 (2013) 56-65.
[31] Z. Li, J. Wang, Y. Xiong, Z. Li, S. Feng, The determination of the fatty acid content of sea buckthorn seed oil using near infrared spectroscopy and variable selection methods for multivariate calibration. Vib. Spectrosc. 84 (2016) 24-29. [32] E. Giese, O. Winkelmann, S. Rohn, J. Fritsche, Determining quality parameters of fish oils 16
by means of 1H nuclear magnetic resonance, mid-infrared, and near-infrared spectroscopy in combination with multivariate statistics. Food Res. Int. 106 (2018) 116-128. [33] J.P. Andries, Y.V. Heyden, L.M. Buydens, Predictive-property-ranked variable reduction with final complexity adapted models in partial least squares modeling for multiple responses. Anal. Chem. 85 (2013) 5444-5453. [34] J.P. Andries, Y.V. Heyden, L.M. Buydens, Predictive-property-ranked variable reduction in
properties for ranking. Anal. Chim. Acta 760 (2013) 34-45.
ro of
partial least squares modelling with final complexity adapted models: Comparison of
[35] Q.S. Xu, Y.Z. Liang, Monte Carlo cross validation. Chemometr. Intell. Lab 56 (2001) 1-11.
-p
[36] R.W. Kennard, L.A. Stone, Computer aided design of experiments. Technometrics 11 (1969)
re
137-148.
lP
[37] K. Qiu, X. Song, G. Tang, S.G. Min, Determination of Fipronil in Acetamiprid Formulation by Attenuated Total Reflectance-Mid-Infrared Spectroscopy Combined with Partial Least
na
Squares Regression. Anal. Lett 46 (2013) 2388-2399. [38] G.H. Fu, Q.S. Xu, H.D. Li, D.S. Cao, Y.Z. Liang, Elastic net grouping variable selection
ur
combined with partial least squares regression (EN-PLSR) for the analysis of strongly
Jo
multi-collinear spectroscopic data. Appl. Spectrosc 6(2011) 402-408.
17
Figure caption:
Figure 1 Spectra of corn (a) and selected frequencies of variables obtained by different methods
Jo
ur
na
lP
re
-p
ro of
on corn samples within 100 repeats. (b) UVE; (c) FCAM; (d) UVE-FCAM and (e) UVE-SPA
Figure 2 Spectra of pesticide formulation (a), fipronil technical (b) and selected frequencies of variables obtained different methods on pesticide samples within 100 repeats. (c) UVE; (d) FCAM; (e) UVE-FCAM and (f) UVE-SPA.
18
19
ro of
-p
re
lP
na
ur
Jo
Tables Table 1 Results of different methods on the corn dataset. nVARa
Methods
nLVsb
RMSECV
RMSEC
RMSEP
14
0.114
0.062
0.100
11.8±0.6
0.036±0.010
0.018±0.007
0.032±0.010
700
PLS
c
UVE-PLS
86.2±15.7
FCAM-PLS
25.8±17.1
8.6±2.1
0.060±0.014
0.046±0.012
0.071±0.010
UVE-FCAM-PLS
22.0±8.3
9.8±1.7
0.019±0.010
0.013±0.009
0.031±0.015
UVE-SPA-PLS
17.0±3.2
11.0±1.3
0.034±0.009
0.022±0.006
0.046±0.011
Number of variables;
b
Number of latent variables;
c
Statistical results with the form mean value ± standard deviation from 100 repeats
re
-p
ro of
a
Table 2 Results of different methods on the pesticide data set. nLVsb
RMSECV
RMSEC
RMSEP
6001 1287.3±181.2c 19.1±4.4 79.3±29.7 14.0±3.2
14 14.0±0.9 9.2±1.0 11.9±0.5 12.4±2.0
0.098 0.072±0.003 0.071±0.009 0.045±0.007 0.068±0.009
0.032 0.034±0.006 0.059±0.013 0.029±0.005 0.049±0.008
0.103 0.091±0.006 0.140±0.022 0.090±0.009 0.126±0.026
na
PLS UVE-PLS FCAM-PLS UVE-FCAM-PLS UVE-SPA-PLS
lP
nVARa
Methods
Number of variables;
b
Number of latent variables;
c
Statistical results with the form mean value ± standard deviation from 100 repeats
Jo
ur
a
20