Prediction of fatty acid composition in camellia oil by 1H NMR combined with PLS regression

Prediction of fatty acid composition in camellia oil by 1H NMR combined with PLS regression

Accepted Manuscript Prediction of fatty acid composition in camellia oil by 1H NMR combined with PLS regression MengTing Zhu, Ting Shi, Yi Chen, Shuha...

656KB Sizes 0 Downloads 15 Views

Accepted Manuscript Prediction of fatty acid composition in camellia oil by 1H NMR combined with PLS regression MengTing Zhu, Ting Shi, Yi Chen, ShuhanLuo, TuoLeng, YangLing Wang, CongGuo, Mingyong Xie PII: DOI: Reference:

S0308-8146(18)32120-4 https://doi.org/10.1016/j.foodchem.2018.12.025 FOCH 23984

To appear in:

Food Chemistry

Received Date: Revised Date: Accepted Date:

25 September 2018 26 November 2018 6 December 2018

Please cite this article as: Zhu, M., Shi, T., Chen, Y., ShuhanLuo, TuoLeng, Wang, Y., CongGuo, Xie, M., Prediction of fatty acid composition in camellia oil by 1H NMR combined with PLS regression, Food Chemistry (2018), doi: https://doi.org/10.1016/j.foodchem.2018.12.025

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Prediction of fatty acid composition in camellia oil by 1H NMR combined with PLS regression Running title: Prediction of fatty acid composition in camellia oil by NMR and PLS

MengTing Zhua, Ting Shia, Yi Chena*, ShuhanLuoa, TuoLenga, YangLing Wanga, CongGuo, Mingyong Xiea

a

State Key Laboratory of Food Science and Technology, Nanchang University, Nanchang 330047, People’s Republic of China

*To whom correspondence should be addressed. Professor Yi Chen, PhD; E-mail address: [email protected] Tel.: 0791-88304449. Fax: 0791-88304449.

1

Abstract: A rapid method for the determination of fatty acid (FA) composition in camellia oils was developed based on the 1H NMR technique combined with partial least squares (PLS) method. Outliers detection, LVs optimization and data pre-processing selection were explored during the model building process. The results showed the optimal models for predicting the content of C18:1, C18:2, C18:3, saturated, unsaturated, monounsaturated and polyunsaturated FA were achieved by Pareto scaling (Par) pretreatment, with correlation coefficient (R2) above 0.99, the root mean square error of estimation and prediction (RMSEE, RMSEP) lower than 0.954 and 0.947, respectively. Mean-centering (Ctr) was more suitable for the model of C16:0 and C18:0 with the best performance indicators (R2 ≥ 0.945, RMSEE ≤ 0.377, RMSEP ≤ 0.212). This study indicated that 1

H NMR has the potential to be applied as a rapid and routine method for the analysis of FA

composition in camellia oils. Keywords: Camellia oil, Fatty acid composition, PLS, 1H NMR

1. Introduction As one of the main species in genus Camellia, Camellia oleifera (C. oleifera), belongs to the Theaceae family and is widely cultivated in the southern provinces of China (Zeb, 2012). Camellia oleifera oil (C. oleifera oil), obtained from the seeds of C. oleifera, is one of the popular edible vegetable oils in China. Due to its similar fatty acid composition to that of olive oil, C. oleifera oil has also been honored as “Eastern Olive Oil” (Wang, Lee, Wang, & He, 2006). C. oleifera oil is rich in unsaturated fatty acids (UFAs), including palmitoleic acid, oleic acid, linoleic acid, linolenic acid, eicosenoic acid as well as docosenoic acid, accounting for approximately 90% of total fatty acids. In particular, oleic acid accounts for 74–87% of the total fatty acids, which almost is the highest among the reported edible oils (Su, Shih, & Lin, 2014). Besides, C. oleifera oil is also rich in other 2

fat-soluble natural compounds with health benefits, such as vitamin E, sterols, squalene, and flavonoids. In particular, levels of vitamin E are twice as high as in olive oil (He, Zhou, Zhang, & Liu, 2011). C. oleifera oil has been reported to provide lots of health-promoting effects, including reducing cholesterol and triglycerides in the blood, lowering blood pressure, and hence it has also been traditionally applied to prevent cardiovascular diseases, arteriosclerosis, and burn injury in China (Kim, Hui, Kim, Lim, Cho, Choi, et al., 2014; Yuan, Wang, Chen, Zhou, & Ye, 2013). All those distinct properties mean that C. oleifera oil is sold at a much higher price than other vegetable oils. In order to seek a high commercial profit, adulterated and shoddy C. oleifera oil is always found in the market. Recently, a rapid method based on proton nuclear magnetic resonance (1H NMR) technique and chemometrics has been reported by us to detect the adulteration of camellia oil (Shi, Zhu, Chen, Yan, Chen, Wu, et al., 2018). The results indicated that the fatty acid composition can be used as an indicator to assess the purity and adulteration levels of C. oleifera oil. Similarly, the free fatty acid content has been reported as a key quality index to classify olive oil (Muik, Lendl, Molina-DıA ́ z, Ayora-Cañada, et al., 2003). Moreover, Kamal-Eldin and Andersson (1997) reported that there was a correlation between fatty acid composition and tocopherol content in vegetable oils. Thus, as a key factor to evaluate the quality of vegetable oils, the fatty acid composition is listed as a necessary testing item in the quality standards for vegetable oils. However, the conventional methods used in the quality standards for fatty acids determination, such as gas chromatography (GC) (Wang, Zeng, Verardo, & Mdm, 2017) and gas chromatography–mass spectrometry (GC–MS) (Li, Kong, Shi, & Shen, 2016), are always tedious, time-consuming, fatty acids standards required and complicated pretreatment involved. Nowadays, a new, rapid and nondestructive method with combinational application of spectroscopy techniques (including near-infrared, infrared, Raman spectroscopy and nuclear magnetic resonance) and 3

chemometrics has been recognized as an alternative analytical tool in oil quality control. Among all those above methods, nuclear magnetic resonance (NMR) spectroscopy has attracted most interest, due to its many advantages, such as minimal sample preparation, rapid data acquisition and ability to provide a complete view of chemical compositions with qualitative and quantitative information (Mckenzie, Donarski, Wilson, & Charlton, 2011). Thus, there are many studies reported on the application of NMR and chemometrics for quality assessment and authentication of vegetable oil. Kritioti, Menexes, and Drouza, (2018) used 1H NMR together with principal components analysis (PCA) and hierarchical cluster analysis for the classification of virgin olive oils by different geographical regions and varieties. Popescu, Costinel, Dinca, Marinescu, Stefanescu, and Ionete (2015) also developed an NMR-based PCA model for discrimination of different vegetable oils. Wu and He (2014) developed a quantification model on the basis of 1H NMR and partial least squares (PLS) regression to determine the contents of docosahexaenoic acid and eicosapentaenoic acid in algal oil. Fang, Jing, Tay, Lau, and Li ( 2013) built PLS models with 1H NMR and GC–MS data to quantify the animal fats in canola oil. In our previous studies, NMR has also been successfully applied for the rapid prediction of adulteration levels in camellia oils with the help of orthogonal projections to latent structure–discriminant analysis (OPLS–DA) and PLS (Shi, et al., 2018). All the aforementioned studies confirmed that NMR was a powerful tool for both qualitative and quantitative analysis. To the best of our knowledge, no study based on an NMR–PLS approach has been reported for the determination of fatty acid composition in camellia oils. Thus, in this study, the probability of determining the fatty acid composition of camellia oils through a PLS model based on 1H NMR data was investigated as an alternative to chromatographic analysis. To improve the predictive ability of the PLS model, three different pretreatment methods (Ctr, UV and Par) were applied and 4

compared during the PLS model development.

2. Materials and methods

2.1. Samples Sixty-six camellia oil samples, including 49 pure camellia oils and 17 camellia oil blends, were collected from different areas of Jiangxi Province between January 2016 and May 2016. After collection, all these samples were stored at −20 °C before analysis.

2.2. Chemicals and reagents Reference standards GLC-463, internal standards triheneicosanoin (C21:0, TAG) and methyl heneicosanoate (C21:0, FAME) were purchased from Nu-Chek Prep, Inc. (Elysian, MN). FAMEs of C18:2 isomer mixture and C18:3 isomer mixture were bought from Supelco Inc. (Bellefonte, PA,). HPLC-grade n-heptane and chloroform were bought from Fisher Scientific (Fair Lawn, NJ). HPLC-grade methanol was purchased from Merck (Darmstadt, Germany). Guaranteed grade potassium hydroxide and deuterated chloroform (CDCl3 99.8%-d) containing tetramethylsilane (TMS 0.3% v/v) were purchased from Aladdin (Shanghai, China).

2.3. Fatty acid composition analysis Fatty acids composition analysis was carried out using a GC equipped with a flame ionization detector (FID) according to the method reported in a previous paper (Chen, Yang, Nie, Yang, Wang, Yang, et al., 2014).

2.4. 1H NMR spectroscopy acquisition 1

H NMR samples were prepared as follows: 200 μL of oil and 800 μL of CDCl3 containing 5

tetramethylsilane (TMS 0.3% v/v) were mixed thoroughly for 10 s. Then 600 μL of the well-mixed solution wer transferred to a 5-mm NMR tube for subsequent 1H NMR analysis. The spectra were acquired on a Bruker Avance 600MHz spectrometer (Bruker, Karlsruhe, Germany) equipped with a cryoprobe incorporating a z-axis gradient, operating at 298 K. According to published work (Shi, et al., 2018), the acquisition parameters were selected as follows: 32 scans and 4 dummy scans for each free induction decay, 32 K of time domain points with a spectral width of 13.0 ppm, 90° pulse width of 6.5 μs, acquisition time of 3.0 s and relaxation delay of 1.0 s.

2.5. 1H NMR spectra pre-processing The NMR raw data sets were introduced into MestReNova software (Mestrelab Research, Santiago de Compostela, Spain). The chemical shift scale was referenced to the signal of the TMS (0 ppm). Phase correction and baseline correction were performed automatically to ensure a better quantitative comparison of the spectra. In order to perform the statistical analysis, the spectra were integrated over the region from 10 to 0.5 ppm, excluding residual solvent signal (7.60–6.90 ppm) in the MestReNova. The integration width for those signals should be carefully selected to avoid the inclusion of side bands that would affect the precision of the result. According to our published work, 16 spectral signals (Fig.1) were integrated and normalized for subsequent data processing.

2.6. Chemometric analysis Prediction models were developed using partial least squares (PLS) regression algorithm by relating 1H NMR spectra to the reference measurements. PLS regression is a procedure that extracts the primary components from the X-block and Y-block simultaneously and then creates a multivariate linear model (Li, Zhang, Yao, & Jiang, 2017). The X matrices containing the intensities of the 15 selected 1H NMR resonances and the vector Y as the concentrations of fatty acids 6

(palmitic, stearic, oleic, linoleic and linoleic acid, as well as saturated fatty acids (SFA), unsaturated fatty acids (USFA), monounsaturated fatty acids (MUFA), and polyunsaturated fatty acids (PUFA)) determined by GC were utilized to build the regression models. Three different data pretreatment methods including mean centering (Ctr), auto-scaling (unit variance, UV) and Pareto scaling (Par) were tested for the data matrix prior to PLS model construction. The data pretreatment and PLS regression process were performed in the SIMCA P+ version 13 software (Umetrics, Malmö, Sweden). The 66 camellia oil samples were randomly assigned to a calibration set comprising 2/3 of the samples (n = 44) and a validation set containing 1/3 of samples (n = 22). The internal leave-7-out cross validation methodology was used on calibration sets for models development. R2Y(cum) and Q2(cum) are the two parameters related with cross validation. R2Y (cum) (i.e., the explained variation in the Y matrix) evaluates the goodness of fit, while Q2(cum) (i.e., the predicted variation) indicates the predictive ability of the model (Bjarnestad & Dahlman, 2002). Moreover, the performance of the PLS models was assessed by the correlation coefficient (R2), root mean square error of cross validation (RMSECV), root mean square error of estimation (RMSEE) and root mean square error of prediction (RMSEP). RMSECV was obtained by summarizing the cross-validation residuals of the observations in the calibration set and was used as a metric for adjusting the latent variables (LVs) of PLS models. Generally, an excellent model should yield higher R2 value but lower RMSEE and RMSEP values (Chen, Zhu, Xie, Nie, Liu, Li, et al., 2008). In order to check the validity and the degree of overfit for the PLS model, the response permutation test was frequently used. In this test, with X fixed, the order of the elements in the Y-vector is randomly permuted several times and then the cross-validation process described above was repeated. With the LVs unchanged, each time a new PLS model is fitted using X and the permuted Y. After cross-validation, both R2Y and Q2 values for the derived models are calculated. For robust models, the intercept value of R2Y and Q2 should be less than 0.4 and 0.05, respectively. 7

(Bjarnestad et al., 2002). In addition, to check the statistical significance of the PLS model, the CV–ANOVA procedure was applied. The significance level was considered that of p ≤ 0.05.

2.7. Statistical analysis All of the chemical analyses were carried out in triplicate, and values were expressed as the mean ± standard deviation (SD). Statistical significance (t-test: two-sample equal variance, using two-tailed distribution) was determined using SPSS 23.0 (SPSS Inc., Chicago, IL). Differences at p ≤ 0.05 were considered to be significant.

3. Results and discussion

3.1. Fatty acid composition The fatty acid composition of camellia oils analyzed by traditional GC method is shown in Table 1. Consistent with reported studies (Ma, Ye, Rui, Chen, & Zhang, 2011; Wang, et al., 2017), 12 kinds of fatty acids were detected in the pure camellia oils. Along with the 12 fatty acids, eicosadienoic acid (C20:2) was exclusively detected in camellia oil blends, which indicated that C20:2 was introduced from the vegetable oil into the blended camellia oils. For both pure and blended camellia oils, the main FA composition was composed of oleic (C18:1), linoleic (C18:2), palmitic (C16:0), stearic (C18:0) and linolenic (C18:3) acids. The above five fatty acids made up more than 96% of total fatty acid composition. Similar results for olive, canola, hazelnut, cottonseed and sunflower oils were reported by Yalcin, Toker, Ozturk, Dogan, and Kisi (2012). However, the relative content of those individual FA has a great difference between 66 samples of the present work, which may be explained by different cultivars. The highest variability was observed for oleic acid, with standard deviation at 19.7. This was in agreement with the reported results (Yuan et al., 8

2013). According to Yuan et al. (2013), the reference values of these fatty acids had a broad range of variation, which was helpful for developing feasible and robust calibration models. Thus, the prediction models for the five abundant fatty acids will be developed in the following study. In order to comprehensively analyze the fatty acid composition of camellia oils, the PLS models for SFA, USFA, MUFA and PUFA as the main fatty acid groups were also built.

3.2. 1H NMR analysis Fig. 1 shows the characteristic 1H NMR spectra of pure and blended camellia oil. The assignment of these signals is well established (Castejón, Mateos-Aparicio, Molero, Cambero, & Herrera, 2014; Merchak, Bacha, Khouzam, Rizk, Akoka, & Bejjani, 2016; Popescu, et al., 2015), as shown in Table 2. Since triglycerides (TGs) are the main component of all vegetable oils, most of the signals in Fig.1 were assigned to the groups related to fatty acids. There were also several weak signals around 1.7 and 0.7 ppm, which were assigned to the minor components in vegetable oil including, mainly, squalene and sterols. 1

H NMR spectra of pure and blend camellia oil were similar in the peak profile but different in

the peak intensities. The three signals around 2.8 ppm (signal 7, =CH–CH2–CH=), 2.7 ppm (signal 8, =CH–CH2–CH=), and 1.0 ppm (signal 13, -CH=CH–CH2–CH3), were attributed to bis-allylic protons of linolenyl and linoleyl group and terminal methyl protons of linolenic acid, which were responsible for the main characteristic variations in the tested camellia oil with different purities. As shown in Table 1, the camellia oil blend contained higher amounts of linoleic and linolenic acid than pure camellia oil, and hence, higher intensities of these signals were observed in Fig.1. Therefore, the characteristic signal intensities obtained from 1H NMR spectroscopy can be used as an important criterion to evaluate the fatty acid composition of camellia oil. 9

However, due to the overlapping peaks in 1H NMR spectra, it is difficult to quantify these components relying solely on 1H NMR spectra (Li, Xu, Wang, Zhai, Chen, & Liu, 2017). Thus, it is necessary to apply chemometric methods to make full use of the signals in 1H NMR spectra for the determination of different fatty acids.

3.3. PLS regression analysis Among plentiful chemometrics quantitative analysis methods, PLS combining the features of PCA and multiple linear regression, is the most common multivariate calibration method. It is particularly recommended in cases of regression where the matrix of predictors has more variables than observations, and where it is likely that the explanatory variables are correlated (Gómezcaravaca, Maggio, & Cerretani, 2016; Chen, Xie, Zhang, Wang, Nie, & Li, 2012). In view of this, PLS was chosen to construct the quantitative model for fatty acids, i.e., oleic (C18:1), linoleic (C18:2), palmitic (C16:0), stearic (C18:0) and linolenic acid (C18:3), as well as SFA, USFA, MUFA and PUFA determined in the present study.

3.3.1 Excluding outliers Detection and removal of the outliers from the dataset is necessary during the model development. In the present study, outliers were identified using two powerful methods available in SIMCA–P: Hotelling’s T2 (identifies strong outliers) and Distance to Model X (identifies moderate outliers). The former is a multivariate generalization of Student’s t-test, providing a check for observation adhering to multivariate normality (Tugizimana, Steenkamp, Piater, & Dubery, 2016), while the latter was based on the critical limit for the distance to model (DCrit) calculated by an inverse cumulative F-distribution function (Eriksson, Trygg, & Wold, 2014). Both tests were performed with the corresponding 95% confidence interval in this study (Fig.2). Inspecting the 10

Hotelling’s T2 range plots (Fig.2a), two observations (No. 8 and 48) were regarded as strong outliers with T2 values higher than 8.514, which was the median T2 for all the samples. Thus, these two samples should be removed for further modeling. The moderate outliers were further identified by inspecting the distance to the model in X-space (DModX) plots. Generally, observations with a DModX larger than twice the Dcrit are considered moderate outliers (Silva, Vercruysse, Vervaet, Remon, Lopes, Beer, et al., 2018). As shown in Fig. 2b, sample No. 50 with an extra high DModX value greater than 2.934 was considered as a moderate outlier and would be further removed from the test samples. After the outlier removal, 63 samples remained for the development of the model.

3.3.2 Development of the PLS models with different pretreated data In order to decrease the influence of disturbing factors contained in the dataset, an appropriate data pretreatment method is vital during the model building process. To assess the effect of data pretreatment algorithms on the PLS models for each fatty acid, three different pretreatment methods including mean centering (Ctr), unit variance scaling (UV, also called auto-scaling) and Pareto scaling (Par) were compared. The Ctr focuses on the fluctuating part of the data, and leaves only the relevant variation (being the variation between the samples) for further analysis (Van den Berg, Hoefsloot, Westerhuis, Smilde, & Van der Werf, 2006). The UV scaling is equivalent to make all the variable axes to have the same length, which provides equal importance to all the variables. On the other hand, the Pareto scaling makes variables with larger variation to be “heavier weighted” in the analysis than variables with smaller variation (Eriksson, Trygg, & Wold, 2014). The LVs represent the spectrum information of sample components, and reasonable choice of LVs is crucial for noise elimination and the full use of spectral information. Thus, the choice of latent variables requires a joint optimization with the spectroscopy pretreatment methods. 11

In the present study, the default SIMCA-P cross-validation procedure (7-fold cross-validation) was applied to optimize the number of LVs. For 7-fold cross-validation, the dataset is split into 7 different subsets. For a fixed number of LVs, the Y-values of all individuals of each subset are predicted using a submodel built with the 6 other subsets (calibration subset). The differences between the predicted Y-values and the observed Y-values are used to calculate the RMSECV which is further used in the selection of an optimal number of LVs. The procedure starts at LV = 1 and is repeated by increasing LVs. LVs with the lowest values of RMSECV are selected as optimum (Triba, Le, Amathieu, Goossens, Bouchemal, Nahon, et al., 2014). As shown in Fig. 3, take the oleic acid prediction model with Par as preprocessing method as example, the values of RMSECV were almost constant from the third LV. Therefore, the optimum number of LVs for this PLS model was selected as 3. The optimum number of LVs for all those developed prediction models with varied pretreatment methods is shown in Table 3, with values ranging from 2–4.

3.3.3 Performance validation of the model Cross-validation was also used to estimate the ability of a model to predict correctly the Y response matrix of new individuals. As shown in Table 3, results of the cross-validation procedure were summarized by the value of different quality parameters (R2Y and Q2). As a function for the goodness of fit, greater R2Y (cum) values indicate better modeling (Özdemir, Dağ, Makuc, Ertaş, Plavec, & Bekiroğlu, 2018). As an indicator of the prediction ability of the model, Q2 (cum) value > 0.5 and 0.9 is admitted for good and excellent predictability respectively. Taking into account that a large discrepancy between R2Y(cum) and Q2 (cum) indicates an overfitting of the model through the use of too many components, the difference between R2Y(cum) and Q2 (cum) should be lower than 0.3 (Bjarnestad et al., 2002; Han, Huang, Hylands, & Legido-Quigley, 2016). As can be seen from 12

Table 3, except for the prediction model for C16:0 built with Pareto scaling as pretreatment method, all the other models were demonstrated with R2Y(cum) above 0.929 and Q2(cum) above 0.91, indicating good fitness and excellent predictability. In order to avoid over-fitting, a permutation test was applied as an additional way to test the reliability and over-fitting of the model. As the permutation parameters showed in Table 3, for all the PLS models built with the raw data, the intercept value of R2Y and Q2Y were above 0.4 and 0.05, respectively, indicating that over-fitting might exist due to the noise and undesirable features in the raw data. Usually, it is a vital step to remove undesirable features from the X-data by pre-treatment techniques before constructing a PLS model. As expected, in the present study, the models built with three different pretreatment methods showed appropriate fitness of the data. Furthermore, the CV–ANOVA was used to check the statistical significance of the obtained models. The computed PLS models for each fatty acid were effective models with extremely low p-values (Table 3). The performance parameters (R2, RMSEE, and RMSEP) of all the models constructed with different pretreatment methods are summarized in Table 3. The optimal pretreatment method for each fatty acid model was obtained based on the minimized value of RMSEE and RMSEP. The results showed that the optimal models for predicting the content of C18:1, C18:2, C18:3, SFA, USFA, MUFA and PUFA were achieved by Par scaling pretreatment, with R2 above 0.99. In addition, the values of RMSEE and RMSEP were lower than 0.954 and 0.947, respectively. Finally, the PLS model with Ctr as preprocessing method was the most robust for C16:0 and C18:0 with the best performance indicators (R2 ≥ 0.945, RMSEE ≤ 0.377, RMSEP ≤ 0.212). Fig. S1 showed the predicted fatty acid values based on 1H NMR spectra versus the reference values, for the optimized model. Strong correlations between the selected 1H NMR spectra of all the tested oil samples and their fatty acids composition were observed. The results demonstrated that it is feasible to predict 13

the fatty acids composition directly from the 1H NMR spectra of the camellia oil sample based on this developed PLS model.

4. Conclusions This study investigated whether 1H NMR spectroscopy analysis can be combined with multivariate statistical analysis to predict the fatty acid composition of camellia oil samples. The experimental results clearly demonstrated that several factors such as the outlier samples, LVs numbers and data pre-processing methods may significantly affect the quality of computed PLS models. After optimization, Pareto scaling was found to be the best pretreatment method for most of the prediction model with high correlation coefficients and small errors. The good performance indicated that 1H NMR combined with PLS could be applied as a precise and rapid method to determine the fatty acid composition in camellia oils. This method is faster and easier than traditional methods (Mckenzie, Donarski, Wilson, & Charlton, 2011). Thus, it can be used as an alternative method to determine the fatty acid composition in camellia oils. The methodology described in this study would be useful for further attempts for the prediction of fatty acid composition in other vegetable oils.

Acknowledgements The financial supports from the National Natural Science Foundation of China (No: 31471647, 21467016), the Key Technologies R & D Program of Jiangxi Province (No. 20152ACF60012), the Program from State Key Laboratory of Food Science and Technology (SKLF-QN-201502) and the Natural Science Foundation of Jiangxi (20171ACB21015) are gratefully acknowledged.

14

References Bjarnestad, S., & Dahlman, O. (2002). Chemical compositions of hardwood and softwood pulps employing photoacoustic Fourier transform infrared spectroscopy in combination with partial least-squares analysis. Analytical Chemistry, 74(22), 5851-5858. Castejón, D., Mateos-Aparicio, I., Molero, M. D., Cambero, M. I., & Herrera, A. (2014). Evaluation and Optimization of the Analysis of Fatty Acid Types in Edible Oils by 1H NMR. Food Analytical Methods, 7(6), 1285-1297. Chen Y, Xie M. Y., Zhang H., Wang, Y. X., Nie, S. P., & Li, C. (2012). Quantification of total polysaccharides and triterpenoids in Ganoderma lucidum, and Ganoderma atrum, by near infrared spectroscopy and chemometrics[J]. Food Chemistry, 135(1):268-275. Chen, Y., Yang, Y., Nie, S. P., Yang, X., Wang, Y. T., Yang, M. Y., Li, C., & Xie, M. Y. (2014). The analysis of trans fatty acid profiles in deep frying palm oil and chicken fillets with an improved gas chromatography method. Food Control, 44, 191-197. Chen, Y., Zhu, S. B., Xie, M. Y., Nie, S. P., Liu, W., Li, C., Gong, X. F., & Wang, Y. X. (2008). Quality control and original discrimination of Ganoderma lucidum based on high-performance liquid chromatographic fingerprints and combined chemometrics methods. Analytica Chimica Acta, 623(2), 146-156. Eriksson, L., Trygg, J., & Wold, S. (2014). A chemometrics toolbox based on projections and latent variables. Journal of Chemometrics, 28(5), 332-346. Fang, G., Jing, Y. G., Tay, M., Lau, H. F., & Li, S. F. Y. (2013). Characterization of oils and fats by 1 H NMR and GC/MS fingerprinting: Classification, prediction and detection of adulteration. Food Chemistry, 138(2-3), 1461-1469. Gómezcaravaca, A. M., Maggio, R. M., & Cerretani, L. (2016). Chemometric applications to assess quality and critical parameters of virgin and extra-virgin olive oil. A review. Analytica Chimica Acta, 913, 1-21. Han, P., Huang, Y., Hylands, P. J., & Legido-Quigley, C. (2016). Assessment of Polygonum capitatum Buch.-Ham. ex D.Don by metabolomics based on gas chromatography with mass spectrometry. Journal of Separation Science, 39(10), 1979-1986. He, L., Zhou, G. Y., Zhang, H. Y., & Liu, J. A. (2011). Research progress on the health function of tea oil. Journal of Medicinal Plants Research, 5(4), 485-489. Kamal-Eldin, A., & Andersson, R. (1997). A multivariate study of the correlation between tocopherol content and fatty acid composition in vegetable oils. Journal of the American Oil Chemists Society, 74(4), 375-380. Kim, J. K., Hui, G. P., Kim, C. R., Lim, H. J., Cho, K. M., Choi, J. S., Shin, D. H., & Shin, E. C. (2014). Quality Evaluation on Use of Camellia Oil as an Alternative Method in Dried Seaweed Preparation. Preventive Nutrition and Food Science, 19(3), 234-241. Kritioti, A., Menexes, G., & Drouza, C. (2017). Chemometric characterization of virgin olive oils of the two major Cypriot cultivars based on their fatty acid composition. Food Research International, 103, 426-437. Lennart, E., Johan, T., & Svante, W. (2008). CV‐ ANOVA for significance testing of PLS and OPLS® models. Journal of Chemometrics, 22(11-12), 594-600. Li, B. Q., Xu, M. L., Wang, X., Zhai, H. L., Chen, J., & Liu, J. J. (2017). An approach to the simultaneous quantitative analysis of metabolites in table wines by 1H NMR self-constructed three-dimensional spectra. Food Chemistry, 216, 52-59. 15

Li, M., Zhang, L., Yao, X., & Jiang, X. (2017). Membrane Introduction Mass Spectrometry Combined with an Orthogonal Partial-Least Squares Calibration Model for Mixture Analysis. Analytical Sciences, 33(11), 1225-1230. Li, X., Kong, W., Shi, W., & Shen, Q. (2016). A combination of chemometrics methods and GC–MS for the classification of edible vegetable oils. Chemometrics & Intelligent Laboratory Systems, 155, 145-150. Ma, J., Ye, H., Rui, Y., Chen, G., & Zhang, N. (2011). Fatty acid composition of Camellia oleifera oil. Journal of Consumer Protection and Food Safety, 6(1), 9-12. Mckenzie, J. S., Donarski, J. A., Wilson, J. C., & Charlton, A. J. (2011). Analysis of complex mixtures using high-resolution nuclear magnetic resonance spectroscopy and chemometrics. Progress in Nuclear Magnetic Resonance Spectroscopy, 59(4), 336-359. Merchak, N., Bacha, E. E., Khouzam, R. B., Rizk, T., Akoka, S., & Bejjani, J. (2016). Geoclimatic, morphological, and temporal effects on Lebanese olive oils composition and classification: A 1H NMR metabolomic study. Food Chemistry, 217, 379-388. Muik, B., Lendl, B., Molina-DıA ́ z, A., & Ayora-Cañada, M. J. (2003). Direct, reagent-free determination of free fatty acid content in olive oil and olives by Fourier transform Raman spectrometry. Analytica Chimica Acta, 487(2), 211-220. Özdemir, İ. S., Dağ, Ç., Makuc, D., Ertaş, E., Plavec, J., & Bekiroğlu, S. (2018). Characterisation of the Turkish and Slovenian extra virgin olive oils by chemometric analysis of the presaturation 1H NMR spectra. Food Science and Technology, 92, 10-15. Popescu, R., Costinel, D., Dinca, O. R., Marinescu, A., Stefanescu, I., & Ionete, R. E. (2015). Discrimination of vegetable oils using NMR spectroscopy and chemometrics. Food Control, 48, 84-90. Shi, T., Zhu, M., Chen, Y., Yan, X., Chen, Q., Wu, X., Lin, J., & Xie, M. (2018). 1H NMR combined with chemometrics for the rapid detection of adulteration in camellia oils. Food Chemistry, 242, 308-315. Silva, A. F., Vercruysse, J., Vervaet, C., Remon, J. P., Lopes, J. A., De Beer, T., & Sarraguça, M. C. (2018). Process monitoring and evaluation of a continuous pharmaceutical twin-screw granulation and drying process using multivariate data analysis. European Journal of Pharmaceutics and Biopharmaceutics,128, 36-47. Su, M. H., Shih, M. C., & Lin, K.H. (2014). Chemical composition of seed oils in native Taiwanese Camellia species. Food Chemistry, 156(3), 369-373. Triba, M. N., Le, M. L., Amathieu, R., Goossens, C., Bouchemal, N., Nahon, P., Rutledge, D. N., & Savarin, P. (2014). PLS/OPLS models in metabolomics: the impact of permutation of dataset rows on the K-fold cross-validation quality parameters. Molecular Biosystems, 11(1), 13-19. Tugizimana, F., Steenkamp, P. A., Piater, L. A., & Dubery, I. A. (2016). A Conversation on Data Mining Strategies in LC-MS Untargeted Metabolomics: Pre-Processing and Pre-Treatment Steps. Metabolites, 6(4), 40. Van den Berg, R. A., Hoefsloot, H. C., Westerhuis, J. A., Smilde, A. K., & Van der Werf, M. J. (2006). Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics, 7(1), 142. Wang, L., Lee, F. S. C., Wang, X., & He, Y. (2006). Feasibility study of quantifying and discriminating soybean oil adulteration in camellia oils by attenuated total reflectance MIR and fiber optic diffuse reflectance NIR. Food Chemistry, 95(3), 529-536. Wang, X., Zeng, Q., Verardo, V., & Mdm, C. (2017). Fatty acid and sterol composition of tea seed 16

oils: Their comparison by the "FancyTiles" approach. Food Chemistry, 233, 302-310. Wu, D., & He, Y. (2014). Potential of spectroscopic techniques and chemometric analysis for rapid measurement of docosahexaenoic acid and eicosapentaenoic acid in algal oil. Food Chemistry, 158(11), 93-100. Yalcin, H., Toker, O. S., Ozturk, I., Dogan, M., & Kisi, O. (2012). Prediction of fatty acid composition of vegetable oils based on rheological measurements using nonlinear models. European Journal of Lipid Science and Technology, 114(10), 1217-1224. Yuan, J., Wang, C., Chen, H., Zhou, H., & Ye, J. (2013). Prediction of fatty acid composition in Camellia oleifera oil by near infrared transmittance spectroscopy (NITS). Food Chemistry, 138(2-3), 1657-1662. Zeb, A. (2012). Triacylglycerols composition, oxidation and oxidation compounds in camellia oil using liquid chromatography-mass spectrometry. Chemistry and Physics of Lipids, 165(5), 608-614.

17

Figure Captions Fig. 1 1H NMR spectrum at 600 MHz of pure camellia oil and camellia oil blend. The 16 selected NMR signals are labeled and some of them showed in expanded scale. Fig. 2 a) Hotelling T2 Score plot for the detection of strong outliers, b) Distance to the model X (DModX) plot for the detection of moderate outliers

Fig. 3 Root mean square error of cross validation (RMSECV) as metric of the number of latent variables (LVs) for the PLS model.

18

Table 1 The reference concentration of each fatty acid in 66 samples determined by GC (%) Camellia oil blends (n = 17) Fatty acids

Min.

Pure camellia oils (n = 49)

Max.

Mean ± SD

Min.

Max.

Mean ± SD

c

Major fatty acids (% of total FA) C16:0

4.00

10.84

7.23a±3.21

5.84

9.64

8.54a±0.61

C18:0

1.85

4.66

3.03a±1.23

2.02

2.65

2.18b±0.11

C18:1

21.07

64.68

44.94a±19.7

72.32

80.94

79.59b±1.23

a

C18:2

17.99

53.57

34.37 ±16.31

6.61

13.36

7.99b±0.99

C18:3

4.79

9.33

7.30a±1.37

0.44

0.92

0.68b±0.08

Minor fatty acidsd (% of total FA) C14:0

0.04

0.08

0.06a±0.02

0.04

0.06

0.04b±0.00

C16:1

0.08

0.19

0.14a±0.06

0.08

0.14

0.10b±0.01

C17:0

0.04

0.10

0.07a±0.02

0.04

0.08

0.06a±0.01

C20:0

0.33

0.62

0.49a±0.12

0.04

0.18

0.06b±0.03

C20:2

0.03

0.06

0.05±0.01

–e





a

C22:0

0.35

0.49

0.43 ±0.04

0.03

0.57

0.08b±0.1

C22:1

0.12

0.16

0.15a±0.01

0.03

0.21

0.05b±0.03

C24:0



a

0.16

0.08 ±0.08

0.04

0.08

0.07a±0.01

Fatty acid groups (% of total FA) TFAf

0.60

2.92

1.67a±0.74



1.45

0.55b±0.35

SFAf

7.06

16.27

11.39a±4.30

9.38

12.26

11.03a±0.44

USFAf

81.42

92.15

86.94a±4.66

86.58

90.43

88.42a±0.63

MUFAf

21.26

65.03

45.22a±19.77

72.49

81.25

79.75b±1.24

PUFAf

25.06

60.51

41.72a±15.19

7.23

14.09

8.67b±1.01

Values represent the means ± SD from triplicate experiments. Values in the same rows with different letters are significantly different (p < 0.05). c:>1% of total FA; d: ≤1% of total FA; e: not detected; f: TFA - trans-fatty acids; SFA - saturated fatty acids; USFA - unsaturated fatty acids; MUFA - monounsaturated fatty acids; PUFA - polyunsaturated fatty acids.

19

Table 2 Chemical shift assignment of 1H-NMR signals for the main components in vegetable oils with the peak notation shown in Fig. 1 Signal

Chemical shift

Multiplicity

Proton

Assignment

(ppm) 1

5.42–5.29

m

–CH=CH–

all unsaturated fatty acids

2

5.29–5.22

m

﹥CHOCOR

sn-2 triacylglycerols

3

4.36–4.24

dd

–CH2OCOR

sn-1 triacylglycerols

4

4.20–4.10

dd

–CH2OCOR

sn-3 triacylglycerols

4.10–4.05

m

–CH2–

sn-1,3-diacylglycerols

5

4.04–3.98

m

–CH2–

sn-1,3-diacylglycerols

6

3.76–3.68

d

–CH2–

sn-1,2-diacylglycerols

7

2.84–2.79

t

=CH–CH2–CH=

linolenyl group

8

2.79–2.70

t

=CH–CH2–CH=

linoleyl group

9

2.40–2.20

dt

–OCO–CH2–

all acyl groups

10

2.08–1.94

m

–CH2–CH=CH–

oleyl, linoleyl and linolenyl groups

11

1.70–1.67

s

1.67–1.50

m

–OCO–CH2–CH2–

all acyl groups

12

1.40–1.14

m

–(CH2) n–

all acyl groups

13

1.02–0.92

t

–CH=CH–CH2–CH3

linolenyl group

14

0.92–0.80

t

–CH2–CH2–CH2–CH3

all acyl groups except linolenyl

15

0.72–0.69

s

stigmasterol

0.66–0.69

s

β-sitosterol

0.60–0.50

d

triterpene alcohol (cycloartenol)

16

squalene

s: single; d: doublet; t: triplet; m: multiplet; dt: double triplet; dd: double doublet.

20

Table 3 The parameters of the fatty acid predictive models (n = 63) R2Y(cum)

Q2(cum)

Q2

intercept

intercept

0.077

–0.375

0.024

–0.257

–0.044

–0.213

0.959

0.947

0.054

–0.395

–0.008

–0.3

–0.021

–0.275

0.918

0.902

0.056

–0.383

0.00

–0.006

–0.315

1.182

0.00

–0.05

–0.208

1.23

0.818

0.00

0.934

0.921

2.397

1.132

0.062

–0.376

acids

Treatment

C16:0

UV

0.956

0.92

3

0.956

0.373

0.583

0.417

Par

0.962

0.549

3

0.962

0.348

1.187

0.231

Ctr

0.954

0.951

2

0.954

0.377

1.003

0.212

None

0.998

0.998

4

0.959

0.365

0.403

0.28

UV

0.929

0.91

3

0.929

0.194

0.211

0.143

Par

0.942

0.924

3

0.942

0.176

0.19

0.13

Ctr

0.945

0.923

3

0.945

0.171

0.191

0.127

None

0.996

0.995

3

0.946

0.171

0.184

0.125

UV

0.992

0.987

3

0.992

1.697

2.609

1.352

Par

0.999

0.999

3

0.999

0.651

0.968

0.937

Ctr

0.995

0.995

2

0.995

1.276

1.372

None

1

1

3

0.998

0.881

UV

0.998

0.98

3

0.988

1.589

C18:1

C18:2

RMSEE

R2Y

Data

C18:0

LVs

R2

Fatty

21

RMSECV

RMSEP

p-Value

9.91×10−15 2.28×10−4 5.61×10−8 0.00 1.17×10−16 3.01×10−18 3.49×10−18 0.00 5.23×10−28

2.11×10−25

C18:3

SFA

USFA

Par

0.996

0.994

3

0.996

0.954

1.261

0.74

Ctr

0.994

0.992

2

0.994

1.095

1.304

0.898

None

0.999

0.998

3

0.998

0.719

1.171

0.643

UV

0.99

0.984

4

0.99

0.313

0.374

0.176

Par

0.998

0.994

4

0.998

0.148

0.377

0.108

Ctr

0.998

0.979

2

0.988

0.332

0.559

0.211

None

0.992

0.988

3

0.988

0.341

0.506

0.197

UV

0.979

0.957

3

0.979

0.317

0.627

0.287

Par

0.99

0.989

3

0.99

0.215

0.27

0.214

Ctr

0.982

0.975

2

0.982

0.289

0.35

0.297

None

0.999

0.999

3

0.968

0.389

0.426

0.251

UV

0.971

0.946

3

0.971

0.427

0.806

0.535

Par

0.982

0.97

4

0.982

0.338

0.439

0.383

Ctr

0.966

0.955

2

0.966

0.458

0.573

0.424

22

3.81×10−35 1.14×10−37 0.00 4.30×10−27 5.42×10−27 6.16×10−26 1.84×10−29 1.03×10−16 1.83×10−29 5.35×10−28 0.00 4.43×10−15 4.92×10−22 2.13×10−22

–0.01

–0.329

–0.048

–0.206

0.482

0.375

0.105

–0.482

0.146

–0.28

–0.048

–0.208

0.338

0.201

0.072

–0.37

0.022

–0.287

–0.064

–0.236

0.965

0.958

0.079

–0.369

0.133

–0.287

–0.039

–0.195

MUFA

PUFA

None

1

1

4

0.966

0.471

0.619

0.402

UV

0.992

0.989

3

0.992

1.733

1.977

2.258

Par

0.999

0.999

3

0.999

0.612

0.696

0.947

Ctr

0.996

0.996

2

0.996

1.166

1.142

None

1

1

3

0.998

0.787

UV

0.99

0.986

3

0.99

Par

0.997

0.996

3

Ctr

0.997

0.997

None

0.999

0.999

0.999

0.999

0.055

–0.389

0.00

0.005

–0.313

1.146

0.00

–0.056

–0.228

0.89

0.861

0.00

0.935

0.922

1.66

1.97

1.702

0.071

–0.365

0.997

0.927

0.956

0.848

0.00

0.001

–0.313

2

0.997

0.895

0.909

0.926

0.00

–0.049

–0.216

3

0.998

0.763

0.891

0.843

0.00

0.499

0.399

23

0.00 3.83×10−32

3.39×10−30

Fig.1

24

Fig. 2

25

Fig. 3

Declaration of interests ☐ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Highlights 1.

1

H NMR combined with PLS was used to predict the fatty acid composition in

camellia oil. 2. Outlier removal, LVs selection and data preprocessing were explored for model optimization. 3. Reliability of the PLS model was verified by response permutation test and CV-ANOVA. 4. This method showed a high accuracy for fatty acid compound determination.

26