Detection of unexpected frauds: Screening and quantification of maleic acid in cassava starch by Fourier transform near-infrared spectroscopy

Detection of unexpected frauds: Screening and quantification of maleic acid in cassava starch by Fourier transform near-infrared spectroscopy

Accepted Manuscript Detection of unexpected frauds: Screening and quantification of maleic acid in cassava starch by Fourier transform near-infrared s...

1MB Sizes 0 Downloads 21 Views

Accepted Manuscript Detection of unexpected frauds: Screening and quantification of maleic acid in cassava starch by Fourier transform near-infrared spectroscopy Hai-Yan Fu, He-Dong Li, Lu Xu, Qiao-Bo Yin, Tian-Ming Yang, Chuang Ni, Chen-Bo Cai, Ji Yang, Yuan-Bin She PII: DOI: Reference:

S0308-8146(17)30072-9 http://dx.doi.org/10.1016/j.foodchem.2017.01.061 FOCH 20453

To appear in:

Food Chemistry

Received Date: Revised Date: Accepted Date:

30 June 2015 27 December 2016 13 January 2017

Please cite this article as: Fu, H-Y., Li, H-D., Xu, L., Yin, Q-B., Yang, T-M., Ni, C., Cai, C-B., Yang, J., She, YB., Detection of unexpected frauds: Screening and quantification of maleic acid in cassava starch by Fourier transform near-infrared spectroscopy, Food Chemistry (2017), doi: http://dx.doi.org/10.1016/j.foodchem. 2017.01.061

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Detection of unexpected frauds: Screening and quantification of maleic acid in cassava starch by Fourier transform near-infrared spectroscopy Hai-Yan Fu a*, He-Dong Li a, Lu Xu b*, c, Qiao-Bo Yina, Tian-Ming Yang a, Chuang Ni a, Chen-Bo Cai d, Ji Yang a, and Yuan-Bin She c* a

The Modernization Engineering Technology Research Center of Ethnic Minority Medicine of Hubei province, School of Pharmaceutical Sciences, South-Central University for Nationalities, Wuhan 430074, PR China b College of Material and Chemical Engineering, Tongren University, Tongren, 554300, Guizhou, PR China c College of Chemical Engineering, Zhejiang University of Technology, Hangzhou 310014, PR China d College of Chemistry and Life Science, Chuxiong Normal University, Chuxiong 675000, PR China

* Corresponding authors. Tel: +86 (0)27 67841196; fax: +86 (0) 27 67841196. E-mail address: [email protected] (Hai-Yan Fu); [email protected] (Lu Xu); [email protected] (Yuan-Bin She).

Abstract Fourier transform near-infrared (FT-NIR) spectroscopy and chemometrics were adopted for the rapid analysis of a toxic additive, maleic acid (MA), which has emerged as a new extraneous adulterant in cassava starch (CS). After developing an untargeted screening method for MA detection in CS using one-class partial least squares (OCPLS), multivariate calibration models were subsequently developed using least squares support vector machine (LS-SVM) to quantitatively analyze MA. 1

As a result, the OCPLS model using the second-order derivative (D2) spectra detected 0.6% (w/w) adulterated MA in CS, with a sensitivity of 0.954 and specificity of 0.956. The root mean squared error of prediction (RMSEP) was 0.192 (w/w, %) by using the standard normal variate (SNV) transformation LS-SVM. In conclusion, the potential of FT-NIR spectroscopy and chemometrics was demonstrated for application in rapid screening and quantitative analysis of MA in CS, which also implies that they have other promising applications for untargeted analysis.

Keywords: Maleic acid; Cassava starch; Fourier transform near-infrared spectroscopy (FT-NIR); Least squares-support vector machine (LS-SVM); One-class partial least squares (OCPLS)

1. Introduction Cassava (Manihot esculenta Crantz) is an important food in tropical and subtropical regions, including Africa, Asia, Latin America and the Caribbean (Ampe, Sirvent, & Zakhia, 2001; Morales, Álvarez, & Sánchez, 2008). Cassava starch (CS) has been widely employed in food formulations for bread, cakes, biscuits, pasta, tapioca balls, and cassava noodles due to the specific characteristics, such as bland taste and flavor, 2

high paste clarity, and lower amylase content compared with wheat, potato, and maize starches (Fiorda, Soares, Silva, Grosmann, & Souto, 2013; Shittu, Dixon, Awonorin, Sanni, & Maziya-Dixon, 2008; Raja, 1995). However, in 2013, a serious food safety scandal caused significant public concern in Taiwan and Singapore when an industrial substance, maleic acid (MA), was identified in popular snacks from meat balls to the tapioca "pearls" used in bubble tea (ANN, 2014). MA can cause acute kidney failure, but high concentrations were intentionally added to powdered CS to impart and improve chewiness of the starch. Similar to melamine (Filazi, Sireli, Ekici, Can, & Karagoz, 2012) and diethylhexyl phthalate (DEHP) (Yang, Hauser, & Goldman, 2013), MA is another emerging toxic and illegal food additive that was not included in routine analysis of CS before the scandal occurred. Although analytical methods to detect MA in powdered CS were quickly developed after the scandal (Xu et al., 2013; Chen, Wu, & Wu, 2015; Tsai et al., 2015), it is still challenging to obtain a quick response and/or early warning signs of "unexpected" adulterants. Traditional mechanisms in response to food safety problems are less applicable because the analytical methods target known adulterants, or in other words, relevant analyses are only required and performed when an adulterant is discovered or suspected to exist. Because fraudulent new food and extraneous adulterants are constantly emerging, reliance on 3

routine targeted analysis enables the control of food adulterations and fraudulent foods to be caught in a vicious cycle of “adulteration, scandal, targeted analysis, new adulterants, etc.” (Xu, Yan, Cai, Wang, & Yu, 2013). To tackle such challenges, a new strategy of untargeted analysis has been proposed to enable the simultaneous screening of a range of known and unknown adulterants in foods (Moore, Ganguly, Smeller, Botros, & Bergana, 2012). In chemometrics, simultaneous screening of known targets and potential adulterants can be tackled by class modeling techniques (CMTs), which can predict whether a new object should be accepted or rejected by the modeled class (e.g., pure and authentic CS) based on analytical signals (Forina, Oliveri, Lanteri, & Casale, 2008; Oliveri, Egidio, Woodcock, & Downey, 2011). Untargeted screening of adulterants can be performed by (Deng et al., 2012; Vlachos & Arvanitoyannis, 2008): (1) typical objects of a pure and authentic food are collected, and the chemical features are measured; (2) a class model is trained using the typical and authentic objects to define the distribution; and (3) a new object is measured and predicted by the class model. A critical issue in the development of a class model is to collect a set of representative objects and control unwanted variations in the features of training objects. In this way, slight differences caused by low levels of adulterations or fraudulent foods can be predicted and detected by the class model. 4

Near-infrared (NIR) spectroscopy has been considered to be a rapid and effective technique to generate a profile of multiple chemical components in complex mixtures (Escuredo, González-Martín, Rodríguez-Flores, & Seijo, 2015; Núñez-Sánchez et al., 2016). Combined with chemometrics, NIR spectra have been widely used for component analysis and property prediction of food materials (Barbin et al., 2015; López, Trullols, Callao, & Ruisánchez, 2014; Viegas, Mata, Duarte, & Lima, 2016). The general objective of this study was to investigate the feasibility of using NIR spectroscopy and chemometrics for rapid analysis of MA in CS. An untargeted screening method to detect MA in CS was developed using one-class partial least squares (OCPLS) (Xu, Cai, & Deng, 2011; Xu, Yan, Cai, & Yu, 2013). In addition, quantitative analysis of MA was performed using least squares support vector machine (LS-SVM) (Suykens & Vandewalle, 1999). 2. Materials and Methods 2.1. Preparation of samples A set of 165 pure and authentic CS powder samples were collected from seven provinces in south China including the Guangdong (26), Guangxi (29), Fujian (21), Yunnan (22), Jiangxi (20), Sichuan (22) and Hainan (25) provinces. The CS samples were uniform fine powders and no sample preprocessing was performed. To prevent the influence of humidity, samples were taken from freshly opened packages. In addition, 5

adulterated CS samples were prepared by mixing and stirring the above pure CS powders with MA powders in an agate mortar (manually, with the help of a stirring rod) at nine different levels (0.10%, 0.60%, 1.1%, 3.5%, 6.1%, 8.6%, 11.0%, 16.0%, and 20%, w/w). Because this work studied the analysis performance of technique for different adulterants, the doping materials and the corresponding contents were considered as the two primary factors. Although the selection of pure CS samples to be adulterated was not a factor of interest, it could influence the discrimination of adulterated samples through variations in pure sample composition. Therefore, an incomplete and unbalanced randomized block design was performed (Gupta, Das, & Dey, 1991) with the types and contents of adulterants as primary factors and the selection of pure CS samples to be adulterated as a blocking factor. In this way, the influence of selecting pure CS samples to be adulterated could be largely reduced within a moderate total number of adulterated samples. At each doping level, 25 pure CS were randomly selected from the 165 pure CS samples, and then they were adulterated with MA. Therefore, a total of 225 adulterated CS samples were prepared. All of the pure and adulterated samples were fully dried with sun exposure, sealed with moisture-proof materials and maintained in a cool (at 4°C), dark and dry (relative humidity under 60%) area prior to NIR analysis. 2.2. NIR analysis of samples 6

The NIR diffuse reflectance spectra of CS samples taken from freshly opened packages were measured from 4000 to 10000 cm-1 on a Nicolet 6770 FTIR spectrometer (Thermo Fisher Scientific Inc., USA) using OMNIC 8.2 spectral collecting software. All of the spectra were measured by a PbS detector with an internal gold background as the reference. To reduce the influence of instrument shift, the sequence of NIR analysis was randomly arranged, and the instrument background was corrected each hour, based on the internal gold background. For each spectrum, 32 scans were performed; more scans did not significantly improve the signal quality. The resolution was 8 cm-1, and the scanning interval was 3.857 cm-1. Therefore, each spectrum had 1557 individual data points for chemometric analysis. 2.3. Spectral preprocessing and splitting of data sets Instrument shifts and the background of different batches can negatively influence the performance of class modeling. Therefore, proper data preprocessing is required to reduce unwanted non-composition-correlated spectral variations in the raw data. In this paper, smoothing, taking second-order derivatives (D2) and standard normal variate (SNV) transformation (Barnes, Dhanoa, & Lister, 1989) were used as data preprocessing methods. Both smoothing and D2 were computed using the Savitzky-Golay (S-G) algorithm (Savitzky & Golay, 1964) for simplicity and effectiveness. The Kennard and Stone (K-S) 7

algorithm (Kennard, & Stone, 1969) was used to divide the measured objects into representative training and test sets for both class modeling and multivariate calibration. 2.4. Class modeling using robust OCPLS OCPLS classifier (Xu et al., 2011) is a recently proposed CMT using partial least squares (PLS) regression. The response variable of OCPLS is a vector of ones, and unlike the ordinary PLS regression, the predictors in OCPLS should not be mean-centered. Two distance measures, the score distance (SD) and the absolute centered model residual (ACR), were defined based on the principal latent variables (LVs) and the error of response variables, respectively. The presence of outliers in the training or test set would cause bias or even breakdown in estimation of OCPLS parameters. In this work, a robust OCPLS algorithm, including partial robust M-regression (PRM) (Daszykowski, Vander Heyden, & Walczak, 2007; Serneels, Croux, Filzmoser, & Van Espen, 2005), was used for class modeling to remove the influence of potential outliers. Given a cutoff value of the percent of potential outliers, PRM was performed using all of the training objects. Outliers could be detected as those having the lowest weights. Finally, an ordinary OCPLS model with outliers removed was developed. The PRM-OCPLS model was generated using the MATLAB routines in the OCPLS toolbox (Xu, Goodarzi, Shi, Cai, & Jiang, 2014). 8

The performance of the OCPLS models with different spectral preprocessing was evaluated in terms of sensitivity (Sens) and specificity (Spec) of prediction defined as Sens = TP (TP + FN) (1)

,

Spec = TN (TN + FP) (2)

,

where TP, FN, TN, and FP are the numbers of true positives, false negatives, true negatives, and false positives, respectively. In this work, pure and adulterated CS objects were denoted as positives and negatives, respectively. 2.5. LS-SVM The multivariate calibration model relating the measured NIR spectra to the level of MA in CS was developed using the LS-SVM (Suykens & Vandewalle, 1999). LS-SVM is a simplified version of support vector machines (SVMs). LS-SVM uses equality-type constraints instead of quadratic programming as in the ordinary SVMs, and it has a much faster computation speed. In this study, the Gaussian radical basis function (RBF) was used as a nonlinear kernel transformation in LS-SVM. LS-SVM has two parameters, γ and σ, that must be optimized. The kernel width, γ, can be adjusted to control the nonlinear nature of the RBF. The regularization parameter, σ, controls the tradeoff between minimization of model structural risk and the learning error. The parameters of the LS-SVM models (γ and σ) were optimized by minimizing the prediction 9

errors of leave-one-out cross validation (LOOCV) using the simplex optimization. The tuning, training and prediction of LS-SVM models were performed using the LS-SVMLab v1.8 MATLAB toolbox. 3. Results and Discussion The raw NIR spectra for MA, as well as the pure and adulterated CS samples, are plotted in Fig. 1. In the pure CS spectra, the peak at 4007 cm-1 can be assigned to the combined absorbance of C-H and C-C stretching, and those at 4312 and 4373 cm-1 can be attributed to the combined absorbance of C-H and C-C stretching. Other peak assignments (Hodsagi, Gergely, Galencser, & Salgo, 2012; Cozzolino, Roumeliotis, & Eglinton, 2013) are as follows: 4748 cm-1 (combination of C-O stretching and O-H deformation), 5184 cm-1 (combination of the baseband of O-H stretching and the first overtone of C-O deformation), the broad band from 5300 to 6000 cm-1 (the first overtones of C-H stretching in various groups), 6329 and 6846 cm-1 (the first overtone of O-H stretching or N-H stretching), and 8308 cm-1 (the second overtones of C-H stretching in various groups). From Fig. 1, although MA has a reflectance pattern significantly different from CS and the maximum spectral intensity in terms of log(1/R) was higher than that of CS, the MA spectrum has a poor resolution, and the spectral variations caused by low-level adulterations of CS were not obvious. [please insert Figure 1 here] 10

The preprocessed spectra of pure and adulterated CS objects by smoothing, taking D2 and SNV are shown in Fig. 2. As observed from Fig. 2, with smoothing and SNV, the differences between pure and MA-adulterated CS were still not very obvious. Taking D2 spectra could enhance some detailed frequency information caused by MA, for example, the peaks at 8809 and 4756 cm-1. However, even with D2 spectra, chemometric models were required to extract useful information for discrimination because the spectral variations caused by low-level adulterations were not easily recognized by the naked eye. [please insert Figure 2 here] For untargeted detection of MA in CS, the K-S algorithm was performed to split the 165 pure CS objects into a training set of 100 objects and a test set of 65 objects. The 100 pure CS objects were used to train the OCPLS model, and the other 65 pure CS objects, as well as the 225 adulterated CS objects, were used for prediction. Wavelength or feature selection for class modeling is not trivial. Although feature selection has become routine in multivariate calibration/classification and it is generally recognized as effective in improving model performance, for class models, there are still no general guidelines for feature selection (Xu et al., 2012). Because it is difficult to perform exhaustive collection of all potential frauds, one cannot evaluate the influence of inclusion/exclusion of certain features on the detection of false objects 11

from non-target classes. Moreover, retaining more features generally has advantages in describing a class sufficiently. Therefore, no wavelength intervals were excluded from the measured data. Robust OCPLS models based on the PRM algorithm were developed using the raw, D2 and SNV spectra of 4000–10000 cm-1. For PRM-OCPLS, the percent of outliers was set to 10% (0.10) as a default. Monte Carlo cross validation (MCCV) (Xu & Liang, 2001) was performed to select the number of significant OCPLS LVs. For MCCV, the training objects (with outliers removed by robust OCPLS) were randomly divided 100 times, and each time, ten objects were left out for testing. The values of predicted residual sum of squares (PRESS) by MCCV were examined, and a lower model complexity was preferred when the PRESS could not be significantly reduced. [please insert Table 1 here] The training and prediction results of the OCPLS models are summarized in Table 1. FN is the number of pure CS objects that were wrongly rejected as adulterated ones, and FP is the number of adulterated CS objects that were wrongly accepted as pure ones. As shown in Table 1, the OCPLS models with smoothing, D2 and SNV spectra had 3 LVs, and the OCPLS with the raw spectra used 4 LVs. These results indicated that the spectral variations in pure CS objects were well controlled and the pure CS spectra had a data structure with low complexity. For all four 12

OCPLS models, the model sensitivity was high, and only three or four pure CS objects were wrongly classified as adulterated CS (FN). For the detection of adulterated CS, the best model was D2-OCPLS with a prediction sensitivity of 0.954 and a specificity of 0.956. By examining the origins of wrongly classified adulterated objects (FP), we found that all of the predicted FP objects came from the lowest doping levels among the nine levels investigated (25 objects for each level). Because of the spectral variations between pure CS and pure MA, the adulterated CS with higher MA levels had relatively larger SD and ACR values, which were easier to detect with the class model. For D2-OCPLS, the ten FP objects had a 0.1% doping level, indicating that 0.6% or higher MA could be safely detected in CS. SNV-OCPLS had 29 FP objects (25 objects with a 0.1% doping level and four objects with a 0.6% doping level) and could detect 1.1% or higher MA in pure CS. OCPLS with the raw data and smoothing could detect at least 6.1% and 3.5% of MA in pure CS, respectively. The training and prediction results of D2-OCPLS are also profiled in Fig. 3; when training the OCPLS with D2 spectra, there were three outlying pure CS objects, which had significantly larger SD and/or ACR values. In the PRM-OCPLS model, these three objects had the lowest weights and were diagnosed as outliers, which would not influence the estimation of critical values of SD and ACR. The high sensitivity, or the low FN, obtained with different data preprocessing 13

techniques also demonstrated that PRM-OCPLS could automatically detect and remove outliers. [please insert Figure 3 here] [please insert Table 2 here] For the quantitative analysis of MA in CS, multivariate calibration models of MA were developed using LS-SVM. Two hundred and fifty CS objects containing ten different levels (from 0% to 20%) of MA were divided into a training set (175 objects) and a test set (75 objects) using the K-S algorithm. The training and prediction results for different data preprocessing methods are listed in Table 2. D2 and SNV significantly reduced the prediction errors, indicating that the removal of spectral background and baseline improved the accuracy of the multivariate calibration. The lowest root mean squared error of prediction (RMSEP), 0.192 (%), was obtained by SNV-LS-SVM. A comparison of the root mean squared error of cross validation (RMSECV) and RMSEP of D2-LS-SVM indicated that taking D2 spectra would generate somewhat instable results for the multivariate calibration (Bouveresse, 1997). SNV transformation seemed to be the most suitable method for data preprocessing in the multivariate calibration. The prediction results by SNV-LS-SVM are plotted in Fig. 4, indicating good prediction accuracy of the model. The advantages of SNV might be attributed to the ability to reduce the effects of scattering effects, which were caused by the unequal 14

powder sizes of CS and MA. [please insert Figure 4 here] 4. Conclusions FT-NIR spectroscopy and chemometrics were combined to analyze MA adulterations in CS. Untargeted detection of MA in CS was performed by developing OCPLS class models of pure CS. Under the current experimental conditions, 0.6% or higher levels of MA adulterations were safely detected by D2-OCPLS. Because the adulteration of MA in CS was usually high (approximately 0.83%) (Liu & Ruan, 2014), D2-OCPLS was sufficient to detect MA adulteration in public markets. In addition, regarding the quality control and screening of potential adulterations, the proposed untargeted analysis can be used as a supplementary method to traditional targeted analytical methods. Quantitative analysis of MA was performed using the LS-SVM calibration, and as a result, the most accurate calibration model was obtained by SNV-LS-SVM. In summary, FT-NIR spectroscopy and chemometrics have demonstrated their potential for use in adulteration analysis in CS.

Acknowledgments Dr. Hai-Yan Fu and Dr. Yuan-Bin She are much grateful to Dr. Feng 15

Chen in the Department of Food, Nutrition and Packaging Sciences of Clemson University in South Carolina, USA, for his thorough review and error-correction of this manuscript. All authors are much grateful to editor and anonymous reviewers who offered many very helpful suggestions and appreciate language editing services by Elsevier Webshop Support. The authors also appreciate the financial support from the National Natural Science Foundation of China (Grant Nos. 21665022, 21576297, 21205145, and 21476270) and the major project of Science and Technology Department of Hubei Province (2016ACA138). Dr. Lu Xu thanks the financial support from Postdoctoral Science Project (No. 171172), the Open Research Program (No. GCTKF2014007) of State Key Laboratory Breeding Base of Green Chemistry Synthesis Technology (Zhejiang University of Technology), the Research Fund for the Doctoral Program of Tongren University (No. trxyDH1501), the Open Research Program (No.2015ZY006) from the Modernization Engineering Technology Research Center of Ethnic Minority Medicine of Hubei province (South-central University for Nationalities) and the research funds from the Education Department of Guizhou Province (no. QJHKYZ[2015]498).

References 16

Ampe, F., Sirvent, A., & Zakhia, N. (2001). Dynamics of the microbial community responsible for traditional sour cassava starch fermentation studied by denaturing gradient gel electrophoresis and quantitative rRNA hybridization. International Journal of Food Microbiology, 65 (1-2), 45-54. Asia News Network [ANN] (2013). Food vendors in Taiwan wring hands over toxic starch. Retrieved from,http://www.asianewsnet.net/Food-vendors-in-Taiwan-wring-hands-over-toxic-star-4732 6.html Barnes, R. J., Dhanoa, M. S., & Lister S. J. (1989). Standard Normal Variate Transformation and De-trending of Near-Infrared Diffuse Reflectance Spectra. Applied Spectroscopy, 43, 772-777. Barbin, D. F., Kaminishikawahara, C. M., Soares, A. L., Mizubuti, I. Y., Grespan, M., Shimokomaki, M., et al. (2015). Prediction of chicken quality attributes by near infrared spectroscopy. Food Chemistry, 168, 554-560. Bouveresse, E. (1997). Maintenance and Transfer of multivariate calibration models based on near-infrared spectroscopy. Doctoral Thesis, Vrije Universiteit Brussel. Chen, H. C., Wu, C., & Wu, K. Y. (2015). Determination of the maleic acid in rat urine and serum samples by isotope dilution-liquid chromatography-tandem mass spectrometry with on-line solid phase extraction. Talanta, 136, 9-14. Cozzolino, D., Roumeliotis, S., & Eglinton. J. (2013). Exploring the Use of Near Infrared (NIR) Reflectance Spectroscopy to Predict Starch Pasting Properties in Whole Grain Barley. Food Biophysics, 8, 256-261. Daszykowski, M., Vander Heyden, Y., & Walczak, B. (2007). Robust partial least squares model for prediction of green tea antioxidant capacity from chromatograms. Journal of Chromatography A, 1176 (1-2), 12-18. Deng, D. H., Xu, L., Ye, Z. H., Cui, H.F., Cai, C. B., & Yu, X. P. (2012). FTIR Spectroscopy and Chemometric Class Modeling Techniques for Authentication of Chinese Sesame Oil. Journal of the American Oil Chemists' Society, 89 (6), 1003-1009. Escuredo, O., González-Martín, M. I., Rodríguez-Flores, M. S., & Seijo, M. C. (2015). Near infrared spectroscopy applied to the rapid prediction of the floral origin and mineral content of honeys. Food Chemistry, 170, 47-54. Filazi, A., Sireli, U. T., Ekici, H., Can, H. Y., & Karagoz, A. (2012). Determination of melamine in milk and dairy products by high performance liquid chromatography. Journal of Dairy Science, 95 (2), 602-608. Fiorda, F. A., Soares, M. S., Silva, F. A., Grosmann, M. V. E., & Souto, L. R. F. (2013). Microestructure, texture and colour of gluten-free pasta made with amaranth flour, cassava starch and cassava bagasse. LWT - Food Science and Technology, 54 (1), 132-138. Forina, M., Oliveri, P., Lanteri, S., & Casale, M. (2008). Class-modeling techniques, classic and new, for old and new problems. Chemometrics and Intelligent Laboratory Systems, 93 (2), 132-148. Gupta, V. K., Das, A., & Dey, A. (1991). Universal optimality of block designs with unequal block sizes. Statistics & Probability Letters, 11, 177-180. Hódsági, M., Gergely, S., Gelencsér, T., & Salgó, A. (2012). Investigations of native and resistant starches and their mixtures using near infrared spectroscopy. Food and Bioprocess Technology, 5, 401–407. Kennard, R. W., & Stone, L. A. (1969). Computer Aided Design of Experiments. Technometrics, 11, 137-148. 17

Liu, S. W., & Ruan, Z. L. (2014). The trouble after enjoying the feeling of elasticity: food safety problems caused by maleic acid in starches. Quality and Standardization, (1), 31-32. (In Chinese) López, M. L., Trullols, E., Callao, M. P., & Ruisánchez, I. (2014) Multivariate screening in food adulteration: Untargeted versus targeted modeling. Food chemistry, 147, 177-181. Moore, J. C. A., Ganguly, J., Smeller, L., Botros, M., & Bergana, M M. (2012). Standardisation of non-targeted screening tools to detect adulterations in skim milk powder using NIR spectroscopy and chemometrics. NIR News, 23(5), 9-11. Morales, S., Álvarez, H., & Sánchez, C. (2008). Dynamic models for the production of glucose syrups from cassava starch. Food and Bioproducts Processing, 86 (1), 25-30. Núñez-Sánchez, N. Martínez-Marín, A. L., Polvillo, O., Fernández-Cabanás, V. M., Carrizosa, J. Urrutia, B., et al. (2016). Near Infrared Spectroscopy (NIRS) for the determination of the milk fat fatty acid profile of goats. . Food Chemistry, 190, 244-252. Oliveri, P., Egidio, V., Woodcock, T., & Downey, G. (2011). Application of class-modelling techniques to near infrared data for food authentication purposes. Food Chemistry, 125 (4), 1450-1456. Raja, K. C. M. (1995). Cassava starch: scope and limitations, in New Developments in Carbohydrates and Related Natural Products. Oxford & IBH Publishing, 55-62. Savitzky, A., & Golay, M. J. E. (1964). Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Analytical Chemistry, 36, 1627-1639. Serneels, S., Croux, C., Filzmoser, P., & Van Espen, P. J. (2005). Partial robust M-regression. Chemometrics and Intelligent Laboratory Systems, 79 (1-2), 55-64. Shittu, T. A., Dixon, A., Awonorin, S. O., Sanni, L. O., & Maziya-Dixon, B. (2008). Bread from composite cassava-wheat flour. II: Effect of cassava genotype and nitrogen fertilizer on bread quality. Food Research International, 41 (6), 569-578. Suykens, J. A. K., & Vandewalle, J. (1999). Least Squares Support Vector Machine Classifiers. Neural Processing Letters, 9, 293-300. Tsai, C. F., Wu, G. Y., Kuo, C. H., Lin, Y. W., Chang, C. H., Tseng, S. H., Kao, Y. M., Chiueh, L. C., Lu, T. J., & Shih, D. Y. C.(2015). Effective extraction method through alkaline hydrolysis for the detection of starch maleate in foods. Journal of Food and Drug Analysis, 23(3), 442-446. Viegas, T. R., Mata, A. L. M. L., Duarte, M. M. L., & Lima, K. M. G. (2016). Determination of quality attributes in wax jambu fruit using NIRS and PLS. Food Chemistry, 190, 1-4. Vlachos, A., & Arvanitoyannis, I. S. (2008). A review of rice authenticity/adulteration methods and results. Critical Reviews in Food Science and Nutrition, 48 (6), 553-598. Xu, D., Chen, Y., Zhou, S., Lian, Y., Chen, L., Lin, L., Zhou, Y., & Huang, Z. (2013). Determination of the total amount of maleic acid and maleic anhydride in starch and its products by high performance liquid chromatography-tandem mass spectrometry. Chinese Journal of Chromatography, 31 (12), 1224-1227. Xu, L., Cai, C. B., & Deng, D. H. (2011). Multivariate quality control solved by one-class partial least squares regression: identification of adulterated peanut oils by mid-infrared spectroscopy. Journal of Chemometrics, 25 (10), 568-574. Xu, L., Goodarzi, M., Shi, W., Cai, C. B., & Jiang, J. H. (2014). A MATLAB toolbox for class modeling using one-class partial least squares (OCPLS) classifiers. Chemometrics and Intelligent Laboratory Systems, 139, 58-63. 18

Xu, L., Yan, S.M., Cai, C. B., & Yu, X. P. (2013). One-class partial least squares (OCPLS) classifier. Chemometrics and Intelligent Laboratory Systems, 126, 1-5. Xu, L., Yan, S. M., Cai, C. B., Wang, Z.J., & Yu, X. P. (2013). The feasibility of using near-infrared spectroscopy and chemometrics for untargeted detection of protein adulteration in yogurt: removing unwanted variations in pure yogurt. Journal of Analytical Methods in Chemistry, 2013, 201873. Xu, L., Ye, Z. H., Yan, S. M., Shi, P. T., Cui, H. F., Fu, X. S., & Yu, X. P. (2012). Combining local wavelength information and ensemble learning to enhance the specificity of class modeling techniques: Identification of food geographical origins and adulteration. Analytica Chimica Acta, 754, 31-38. Xu, Q. S., & Liang, Y. Z. (2001). Monte Carlo cross validation. Chemometrics and Intelligent Laboratory Systems, 56, 1-11. Yang, J., Hauser, R., & Goldman, R. H. (2013). Taiwan food scandal: the illegal use of phthalates as a clouding agent and their contribution to maternal exposure. Food and Chemical Toxicology, 58, 362-368.

Figure and table captions: Table 1. Prediction results of OCPLS models for pure and CS adulterated with MA. Table 2. Quantitative analysis of MA in CS using LS-SVM. Figure 1. NIR spectra of pure CS, adulterated CS and MA. The adulteration levels were from 0.1% to 20% (w/w). Figure 2. NIR spectra of the pure and adulterated CS preprocessed by smoothing, taking second-order derivatives (D2) and SNV transformation. An extra shift was added to the spectra of adulterated CS. Figure 3. Training and prediction results obtained by D2-OCPLS. Figure 4.The prediction results of MA in CS by SNV-LS-SVM.

19

Pure cassava starch

1 0.9

Log(1/R)

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 4000

5000

6000

7000

8000

9000

10000

-1

Wavenumber (cm ) Adulterated cassava starch 0.8 0.7

Log(1/R)

0.6 0.5 0.4 0.3 0.2 0.1 4000

5000

6000

7000

8000

9000

10000

-1

Wavenumber (cm ) Maleic acid

1 0.9 0.8

Log(1/R)

0.7 0.6 0.5 0.4 0.3 0.2 0.1 4000

5000

6000

7000

8000

9000

10000

Wavenumber (cm -1)

Fig. 1. NIR spectra of pure CS, adulterated CS and MA. The adulteration levels were from 0.1% to 20% (w/w).

20

Smoothing

Log(1/R)

adulterated

pure

4000

5000

6000

7000

8000

9000

10000

Wavenumber (cm -1)

D2

Log(1/R)

adulterated

pure

4000

5000

6000

7000

8000

9000

10000

Wavenumber (cm -1)

Log(1/R)

SNV

adulterated

pure

4000

5000

6000

7000

8000

Wavenumber (cm

9000

10000

-1

)

Fig. 2. NIR spectra of the pure and adulterated CS preprocessed by smoothing, taking second-order derivatives (D2) and SNV transformation. An extra shift was added to the spectra of adulterated CS.

21

Training of OCPLS (3 LVs, outliers removed) 0.035 0.03

ACR

0.025 0.02 0.015 0.01 0.005 0 0

10

20

30

40

50

60

Score distance

Prediction of pure CS by PRM-OCPLS (3 LVs) 0.03

ACR

0.025 0.02 0.015 0.01 0.005 0 0

10

20

30

40

50

60

70

80

Score distance

Prediction of adulterated CS by PRM-OCPLS (3 LVs) 0.2

ACR

0.15

0.1

0.05

0 0

50

100

150

Score distance

Fig. 3. Training and prediction results obtained by D2-OCPLS.

22

SNV-LS-SVM

Predicted doping level (%)

25 20 15 10 5 0 -5 -5

0

5

10

15

20

Reference doping level (%) Fig. 4. The prediction results of MA in CS by SNV-LS-SVM.

23

25

Table 1 Prediction results of OCPLS models for pure and CS adulterated with MA. Preprocessing

LVsa

Raw data Smoothing D2 SNV

4

False negatives (FN) 3

False positives (FP) 77

3

4

71

3 3

3 3

10 29

a

The number of OCPLS latent variables (LVs). The numbers in the brackets represent TP/(TP+FN). c The numbers in the brackets represent TN/(TN+FP). d The number of objects that were wrongly classified. b

24

Sensitivityb

Specificityc

0.954 (62/65) 0.938 (61/65) 0.954 (62/65) 0.954 (62/65)

0.658 (148/225) 0.684 (154/225) 0.956 (215/225) 0.871 (196/225)

Detected MA level 6.1% 3.5% 0.6% 1.1%

Table 2 Quantitative analysis of MA in CS using LS-SVM. Preprocessing

Optimized parameters (log10 γ, σ2)

RMSECV (%)

RMSEP (%)

Raw data Smoothing D2 SNV

(4.330, 882.5) (4.931, 6077) (4.839, 3827) (5.249, 186.7)

1.472 1.165 0.655 0.208

1.763 1.266 0.791 0.192

25

Highlights  Untargeted detection of maleic acid (MA) in cassava starch (CS) was developed.  One-class partial least squares detected 0.6% (w/w) or more MA in CS.  Taking second-order derivatives improved the specificity for untargeted detection.  Accurate calibration of MA was achieved by least -squares support vector machines.

26