Journal Pre-proofs Multi-block classification of Italian semolina based on Near Infrared Spectroscopy (NIR) analysis and alveographic indices Patrizia Firmani, Alessandro Nardecchia, Francesca Nocente, Laura Gazza, Federico Marini, Alessandra Biancolillo PII: DOI: Reference:
S0308-8146(19)31804-7 https://doi.org/10.1016/j.foodchem.2019.125677 FOCH 125677
To appear in:
Food Chemistry
Received Date: Revised Date: Accepted Date:
15 June 2019 29 August 2019 7 October 2019
Please cite this article as: Firmani, P., Nardecchia, A., Nocente, F., Gazza, L., Marini, F., Biancolillo, A., Multiblock classification of Italian semolina based on Near Infrared Spectroscopy (NIR) analysis and alveographic indices, Food Chemistry (2019), doi: https://doi.org/10.1016/j.foodchem.2019.125677
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2019 Elsevier Ltd. All rights reserved.
Multi-block classification of Italian semolina based on Near Infrared Spectroscopy (NIR) analysis and alveographic indices
Authors Patrizia Firmania, Alessandro Nardecchiaa, Francesca Nocenteb, Laura Gazzab, Federico Marinia, Alessandra Biancolilloa,c* aDepartment
bCREA
of Chemistry, University of Rome “La Sapienza”, Piazzale Aldo Moro, 5, 00185 Rome, Italy
Research Centre for Engineering and Agro-Food Processing, Via Manziana, 30 00189 Rome, Italy
cDepartment
of Physical and Chemical Sciences, University of L'Aquila, Via Vetoio, Coppito, 67100, Italy
*Corresponding Author: dr. Alessandra Biancolillo Dept. of Chemistry University of Rome La Sapienza P.le Aldo Moro 5 I-00185 Rome Italy Tel +39 06 49913680 Fax +39 06 4969 3292 e-mail:
[email protected]
1
Abstract Durum wheat (Triticum turgidum ssp durum) is widely grown in the Mediterranean area. The semolina obtained by this grain is used to prepare pasta, couscous, and baked products all over the world. The growing area affects the characteristics of Durum wheat; consequently, it is relevant to trace this product. The present study aims at developing an analytical methodology which would allow tracing durum semolina harvested in 7 different Italian macro-areas. In order to achieve this goal, 597 samples of semolina have been analysed by Near Infrared Spectroscopy, and by measuring alveographic parameters. Eventually, the information collected have been handled by a multi-block classifier (SO-PLS-LDA) in order to predict the origin of samples. The proposed approach provided extremely satisfactory results (in external validation, on a test set of 140 objects), correctly classifying all samples according to their growing area, confirming it represents a suitable solution for tracing durum wheat semolina. Keywords Multi-block; SO-PLS; Classification; Triticum durum semolina; Durum wheat; alveographic indices, Near Infrared Spectroscopy (NIR); LDA
2
1. Introduction Durum wheat (Triticum turgidum ssp durum) is a tetraploid species mainly grown in the Mediterranean area and in Middle East, which flour and semolina are used in bakery stuffs or for pasta, couscous and bulgur production. In Italy, due to the long tradition of durum wheat pasta making, it is important to identify the origin in order to valorise the quality of semolina and, as reported in the Official Gazette of the Italian Republic, based on the ordinance of July 26th, 2017, it is mandatory to indicate on the label of pasta box the growing Country of durum wheat. This led to the necessity of developing analytical methods to prevent possible food frauds which could affect both producers and consumers under different aspects (e.g., economic losses and food safety). As already demonstrated in many studies, it is possible to distinguish different wheat species and cultivars employing various analytical techniques. Electrophoresis (Bietz & Simpson, 1992) is one of the first and most used methodology to analyse wheat proteins, applicable for the cultivar identification. Chromatography is also used for the investigation of wheats: Gas Chromatography (GC) was employed for the profiling and analysis of volatile composition in order to demonstrate the correlation between a certain cultivar and the flavour of cooked pasta (Beleggia, Platani, Spano, Monteleone, & Cattivelli 2009); liquid chromatography was used to estimate the colour of wheat present in different levels due to botanical origins and growing conditions (Fratianni, Irano, Panfili & Acquistucci, 2005), to detect adulterations (McCarthy, Scanlon, Lumley & Griffin, 1990), and to investigate the gliadins in different wheat cultivars (Burnouf & Bietz, 1984). Inductively Coupled Plasma-Mass Spectrometry (ICP-MS) and X-ray fluorescence were used to extrapolate the fingerprints of geographical origin by different concentration of 22 elements due to the provenance soils (Zhao, Guo, Wei & Zhang, 2013a) also in combination with Scanning Electron Microscopy (SEM) analysis (Beltrami et al, 2011). Furthermore, Nuclear Magnetic Resonance (NMR) and Isotope Ratio Mass Spectrometry (IRMS) were employed to study geographical origin of wheat on the basis of its metabolic profile (Longobardi, Sacco, Casiello, Ventrella & Sacco, 2015). Despite the development of technology and the increasing capacities of the instruments, it is not straightforward to handle the information provided by such a complex matrix, as wheat is. Consequently, in the literature, several methodologies, developed by the combination of analytical techniques and chemometric approaches, have been proposed. For instance, Principal Component Analysis (PCA) was used together with Gas ChromatographyMass Spectrometry (GC-MS) and High-Performance Liquid Chromatography (HPLC) data in order to investigate the furosine and flavour compounds for the characterization of durum wheat pasta (Giannetti,
3
Boccacci Mariani, Mannino & Testani, 2014). Alveographic indices have been used to assess the dough strength and to predict the quality of wheat flour; for instance, Marti et al. (2014) have inspected them, together with other rheological approaches, by correlation analysis, in order to evaluate characteristics of durum wheat semolina (Marti, Cecchini, D’Egidio, Dreisoerner, & Pagani, 2014). NMR and IRMS were also coupled with chemometric methods (PCA and discriminant analysis) to assess the geographical origin and quality of durum wheat, combining isotopic 13C/12C, 18O/16O, 15N/14N ratios and polysaccharides and triacylglycerols NMR data (Consonni & Cagliani, 2010). ANalysis Of VAriance (ANOVA) algorithm was used in an isotopic composition study in order to verify the importance of the geographical origin to distinguish different wheat samples (Brescia et al, 2002). Despite the efficiency of these approaches, they present a common drawback: the preparation of samples can be time-consuming, as well as economically and environmentally disadvantageous. Chemometrics allows the interpretation and a more complete exploitation of Near-Infrared (NIR) spectroscopy data: a faster, relatively cheap, non-destructive, non-invasive and automatable methodology. This spectroscopic technique is widely applied for the quality analysis of agro-food, in particular in combination with classification approaches to achieve diverse goals (for instance, to determine salubrity (Biancolillo, Firmani, Bucci, Magrì, & Marini 2019a), for on-line quality assessment (Pérez-Marín, Torres, Entrenas, Vega, & Sánchez, 2019), or to authenticate PDO foodstuff (Biancolillo et al, 2018a) and high-added value products (Eisenstecken et al. 2019; Firmani, De Luca, Bucci, Marini & Biancolillo, 2019a; Firmani, Bucci, Marini & Biancolillo, 2019b). Concerning the application of NIR for grain process control, several studies, aiming at different goals, are present in the literature. Partial Least Squares regression (PLS) has been applied on NIR data collected on wheat to predict milling and baking parameters such as flour extraction, protein content, gas and baking volume of different varieties (Jirsa, Hrušková, & Švec, 2008). Furthermore, this technique has been used, in combination with Discriminant Partial Least Squares (DPLS) analysis, to trace wheat grains, achieving acceptable results (~80% of correct classification rates) (Gonzáles-Martín et al, 2014); similarly, LDA and DPLS were also applied for assessing the origin of different wheat samples (Zhao, Guo, Wei, & Zhang, 2013b). In another study, PCA was employed to follow in-line agglomeration process of durum wheat semolina using different water supply conditions (Mandato, Taliani, Ait-Kaddour, Ruiz, & Cuq, 2013). Despite the great impact that chemometrics had on the obtainable results and the interpretation of data, is still difficult to obtain evermore reliable results, exploitable from a commodity point of view. Under this perspective, in the last years, for several reasons (for instance, relatively higher availability of analytical instruments/facilities), it has become not uncommon to handle multi-platform data sets. In principle, handling
4
these kinds of data sets, the diverse data blocks could be individually examined, but it has been demonstrated that it could be more powerful to handle them by means of data fusion strategies. In the literature, several multi-block methods have been proposed; the main difference among them is the level data are fused. In fact, these approaches are often divided into low-level methodologies, where original data are joint, mid-level approaches, where the fusion takes place at the features level, and high-level strategies, which exploit a posteriori probability (Biancolillo, Boqué, Cocchi & Marini, 2019b). These approaches have been quite widely used for food-quality assessment; for instance, on tomatoes (Hohmann et al, 2015), fruits juices (Haddi et al, 2014), beer (Biancolillo, Bucci, Magrì, Magrì & Marini, 2014), wine (Silvestri et al, 2014; Biancolillo, Næs, Bro, & Måge, 2017) and several other foodstuffs. Accordingly, the aim of this work is to classify different cultivars of durum wheat grown in seven geographical areas of Italy by means of NIR spectra, alveographic parameters and a data-fusion classification approach. To achieve this goal, Sequential and Orthogonalized-Partial Least Squares-Linear Discriminant Analysis (SO-PLSLDA) has been exploited. This multi-block approach has been applied because it has demonstrated to perform well in terms of predictions and, at the same time, it is very suitable for the interpretation of the systems.
2. Materials and methods 2.1 Samples Semolina samples from 30 Triticum turgidum ssp durum cultivars were grown at 7 locations in Italy in 2016-17 using a randomized block design with 3 replications. Cultivars are divided into 8 geographical groups, according to their macro-areas of provenance (Po Valley, Adriatic coast, Rome, Central Apennines, South Apennines, Adriatic-Ionic, Sicily and Sardinia). Samples belonging to group 3 were not available. Details about all the analyzed samples are reported in Table 1. Wheat cultivars are listed in Table 2. -----------------------------------------Insert Table 1 approx. here--------------------------------------------------------------------------------Insert Table 2 approx. here----------------------------------------
2.2 Spectra acquisition
5
NIR spectra were collected by means of a Nicolet 6700 FT-NIR instrument (Thermo Scientific Inc., Madison, WI) equipped with an InGaAs detector and an integrating sphere (Thermo Scientific Inc., Madison, WI); signals have been collected in reflectance mode in the range from 4000 cm−1–10000 cm−1 with a nominal resolution of 4 cm−1. Spectra were acquired directly on semolina samples (avoiding any physical-chemical pretreatment), which were put in glass vials and then located on the window of the integrating sphere. For each of the 3 replicated plot samples, spectra were collected in duplicate. Finally, signals were acquired by the OMNIC software (Thermo Scientific Inc., Madison, WI) and exported in MATLAB 2015b (The Mathworks, Natick, MA) for the subsequent analysis. Prior the creation of any classification model, spectra were converted into pseudo-absorbance (log(1/R)) and replicates were averaged leading to a data matrix of dimensions 597×3112. 2.3 W-index measurement Wheat kernels were milled using a Bühler MLU 202 laboratory mill (Bühler; Uzwil, Switzerland) to obtain semolina samples with a particle size lower than 500 µm, as established by AACC International, 2000 method 26e41. Alveographic parameters (P, L and W) of milled samples were analyzed by the Chopin Alveograph (Chopin, Villeneuve La Garenne, France) according to the UNI 10453 (1995). In the present work, only W-index (x 10-4 J) has been considered and the values (listed following the same order used in Table 2) are reported in the supplementary material.
2.4 Sequential and Orthogonalized-Partial Least Squares-Linear Discriminant Analysis Sequential and Orthogonalized-Partial Least Squares-Linear Discriminant Analysis (SO-PLS-LDA) (Biancolillo, Måge & Næs, 2015) is a data fusion classification method developed combining Linear Discriminant Analysis (LDA) (Fisher, 1936) with the multi-block regression method called Sequential and Orthogonalized-Partial Least Squares (SO-PLS) (Biancolillo & Næs, 2019c; Næs, Tomic, Mevik & Martens, 2011). This approach exploits SO-PLS as a feature reduction step, in order to allow the calculation of LDA even when the variance/covariance matrix is ill-conditioned (Biancolillo et al, 2015). Considering two predictor blocks 𝑿 and 𝒁, used to predict a 𝒀 response, the algorithm can be summarized into the following 5 steps: 1) 𝑿 is used to estimate the 𝒀 by PLS.
6
In this step, model parameters (i.e., the PLS regression coefficients), as well as the 𝑿-scores (𝑻𝑋) and the 𝒀residuals (𝑬𝑌), are calculated. 2) 𝒁 is orthogonalized with respect to the 𝑿-scores (𝑻𝑋), obtaining 𝒁𝑂𝑟𝑡ℎ. This step is “the core” of the method; it removes the redundancies between the predictors. 3) 𝒁𝑂𝑟𝑡ℎ is used to estimate the 𝑬𝑌 by PLS. Model parameters (PLS regression coefficients), as well as the 𝒁𝑂𝑟𝑡ℎ-scores (𝑻𝒁𝑂𝑟𝑡ℎ), are calculated. 4) Calculation of the full predictive model. The general regression equation 𝒀 = 𝑿𝒃 + 𝒁𝒄 can be solved summing predictions from step 1 and step 3. 5) LDA can be either applied con the row-augmented scores (𝑻𝑆𝑂=[𝑻𝑋 𝑻𝒁𝑂𝑟𝑡ℎ]) or on the 𝒀 estimated in step 4. In a classification context, the 𝒀 response is the so-called Dummy matrix, a binary structure encoding the classinformation; examples about the generation of this matrix can be found in (Biancolillo & Marini, 2018b); a Matlab
function
for
the
calculation
of
SO-PLS
models
can
be
freely
downloaded
at
https://www.chem.uniroma1.it/romechemometrics/research/algorithms/so-pls/.
3. Results and discussion Raw NIR spectra of all the samples are presented in Figure 1. ------------------------------Insert Figure 1 approx. here--------------------------------------------------
Prior to the classification, data were imported in MatLab 2015b (The Mathworks, Natick, MA). As abovementioned, the aim of the present work is to develop an analytical methodology, based on the analysis of NIR and W indices, finalized at discriminating durum wheat semolina according to its geographical origin. In order to externally validate the classification model, it is necessary to divide the available samples into a training and a test set, which is not a trivial task in a multi-block framework. In fact, dividing objects into the two sets is necessary to take into account the sample variability present in all data blocks. Consequently, in order to create calibration and validation sets which could be a good representation of all the categories (i.e., which would cover
7
as much as possible the sample space) accounting both data blocks under study, the following procedure has been pursued. NIR signals and W-indices have been divided in diverse data matrices according to the class-belonging of samples they are collected on (obtaining seven matrices for NIR spectra: 𝑿𝑁𝐼𝑅,1, 𝑿𝑁𝐼𝑅,2, 𝑿𝑁𝐼𝑅,3,𝑿𝑁𝐼𝑅,4,𝑿𝑁𝐼𝑅,5, 𝑿𝑁𝐼𝑅,6 and 𝑿𝑁𝐼𝑅,7 and seven for W-indices: 𝑾1, 𝑾2, 𝑾3,𝑾4,𝑾5,𝑾6 and 𝑾7 - one per each category ). Then, matrices are row-augmented class per class (for instance, for Class Po, 𝑪1 = [𝑿𝑁𝐼𝑅,1 𝑾1]) and Principal Component Analysis (PCA) (Joliffe, 2011) is calculated on each concatenated data matrix. Afterwards, the first 5 Principal Components (PCs) from each model are extracted and samples are divided by the Duplex algorithm (Snee, 1977). A graphical representation of this procedure applied to a generic Class 1 can be found in Figure 2. ------------------------------Insert Figure 2 approx. here------------------------------------------------------
Eventually, all the single-class training and test sub-sets are collected together, ending up with a calibration set consisting of 457 samples (70 belonging to Class Po, 70 to Adriatic, 61 appertaining to Class Central, 61 to Class South, 70 from Class Adriatic-Ionic, 67 belonging to Class Sicily and 58 to Class Sardinia), while the validation one is made of 20 samples for each category (for a total of 140 objects). At first, before adopting a multi-block approach by building the SO-PLS model, the possibility of using the individual blocks to trace the available samples was also tested. In particular, given the different nature of the blocks, PLS-LDA was used to build a classification model based on NIR spectra, while LDA was directly applied on the vector of W-indices. The PLS-LDA model built on the NIR data provided good classification rates in external validation; in fact, it led to an overall correct prediction rate on the test set slightly lower than 98% (3 samples misclassified). On the other hand, when the model was built on the W-index only, predictions did not result satisfactory. In fact, LDA misclassified about 30% of the test samples. The SO-PLS model has been calculated in a 7-fold cross-validation procedure. NIR spectra were used as first input block and W indices as the second one; both of them were mean centered (MC) in order to correct possible spurious variance due to the off-set differences accountable as the average trend. The SO-PLS calibration model provided 100% of correct classification (in cross-validation) for all the seven categories under study. Eventually, the calibration model was applied on the mean centered validation set, and, also in this case 100% of samples were correctly classified. In Figure 3, samples are projected onto the canonical variate space.
8
As expected, the seven groups appear well divided, except for a slight overlapping between Class Adriatic-Coast (blue squares) and Class Adriatic-Ionic (black stars) which fall at similar values for the first two canonical variates, but they are distinguishable along the third one; this observation is not surprising considering these categories include grains very close to one and another from the geographical point of view. ------------------------------Insert Figure 3 approx. here--------------------------------------------------
Additionally, in order to have a deeper insight into the data set under study, Variable Importance in Projection (VIP) (Wold, Johansson, & Cocchi, 1993) indices were calculated (following the embedded procedure suggested in (Biancolillo Liland, Måge, Næs & Bro, 2016)), in order to understand which spectral bands gave the highest contribution in discriminating the groups. VIP coefficients weight the importance of each spectral variable in defining the LVs subspace; the average of the square values of the VIP indices is 1, thus meaning that this value can be assumed as a cut-off to define which spectral variables are the most significant. Starting from this assumption, all the variables presenting a VIP value higher than 1 are considered the most relevant for the classbelonging prediction. A graphical representation of VIP coefficients is reported in Figure 4. The red points correspond to spectral variables having VIP scores higher than one, while the black line is the average trainingset spectrum. ------------------------------Insert Figure 4 approx. here--------------------------------------------------
As reported in Figure 4, the most relevant spectral zones are:
From 4000 to ~5500 cm-1 where combination bands of N-H bonds, O-H bonds, C-H bonds and C=O bonds are present (Cocchi et al, 2005)
From ~6700 to ~7400 cm-1 where the first overtone of the O–H stretching is present (Zhao, Guo, Wei, & Zhang, 2013b)
From ~9800 to ~10000 cm-1, where N–H 2nd overtone associated with peptides and proteins and 2nd overtone of C–H stretch (methyl and methylene) of starches, lipids and/or proteins are present (Zhao, Guo, Wei, & Zhang, 2013b).
4. Conclusions
9
In conclusion, NIR spectroscopy and W-indices combined with SO-PLS-LDA in a multi-block strategy has demonstrated to be a suitable methodology for the assessment of the geographical origin of Italian Triticum durum semolina. In fact, the proposed approach allowed tracing all the available samples providing 100% of correct classification rates (in external validation).
10
Aknowledgements The Authors thank C. Cecchini, E. Gosparini and F. Quaranta of CREA IT for technical assistance.
References AACC International. (2000). Approved Methods of the American Association of Cereal Chemists. Method 26e41, 10th ed. The Association: St. Paul, MN.
Beleggia, R., Platani, C., Spano, G., Monteleone, M., & Cattivelli, L. (2009). Metabolic profiling and analysis of volatile composition of durum wheat semolina and pasta. Journal of Cereal Science, 49, 301-309.
Beltrami, D., Calestani, D., Maffini, M., Suman, M., Melegari, B., Zappettini, A., Zanotti, A., Zanotti, L., Casellato, U., Careri, M., & Mangia, A. (2011). Development of a combined SEM and ICP-MS approach for the qualitative and quantitative analyses of metal microparticles and sub-microparticles in food products. Analytical and Bioanalytical Chemistry, 401, 1401-1409.
Biancolillo, A., Bucci, R., Magrì, A.L., Magrì, A.D., & Marini, F. (2014). Data-fusion for multiplatform characterization of an Italian craft beer aimed at its authentication. Analyitca Chimica Acta, 820, 23−31.
Biancolillo, A., Måge, I., & Næs, T. (2015). Combining SO-PLS and linear discriminant analysis for multi-block classification. Chemometrics and Intelligent Laboratory Systems, 141, 58–67.
Biancolillo, A., Liland, K.H., Måge, I., Næs, T., & Bro, R. (2016). Variable selection in multi-block regression. Chemometrics and Intelligent Laboratory Systems, 156, 89–101.
11
Biancolillo, A., Næs, T., Bro, R., & Måge, I. (2017). Extension of SO-PLS to multi-way arrays: SO-N-PLS, Chemometrics and Intelligent Laboratory Systems, 164, 113–126.
Biancolillo, A., De Luca, S., Bassi, S., Roudier, L., Bucci, R., Magrì, A.D., & Marini, F. (2018a). Authentication of an Italian PDO hazelnut (“Nocciola Romana”) by NIR spectroscopy. Environmental Science and Pollution Research, 25, 28780-28786.
Biancolillo, A., & Marini, F. (2018b). Chapter Four - Chemometrics Applied to Plant Spectral Analysis. In: J. Lopes, & C. Sousa (Eds.), Vibrational Spectroscopy for Plant Varieties and Cultivars Characterization, Comprehensive Analytical Chemistry (vol. 80, pp. 69-104). Amsterdam: Elsevier.
Biancolillo, A., Firmani, P., Bucci, R., Magrì, A.D., & Marini F. (2019a). Determination of insect infestation on stored rice by near infrared (NIR) spectroscopy, Microchemical Journal, 145, 252-258.
Biancolillo, A., Boqué, R., Cocchi, M., & Marini F. (2019b). Data Fusion strategies in food analysis. In M. Cocchi (Ed.), Data Handling in Science and Technology, Vol 31. Amsterdam: Elsevier.
Biancolillo, A., & Næs, T. (2019c). The sequential and orthogonalised PLS regression (SO-PLS) for multi-block regression; theory, examples and extensions. In: M. Cocchi (Ed.), Data Handling in Science and Technology, Vol 31. Amsterdam: Elsevier.
Bietz, J.A., & Simpson, D.G. (1992). Electrophoresis and chromatography of wheat proteins: available methods, and procedures for statistical evaluation of the data. Journal of Chromatography, 624, 53-80.
12
Brescia, M.A., Di Martino, G., Guillou, C., Reniero, F., Sacco, A., & Serra, F. (2002). Differentiation of the geographical origin of durum wheat semolina samples on the basis of isotopic composition. Rapid Communications in Mass Spectrometry, 16, 2286-2290.
Burnouf, T., & Bietz, J.A. (1984). Reversed-Phase High-Performance Liquid Chromatography of Durum Wheat Gliadins: Relationships to Durum Wheat Quality. Journal of Cereal Science, 2, 3-14.
Cocchi, M., Corbellini, M., Foca, G., Lucisano, M., Pagani, M.A., Tassi, L., & Ulrici, A. (2005). Classification of bread wheat flours in different quality categories by a wavelet-based feature selection/classification algorithm on NIR spectra. Analyitica Chimica Acta, 544, 100–107.
Consonni, R., & Cagliani, L.R. (2010). Nuclear Magnetic Resonance and Chemometrics to Assess Geographical Origin and Quality of Traditional Food Products. Advances in Food and Nutrition Research, 59, 87-165.
Eisenstecken, D., Stürz, B., Robatscher, P., Lozano, L., Zanella, A., & Oberhuber, M. (2019). The potential of near infrared spectroscopy (NIRS) to trace apple origin: Study on different cultivars and orchard elevations. Postharvest Biology and Technology, 147, 123-131.
Firmani, P., De Luca, S., Bucci, R., Marini, F., & Biancolillo A. (2019a). Near Infrared (NIR) spectroscopybased classification for the authentication of Darjeeling black tea. Food Control, 100, 292-299.
Firmani, P., Bucci, R., Marini, F., & Biancolillo A. (2019b). Authentication of “Avola almonds” by near infrared (NIR) spectroscopy and chemometrics. Journal of Food Composition and Analysis, 82, 103235.
Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188.
13
Fratianni, A., Irano, M., Panfili, G., & Acquistucci, R. (2005). Estimation of Color of Durum Wheat. Comparison of WSB, HPLC, and Reflectance Colorimeter Measurements. Journal of Agricultural and Food Chemistry, 53, 2373-2378.
Giannetti, V., Boccacci Mariani, M., Mannino, P., & Testani, E. (2014). Furosine and flavour compounds in durum wheat pasta produced under different manufacturing conditions: Multivariate chemometric characterization. Journal of Food Science and Technology, 56, 15-20.
Gonzáles-Martín, M.I., Moncada, G.W., Gonzáles-Pérez, C., San Martín, N.Z., López-Gonzáles, F., Ortega, I.L., & Hernández-Hierro, J.M. (2014). Chilean flour and wheat grain: Tracing their origin using near infrared spectroscopy and chemometrics. Food Chemistry,145, 802-806.
Haddi, Z., Mabrouk, S., Bougrini, M., Tahri, K., Sghaier, K., Barhoumi, H., El Bari, N., Maaref, A., JaffrezicRenault, N., & Bouchikhi, B. (2014). E-Nose and e-Tongue combination for improved recognition of fruit juice samples. Food Chemistry, 150, 246-253.
Hohmann, M., Monakhova, Y., Erich, S., Christoph, N., Wachter, H., & Holzgrabe. U. (2015). Differentiation of Organically and Conventionally Grown Tomatoes by Chemometric Analysis of Combined Data from Proton Nuclear Magnetic Resonance and Mid-infrared Spectroscopy and Stable Isotope Analysis. Journal of Agricultural and Food Chemistry, 63, 9666-9675.
Jirsa, O., Hrušková, M., & Švec, I. (2008). Near-infrared prediction of milling and baking parameters of wheat varieties. Journal of Food Engineering, 87, 21-25.
14
Jolliffe, I. (2011). Principal Component Analysis. In: M. Lovric (Ed.), International Encyclopedia of Statistical Science. Heidelberg: Springer.
Longobardi, F., Sacco, D., Casiello, G., Ventrella, A., & Sacco, A. (2015). Characterization of the Geographical and Varietal Origin of Wheat and Bread by Means of Nuclear Magnetic Resonance (NMR), Isotope Ratio Mass Spectrometry (IRMS) Methods and Chemometrics: A Review. The Journal of Agricultural Science, 6, 126-136.
Mandato, S., Taliani, C.C., Ait-Kaddour, A., Ruiz, T., & Cuq, B. (2013). In-line monitoring of durum wheat semolina wet agglomeration by near infrared spectroscopy for different water supply conditions and water addition levels. Journal of Food Engineering, 119, 533-543.
Marti, A. Cecchini, C., D'Egidio, M. G., Dreisoerner, J., & Pagani, M. A. (2014), Characterization of Durum Wheat Semolina by Means of a Rapid Shear‐Based Method. Cereal Chemistry, 91, 542-547.
McCarthy, P.K., Scanlon, B.F., Lumley, I.D., & Griffin, M. (1990). Detection and Quantification of Adulteration of Durum Wheat Flour by Flour from Common Wheat Using Reverse Phase HPLC. Journal of the Science of Food and Agriculture, 50, 211-226.
Næs, T., Tomic, O., Mevik, B.H., & Martens, H. (2011). Path modelling by sequential PLS regression. Journal of Chemometrics, 25, 28–40.
Pérez-Marín, D., Torres, I., Entrenas, J.A., Vega, M., & Sánchez, M.T. (2019). Pre-harvest screening on-vine of spinach quality and safety using NIRS technology. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 207, 242-250.
15
Silvestri, M., Elia, A., Bertelli, D., Salvatore, E., Durante, C., Li Vigni, M., Marchetti, A., & Cocchi, M. (2014). A mid level Data Fusion strategy for the varietal classification of Lambrusco P.D.O. Wines. Chemometrics and Intelligent Laboratory Systems, 137, 181-189.
Snee, R.D. (1977). Validation of regression models: Methods and examples. Technometrics, 19, 415–428.
UNI 10453, 1995. Grano duro e semole. Determinazione delle caratteristiche reologiche ediante alveografia. UNI, Ente Nazionale Italiano di unificazione, Milan, Italy (In Italian).
Wold, S., Johansson, E., & Cocchi, M. (1993). PLS: Partial least squares projections to latent structures. In: H. Kubinyi (Ed.), 3D QSAR in drug design: Theory, methods and applications (pp. 523–550). Leiden: Kluwer/ESCOM Science Publisher.
Zhao, H., Guo, B., Wei, Y., & Zhang, B. (2013a). Multi-element composition of wheat grain and provenance soil and their potentialities as fingerprints of geographical origin. Journal of Cereal Science, 57, 391-397.
Zhao, H., Guo, B., Wei, Y., & Zhang, B. (2013b). Near infrared reflectance spectroscopy for the determination of the geographical origin of wheat. Food Chemistry, 138, 1902-1907.
16
Figure Captions Fig. 1: Raw NIR spectra of semolina samples Fig. 2: Graphical representation of the sets splitting procedure for a generic Class 1 Fig. 3: SO-PLS-LDA analysis: Samples projected onto the space of the first 2 canonical variates. Legend: Red circles: Class Po; Blue squares: Class Adriatic-Coast; Magenta triangles: Class Central; Cyan triangles: Class South; Black stars: Adriatic-Ionic; Green hexagrams: Class Sicily; Yellow triangles: Sardinia Fig. 4: VIP analysis of NIR spectra. Legend: black line - average spectrum; red points –variables presenting VIP>1
17
Table 1: Description of the analysed semolina samples Geographical origin
Class Name
Group
Number of lots
Number
number
available
samples
of
Po Valley
Po
1
30
90
Adriatic coast
Adriatic
2
30
90
Rome
--
3
NOT
NOT
AVAILABLE
AVAILABLE
27
81
Central Apennines
Central
4
18
South Apennines
South
5
27
81
Adriatic-Ionic
Adriatic-Ionic
6
30
90
Sicily
Sicily
7
29
87
Sardinia
Sardinia
8
26
78 597
19
Table 2: Investigated cultivars Cultivars
Cultivars Group 2
Group 1
Cultivars
Cultivars
Cultivars
Cultivars
Cultivars
Group 4
Group 5
Group 6
Group 7
Group 8
1.
Acadur
1.
Acadur
1.
Acadur
(2 lots)
2.
Alemanno
2.
Alemanno
1.
Achille
1.
Achille
1.
Achille
1.
Acadur
2.
Antalis
2.
Antalis
2.
Augusto
2.
Alemanno
3.
Augusto
3.
Augusto
3.
Calò
3.
Antalis
2.
Alemanno
3.
Antalis
3.
Antalis
4.
Calò
4.
Calò
4.
Claudio
4.
Aureo
3.
Antalis
4.
Aureo
4.
Aureo
5.
Claudio
5.
Claudio
5.
Core
5.
Burgos
4.
Aureo
5.
Burgos
5.
Calò
6.
Core
6.
Core
6.
Daurur
6.
Calò
5.
Burgos
6.
Calò
6.
Claudio
7.
Daurur
7.
Daurur
7.
Duilio
7.
Claudio
6.
Calò
7.
Claudio
7.
Core
8.
Diamante
8.
Diamante
8.
Ettore
8.
Core
7.
Claudio
8.
Core
8.
Duilio
9.
Duilio
9.
Duilio
9.
Furio
9.
Duilio
8.
Core
9.
Duilio
9.
Egeo
Camillo
10. Egeo
9.
Duilio
10. Egeo
10. Ettore 11. Furio
10. Dylan
10. Dylan
11. Ettore
11. Ettore
10. Iride
11. Ettore
10. Egeo
11. Ettore
12. Furio
12. Furio Camillo
11. Kanakis
12. Furio
11. Ettore
12. Furio
13. Iride
12. Marakas
13. Iride
14. Kanakis
13. Marco
14. Kanakis
15. Marakas
15. Marakas
16. Marco Aurelio
16. Marco
Camillo
Camillo 13. Iride
12. Furio Camillo
Camillo
Camillo 12. Iride
13. Iride
13. Kanakis
14. Kanakis
13. Iride
14. Kanakis
14. Karalis
14. Mario
15. Marakas
14. Kanakis
15. Marakas
15. Marakas
17. Mario
15. Monastir
16. Marco
15. Marakas
16. Marco
16. Marco
18. Monastir
16. Obelix
17. Mario
19. Obelix
17. Odisseo
17. Monastir
18. Monastir
20. Odisseo
18. Pigreco
18. Odisseo
19. Obelix
21. Pigreco
19. Ramirez
20. Odisseo
22. Ramirez
21. Pigreco
23. Santo Graal
22. Ramirez
24. Saragolla
23. Santo
25. Secolo
Aurelio
Aurelio
Aurelio
16. Marco
Aurelio
Aurelio
17. Monastir
17. Monastir
17. Monastir
18. Odisseo
18. Odisseo
19. Pigreco
18. Odisseo
19. Pigreco
19. Pigreco
20. Santo
20. Prospero
19. Pigreco
20. Prospero
20. Ramirez
Graal
21. Ramirez
20. Prospero
21. Ramirez
21. Salgado
21. Saragolla
22. Secolo
21. Ramirez
22. Salgado
22. Secolo
22. Secolo
23. Simeto
22. Salgado
23. Saragolla
23. Simeto
20
Aurelio
Graal
26. Simeto
23. Simeto
24. Svevo
23. Saragolla
24. Secolo
24. Svevo
24. Saragolla
27. Solstizio
24. Solstizio
25. Teodorico
24. Secolo
25. Simeto
25. Tirex
25. Secolo
28. Svevo
25. Svevo
26. Tirex
25. Simeto
26. Svevo
26. Tito
26. Simeto
29. Tirex
26. Tirex
27. Tito
26. Svevo
27. Teodorico
27. Solstizio
30. Tito Flavio
27. Tito
27. Teodorico
28. Tirex
28. Tirex
29. Tito
28. Svevo
Flavio
Flavio
29. Tirex
29. Tito
30. Tito
Flavio
Flavio
21
Flavio
Flavio
Highlights
The quality of Durum wheat semolina is affected by the growing area
597 samples of semolina have been analysed by Near Infrared Spectroscopy (NIR)
Alveographic parameters have been measured on the same samples.
A data fusion approach (SO-PLS-LDA) was used to predict the origin of samples
All test samples (140 objects) were correctly classified
22