Vibrational Spectroscopy 107 (2020) 103040
Contents lists available at ScienceDirect
Vibrational Spectroscopy journal homepage: www.elsevier.com/locate/vibspec
Authentication of Grappa (Italian grape marc spirit) by Mid and Near Infrared spectroscopies coupled with chemometrics
T
Stefano Schiavonea, Benedetta Marchionnia, Remo Buccia, Federico Marinia, Alessandra Biancolilloa,b,c,* University of Rome “La Sapienza”, Department of Chemistry, Piazzale Aldo Moro 5, 00185, Rome, Italy ITAP, INRAe, Montpellier SupAgro, University of Montpellier, Montpellier, France c University of L’Aquila, Via Vetoio, 67100 Coppito, L’Aquila, Italy a
b
A R T I C LE I N FO
A B S T R A C T
Keywords: IR Data fusion Adulteration PLS-DA SO-PLS SO-CovSel
The aim of the present study is to authenticate Grappa spirits and to develop a non-destructive methodology which would allow detecting possible adulteration (by less valuable spirits) on this product. Grappa is an Italian alcoholic drink obtained by distillation of grape marks which has recently received the Geographical Indication (GI) label. As a high added-value product, it is relevant to develop methodologies which allow its authentication and detecting possible frauds (e.g., adulterations); and, whether feasible, it would be suitable to achieve these goals through non-destructive approaches (in order to minimize the economic loss). Mid Infrared (MIR) and Near Infrared (NIR) spectroscopies have been used for the authentication and the characterization of the spirits under investigation. The present work is conceptually divided into two parts: the first one, centered on the authentication of grappa spirits, focused on distinguishing them from other Italian distillates, and a second one aimed at developing an analytical methodology suitable to discern between pure and adulterated grappas. Both classification problems have been investigated by PLS-DA and by three multi-block strategies, i.e., Multi-Block Partial Least Squares (MB-PLS), Sequential and Orthogonalized Partial Least Squares (SO-PLS) and Sequential and Orthogonalized Covariance Selection (SO-CovSel) in order to test whether a data-fusion approach would lead to an improvements of the classification rates. The best results (in terms of predictions) were provided by multiblock strategies; in particular, they provided 100 % of correct classification when applied to discriminate pure and adulterated samples, suggesting these methodologies are definitely suitable for the proposed purpose.
1. Introduction Grappa is an alcoholic beverage obtained by distillation of grape marcs left-over from the vinification process. The name "Grappa" is exclusively reserved for spirits produced in Italy (by Italian pomaces) following a specific productive process strictly regulated. In 2011, this spirit obtained the Geographical Indication label; as a consequence, liquors obtained by the same process outside the Italian Country cannot use this name (but they usually have other typical denominations, e.g., Ouzo in Greece). As an added value product, it is subject to adulteration (e.g., it could be diluted by other spirits having a lower commercial value) and counterfeit (e.g., a lower quality spirit could be sold as Grappa); consequently, checking its authenticity is a relevant issue and different methods have been proposed for this purpose. In general, spirits are characterized by gas-chromatography or isotope ratio measurements [1,2]; for marc-liquors, different approaches based on NMR
⁎
have been proposed [3,4]. Nevertheless, these methodologies present two main issues: they are time-consuming and destructive. For this reason, the intent of the present paper is to propose a non-destructive method aimed at the authentication of Grappa spirits; in particular, MIR and NIR spectroscopy combined with classification approaches are tested for this purpose. So far, these spectroscopic techniques have been applied for classification of fruit [5] or sugarcane spirits [6], but not on Grappa. The work is divided into two parts: in the first one, a methodology intended at distinguishing between Grappa liquors and other fruit/cereals spirits is discussed, while the second one is focused on revealing adulterated (by other spirits having a lower market value) Grappas. Consequently, for what concerns the first part, some pure Grappa and other Italian liquors obtained by distillation of fruits and cereals have been analyzed by NIR and MIR spectroscopy. Then, the distinction between Italian marc spirits and other distillates have been investigated by means of classification approaches.
Corresponding author at: Department of Physical and Chemical Sciences, University of L’Aquila, Via Vetoio, I-67100 L’Aquila, Italy. E-mail address:
[email protected] (A. Biancolillo).
https://doi.org/10.1016/j.vibspec.2020.103040 Received 9 December 2019; Received in revised form 1 February 2020; Accepted 15 February 2020 Available online 19 February 2020 0924-2031/ © 2020 Elsevier B.V. All rights reserved.
Vibrational Spectroscopy 107 (2020) 103040
S. Schiavone, et al.
ZnSe working between 400 and 4000 cm−1 (nominal resolution 4 cm−1). Each spirit/mixture has been placed on the ATR crystal using a glass pipette. Two replicated measures were collected for each sample, meaning that each spirit/mixture was filmed on the crystal, the spectrum was collected, and then, after cleaning the ATR crystal surface, a new aliquot of the same sample was placed on the crystal and measured again. In between samples, the ATR crystal was cleaned by isopropanol, distilled water and then gently dried by soft tissues. MIR spectra have been collected and extracted by the software Spectra v. 1.50. Either the MIR instrument and the software are produced by Perkin-Elmer (San Jose, CA). All the models were built on the spectra averaged over the replicates.
Similarly, in the second part of the present work, mixtures of Grappa and Vodka have been prepared, in order to mimic adulterated samples, they have been analyzed by MIR and NIR spectroscopy and, finally, classification models have been created in order to distinguish pure samples from the adulterated ones (i.e., mixtures Grappa-Vodka). In general, classification methods are widely used for the authentication of foodstuff; additionally, they have been successfully applied for classification of alcoholic beverages [7–19]. In the present work, four different chemometric methods have been used; one based on the individual analysis of data blocks, and three data fusion strategies. In fact, Partial Least Squares Discriminant Analysis (PLS-DA) [20,21], Multi-Block Partial Least Squares (MB-PLS) [22,23], Sequential and Orthogonalized Partial Least Squares (SO-PLS) [24] and Sequential and Orthogonalized Covariance Selection (SO-CovSel) [25], combined with Linear Discriminant Analysis (LDA) were used to achieve two aims: the first one, is to discriminate between Grappa and other Italian fruit/cereal spirits; the second one, is to detect adulterated samples of grappa. These four approaches have been chosen because they are often combined with NIR and MIR spectroscopies for the authentication and the characterization of high added-value food products; in particular, several authors have demonstrated the potential of data-fusion strategies analyzing multi-platform data [26–30] and how they often outperform the individual data block analysis.
2.2.2. Near infrared spectroscopy (NIR) NIR spectra have been acquired by a Nicolet 6700 FT-NIR equipped with an integrating sphere; signals have been acquired in the spectral range between 10,000 and 4000 cm−1 (nominal resolution 4 cm−1). Samples were poured in a cylindrical glass vial (inner diameter 19 mm, height 26 mm) and located over the sphere window. Two replicated measures were collected on each sample, by measuring four different aliquots of the same spirit/mixture in the same (cleaned) glass vial. In between measurements, glass vials were washed by soap and water, distilled water, and acetone. Spectra have been exported by means of the OMNIC software. The FT-NIR instrument, the sphere and the software are Thermo Scientific Inc. products (Madison, WI). All the models were built on the spectra averaged over the replicates.
2. Material and methods 2.1. Samples
2.3. Chemometric methods
Seventy-six (76) samples of different Italian spirits were available for the analysis. Of these, 59 were grappa and 17 other alcoholic beverages obtained by distillation of cereals or fruits (pears, apples or berries). All samples were directly collected from the producers, and they came from two regions of North Italy (Veneto and Trentino Alto Adige); more details are reported in Table 1. As mentioned, in the second part of the work, the possibility of distinguishing between pure and adulterated grappa spirits has been investigated. In order to accomplish this goal, 36 different grappas have been adulterated with two different distilled spirits (vodka) having a lower market value (obtaining thirty-six mixtures Grappa-Vodka). The amount (v/v%) of the adulterants present into the mixtures is reported in Table 2; the two diverse liquors are indicated as A (vodka A) or B (vodka B). The two distilled spirits used as adulterants were acquired in a well-known supermarket chain in Italy.
In the present study, MIR and NIR spectra were collected on pure spirits/mixtures. At the beginning, individual data matrices were used to create PLS-DA models, and then, they were simultaneously analyzed by three different multi-block approaches: MB-PLS-LDA, SO-PLS-LDA and SO-CovSel-LDA. In the related sub-sections below, a brief description of the applied methods is reported. All the calculations were run in Matlab 2015b (The Mathworks Inc., Natick, MA) using in-house routines and functions. 2.3.1. PLS-DA The Partial Least Squares-Discriminant Analysis (PLS-DA) [20,21], as the name suggests, is a discriminant classifier developed by linear discriminant analysis but it overcomes problems related to ill-conditioned data matrices. Briefly, the application of this method is based on the transformation of a classification problem in a regression one. In fact, a predictor block is used to estimate (by PLS) a binary response called dummy Y providing the class information [29]. More details on the algorithm [20,21,31–33] can be found in the provided references. Once the calibration model is built, it is possible to assign unknown samples to categories. In the present study, this aim is achieved following the Bayesian approach proposed by Perez and collaborators [34].
2.2. Instrumental apparatus Each sample (either pure spirits or mixtures) has been analyzed by MIR and NIR spectroscopy; the operative procedures applied for collecting spectra are described in the related sections below. 2.2.1. Mid infrared spectroscopy (MIR) MIR spectra has been collected by a FT-IR Perkin Elmer 1600 Series equipped with a Globar source, a DTGS detector and an ATR cell in
2.3.2. Data fusion As mentioned, it has been demonstrated that it is better to handle multi-platform data sets using multi-block approaches rather than exploiting the blocks in individual models. For this reason, data-fusion approaches aiming at simultaneous extraction of information from MIR and NIR spectra were pursued. In the following sub-paragraphs, the procedures applied for the calculation of multi-block classification models are described. Methodologies are illustrated taking into account the case where two predictor blocks X (N × J ) and Z(N × K ) are used to predict the Y(N × G) response.
Table 1 Pure distillates: type of spirit and origin. N. samples
59 8 4 3 2 Tot
Type of Spirit
Grappa Distillate Distillate Distillate Distillate 76
of of of of
pear cereal apples berries
Origin Trentino Alto Adige
Veneto
30 6 3
29 2 1 3 2 37
39
2.3.2.1. Multi-Block-PLS (MB-PLS). Different algorithms have been 2
Vibrational Spectroscopy 107 (2020) 103040
S. Schiavone, et al.
Table 2 Adulterated Grappa samples: Amount of adulterant (% volume/volume) and type (A:vodka A, B:vodka B). #
Adulterant %(v/v)
Adulterant Type
#
Adulterant %(v/v)
Adulterant Type
#
Adulterant %(v/v)
Adulterant Type
#
Adulterant %(v/v)
Adulterant Type
1 2 3 4 5 6 7 8 9
1% 1% 1% 2% 2% 2% 3% 3% 3%
A B A B A B A B A
10 11 12 13 14 15 16 17 18
4% 4% 4% 5% 5% 5% 6% 6% 4%
A B A B A B A B A
19 20 21 22 23 24 25 26 27
7% 7% 7% 8% 8% 8% 9% 9% 9%
A B A B A B A B A
28 29 30 31 32 33 34 35 36
10% 10% 10% 15% 15% 15% 20% 20% 20%
A B A B A B A B A
proposed for MB-PLS [22,23,26]; the one adopted in the present work is the one proposed by Qin et al. [27]. Briefly, the procedure can be summarized in the following steps:
-residuals EY and the regression coefficients B ). 4) The second input block, Z , is orthogonalized with respect to Xsel (obtaining ZOrth ) 5) Covariance Selection is applied on ZOrth , in order to select a fixed amount of variables. Selected features are organized in a matrix (ZOrth, sel ). 6) ZOrth, sel is used to predict the Y residuals EY (from step 3) by ordinary least squares (and the regression coefficients C ). 7) The full predictive model can be calculated by summing up results from step 3 and step 6: Y = Xsel B + ZOrth, sel C (2)
1 X and Z are block-scaled (obtaining Xn and Zn ) 2 Xn and Zn are concatenated obtaining XC ( XC = [Xn Zn]). 3 A standard PLS regression is calculated, fitting Y to XC (obtaining the PLS-scores TC ). Finally, the classification model is obtained applying LDA on the predicted Y or the scores extracted from the PLS-regression (TC ) [32].
In case one is interested in creating a classification model, an additional step must be added to the previous list:
2.3.2.2. Sequential and orthogonalized-PLS (SO-PLS). Sequential and Orthogonalized-PLS (SO-PLS) [24] is a multi-block regression method where the information is sequentially extracted from the data blocks. Handling the two-predictor blocks case mentioned above, the algorithm is constituted of four main steps:
1) LDA in applied on the predicted Y . A set of functions (Matlab environment) suitable for creating (and validating) SO‐CovSel models is freely downloadable at https://www. chem.uniroma1.it/romechemometrics/research/algorithms
1 Y is fitted to X by PLS (obtaining model parameters, including TX ). 2 Z is orthogonalized with respect to the scores extracted from the first PLS (obtaining ZOrth ). 3 The residuals from step 1. are fitted to ZOrth (obtaining model parameters, including TZOrth ) 4 The full predictive model can be calculated simply summing up predictions from steps 1. and step 3.
3. Results 3.1. Distinction between Grappa and other spirits In order to externally validate the predictive models, samples had been divided into training and test set by the Duplex algorithm [38]. In particular, the calibration set was made of 58 samples (46 belonging to Class Grappa, and 12 to Class “Other Spirits”) and the validation one of 18 objects (13 belonging to Class Grappa, and 5 to Class “Other Spirits”). Spectra were pretreated by different preprocessing approaches prior the creation of any classification model, with the aim of removing spurious variability possibly present in data (mainly due to scattering). In particular, first and second derivatives, calculated according to the Savitzky–Golay method (2nd order polynomial and 19-point window for the 1st derivative, 3rd order polynomial and 19-point window for the 2nd derivative) [39–41], standard normal variate (SNV) [42,43], and their combinations, were tested. The optimal pretreatment and the number of latent variables (LV) to be used for the creation of the calibration models were chosen inquiring classification errors calculated by a 7fold cross-validation.
(1)
Y = XB + ZC
Where B(J × L) and C(K × L) are the regression coefficients from step 1. and 3., respectively. In order to create the classification model, scores TX and TZOrth are concatenated (TCSO = [TX TZOrth]) and then LDA is applied on the resulting matrix TCSO. As discussed in [35,36], the calculation of LDA on scores is equivalent to its application on Y [30]. A set of functions (Matlab environment) suitable for creating (and validating) SO‐PLS models is freely downloadable at https://www. chem.uniroma1.it/romechemometrics/research/algorithms.
ˆ
2.3.2.3. Sequential and orthogonalized-covariance selection (SOCovSel). Sequential and Orthogonalized-Covariance Selection [25] is a sequential multi-block regression method which combines the idea of using the orthogonalization to remove redundancies from data blocks (as in SO-PLS), but it does not involve any transformation of original variables into latent ones. Its algorithm is similar to SO-PLS’ but the feature reduction provided by PLS is replaced by Covariance Selection (CovSel) [37]; briefly, it can be summarized by the following steps:
3.1.1. PLS-DA on MIR spectra As mentioned in the Introduction, PLS-DA was used to distinguish samples of Grappa from the other spirits. Different calibration models were built pretreating spectra with the preprocessing approaches listed in Table 3, and then PLS-DA models were built; the number of latent variables used and the cross-validated correct classification rates obtained from the different models are reported in Table 3. Inspecting the classification rates, it appears that the highest one is obtained when MIR spectra are pretreated by SNV + 1st derivative; this model was used for predictions on the test set and it led to a correct
1) Predictor blocks are mean centered and the response Y is autoscaled. 2) Covariance Selection is applied on X , in order to select an appropriate number of variables. Selected variables are organized in a matrix ( Xsel ). 3) Xsel is fitted to Y by ordinary least squares (obtaining the Y 3
Vibrational Spectroscopy 107 (2020) 103040
S. Schiavone, et al.
Table 3 Grappa vs “Other Spirits”. PLS-DA models on MIR spectra: Number of latent variables and (%) Correct Classification rate per class (CV). Pretreatment
Mean Center. 1st Derivative (+MC) 2nd Derivative (+MC) SNV (+MC) SNV +1st Derivative (+MC) SNV +2nd Derivative (+MC)
LVs
4 4 2 4 2 2
Table 4 Grappa vs “Other Spirits”. PLS-DA models on NIR spectra: Number of latent variables and (%) Correct Classification rate per class (CV).
(%) Correct Classification Rate Class “Grappa” (CV)
Class “Other spirits” (CV)
84.8 87.0 84.8 89.1 89.8 84.8
66.7 66.7 66.7 58.3 75.0 66.7
Pretreatment
Mean Center. 1st Derivative (+MC) 2nd Derivative (+MC) SNV (+MC) SNV +1st Derivative (+MC) SNV +2nd Derivative (+MC)
classification rate of 69.2 % and 40.0 % for class “Grappa” and class “Other Spirits”, respectively.
LVs
1 1 1 2 1 1
(%)Correct Classification Rate Class “Grappa” (CV)
Class “Other spirits” (CV)
60.9 60.9 58.7 69.6 63.0 67.4
66.7 66.7 66.7 50.0 66.7 50.0
absorptions (bolded in green, in Fig. 2) fall in the wavelengths range between 5500 and 6000 cm−1 (which identifies the first overtone of the C–H stretching), and around 7000 cm−1 (which is relative to the first overtone of the OeH stretching).
3.1.1.1. VIP on MIR spectra. Variable Importance in Projection (VIP) indices [44] were calculated in order to test whether variable selection could improve predictions, and to identify which spectral variables contribute the most to classification models. Consequently, a new PLSDA model was calculated on the selected variables. This provided the same classification rates on the validation set. In Fig. 1, the mean spectrum (black line) is displayed; variables having a VIP index higher than 1 are emphasized in green. According to the figure, the most significant variables are mainly those corresponding to the absorption of the ester bond CeO (in the fingerprint region, between 1000 and 1100 cm−1).
3.1.3. Data-fusion In order to divide both MIR and NIR spectra in a training and a test set, these data matrices were concatenated (obtaining XConc = [Xnir Xmir ]) a PCA model was calculated on XConc , and finally the Duplex algorithm was run on PCA-scores. In this way, 58 samples were selected for the training and 18 for the test set. The number of latent variables required by calibration models was defined inspecting classification errors calculated by a 7-fold cross-validation. 3.1.3.1. MB-PLS-LDA. MB-PLS-LDA models were created as described in Paragraph 2.4.1 after data-pretreatment; the number of latent variables used and the correct classification percentage rates per class are reported in Table 5. The MB-PLS-LDA model was calculated on data blocks preprocessed by the optimal preprocessing approaches highlighted in the PLS-DA analysis on individual blocks (i.e., SNV+ 1st derivative for both NIR and MIR blocks) and divided by their Frobenius norm (to avoid the data set having more variables drove the model). The calibration model was then applied on the test set, and it provided a correct classification rate for class “Grappa” of 61.5 % and of 40.0 % for class “Other Spirits”, corresponding to 5 and 3 misclassified samples, respectively.
3.1.2. PLS-DA on NIR spectra As described above for MIR spectra, also NIR signals were divided into a training and a test set, and then classification models were built. After data pretreatment, PLS-DA models were built; the number of latent variables used and the correct classification rates per class are reported in Table 4. Inquiring results, it appears that the most suitable pretreatment is SNV+ 1st derivative; the model built on pretreated data was applied on the test set (preprocessed accordingly) and it has led to a correct classification rate of 53.8 % for class “Grappa” (correspondent to 6 misclassified grappa samples) and of 40.0 % for the other spirits (indicating that 3 samples belonging to class “Other Spirits” were wrongly assigned).
3.1.3.2. SO-PLS-LDA. After MB-PLS-LDA, a mid-level classification approach was tested. The SO-PLS-LDA model is calculated on data blocks pretreated by those preprocessing approaches chosen as the most suitable building the PLS-DA models on the individual data sets; consequently, both spectral blocks were preprocessed by SNV and 1st derivative. Either possible orders of the data sets were tested; the best results (in terms of lowest classification error in cross validation) were provided by the model built using NIR as 1st input block and MIR as
3.1.2.1. VIP on NIR spectra. As it was done for MIR data, VIP indices were calculated also on NIR spectra (Fig. 2). Consequently, a new PLSDA model was calculated on the 330 selected spectral variables. This provided slight better results in prediction (on the test set); in fact, it correctly classified 61.5 % of grappa samples and the 40 % of the other spirits. Investigating the selected variables, it came out the relevant
Fig. 1. VIP on MIR spectra. Selected variables are highlighted in green. 4
Vibrational Spectroscopy 107 (2020) 103040
S. Schiavone, et al.
Fig. 2. VIP on NIR spectra. Selected variables are highlighted in green.
by VIP on the individual blocks, but, in this case, also variables between 3000 and 3500 cm−1 (ascribable to the stretching of OeH or NeH groups) were selected.
Table 5 Grappa vs “Other Spirits”. MB-PLS-LDA: Number of latent variables (LVs) Correct Classification rate per class both in calibration (%CV) and in prediction (%test set). LVs
4
3.1.3.3. SO-CovSel-LDA. The SO-CovSel-LDA model is calculated using the data preprocessed as described above for SO-PLS-LDA (Section 3.1.3.2). The order of the blocks and the number of variables to be selected were chosen in a 7-fold cross-validation procedure: the best model was defined looking at the lowest classification error (on the training set). Finally, the calibration model was built using NIR as 1st input block and MIR as the 2nd one; the selected variables were 1 from NIR and 5 from MIR. The application of the calibration model to the test set led to the same results discussed for SO-PLS-LDA; i.e., to a correct classification rate of 76.9 % for Class Grappa and 60 % for Class “Other Spirits”. Also in this case results can be represented as histogram of the canonical variates distribution (Fig. 4). In the plot (Fig. 4a for the calibration model and Fig. 4b for the validation one), red and blue bars represent samples belonging to Class “Grappa” and Class “Other Spirits”, respectively. Objects belonging to the two different groups present a quite clear trend: grappa samples show highly negative canonical variate scores, while the other spirits are closer to zero. As above-mentioned, SO-CovSel-LDA naturally provides information regarding the variables which contribute the most to the classification problem. The selected feature from the NIR block is the absorption at 5323 cm−1 attributable to the second overtone of the C]O or to the first one of the OeH stretching. Concerning MIR, selected variables are those at 1054 cm−1 and 1038 cm−1 ascribable to the ester bond CeO and some skeletal vibrations (features at 674 cm−1, 654 cm−1 and 650 cm−1). Despite Covariance Selection is a much more parsimonious (in terms of number of selected variables) approach than VIP, these findings are in agreement with the VIP analysis pursued on individual blocks.
(%) Correct Classification Rate Class “Grappa” (CV)
Class “Other spirits” (CV)
Class “Grappa” (test set)
Class “Other spirits” (test set)
80.4
66.7
61.5
40.0
2nd one; a 7-fold cross-validation procedure was used also to assess the optimal complexity of the model; the best solution was achieved extracting 3 LVs from both blocks. The calibration model was then applied to the test set and it provided correct classification rate of 76.9 % for Class Grappa (corresponding to 3 misclassified samples), and 60 % for Class “Other Spirits” (indicating two objects belonging to this category were wrongly assigned). A graphical representation of the results is displayed in Fig. 3, where histograms show the distribution of the canonical variates for the two groups (in red for Class Grappa and in blue for Class “Other Spirits”). Fig. 3a and Fig. 3b refer to calibration and validation model, respectively. From the plots it is possible to recognize a clear trend: grappa samples present highly negative scores values, whereas objects belonging to Class “Other Spirits” fall at positive or close to zero canonical variates scores. The test samples that do not follow this tendency are misclassified: from Fig. 3b is straightforward to recognize that this happens for 3 object belonging to Class Grappa and 2 appertaining to Class “Other Spirits”. In order to inspect which variables contributed the most to the SOPLS-LDA model, VIP analysis has been pursued following the embedded strategy suggested in [45]. This led to a good agreement with the relevant features highlighted
Fig. 3. Distribution of canonical variate scores for Class Grappa (in red) and Class “Other Spirits” (in blue). a) Calibration set; b) Validation set. 5
Vibrational Spectroscopy 107 (2020) 103040
S. Schiavone, et al.
Fig. 4. Distribution of canonical variate scores for Class Grappa (in red) and Class “Other Spirits” (in blue). a) Calibration set; b) Validation set.
3.2. Distinction between pure and adulterated grappa spirits
classified by the model.
As abovementioned, the aim of the present study was not only to test whether it is possible to distinguish between grappas and other distillates, but also to realize a classification model which would allow discerning between pure and adulterated grappa spirits (i.e., between Class “Pure” and Class “Adulterated”). Consequently, adulterated samples have were prepared as described in Section 2.1, measured by MIR and NIR spectroscopy and then analyzed by PLS-DA and data fusion approaches. The 95 samples (59 Pure Grappa + 36 Adulterated Grappa) were divided into training and test as follows. The first 6 PCs calculated on the MIR and NIR data of the adulterated samples were concatenated and duplex algorithm was applied to this matrix to select 22 samples to be included in the training set and 14 samples to be included in the test set. Successively, the 14 pure grappa samples used to prepare the adulterated samples selected for the validation set were also added to that set, while the remaining 45 were included in the training set. Eventually, based on this selection, the data set was split into a calibration set of 67 samples (45 “Pure” and 22 “Adulterated”) and a validation set of 28 samples (14 “Pure” and 14 “Adulterated”).
3.2.2. PLS-DA on NIR spectra The possibility of realizing a classification approach to detect adulterated grappa spirits based on NIR spectroscopy was tested as well. NIR spectra was preprocessed (using the same approaches mentioned in Section 3.1) and then PLS-DA analysis was pursued on the pretreated data sets. Results are reported in Table 7. From the table, it appears that, independently of the preprocessing approach applied, correct classification rates are always the same. The calibration model used to predict test samples was the one built on data preprocessed by bare mean centering; this led to 92.9 % of correct classification rate for Class “Pure” and 100.0 % for Class “Adulterated” (corresponding to 1 misclassified sample). 3.2.3. Data-fusion Despite the models built on the individual data blocks provided extremely good predictions, also data fusion approaches were calculated, in order to investigate whether they could provide additional information on the system under study. Independently on the data-fusion method used, MB-PLS-LDA, SOPLS-LDA and SO-CovSel-LDA models were calculated on MIR and NIR data pretreated by the optimal preprocessing approach defined on the individual PLS-DA models. In order to take into account the variability present in all the data blocks, calibration and validation sets were reorganized following the same procedure described in Section 3.1.3.
3.2.1. PLS-DA on MIR spectra Nine PLS-DA models have been created, one per each preprocessed data set. The number of latent variables used and the classification rate per class (%) in 7-fold cross validation are reported in Table 6. From the table it is clear the optimal preprocessing approach for the data at study is bare mean centering. Consequently, the model realized on pretreated spectra was applied to the test set (preprocessed accordingly), proving 100 % of correct classification on both classes. A graphical representation of results is reported in Fig. 5. In the plot, the predicted Y is displayed against the test sample index: pure grappas are represented as red diamonds while adulterated samples are displayed as black triangles; the blue dashed line is the delimiter between the pure (above) and the adulterated (below) grappa region. From Fig. 5 is straightforward each test sample is correctly
3.2.3.1. MB-PLS-DA. The MB-PLS-DA model was built on mean centered and block-scaled matrices (i.e., normalized by the Frobenius’ norm). The calibration model, whose optimal complexity resulted to be 4 latent variables, when applied on the test set provided correct classification rates of 92.9 % and 100 % for Pure and Adulterated grappas, respectively. 3.2.3.2. SO-PLS-LDA. Similarly as before, the SO-PLS model was built on data blocks preprocessed by the pretreatments chosen as the most suitable in the individual PLS-DA; consequently, MIR (1st input block) and NIR (2nd input block) spectra were pretreated by bare mean centering. This calibration model, whose optimal complexity resulted 4 latent variables per block, correctly classified all training samples; when applied on the validation set, it led to 100 % of correct classification rates for both categories. The scores plot is reported in Fig. 6: from the plot is straightforward samples belonging to the two classes are well separated; pure grappa samples (red diamonds) fall at positive values of LV1 while the adulterated objects (black triangles) present negative values for this
Table 6 Pure Grappa vs Adulterated. PLS-DA models on MIR spectra: Number of latent variables and (%) Correct Classification rate per class (CV). Pretreatment
Mean Center. 1st Derivative (+MC) 2nd Derivative (+MC) SNV (+MC) SNV +1st Derivative (+MC) SNV +2nd Derivative (+MC)
LVs
2 3 4 2 2 4
(%) Correct Classification Rate Class “Pure” (CV)
Class “Adulterated”
100.0 100.0 97.8 95.6 100.0 100.0
100.0 95.5 54.5 95.5 63.6 50.0
6
Vibrational Spectroscopy 107 (2020) 103040
S. Schiavone, et al.
Fig. 5. PLS-DA analysis on MIR data – Predicted Y vs sample index. Legend: Red diamonds: Class Grappa; Black Triangles: Class “Adulterated”. The blue dashed line represents the boundary between the class-regions. Empty and filled symbols represent calibration and validation samples, respectively.
component.
Table 7 Pure Grappa vs Adulterated. PLS-DA models on NIR spectra: Number of latent variables and (%) Correct Classification rate per class (CV). Pretreatment
Mean Center. 1st Derivative (+MC) 2nd Derivative (+MC) SNV (+MC) SNV +1st Derivative (+MC) SNV +2nd Derivative (+MC)
LVs
3 2 2 1 2 3
3.2.3.3. SO-CovSel-LDA. The SO-CovSel-LDA model was built on data blocks preprocessed by bare mean centering. Also in this case, the most suitable order of the blocks was chosen in a 7-fold cross-validation procedure; in this case, the MIR set was selected as first input block, and NIR spectra was used as the second one. Covariance Selection selected 1 and 3 variables, respectively, on the two blocks. This model led to 100 % of correct classification rates on the test set for both Class “Pure” and “Adulterated”. As above-mentioned, this method naturally provides information about the most relevant variables for the solution of the problem under study. A graphical representation of selected features is reported in Fig. 7, where average MIR (Fig. 7a) and NIR (Fig. 7b) spectra are displayed in blue, and the most relevant variables are circled in red. The most relevant MIR variable is the one at 1042 cm−1, attributable to the ester bond CeO. Instead, for what concerns the NIR spectra, the most relevant variables from the classification point of view are the one at 9149 cm−1 probably linked to the second overtone of the C–H stretching, the one at 7338 cm−1, attributable to the first overtone of the OeH stretching and the one at 4017 cm-1 identifiable as the absorption provided by the combination bands of C–H or SeH stretching.
Correct Classification Rate (%) Class “Pure” (CV)
Class “Adulterated” (CV)
100.0 100.0 100.0 100.0 100.0 100.0
100.0 100.0 100.0 100.0 100.0 100.0
4. Conclusions Two different classification problems were faced in the present study. In the first part, grappa and other distillates of fruits and cereals were investigated by NIR and MIR (combined with classification tools) in order to investigate whether it would be possible to discern grappa samples from the other spirits. In order to achieve this goal, PLS-DA was applied on NIR and MIR spectra (separately) and then three different multi-block strategies (MB-PLS-LDA, SO-PLS-LDA and SO-CovSel-LDA) were pursued. In order to ease the comparison among the outcomes of the different approaches, the classification rates on the external set provided by the diverse strategies are summarized in Table 8. As expected, the data fusion approaches led to better results (in
Fig. 6. SO-PLS-LDA analysis: Scores plot. Legend: Red diamonds: Class Grappa; Black Triangles: Class “Adulterated”. Empty and filled symbols represent calibration and validation samples, respectively.
7
Vibrational Spectroscopy 107 (2020) 103040
S. Schiavone, et al.
Fig. 7. Variable Selection: a) Mean MIR spectra are displayed in blue. Selected variables by SO-CovSel-LDA are circled in red. b) Mean NIR spectra are displayed in blue. Selected variables by SO-CovSel-LDA are circled in red. Table 8 Comparison among the tested approaches for both the classification problems. Discrimination between Grappa and “Other spirits” Classification Rate (% on test set) PLS-DA on MIR Grappa Other Spirits 69.2 40.0
PLS-DA on NIR Grappa Other Spirits 53.8 40.0
MB-PLS-LDA Grappa 61.5
Other Spirits 40.0
SO-PLS-LDA Grappa 76.9
Other Spirits 60.0
SO-CovSel-LDA Grappa Other Spirits 76.9 60.0
Discrimination between Pure Grappa and Adulterated Classification Rate (% on test set) PLS-DA on MIR Pure Grappa 100.0
Adulterated 100.0
PLS-DA on NIR Pure Grappa 92.9
Adulterated 100.0
MB-PLS-LDA Pure Grappa 92.9
Adulterated 100.0
SO-PLS-LDA Pure Grappa 100.0
Adulterated 100.0
SO-CovSel-LDA Pure Grappa 100.0
Adulterated 100.0
References
terms of predictions) than the individual strategies and, in particular, SO-PLS-LDA and SO-CovSel-LDA provided comparable, quite satisfactory results. On the other hand, the second part of the study was focused on testing whether it would be possible to distinguish pure grappa from the adulterated one. Consequently, pure and adulterated samples were analyzed by NIR and MIR and then PLS-DA analysis (on the individual blocks) and data fusion approaches were tested. The prediction capability of these models was extremely good (Table 8); in particular, PLS-DA on MIR and two of the multi-block strategies (SO-PLS-LDA and SO-CovSel-LDA) provided 100 % of correct classification on all test samples, suggesting these methodologies are definitely suitable for detecting adulterated grappa samples.
[1] V. Giannetti, M. Boccacci Mariani, F. Marini, P. Torrelli, A. Biancolillo, Food Control 105 (2019) 123–130. [2] C. Bauer-Christoph, H. Wachter, N. Christoph, A. Rßmann, L. Adam, Z Lebensm Unters Forsch A 204 (1997) 445–452. [3] C. Fotakis, M. Zervou M, Food Chem. 196 (2016) 760–768. [4] C. Fotakis, D. Christodouleas, K. Kokkotou, M. Zervou, P. Zoumpoulakis, P. Moulos, M. Liouni, A. Calokerinos, Food Chem. 138 (2013) 1837–1846. [5] M. Jakubíková, J. Sádecká, A. Kleinová, P. Májek, J. Food Sci. Technol. 53 (2016) 2797–2803. [6] L.C. De Carvalho, C.D.L.M. De Morais, K.M.G. De Lima, L.C.C. Júnior, P.A.M. Nascimento, J.B. De Faria, G.H.D.A. Teixeira, Anal. Methods 8 (2016) 5658–5666. [7] M. Gishen, R.G. Dambergs, Aust. N.Z. Grapegrow. Winemak. 414 (1998) 43–45. [8] J. Tóthová, J. Sádecká, P. Májek, Czech J. Food Sci. 27 (2007) 425–432. [9] M. Jakubíková, J. Sádecká, A. Kleinová, P. Májek, J. Food Sci. Technol. 53 (2016) 2797–2803, https://doi.org/10.1007/s13197-016-2254-4. [10] O. Anjos, A.J. Santos, L.M. Estevinho, I. Caldeira, Food Chem. 205 (2016) 28–35. [11] O.A. Kolomiets, D.W. Lachenmeier, U. Hoffmann, H.W. Siesler, J. Near Infrared Spectrosc. 18 (2010) 59–67. [12] X. Capron, J. Smeyers-Verbeke, D.L. Massart, Food Chem. 101 (2007) 1585–1597. [13] E. Mattarucchi, M. Stocchero, J.M. Moreno-Rojas, G. Giordano, F. Reniero, C. Guillou, J. Agric. Food Chem. 58 (2010) 12089–12095. [14] A. Biancolillo, R. Bucci, A.L. Magrì, A.D. Magrì, F. Marini, Anal. Chim. Acta 820 (2014) 23–31. [15] B. Brownfield, T. Lemos, J.H. Kalivas, Anal. Chem. 90 (2018) 4429–4437. [16] M. Silvestri, A. Elia, D. Bertelli, E. Salvatore, C. Durante, M. Li Vigni, A. Marchetti, M. Cocchi, Chemometr. Intell. Lab. Syst. 137 (2014) 181–189. [17] G. Papotti, D. Bertelli, R. Graziosi, M. Silvestri, L. Bertacchini, C. Durante, M. Plessi, J. Agric. Food Chem. 61 (2013) 1741–1746. [18] C. Durante, C. Baschieri, L. Bertacchini, D. Bertelli, M. Cocchi, A. Marchetti, S. Sighinolfi, Food Chem. 173 (2015) 557–563. [19] L. Vera, L. Aceña, J. Guasch, R. Boqué, M. Mestres, O. Busto, Talanta 87 (2011) 136–142. [20] M. Sjöström, S. Wold, B. Söderström, PLS discriminant plots, in: E.S. Gelsema, L.N. Kanal (Eds.), Pattern Recognition in Practice, Elsevier, Amsterdam, 1986, pp. 461–470. [21] L. Ståhle, S. Wold, J. Chemometr. 1 (1987) 185–196. [22] I.E. Frank, B.R. Kowalski, Anal. Chim. Acta 162 (1984) 24–251. [23] I.E. Frank, J. Feikema, N. Constantine, B.R. Kowalski, J. Chem. Inf. Comput. Sci. 24 (1984) 20–24.
Declaration of Competing Interest All the authors declare that they have no conflict of interest.
CRediT authorship contribution statement Stefano Schiavone: Formal analysis, Writing - original draft. Benedetta Marchionni: Formal analysis. Remo Bucci: Methodology. Federico Marini: Methodology, Conceptualization, Writing - review & editing. Alessandra Biancolillo: Methodology, Conceptualization, Writing - review & editing.
Acknowledgments We would like to thank Italian Customs and Monopolies Agency (Agenzia delle Dogane e dei Monopoli) for providing the spirits. 8
Vibrational Spectroscopy 107 (2020) 103040
S. Schiavone, et al.
[35] A. Biancolillo, I. Måge, T. Næs Chemom, Intell. Lab. Syst. 141 (2015) 58–67. [36] A. Biancolillo, T. Naes, M. Cocchi (Ed.), Data Handling in Science and Technology, Vol. 31, Chap. 6, Elsevier, Amsterdam, 2019, pp. 157–177, , https://doi.org/10. 1016/B978-0-444-63984-4.00006-5. [37] J.M. Roger, B. Palagos, D. Bertrand, E. Fernandez-Ahumada, Chemom. Intell. Lab. Syst. 106 (2011) 216–223. [38] R.D. Snee, Technometrics 19 (1977) 415–428. [39] A. Savitzky, M.J.E. Golay, Anal. Chem. 36 (1964) 1627–1639. [40] J. Steinier, Y. Termonia, J. Deltour, Anal. Chem. 44 (1972) 1906–1909. [41] H.H. Madden, Anal. Chem. 50 (1978) 1383–1386. [42] R.J. Barnes, M.S. Dhanoa, S.J. Lister, Appl. Spectrosc. 43 (1989) 772–777. [43] M. Dhanoa, S. Lister, R. Sanderson, R.J. Barnes, J. Near Infrared Spectrosc. 2 (1994) 43–47. [44] S. Wold, E. Johansson, M. Cocchi, H. Kubinyi (Ed.), 3D QSAR in Drug Design, ESCOM Science Publishers, Leiden, 1993, pp. 523–550. [45] A. Biancolillo, K.H. Liland, I. Måge, T. Næs, R. Bro, Chemom. Intell. Lab. Syst. 156 (2016) 89–101.
[24] T. Næs, O. Tomic, B.H. Mevik, H. Martens, J. Chemometr. 25 (2011) 28–40. [25] A. Biancolillo, F. Marini, J.M. Roger, J. Chemometr. (2019) e3120, https://doi.org/ 10.1002/cem.3120. [26] A. Biancolillo, R. Boqué, M. Cocchi, F. Marini, Data fusion strategies in food analysis, in: M. Cocchi (Ed.), Data Handling in Science and Technology, Vol. 31, Chap. 10, Elsevier, Amsterdam, 2019, pp. 271–310, , https://doi.org/10.1016/B978-0444-63984-4.00010-7. [27] S.J. Qin, S. Valle, M.J. Piovoso, J. Chemometr. 15 (2001) 715–742. [28] L.E. Wangen, B.R. Kowalski, J. Chemometr. 3 (1988) 3–20. [29] A. Biancolillo, F. Marini, A.A. D’Archivio, J. Food Comp. Anal. 86 (2020) 103351, , https://doi.org/10.1016/j.jfca.2019.103351. [30] P. Firmani, A. Nardecchia, F. Nocente, L. Gazza, F. Marini, A. Biancolillo, Food Chem. (2019) 125677, https://doi.org/10.1016/j.foodchem.2019.125677. [31] M. Barker, W. Rayens, J. Chemometr. 17 (2003) 166–173. [32] U. Indahl, H. Martens, T. Naes, J. Chemometr. 21 (2007) 529–536. [33] H. Nocairi, E.M. Qannari, E. Vigneau, D. Bertrand, Comput. Stat. Data Anal. 48 (2004) 139–147. [34] N.F. Perez, J. Ferrè, R. Boquè, Chemometr. Intell. Lab. Syst. 95 (2009) 122–128.
9