Accepted Manuscript Title: Synergistic effect of the simultaneous chemometric analysis of 1 H NMR spectroscopic and stable isotope (SNIF-NMR, 18 O, 13 C) data: Application to wine analysis Author: Yulia B. Monakhova Rolf Godelmann Armin Hermann Thomas Kuballa Claire Cannet Hartmut Sch¨afer Manfred Spraul Douglas N. Rutledge PII: DOI: Reference:
S0003-2670(14)00577-7 http://dx.doi.org/doi:10.1016/j.aca.2014.05.005 ACA 233249
To appear in:
Analytica Chimica Acta
Received date: Revised date: Accepted date:
23-1-2014 24-4-2014 2-5-2014
Please cite this article as: Yulia B.Monakhova, Rolf Godelmann, Armin Hermann, Thomas Kuballa, Claire Cannet, Hartmut Sch¨afer, Manfred Spraul, Douglas N.Rutledge, Synergistic effect of the simultaneous chemometric analysis of 1H NMR spectroscopic and stable isotope (SNIF-NMR, 18O, 13C) data: Application to wine analysis, Analytica Chimica Acta http://dx.doi.org/10.1016/j.aca.2014.05.005 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Synergistic effect of the simultaneous chemometric analysis of 1H NMR spectroscopic and stable isotope (SNIF-NMR, 18O, 13C) data: application to wine
ip t
analysis
Yulia B. Monakhovaa,b,d*, Rolf Godelmanna, Armin Hermannc, Thomas Kuballaa, Claire
a
us
cr
Cannetb, Hartmut Schäferb, Manfred Spraulb, Douglas N. Rutledgee
Chemisches und Veterinäruntersuchungsamt (CVUA) Karlsruhe, Weissenburger Strasse 3,
an
76187 Karlsruhe, Germany b
Bruker Biospin GmbH, Silbersteifen, 76287 Rheinstetten, Germany Landesuntersuchungsamt -Institut für Lebensmittelchemie und Arzneimittelprüfung, Emy-
Roeder-Straße 1, 55129 Mainz, Germany
M
c
d
e
Ac ce pt e
Russia
d
Department of Chemistry, Saratov State University, Astrakhanskaya Street 83, 410012 Saratov,
AgroParisTech, UMR 1145, Ingénierie Procédés Aliments, 16 rue Claude Bernard, F-75005 Paris
*
Corresponding author. Tel.: 0721-926-5453 Fax: 0721-926-5539. E-mail address: yul-
[email protected] (Y.B. Monakhova)
Graphical abstract
Page 1 of 40
2 Highlights
1
H NMR profilings of 718 wines were fused with stable isotope analysisdata (SNIF-
NMR, 18O, 13C)
The best improvement was obtained for prediction of the geographical origin of wine
Certain enhancement was also obtained for the year of vintage (from 88-97% for 1H
ip t
NMR to 99% for the fused data)
Independent component analysis was used as an alternative chemometric tool for
us
cr
classification
an
Abstract
It is known that 1H NMR spectroscopy represents a good tool for predicting the grape variety,
M
the geographical origin, and the year of vintage of wine. In the present study we have shown that classification models can be improved when 1H NMR profiles are fused with stable
d
isotope (SNIF-NMR, 18O, 13C) data. Variable selection based on clustering of latent variables
Ac ce pt e
was performed on 1H NMR data. Afterwards, the combined data of 718 wine samples from Germany were analyzed using Linear Discrimination Analysis (LDA), Partial Least Squares – Discriminant Analysis (PLS-DA), Factorial Discriminant Analysis (FDA) and Independent Components Analysis (ICA). Moreover, several specialized multiblock methods (Common Components and Specific Weights Analysis (ComDim), consensus PCA and consensus PLSDA) were applied to the data.
The best improvement in comparison with 1H NMR data was obtained for prediction of the geographical origin (up to 100% for the fused data, whereas stable isotope data resulted only in 60-70% correct prediction and 1H NMR data alone in 82-89% respectively). Certain enhancement was obtained also for the year of vintage (from 88-97% for 1H NMR to 99% for the fused data), whereas in case of grape varieties improved models were not obtained.
Page 2 of 40
3 The combination of 1H NMR data with stable isotope data improves efficiency of classification models for geographical origin and vintage of wine and can be potentially used for other food products as well.
ip t
Keywords: wine authentication; German wine; nuclear magnetic resonance; stable isotope
1. Introduction
M
an
us
cr
analysis; chemometrics; data fusion
d
Despite rich information generated by modern analytical platforms, the analysis of a
Ac ce pt e
dataset obtained from a single analytical technique may be too limited to provide a holistic picture of the phenomena under study. Therefore, during recent years, hyphenated analytical techniques have been increasingly used, and multiple measurements on the same samples are carried out by many laboratories using different techniques. Therefore, the development of specialized methods for analysis of such data is of great necessity in analytical chemistry. There are not many applications of data fusion in chemical analysis so far. However,
the number of such studies is definitely growing [1-5]. For example, fusion of the data from mass spectrometry (MS) and diode-array detectors (DAD) was applied to solve coelution problems in liquid chromatography [4]. Mixtures of biocide compounds in model mixtures and in environmental samples were analyzed in this case. Data fusion of HPLC-DAD and MS was used for evaluation of ketoprofen photodegradation processes [5]. Data fusion of three analytical techniques – MS e-nose, mid-IR optical-tongue and UV-visible spectroscopy - was
Page 3 of 40
4 used to predict production factory of the same beer brand with the 95% correct classification [6]. Chromatographic techniques (LS/MS and GC/MS) well complemented 1H NMR profiling from metabolomics studies of biological fluids such as rat urine or cerebrospinal fluid [7,8]. Fusion of 1H NMR and UV-VIS data was used to determine banned dyes (Sudan III and IV)
ip t
in culinary species with the classification rates between 80% and 100% [9]. Data fusion from Raman and near infrared (FT-NIR) images of pharmaceutical formulations allowed to detect
cr
possible problems during production [10].
us
Regarding wine analysis, data fusion of NIR and mid infrared (MIR) techniques were previously used to discriminate 96 red wine, that were aged in oak barrels, in stainless steel
an
tanks with oak chips and in stainless steel tanks alone [11]. However, the information provided by these methods is based on the same physical phenomenon (vibration of the
M
bonds) and, therefore, easier to be fused. In another study, the phenolic extracts of wines from three varieties were analyzed using two-dimensional
1
H-13C heteronuclear NMR
d
spectroscopy. The resulting multiway tensor can be considered as a multiblock dataset with a
Ac ce pt e
matrix of 81 observations (samples) and 625 variables (1H dimension) measured across 413 data blocks (13C dimension). [12]. The best obtained model showed 87.6% of discrimination accuracy. To the best of our knowledge, the fusion of 1H NMR and stable isotope data has not been applied for wine analysis (and for any other food in general) so far. Nowadays, the most powerful analytical methods available for wine analysis are
isotopic analysis and 1H NMR spectroscopy. The possibilities of 1H NMR spectroscopy in non-targeted wine analysis were discussed in some publications [13-17] as well as in our ongoing project aimed at analysis of a big dataset (n=1383) of German wines (results to be published separately). The second technique, isotopic analysis, is now the official and standard method in Europe and North America for routine use in testing the authenticity of several food products and beverages, including wine [18-21]. The method is based on the measurement of stable
Page 4 of 40
5 isotope content (2H, 13C, 18O) of the product or of a specific component such as an ingredient or target molecule. The determinations are carried out using site-specific natural isotope fractionation – nuclear magnetic resonance (SNIF-NMR) and/or Isotopic Ratio Mass Spectrometry (IRMS), and these values could provide information on the botanical and
ip t
geographical origin of a food product [18-20,22]. The main idea behind these official methods is that each plant has its own characteristic range of naturally occurring stable isotopes of 13
C), hydrogen (1H, 2H), oxygen (16O,
18
O) and nitrogen (14N,15N), whose
cr
carbon (12C,
us
distribution has been influenced by a number of physical and/or biochemical properties. The main application of stable isotope analysis in wine analysis is the assessment of
an
authenticity, i.e. the determination of illegal sugar, alcohol, and water addition beside the verification of false labeling of the geographical origin or vintage. Originally, 2H/1H ratios of
M
fermentative ethanol in wine were proved to be useful for the detection of sugar addition [23]. Later, it was shown that stable isotope ratios of wines are affected by the geographical origin,
d
the year of vintage and, to a certain extent, the grape variety [19,20,22,24-29]. Therefore, we
Ac ce pt e
have hypothesized that stable isotope data can improve our chemometric models based on 1 H NMR profiling and the investigation of this possibility is the main focus of our work. Apart from classical classification methods (PCA, LDA, FDA, and PLS-DA), in this
article we have also applied independent components analysis (ICA) as an alternative chemometric tool for the same purpose. The preliminary goal of this method is to extract the pure signals from a data set of mixed signals by finding a transformation that minimizes dependences between “pure” sources (called ICs) [30,31]. This technique has been previously extensively used for multicomponent spectroscopic analysis of different matrices [32-35,3538]. On the other hand, little is known about the applicability of ICA algorithms for solving classification problems [39]. It is worth mentioning that up to now classification of wines was mainly based on the analysis of either elements and isotopes [19,20,22,25-28,40,41] or organic constituents
Page 5 of 40
6 [14,42]. To the best of our knowledge, there is no study that combines isotopic composition of wine (and any food product in general) and information about organic composition obtained
2.
Experimental
2.1.
Samples
ip t
by 1H NMR.
13
C) were measured. Authentic samples
us
samples, stable isotope values ((D/H)I, (D/H)II, 18O,
cr
In total, 1383 wines were collected and analyzed by 1H NMR. For a subset of 718
of pure grape variety wines of vintages 2005 and 2010 were taken from wine research
an
institutes in the Federal State Baden-Württemberg, Wine Research Institute Freiburg and Wine Research Institute Weinsberg. Wines, microvinified according to protocol of EU
M
regulation 2729/2000 for EU Wine Data Base, were collected from official wine research institutes in Baden-Württemberg and Rheinland-Pfalz as (EU Database Wines). All 13
d
German wine growing regions were considered: Baden BAD, Württemberg WT, Pfalz PFL,
Ac ce pt e
Rheinhessen RHH, Mosel-Saar-Ruwer MSR, Franken, Nahe NAH, Sachsen, Saale-Unstrut, Mittelrhein MRH, Rheingau, Ahr, Hessische Bergstrasse. All samples had not been blended with any other variety, other vintage or wine from other regions. The overview of the collected samples regarding vintage and geographical origin is shown in Table 1. In general, wines of the 37 grape varieties were analyzed. The main red grape varieties
were Pinot noir (116/49), Dornfelder (86/69), Lemberger (26/4), Portugieser (23/20), Trollinger (18/6), Regent (14/5), and Pinot Meunier (12/2). Regarding white wine, Riesling (342/247), Müller Thurgau (121/64), Pinot blanc (81/31), Kerner (63/45), Pinot gris (43/16), Chardonnay (16/8), and Gutedel (11/4)) grape varieties were dominant. In the brackets the number of samples measured by NMR/Stable isotopes techniques is shown. Other wines were from unknown grape varieties or the sample number was smaller than 10.
Page 6 of 40
7 2.2.
1
H NMR Experiments
1
H NMR measurements were performed under full automation for the whole process
on an AVANCE III 400 at Bruker BioSpin GmbH, Rheinstetten, Germany, equipped with a 5mm 1H/D-TXI probe-head with z-gradient, automated tuning and matching accessory and
NMR experiments can be found in previous publications [14,43]. Stable isotope analysis
cr
2.3.
ip t
BTO-2000 for temperature control. Sample preparation and all acquisition parameters of 1H
us
The (D/H) ratios were determined at the methyl (D/H)I and methylene (D/H)II sites of the ethanol molecule according to Resolution OIV/OENO 381/2009 [44]. (D/H)I mainly
an
characterizes the plant species which synthesized the sugar and to a lesser extent the geographical location of the place of harvest (type of water used during photosynthesis).
M
(D/H)II represents the climatology of the place of production of the grapes (type of rain and weather conditions) and to a lesser extent the sugar concentration of the original must.
d
R=2*(D/H)II)/(D/H)I express the relative enrichment or depletion of the methylene site, the
Ac ce pt e
methyl site being arbitrarily given the statistical weight of 3. A random distribution of deuterium within the ethyl fragment – as is the case for petrochemically synthesized ethanol would therefore be characterized by a value R=2. The value of R varies according to the biochemical pathways (C3, C4 or CAM) of the plant producing the sugar and to a smaller extend on the conditions employed by the fermentation process. The 13C/12C isotope ratio was measured by IRMS according to Resolution OIV/OENO
381/2009 [45]. The 18O/16O isotopic ratios of the water from wine were determined by IRMS according to Resolution OIV/OENO 353/2009 [46]. In brief, the values were determined by IRMS using the ions m/z 46 (12C16O18O) and m/z 44 (12C
16
O2) which were obtained after
equilibrium of the isotope exchange of water and carbon dioxide. The exchange reaction 12
C16O2 + H218O <—> 12C16O18O + H216O proceeds via the solved hydrogen carbonate and is
temperature dependant. After cryogenic separation from water and ethanol, the carbon dioxide Page 7 of 40
8 in the vapour phase was used for analysis. The
18
O/16O-isotope ratio of water can be
calculated and expressed as the relative difference δ18O ‰ versus the standard „V-SMOW“ (Vienna Standard Mean Ocean Water).
Spectral preprocessing and chemometrics 1
ip t
2.4.
H NMR spectra were preprocessed by bucketing using AMIX v.3.9.12 (Bruker
cr
BioSpin GmbH, Rheinstetten, Germany). Spectral intensities were scaled to total intensity
us
(namely, when each spectrum is set to have unit total intensity by expressing each data point as a fraction of the total spectral integral) and reduced to integrating regions of equal width
an
(0.01 ppm) within the spectral region of δ 9.5-0.5 ppm. The signals of the regions between the ethanol satellites, of water and of acetic acid were excluded from the analysis. The final
M
pretreated data were converted to ASCII files and transferred for multivariate analysis. MATLAB v. 7.0 (The Math Works, Natick, MA, USA) and SAISIR package for
d
MATLAB [47] were used for further statistical calculations. Multiway (n-way) analysis of
Ac ce pt e
variance (ANOVA) was used for testing effects of multiple factors (vintage, grape variety, origin) on whole stable isotope data set (n=718) as well as on subsets of 1H NMR data (i.e., 2009 vintage or Rieslings wines). Statistical significance was assumed at below the 0.05 probability level.
The following methods – ICA, LDA, FDA, and PLS-DA – were applied to the
concatenated 1H NMR and stable isotope data. In this study LDA and FDA were applied to the PCA scores. Furthermore, several specific methods have also been used for analysis: Common Component and Specific Weight Analysis (ComDim) [51,52], CPCA-W [53] and MB-PLS-DA [54-56] multiblock methods. For evaluation only stable isotope data (5 variables) Hotelling t-test, which uses the Mahalanobis distance was additionally utilized for comparison (values were estimated for 95% confidence interval) [57,58].
Page 8 of 40
9 The technique of cross-validation was applied to determine the optimal number of latent variables required to obtain robust models. During test set validation, cross-validation was once again applied on this reduced training set to check if the optimal number of latent variables is the same.
ip t
Concatenation is a straightforward method of data fusion but might require appropriate block scaling to prevent one block of the data being totally dominant. This is especially
18
O,
13
C – with only 5 variables). Therefore, a number of
us
isotope data ((D/H)I, (D/H)II, R,
cr
important in our case, where 1H NMR data should be fused with completely unrelated table
different methods (mean-centering, weighting, auto-scaling, inverse of the sum of squares,
an
root square scaling, log scaling and second derivatives) were tried.
For variable selection of 1H NMR data we used multiway analysis of variance
M
(ANOVA) method and clustering of latent variables (CLV) method [48,49]. In multiway ANOVA we considered a variable (bucket) to be significant, when its p-value was less than
d
0.05. The CLV method involves two stages, namely a hierarchical clustering analysis
Ac ce pt e
followed by a partitioning algorithm. Partitioning is determined by the value of a quality criterion (T) – the sum of the first eigenvalues of the data matrices of each clusters. The discriminant power of each group of latent variables is assessed by ANOVA [48,49]. An unsupervised technique, Independent Components Analysis (ICA), was also
utilized in this study [30,33,35]. In the present paper, the Mutual Information Least Dependent Component Analysis (MILCA) ICA algorithm was applied. The MILCA algorithm has MATLAB interfaces and is available for free on the internet [50]. To get the 'scores' of a set of samples using ICA we used the following formula: Scores=X * Signal * inv(Signal' * Signal), where X is the matrix with buckets of new samples and Signal is the set of the calculated IC vectors. The sample was considered to be correctly classified if its ‘scores’ were found within the 95% probability ellipsoid [39].
Page 9 of 40
10 Different confusion matrices containing information about actual and predicted group memberships made by each classification algorithm were established. Based on the data obtained, the sensitivity and specificity rates were calculated. All chemometric models were validated using leave-one-out cross validation as well
Results and discussion
cr
3.
ip t
as test set validation (approximately one-fourth of the complete data set).
us
During our preliminary investigations, PCA was performed separately on the 1H NMR spectra and stable isotope data for the whole data set trying to construct models for the
an
discrimination of grape varieties (separately for red and white wine varieties), vintage and geographical origin. This would be the most desirable case, because it would allow the
M
determination of all three main characteristics of a wine sample independently of any a priori knowledge about it. However, this first evaluation led us to the conclusion that the variations
d
within the data sets (1H NMR spectra and stable isotope data) were too important to be
Ac ce pt e
considered in one scatter plot regarding these major parameters. The reason for this is that, apart from main wine features (grape variety, vintage and geographical origin), each wine has its specific additional characteristics (for example, special conditions of grape growing or wine production and storage), which influence 1 H NMR profiling and stable isotope values. Therefore, for the sake of simplicity or transparency, we tried to bring the complex multicriterion problem down to individual problems with smaller numbers of samples. To do this, we constrained our database by setting one of the parameters (grape variety or year of vintage) to a constant. For example, we analyzed only wines from the 2009 vintage regarding geographical origin classification. For the same reason we considered separately models for the differentiation of white and red wine grape varieties. We selected subsets for chemometric analysis so that they include the biggest number of samples possible so as to provide acceptable validation (see Table 1) and also based on ANOVA calculation.
Page 10 of 40
11
3.1.
Variable selection for 1H NMR spectra of wines Before data fusion, a reasonable set of variables of the 1H NMR data set has to be
selected. After that we can regard our multivariate analysis as a higher level of fusion in
ip t
comparison with simple concatenation, which usually gives inferior results [59]. It is also a reasonable way to reduce the number of variables for 1H NMR in relation to the small number
cr
of isotope variables (five). The detailed analysis of 1H NMR data set without variable
us
selection is to be published separately.
The analysis of ΔT (change in the sum of the first eigenvalue of the data matrix of
an
each clusters) observed during hierarchical clustering showed that there are eight significant clusters of variables in the three sub data sets considered (vintage 2009, Riesling and red
M
wines, see Table 2 for details). As an example, the graph for Riesling wines showed that the increase of the ΔT criterion is significant when passing from a partitioning of nine to eight
d
groups (Fig. 1). ANOVA on the latent variables calculated by PCA on the eight groups of the
Ac ce pt e
retained partition was used to identify the significant components for discriminating wine regarding geographical origin, year of vintage and grape variety (Table 2). The variables belonging to groups 2 and 6 (geographical origin), 2-4 for year of vintage and 1-2 for red wine grape variety are highly significant for explaining the wine parameters and were subjected for further analysis.
All the significant buckets (chemical shifts) were not listed because of the relatively
large number of the resulting variables (the 896 buckets in the original data set were reduced to 713 buckets for geographical origin). However, we have found that, for example, for discrimination of the red wine varieties, the resonances of lactic acid (group 1), shikimic acid (group 2), citric acid (group 1), malic acid (group 1), acetic acid (group 2), and arginine (group 2) are the most responsible. Glycerol (group 2), succinic acid (group 6), lactic acid (group 6), proline (group 2), malic acid (group 2) and phenolic compounds (groups 2 and 6)
Page 11 of 40
12 have discriminant power for geographical origin. Differences in glucose (group 1), fumaric acid (group 2), tartaric acid (groups 3), alanine (groups 3), glycerol (groups 2), succinic acid (groups 1) contents and phenolic profile (groups 1-3) correlated to differences between vintages.
ip t
The resulting NMR data were analyzed by different multivariate methods (LDA, FDA, PLS-DA, ICA) to evaluate the ability of the selected variables to classify the samples
cr
regarding their membership. Table 3 shows the classification results (leave-one-out cross
us
validation was used) after applying multiway ANOVA and CLV methods in comparison with the initial entire set of buckets. Examining the classification results in Table 3, it is obvious
an
that the correct classification rate for the two methods has increased (for example, from 85% on average without variable selection to 86% with multiway ANOVA and 89% with CLV in
M
the case of geographical origin). It can be also noted that variable selection reduces the
complicated.
d
optimal number of factors in the classification models and, therefore, makes them less
Ac ce pt e
It is no surprise that variable selection has positive influence on multivariate models,
as only those variables associated with the underlying phenomenon that is most correlated with the wine groups are retained to perform classification [48,49]. In all three cases (geographical origin, vintage and red wine grape variety), CLV gave better results than multiway ANOVA. Therefore, the reduced 1H NMR datasets after CLV were later fused with stable isotope data.
3.2.
Analysis of stable isotope data An overview of the obtained stable isotope data for 718 investigated wines is present
in Table 4. The (D/H)I and (D/H)II values were found to be 101.7±1.1 and 127.4±1.6 respectively. The R values (linear combination of (D/H)I and (D/H)II data) were 2.505±0.039
Page 12 of 40
13 on average. The average values of
18
O and
13
C were found to be -0.38±1.32‰ and -
28.5±1.0‰ respectively (Table 4).
3.2.1. Geographical origin
ip t
One of the main applications of stable isotope analysis regarding wine authentication is the verification of the labeling of the geographical origin [20,26,28,40,58,60,61]. The
cr
biggest available study is based on 5220 Italian wine samples collected in the period of 2000-
us
2010 [57,58]. It was shown that geographical origin as well as illegal addition of water and sugar can be better determined using multivariate methods than using univariate approach,
an
which is similar to official method [57,58]. Another interesting study was conducted for Slovenian wines (n=120) of three wine growing regions, where LDA and PCA based on the 13
C,
18
O) allowed to obtain information on regional variability and
M
stable isotopes data (2H,
vintage [28]. It was found that it is possible to discriminate between wines grown in distinct
d
areas within the Piemont region in Italy based on D/H and 13C isotopic ratios [20].
Ac ce pt e
We performed multiway ANOVA (influence of three parameters on stable isotope
wine profiling). P-values for all 718 samples were found to be 0.007, 0.021 and 0.35 for geographical origin, vintage and grape variety, respectively, meaning that stable isotope data are significantly affected by origin and vintage. Therefore, the differentiation of wines according geographical origin would result in the most promising models. For the reason mentioned above, we selected only the wines from the 2009 vintage as an example for further analysis. From Fig.2 it can be seen that wines from Pfalz and Rheinhessen are clearly separated for this vintage. Two other groups, Nahe and Mosel, while being separated from the above-mentioned clusters, overlap with each other. In practice, however, the differentiation between NAH and MSR wines has more scientific than practical importance. These regions fit to a small overlapped region (Fig.3), and even when
Page 13 of 40
14 overlapping problem is present, producers would not discover this because isotopic values from databank are confidential. In this case the first two PCs explained 77.5% of total variability. Not taking into account the remaining 22.5% of variability could prevent reliable classification of new samples if
ip t
sugaring and/or watering of those wines needs to be considered additionally. Therefore, PCA was not utilized as a classification tool but only for data visualization.
cr
We also performed PCA on the example of 111 wines from 2009 vintage using only
us
these parameters (18O, (D/H)I and (D/H)II) and excluding R and 13C values from consideration. However, we did not observe any shrinkage of 95% contour line on PC plot. On the contrary,
an
the performance of PCA decreased (probably due to small number of evaluated variables). Looking at the map of German wine regions (Fig. 3), we can see that although the
M
overlapping regions (NAH and MSR) are not directly bordering, they are situated very close to each other. We analyzed the importance of each stable isotope parameter on the
d
discrimination of the geographical origin of the wines by ANOVA. We observed that
18
O,
Ac ce pt e
(D/H)I and (D/H)II have the biggest F-values and, thus, are significant parameters for the discrimination of geographical origin.
The next important factor to be considered is the harvesting time of the grapes, which
in principle could affect the statistical discrimination on the geographical origin. Instead of performing separate PCAs for early and late harvested wines (which would be difficult, because this parameter is severely influenced by geographical origin and by grape variety), we analyzed the importance of the harvesting time on the discrimination of geographical origin by ANOVA. Although harvesting time has a significant influence on the model, the other four variables - (D/H)I, (D/H)II, 13C and especially 18O have much higher F-values and thereby outweigh the much smaller effect of “harvesting time”. We observed that 18O has the biggest F-value meaning it is the most important parameter for geographical origin discrimination.
Page 14 of 40
15 Therefore, we cannot exclude any of the stable isotope parameters from our models, and consequently all of them were used for data fusion. Based on the circle of correlation (projections of the correlations of the initial variables with Principal Components) we can examine relationship between variables (Fig .4). We
other and, at the same time are not correlated with (D/H)II and R.
13
ip t
concluded that 13C, 18O and also to a smaller degree (D/H)I are positively correlated with each C and
18
O have similar
18
cultivation subregions than
O isotopic ratios more selective and efficient in differentiating
13
us
where it was shown that
cr
influence on the discrimination model contrary to the previous study of Brazilian wines,
C values from ethanol [25]. According to our experience,
18
O
an
and 13C values do not correlate strictly positively due to relatively cold weather conditions in Germany, and the correlation can be observed only for certain vintages (as in our case for
M
2009) and could be completely absent for wines from late-harvested grapes. A significant negative correlation was found between the buckets between 2.77 ppm 1
H NMR profiling and both the
d
and 2.91 ppm of the
13
C- and 18O- values. The
Ac ce pt e
corresponding 1H-signals are due to malic acid, which is degraded during regular grape maturation. At higher temperatures, this degradation is more pronounced. Elevated temperatures also lead to a stronger evaporation and thereby to higher 18O- and 13C-values. Of course, this relation holds only as long as the initial malic acid of the wine has not been removed (e.g. by malo-lacto-fermentation).
3.2.2. Year of vintage
Moreover, to check whether the different years of vintage have any influence on the isotope ratios of German wines, the data of wines from different vintages of Riesling variety were analyzed using PCA. This dataset was selected in this case because it contains the maximum number of samples (n=247) among other grape varieties. Moreover, vintage influences the 1H NMR profiling to a bigger extent than origin or variety (see below) and,
Page 15 of 40
16 thus making the application of data fusion more effective. We observed inferior but still sufficient classification in this case (PC1/PC3): the 2005 and 2009 groups can barely be differentiated. Several literature sources support our observations that stable isotopes can be
ip t
useful for vintage discrimination [19,28,60].
3.2.3. Grape variety
cr
The third important wine parameter to be considered is grape variety. In some studies
us
it was discussed that wines produced from different grape varieties showed different isotopic patterns [19,20,40,61]. Although it can be assumed that the grape variety could influence to a
an
certain extent stable isotope composition, we did not obtain any significant clustering for either red wine or white wine grape varieties using PCA. In this case, we did not test vintage
M
or geographical origin because the number of samples of each particular grape variety for each subset would have been too small for classification and validation. The probable
d
explanation for this is that harvesting time varies significantly within grape varieties [62].
Ac ce pt e
Grapes are harvested in Germany over several months (for example, from 28.08 to 25.10 for the 2007 vintage in our case). Therefore, since our database included wines from “early” and “late” harvested grape, the grape variety discrimination seems to be almost impossible. Nevertheless, we tried to discriminate four red wine grape varieties (Dornfelder,
Portugieser, Regent, and Pinot noir) using reduced data sets: wines from Pfalz (n=49) and from 2005 vintage (n=33). In both cases, multivariate analyses were not able to find significant clusterings of our groups, although the results have been improved in comparison with those obtained for the whole data set.
3.2.4. Classification results Several classification methods (LDA, FDA, and PLS-DA) as well as ICA were evaluated for predicting class membership with respect to geographical origin, red wine grape variety and
Page 16 of 40
17 vintage of wine samples from stable isotope data alone (Table 5). Table 5 also contains information about the optimal number of latent variables (LV). It should be mentioned that the number of LV for the whole data set and reduced calibration data set (without samples that were used for independent test set validation) was the same in each case. Regarding the
ip t
geographical origin of the wines from the 2009 vintage, we obtained only 60-70% correct classification rate for the four areas (PFL, NAH, MSR, and RHH). Another approach for
18
O,
13
C) were used for geographical origin discrimination instead of the optimum
us
NMR, R,
cr
assessing wine authenticity was recently evaluated [58]. In that study all five variables (SNIF-
number of latent variables as in our case (Table 5). Our 95% confidence ellipsoids (for
an
example, Fig.2) correspond to the 95% of confidence level of the chi-squared distribution in this study [57,58]. Using this approach for the data we obtained average specificity rate of
M
72%, which is a bit better than obtained by other chemometric methods (Table 5). As expected, the sensitivity was found to be equal to our predefined confidence level of 95%. It
Ac ce pt e
analysis (Table 5).
d
is clear, however, that NMR has more discrimination power than stable isotopes for wine
As it was mentioned in the section 3.2.1, PCA plot cannot be reliable tool for
evaluation of multivariate approach for predicting wine parameters (Fig.2). As alternative, to estimate the percentage of overlapped samples one can consider confusion matrix obtained by any of the multivariate method used (LDA, FDA, ICA or PLS-DA). As an example, the confusion matrix obtained by LDA for predicting geographical origin of wines showed that, although the average classification rate is 61%, the greatest uncertainty in class membership prediction between MSR and NAH regions (Table 6), which is in a good agreement with the visual results of Fig.2. For the year of vintage all chemometric methods used showed comparable efficiencies of 61-62%. As expected, the worst results were obtained for differentiation of red wine grape varieties, which ranged from 35% for FDA and LDA to 58% for PLS-DA (Table 5). Page 17 of 40
18 Specificity based on the Mahalanobis distance test [57,58] was found to be 60% for the year of vintage and 30% for red wine grape variety, which is comparable with classification rates obtained by other four approaches (LDA, PLS-DA, FDA and ICA) (Table 5). It can be seen that in all cases considered, classification based on 1H NMR data is superior to that obtained
ip t
with stable isotope data (Table 5). We expected, however, that wine authentication will greatly benefit if the discriminant powers of both techniques are combined thereby improving
3.3.
us
cr
classification rates (Table 5).
Fusion of 1H NMR and stable isotope data for wine analysis
an
The next logical step would be to try to combine complementary information of 1H NMR data after variable selection and stable isotope profiling. It can be expected that this
M
would result in better multivariate models reflecting the different sources of information. To avoid the possible problem that after variable selection significant variables (which were
d
initially non-significant) will be omitted from the combined model (which of course would
Ac ce pt e
lead to inferior differentiation), we tried to fuse initial buckets. However, the results were definitely better after variable selection.
3.3.1.
Chemometric analysis of concatenated data
The percentages of correctly classified samples obtained only by 1H NMR, stable
isotope and concatenated data using ICA, LDA, FDA and PLS-DA are summarized in Table 5. Moreover, we used a MB-PLS-DA method, which can be regarded as a multi-block extension of PLS-DA modelling [54-56]. In this study we utilized an algorithm which deflate the response block (either geographical origin or year of vintage or grape variety) Y with the super scores to built the MB-PLS-DA model [54]. The particular choices for the Riesling wine sub-set - for vintage differentiation and the vintage 2009 subset for geographical origin classification - were motivated by the results
Page 18 of 40
19 of multiway ANOVA with respect to 1H NMR data. For the 2009 vintage data set the following values were found: porigin=0.010 and pvariety=0.041, which means that origin influences more on 1H NMR spectral profiling of wine than does grape variety. The p-values for Riesling wines (porigin=0.013 and pvintage=0.008) also motivated our choice to study this
ip t
data set for vintage discrimination. For the analysis of fused data, preprocessing is very important especially in our case,
cr
where spectroscopic and discrete data are combined. The MB-PCA models are also prone to
us
be dominated by large variance variables as is PCA, but are more robust against noise as they attempt to model the common trend among the different blocks. We have found that auto-
an
scaling was not a good solution because it puts noise at the same variance level as the chemical signals. We observed the best classification models when the inverse of the sum of
total variance of each block equals 1).
M
squares of the block scaling factor was used (i.e. after applying the block scaling factor, the
d
As expected, the best improvement in comparison to separate analysis of both data sets
Ac ce pt e
was obtained for prediction of geographical origin. Using leave-one-out cross validation we have even achieved 100% correct classification for the fused data by PLS-DA, whereas stable isotope data alone resulted in just 60-70% correct prediction (multiclass model) and in 72% specificity according to Mahalanobis distance test and 1H NMR data in 84-94%. As the performance of the method can be different from year to year due to climatic impacts, we calculated the percentages of correct classification rates for prediction of geographical origin for other vintages. We have found that the concatenation with stable isotope data always increases discrimination power of NMR spectroscopy: from 94% to 95% for 2005, from 94% to 99% for 2006, from 95% to 97% for 2007, and from 96% to 99% for 2010 (LDA and leave-one-out cross validation was used for evaluation). The specificity values (average for all regions) based on Hotelling t-test based on the Mahalanobis distance for stable isotope data
Page 19 of 40
20 alone were 51% in 2005, 75% in 2006, 61% in 2007 and 63% in 2010. These values explain the biggest gain in efficiency of concatenation for the 2006 vintage. A definite improvement was obtained also for the year of vintage (from 93-98% and for 1H NMR and 61-62% for multiclass model and 60% specificity (Hotelling t-test) for stable
ip t
isotope analysis), compared to 99% for the fused data) (Table 5). In the case of grape varieties, data fusion has no benefits in comparison with 1H NMR models, probably due to the
cr
fact that stable isotope data do not contain much information about grape variety in our case
us
and only introduce additional noise in the fused model (Table 5). Some improvement of 1 H NMR models can be expected for the specific vintages or geographical origins, however,
an
additional samples need to be collected in this case.
In order to avoid over-optimistic results, test set validation has to be done with the
M
samples that were not used to build the calibration models. For this purpose, the independent test sets consisted of approximately one-fourth of randomly selected wine samples (Table 5).
d
The rest of the available data was included in the calibration data sets. To provide more
Ac ce pt e
reliable test set validation, we repeated the training/test splitting ten times. Due to the considerable number of samples at our disposal, during independent set validation we have identified the same significant factors in initial and reduced training data sets for each method. The best results were achieved by PLS-DA for geographical origin and grape variety and by ICA for the year of vintage (Table 5). We also calculated the sensitivity and specificity values for all methods (Table 5). The
sensitivity is defined as the number of true positives divided by the number of true positives + false negatives. This value outlines the probability that the test is positive, when the wine sample actually belongs to the defined group. The specificity (true negatives / (true negatives + false negatives) describes the probability that the test is negative when the wine sample indeed does not belong to the defined group. The values of specificity were generally lower than sensitivity (Table 5). Satisfactory sensitivity results (88-96% for geographical origin, 89-
Page 20 of 40
21 97% for year of vintage) were found for all methods. The specificity values were slightly inferior, and were 85-90% for geographical origin and 87-94% for the year of vintage. As expected, unacceptable sensitivity/specificity rates were obtained for red wine grape varieties (Table 5). The same approach was found useful for evaluating wine authenticity based on
ip t
stable isotope values [57,58] and in our previous 1H NMR study on rice [39]. It should be mentioned that in our study, ICA was applied separately to 1H NMR and
cr
stable isotope data sets as well as to the fused data (Table 3,5). Despite being an unsupervised
us
method, in most cases ICA showed comparable or only slightly inferior performance compared to the classical supervised classification methods used (LDA, PLS-DA, and FDA),
an
thus demonstrating its applicability for solving classification problems. Moreover, to the best of our knowledge, we are the first to apply ICA to the analysis of discrete data (stable
M
isotopes) and concatenated data from different analytical techniques. 3.3.2.
Multiblock (MB) methods - Consensus PCA and Common Component and
Ac ce pt e
wine
d
Specific Weight Analysis (ComDim) - for the determination of geographical origin of
To continue illustrating the performance of specialized methods, we tried other
multiblock methods for 111 wines from the 2009 vintage. First, the data were analyzed separately by standard PCA. The results obtained using stable isotope profiling have already been discussed (Fig.2). In this case clusters (Mosel and Nahe) overlap with each other. Using 1
H NMR data, better discrimination was observed, as expected. However, the others clusters
(Rheinhessen and Nahe) overlap and full differentiation of clusters cannot be had (results not shown). Fortunately, full discrimination can be achieved when the two data sets are taken into account, for example, by using one of the multiblock PCA methods - CPCA-W [53] (Fig. 5). It is very easy to see that all four clusters can be completely separated (Fig. 5), that MB methods are able to integrate the advantages of each analytical technique into a combined
Page 21 of 40
22 model. This result came somewhat as a surprise because all four wine production regions (RHH, PFL, MSR, and NAH) are very closely located and even border with each other (for example, PFL-RHH or NAH-RHH) (Fig.3). The same data were analyzed by another MB tool - Common Components and
ip t
Specific Weights Analysis (ComDim) [51,52,64]. The method consists in determining a common space for several blocks (two in our case), with each matrix having a specific
cr
weighting (“salience”). The best separation for four classes was found in the D1 and D3 plane
us
at 99% probability level (Fig. 6A). The saliences indicate that in the first two dimensions, stable isotope data play a more significant role, while 1H NMR is mostly responsible for the
an
third dimension, resulting in the best clustering in D1-D3 (Fig. 6B).
For the multiblock models we took the 111 samples from both late and early
M
harvesting season (harvesting time lay between 15.09.2009 and 11.11.2009), therefore, the models could be used for the entire 2009 season. However, the classification models cannot
d
be used to predict geographical origin of wines from other vintages. To verify this fact we
Ac ce pt e
have predicted geographical origin of 24 “new” wine samples from 2005, 2006, 2007 and 2010 vintages by visualizing them on consensus PCA plot. For each year two samples from each wine production area (RHH, MSL, PFL, and NAH) were selected. Of these samples, only four wines from PFL and RHH (all from the 2010 vintage) were correctly classified, being projected within the 95% confidence ellipsoids. Other samples were found outside the 95% confidence ellipsoids of the consensus PCA model. This result suggests that neither 1H NMR nor stable isotope data includes vintage-invariant information for the differentiation of geographical origin. Practically speaking, an application of the presented technique will therefore have to be based on annually updated databanks. Thus, both Consensus PCA and ComDim can be used to determine the geographical origin of wines. We did not apply these methods for the determination of year of vintage and
Page 22 of 40
23 grape variety, due to the low efficiency of data fusion in these cases (Table 5). We believe that such a combination is an important improvement for wine and food authentication.
4.
Conclusions
ip t
The combination of data from different analytical platforms is a powerful strategy to enrich the final information content and thus improve the classification results. In this study
cr
several classification strategies with concatenated data have been applied to get the benefits
us
from the synergistic effect of the information obtained from two techniques: 1H NMR and stable isotopes. 1H NMR spectroscopy of wine is becoming quite easy nowadays thanks to
an
simple sample preparation and fast automated measurements. Moreover, we have shown that good classification models regarding important wine parameters (year of vintage and
M
geographical origin) can be constructed. On the other hand, stable isotope ratio analysis has been recognized in recent years to provide a good source of chemical information for the
d
authenticity assessment of food products. Although being relatively time-consuming, these
Ac ce pt e
methods improve classification rates of 1H NMR spectroscopy for the determination of geographical origin and vintage of wine. Therefore, we believe that it is worthwhile to use data fusion of these techniques in combination with multivariate analysis for wine control in cases where high levels of certainty are required.
Acknowledgments
We express our great thank to all
institutions providing wine samples, especially E.
Annweiler and M. Metschies, Chemisches und Veterinäruntersuchungsamt Freiburg. We gratefully acknowledge S. Klein, and J. Geisser Chemisches und Veterinäruntersuchungsamt Karlsruhe for accurate sample preparation. Finally, the authors are grateful for Yun Xu for useful suggestions about multiblock analysis.
Page 23 of 40
24
References
Ac ce pt e
d
M
an
us
cr
ip t
[1] A.K. Smilde, M.J. van der Werf, S. Bijlsma, B.J.C. van der Werff-van der Vat, R.H. Jellema, Fusion of mass spectrometry-based metabolomics data, Anal.Chem.77 (2005) 6729-6736. [2] T.I. Dearing, W.J. Thompson, C.E. Rechsteiner, B.J. Marquardt, Characterization of crude oil products using data fusion of process raman, infrared, and nuclear magnetic resonance (NMR) spectra, Appl.Spectrosc. 65 (2011) 181-186. [3] Y. Xu, E. Correa, R. Goodacre, Integrating multiple analytical platforms and chemometrics for comprehensive metabolic profiling: application to meat spoilage detection, Anal.Bioanal.Chem. 405 (2013) 5063-5074. [4] E. Pere-Trepat, R. Tauler, Chemometrics modelling of organic contaminats in fish and sediment river samples, J.Chromatogr.A 1131 (2006) 85-96. [5] S. Mas, R. Tauler, A. de Juan, Chromatographic and spectroscopic data fusion analysis for interpretation of photodegradation processes, J.Chromatogr.A 1218 (2011) 92609268. [6] L. Vera, L. Acena, J. Guasch, R. Boque, M. Mestres, O. Busto, Discrimination and sensory description of beers through data fusion, Talanta 87 (2011) 136-142. [7] J. Forshed, H. Idborg, S.P. Jacobsson, Evaluation of different techniques for data fusion of LC/MS and 1H-NMR, Chem.Intell.Lab.Syst. 85 (2007) 102-109. [8] A. Smolinska, L. Blanchet, L. Coulier, K.A.M. Ampt, T. Luider, R.Q. Hintzen, S.S. Wijmenga, L.M.C. Buydens, Interpretation and visualization of non-linear data fusion in kernel space: study on metabolomic characterization of progression of multiple sclerosis, PLoS One 7 (2012) e38163 - e38163. [9] C.V. Di Anibal, M.P. Callao, I. Ruisanchez, 1H NMR and UV-visible data fusion for determining Sudan dyes in culinary spices, Talanta 84 (2011) 829-833. [10] F.C. Clarke, M.J. Jamieson, D.A. Clark, S.V. Hammond, R.D. Jee, A.C. Moffat, Chemical image fusion: the synergy of FT-NIR and Raman mapping microscopy to enable a more complete visualization of pharmaceutical formulations, Anal.Chem. 73 (2001) 2213-2220. [11] S.J. Tao, J.M. Li, J.H. Li, J.B. Tang, J.R. Mi, L.L. Zhao Discriminant analysis of red wines from different aging ways by information fusion of NIR and MIR spectra, in IFIP Advances in Information and Communication Technology, 369 (2012), 478-483. [12] J. Boccard, D.N. Rutledge, A consensus orthogonal partial least squares discriminant analysis (OPLS-DA) strategy for multiblock Omics data fusion, Anal.Chim.Acta 769 (2013) 30-39. [13] Y.S. Hong, NMR-based metabolomics in wine science, Magn.Res.Chem. 49 (2011) S13-S21. [14] R. Godelmann, F. Fang, E. Humpfer, B. Schütz, M. Bansbach, H. Schäfer, M. Spraul, Targeted and nontargeted wine analysis by (1)h NMR spectroscopy combined with multivariate statistical analysis. Differentiation of important parameters: grape variety, geographical origin, year of vintage, J.Agric.Food Chem. 61 (2013) 5610-5619. [15] M. Anastasiadi, A. Zira, P. Magiatis, S.A. Haroutounian, A.L. Skaltsounis, E. Mikros, 1H NMR-Based Metabonomics for the Classification of Greek Wines According to Variety, Region and Vintage – Comparison with HPLC Data J.Agric.Food Chem. 57 (2009) 11067-11074.
Page 24 of 40
25
[20]
[21] [22]
M
[23]
ip t
[19]
cr
[18]
us
[17]
K. Ali, F. Maltese, R. Toepfer, Y.H. Choi, R. Verpoorte, Metabolic characterization of Palatinate German white wines according to sensory attributes, varieties, and vintages using NMR spectroscopy and multivariate data analyses, J.Biomol.NMR 49 (2011) 255-266. M. Koda, K. Furihata, F. Wei, T. Miyakawa, M. Tanokura, NMR-based metabolic profiling of rice wines by F2-selective TOCSY spectra, J.Agric.Food Chem. 60 (2012) 4818-4825. H.-L. Schmidt, Food quality control and studies on human nutrition by mass spectrometric and nuclear magnetic resonance isotope ratio determination, Fresenius J. Anal. Chem. 324 (1986) 760-766. D.A. Magdas, S. Cuna, G. Cristea, R.E. Ionete, D. Costinel, Stable isotopes determination in some Romanian wines, Isot.Environ.Health Stud. 48 (2012) 345-353. C. Aghemo, A. Albertino, R. Gobetto, F. Spanna, Correlation between isotopic and meteorological parameters in Italian wines: a local-scale approach, J.Sci.Food Agric. 91 (2011) 2088-2094. M. Perini, F. Camin, δ18O of ethanol in wine and spirits for authentication purposes, J.Food Sci. 78 (2013) C839-C844. D. Costinel, A. Tudorache, R.E. Ionete, R. Vremera, The impact of grape varieties to wine isotopic characterization, Anal.Lett. 44 (2011) 2856-2864. G.J. Martin, M.L. Martin, F. Mabor, M.J. Michon, A new method for the identification of the origin of ethanols in grain and fruit spirits: high-field quantitative deuterium nuclear magnetic resonance at the natural abundance level, J.Agric.Food Chem. 31 (1983) 311-315.
an
[16]
Ac ce pt e
d
[24] A. Monetti, F. Reniero, G. Versini, Classification of Italian wines on a regional scale by means of a multi-isotopic analysis Z.Lebensm.Unters.Forsch. 199 (1994) 311-316. [25] L. Adami, S.V. Dutra, A.R. Marcon, G.J. Carnieli, C.A. Roani, R. Vanderlinde, Geographic origin of southern Brazilian wines by carbon and oxygen isotope analysis, Rapid Commun.Mass Spectrom. 24 (2010) 2943-2948. [26] I.J. Košir, M. Kocjančič, N. Orginc, J. Kidrič, Use of SNIF-NMR and IRMS in combination with chemometric methods for the determination of chaptalisation and geographical origin of wines (the example of Slovenian wines), Anal.Chim.Acta 429 (2001) 195-206. [27] F. Guyon, L. Gaillard, M.H. Salagoity, B. Medina, Intrinsic ratios of glucose, fructose, glycerol and ethanol 13C/12C isotopic ratio determined by HPLC-co-IRMS: toward determining constants for wine authentication, Anal.Bioanal.Chem. 401 (2011) 15511558. [28] N. Ogrinc, I.J. Košir, M. Kocjančič, J. Kidrič, J.Agric.Food Chem. 49 (2001) 14321440. [29] D.A. Magdas, S. Cuna, G. Cristea, R.E. Ionete, D. Costinel, Stable isotopes determination in some Romanian wines, Isot.Environ.Health Stud. 48 (2012) 345-353. [30] A. Hyvärinen, J. Karhunen, E. Oja Independent component analysis, Wiley, New York, 2001. [31] A. Cichocki, S. Amari Adaptive blind signal and image processing. Learning algorithms and applications, Wiley, New York, 2002. [32] Y.B. Monakhova, S.S. Kolesnikova, S.P. Mushtakova, Independent component analysis algorithms for spectral decomposition in UV/VIS analysis of metalcontaining mixtures including multimineral food supplements and platinum concentrates, Anal.Methods 5 (2013) 2761-2772.
Page 25 of 40
26
Ac ce pt e
d
M
an
us
cr
ip t
[33]Y.B. Monakhova, S.A. Astakhov, A.V. Kraskov, S.P. Mushtakova, Independent components in spectroscopic analysis of complex mixtures, Chem.Intell.Lab.Syst. 103 (2010) 108-115. [34] Y.B. Monakhova, S.P. Mushtakova, S.S. Kolesnikova, Chemometrics-assisted spectrophotometric method for simultaneous determination of vitamins in complex mixtures, Anal.Bioanal.Chem. 397 (2010) 1297-1306. [35] D.N. Rutledge, D. Jouan-Rimbaud Bouveresse, Independent Components Analysis with the JADE algorithm, Trends Anal.Chem. 50 (2013) 22-32. [36] I. Schelkanova, V. Toronov, Independent component analysis of broadband nearinfrared spectroscopy data acquired on adult human head, Biomed.Opt.Express. 3 (2012) 64-74. [37] M. Mecozzi, M. Pietroletti, M. Scarpiniti, R. Acquistucci, M.E. Conti, Monitoring of mucilage fromation in Italian seas investigated by infrared spectroscopy and independent component analysis, Environ.Monit.Assess. 184 (2012) 6025-6036. [38] D. Jouan-Rimbaud Bouveresse, A. Moya-González, F. Ammari, D.N. Rutledge, Two novel methods for the determination of the number of components in independent component analysis models, Chem.Intell.Lab.Syst. 112 (2012) 24-32. [39] Y.B. Monakhova, D.N. Rutledge, A. Rossmann, H. Waiblinger, M. Mahler, M. Ilse, T. Kuballa, D.W. Lachenmeier, Determination of rice type by 1H NMR spectroscopy in combination with different chemometric tools rice, 28 J Chemometr. (2013) 83-92. [40] R.D. Di Paola-Naranjo, M.V. Baroni, N.S. Podio, H.R. Rubinstein, M.P. Fabani, R.G. Badini, M. Inga, H.A. Ostera, M. Cagnoni, E. Gallegos, E. Gautier, P. Peral-Garcia, J. Hoogewerff, D.A. Wunderlin, Fingerprints for main varieties of argentinean wines: terroir differentiation by inorganic, organic, and stable isotopic analyses coupled to chemometrics, J.Agric.Food Chem. 59 (2011) 7854-7865. [41] I. Geana, A. Iordache, R. Ionete, A. Marinescu, A. Ranca, M. Culea, Geographical origin identification of Romanian wines by ICP-MS elemental analysis. Food Chemistry 138 (2013) 1125-1134. [42] C.J. Bevin, R.G. Dambergs, A.J. Fergusson, D. Cozzolino, Varietal discrimination of Australian wines by means of mid-infrared spectroscopy and multivariate analysis, Anal.Chim.Acta 621 (2008) 19-23. [43] Y.B. Monakhova, H. Schäfer, E. Humpfer, M. Spraul, T. Kuballa, D.W. Lachenmeier, Application of automated eightfold suppression of water and ethanol signals in 1H NMR to provide sensitivity for analyzing alcoholic beverages, Magn.Reson.Chem. 49 (2011) 734-739. [44] OIV 2009 Resolution OIV/OENO 381/2009. Bestimmung der Deuteriumverteilung im Ethanol aus alkoholischen Getränken aus Erzeugnissen der Weinrebe mittels magnetischer Kernresonanz des Deuteriums (SNIF-NMR), Zagreb, Republic of Croatia, 2009. [45] OIV 2009 Resolution OIV/OENO 381/2009. Bestimmung des Isotopenverhältnisses 13C/12C von Ethanol aus alkoholischen Getränken aus Erzeugnissen der Weinrede mittels Isotopenmassenspektrometrie, Zagreb, Republic of Croatia, 2009. [46] OIV 2009 Resolution OIV/OENO 353/2009. Methode zur Bestimmung des Isotopenverhältnisses 18O/16O von Wasser in Wein und Most, Zagreb, Republic of Croatia, 2009. [47] C.B.Y. Cordella, D. Bertrand, SAISIR: A new general chemometric toolbox, Trends Anal. Chem. 54 (2014) 75–82. [48] E. Vigneau, E.M. Qannari, Clustering of variables around latent components, Commun.Stat.-Simulat. 32 (2003) 1131-1150.
Page 26 of 40
27
Ac ce pt e
d
M
an
us
cr
ip t
[49]M. Cuny, E. Vigneau, G. Le Gall, I.J. Colquhoun, M. Lees, D.N. Rutledge, Fruit juice authentication by 1 H NMR spectroscopy in combination with different chemometrics tools, Anal.Bioanal.Chem. 390 (2008) 419-427. [50] A.V. Kraskov. http://www.ucl.ac.uk/ion/departments/sobell/Research/RLemon/MILCA/MILCA (accessed 8.08.2013). [51] E.M. Qannari, I. Wakeling, P. Courcoux, H.J.H. MacFie, Defining the underlying sensory dimensions, Food Qual.Prefer. 11 (2000) 151-154. [52] E.M. Qannari, I. Wakeling, H.J.H. MacFie, A hierarchy of models for analysis sensory data, Food Qual.Prefer. 6 (1995) 309-314. [53] J.A. Westerhuis, T. Kourti, J.F. Macgregor, Analysis of multiblock and hierarchical PCA and PLS models, J.Chemometr. 12 (1998) 301-321. [54] J.A. Westerhuis, A.K. Smilde, Deflation in multiblock PLS, J.Chemometr. 15 (2001) 485-493. [55] J.A. Westerhuis, P.M.J. Coenegracht, Multivariate modeling of the pharmaceutical 2step process of wet granulation and tableting with multiblock partial least-squares, J.Chemometr. 11 (1997) 379-392. [56] B.R. Kowalski, L.E. Wangen, A multiblock partial least squares algorithm for investigating complex chemical systems, J.Chemometr. 3 (1989) 3-20. [57] N. Dordevic, F. Camin, R.M. Marianella, G.J. Postm, L.M.C. Buydens, R. Wehrens, Detecting the addition of sugar and water to wine , Aust.J.Grape Wine Res. 19 (2013) 324-330. [58] N. Dordevic, R. Wehrens, G.J. Postma, L.M.C. Buydens, F. Camin, Statistical methods for improving verification of claims of origin for Italian wines based on stable isotope ratios, Anal.Chim.Acta 757 (2012) 19-25. [59] Y. Liu, S.D. Brown, Wavelet multiscale regression from the perspective of data fusion: new conceptual approaches, Anal.Bioanal.Chem. 380 (2004) 445-452. [60] A. Pirnau, M. Bogdan, D.A. Magdas, D. Statescu, Food Biophysics 8 (2013) 24-28. [61] J.E. Gimenez-Miralles, D.M. Salazar, I. Solona, Regional origin assignment of red wines from Valencia (Spain) by 2H NMR and 13C IRMS stable isotope analysis of fermentative ethanol, J.Agric.Food Chem. 47 (1999) 2645-2652. [62] F. Camin, G. Versini, D. Depentori, M. Simoni, A. Tonon, L. Ziller. Variation of stable isotopes in several wine constituents inside limited areas and in relation to cultivar and altitude. Proceedings AlpeAdria Symposium.Dodroipo/Udine, 8-10 November 2000. ESRA, Gorizia, 613-628. 2000. [63] M.E. Spitzke, C. Fauhl-Hassek, Determination of the 13C/12C ratios of ethanol and higher alcohols in wine by GC-C-IRMS analysis, Eur.Food Res.Technol. 231 (2010) 247-257. [64] M. Hanafi, G. Mazerolles, E. Dufour, E.M. Qannari, Common components and specific weight analysis and multiple co-inertia analysis applied to the coupling of several measurement techniques, J.Chemometr. 20 (2006) 1-12.
Page 27 of 40
28 Figure Captions
Fig.1. Evolution of the criterion ΔT with the number of clusters for the Riesling wine dataset (n=247).
ip t
Fig. 2. Scatter plot of the PCA scores of stable isotope data regarding determination of geographical origin of wines from 2009 vintage (n=111, mean-centered data) (ellipsoids show
cr
95% probability) MSR – Mosel; RHH- Rheinhessen; PFL- Pfalz; NAH – Nahe.
us
Fig.3. The map of German wine production regions.
Fig. 4. Correlation circle of PCA of stable isotope data regarding determination of
an
geographical origin of wines from 2009 vintage (Fig.2).
M
Fig. 5. Application of the Consensus PCA multiblock method to the fused NMR and stable isotope data (both matrices were mean-centered) regarding geographical origin of wine from
Ac ce pt e
PFL- Pfalz; NAH – Nahe.
d
2009 vintage (n=111) (ellipsoids show 95% probability): MSR – Mosel; RHH- Rheinhessen;
Fig. 6. Application of ComDim multiblock method to the fused NMR and stable isotope data regarding geographical origin (n=111): scatter plot of D1-D3 dimensions (ellipsoids showed 99% probability) (A) and salience values (influence) of NMR and SI values on the Common Components of the model (B).
Page 28 of 40
ip t
29
us
cr
Table 1. An overview of the sample set measured by 1H NMR/SNIF-NMR
2005
2006
2007
2008
2009
2010
Baden
24/24
25/-
25/25
28/-
18/-
149/-
Pfalz
33/33
32/32
31/31
-/-
32/32
33/33
Mosel
38/38
37/37
38/38
-/-
32/32
36/36
Franken
-/-
-/-
-/-
61/-
46/-
Bergstraße
-/-
-
-/-
-/-
5/-
5/-
Mittelrhein
4/4
5/-
5/4
-/-
5/5
3/3
Nahe
15/15
16/16
16/16
-/-
16/16
17/14
Rheingau
-
-
-/-
-/-
10/-
10/-
Rheinhessen
40/40
38/38
37/37
-/-
38/38
37/37
Saale
5/-
5/-
5/-
5/-
5/-
5/-
Sachsen
6/-
7/-
7/-
6/-
5/-
4/-
Württemberg
14/13
13/-
15/14
6/-
43/-
62/-
Ahr
2/2
4/4
4/4
-/-
4/4
-/-
Totala
181/169
45/0
274/127
407/122
M
d
-/-
Ac ce pt e
Hessische
an
Origin/vintage
182/131 183/169
Page 29 of 40
30 a
Additionally, a set of 111 wine samples of unknown origin and vintage was measured only
ip t
by 1H NMR
cr
Table 2. ANOVA results for wine parameters on the eight significant groups obtained by
Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7 Group 8 Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7
Mean squares
an
Degree of freedom
F
M
Geographical origin (2009 vintage, n=111, 4 groupsa) 0.00027 3 0 3.8 0.0346 3 0.115 16.7 0.00056 3 0.00019 3.3 0 3 0 0.2 0.01078 3 0.00359 6.2 0.0362 3 0.121 15.2 0.00031 3 0.00021 2.1 0.00079 3 0.00795 3.4 b Year of vintage (Riesling, n=247, 5 groups ) 0.07646 4 0.01912 6.8 0.19251 4 0.4813 19.6 0.21248 4 0.5312 22.2 0.22820 4 0.5705 24.3 0.08354 4 0.02089 7.5 0.07384 4 0.01846 6.6 0.02158 4 0.01542 2.1 0.03217 4 0.03482 2.3 c Red wine grape variety (n=154, 6 groups ) 0.16687 6 0.21781 9.9 0.10941 6 0.1824 6.1 0.03310 6 0.00189 1.8 0.03682 6 0.00614 1.2 0.03042 6 0.00507 1.6 0.02223 6 0.00371 1.1 0.03548 6 0.00348 0.5
Ac ce pt e
Group 1d Group 2 Group 3 Group 4 Group 5 Group 6 Group 7 Group 8
Sum of squares
d
Source
us
CLV.
Prob>F
0.0131 0 0.0057 0.0215 0.0006 0 0.0054 0.0082 0.0021 0 0 0 0.0087 0.0054 0.01534 0.05437 0 0 0.0127 0.0830 0.1624 0.3496 0.5432 Page 30 of 40
31 Group 8 0.03458 6 0.00642 0.8 0.4532 the groups were: PFL, NAH, MSR, RHH b the groups were: 2005, 2006, 2007, 2009, 2010 c the groups were: Pinot noir, Dornfelder, Lemberger, Portugieser, Trollinger, Regent d “group” means the combination of buckets (chemical shifts) in the cluster
us
cr
ip t
a
Table 3. Summary of classification results for the NMR data set with and without variable
an
selection (leave-one-out cross validation)
Variable selection method (number of variables)
LDA
Without variable selection (896)e
89 (6)d
Multiway ANOVA (713)
M
Geographical origin (2009 vintage, n=111, 4 groupsa)
FDA
ICA
83 (5)
82 (7)
88 (4)
91 (5)
79 (4)
84 (6)
90 (4)
94 (5)
85 (4)
85 (6)
90 (4)
Ac ce pt e
d
PLS-DA
CLV (780)
Year of vintage (Riesling, n=247, 5 groupsb)
Without variable selection (896)
97 (6)
97 (6)
88 (6)
Multiway ANOVA (756)
98 (5)
97 (6)
91 (5)
98 (5)
97 (6)
93 (5)
CLV (732)
95 (6) 95 (5) 96 (5)
Red wine grape variety (n=154, 6 groupsc) Without variable selection (896)
83 (8)
97 (7)
82 (6)
87 (7)
Page 31 of 40
32 Multiway ANOVA (798)
84 (7)
98 (6)
84 (6)
89 (6)
CLV (767)
87 (7)
98 (6)
85 (6)
90 (6)
a
the groups were: PFL, NAH, MSR, RHH the groups were: 2005, 2006, 2007, 2009, 2010 c the groups were: Pinot noir, Dornfelder, Lemberger, Portugieser, Trollinger, Regent d the optimal number of latent variable is given in brackets e the number of buckets in the data set are shown in brackets
an
us
cr
ip t
b
Table 4. An overview of stable isotope analysis data
Average
101.7
127.4
Standard deviation
1.1
1.6
Number of observations
718
718
R
M
(D/H)II
2.505
Ac ce pt e
18
O
13
C
-0.38
-28.5
0.039
1.32
1.0
718
718
718
d
(D/H)I
Table 5. Classification results for 1H NMR, stable isotope (SI) and fused data of wine samples (percent of correctly classified samples). The optimal number of latent variable is given in brackets Geographical origin (2009 vintage, n=111, 4 groupsa)
Page 32 of 40
33 Data set 1
H NMR with CLV e Multiclass model
SI e
LDA
PLS-DA
FDA
ICA
MB-PLSDA
94 (5)
85 (4)
84 (6)
90 (4)
-
61 (3)
60 (4)
61 (4)
70 (3) -
Sensitivity/specificity
H NMR (with CLV)
d
+SI
f
e
99 (7)
100 (6)
96 (7)
92 (4)
100
92 (7)
93 (6)
91 (7)
89 (4)
95
90/88
91/85
92/90
88/86
b
Multiclass model
98 (5)
97 (6)
93 (5)
95 (5)
62 (5)
62 (5)
62 (5)
61 (5)
Sensitivity/specificity
d
99 (6)
-
-
(with CLV)
e
91 (6)
+SI
f
99 (6)
92 (6)
95 (6)
92 (6)
88 (6)
92 (6)
95/91
89/88
94/88
d
H NMR
Ac ce pt e
1
96/90
95/60
(Hotelling t-test)
M
SI e
an
H NMR with CLV e
us
Year of vintage (Riesling, n=247, 5 groups ) 1
cr
1
ip t
95/72
(Hotelling t-test)
92/87
99 96 97/94
Red wine grape variety (n=154, 6 groupsc)
1
H NMR with CLV e
Multiclass model
SI e
83 (8)
97 (7)
82 (6)
87 (7)
35 (4)
58 (3)
35 (3)
40 (4) -
Sensitivity/specificity (Hotelling t-test)
1
-
95/30
H NMR
d
87 (7)
98 (6)
85 (6)
90 (6)
85
(with CLV)
e
71 (7)
83 (6)
79 (6)
81 (6)
75
+SI
f
75/69
84/82
81/78
85/76
78/70
a
the groups were: PFL, NAH, MSR, RHH the groups were: 2005, 2006, 2007, 2009, 2010 c the groups were: Pinot noir, Dornfelder, Lemberger, Portugieser, Trollinger, Regent d leave -one-out cross validation e test set validation (approximately one fourth of the initial data set, average of ten random splitting) b
Page 33 of 40
34 sensitivity/specificity values [%] for test set validation (average values for all groups)
ip t
f
Table 6. Confusion matrix for geographical origin of wine (2009 vintage) using LDA (the
cr
percentage of correctly classified samples are highlighted in bold)
NAH
PFL
RHH
MSR
52
33
5
10
NAH
33
53
13
0
PFL
5
14
67
14
RHH
4
14
11
71
Ac ce pt e
d
M
an
us
MSR
Page 34 of 40
Ac ce pt e
Fig.1.
d
M
an
us
cr
ip t
35
Page 35 of 40
an
us
cr
ip t
36
Ac ce pt e
d
M
Fig. 2.
Page 36 of 40
Fig.3.
Ac ce pt e
d
M
an
us
cr
ip t
37
Page 37 of 40
Fig. 4.
Ac ce pt e
d
M
an
us
cr
ip t
38
Page 38 of 40
d Ac ce pt e
Fig. 5.
M
an
us
cr
ip t
39
Page 39 of 40
40
M
Ac ce pt e
d
B
an
us
cr
ip t
A
Fig. 6.
Page 40 of 40