Synergistic effect of the simultaneous chemometric analysis of 1H NMR spectroscopic and stable isotope (SNIF-NMR, 18O, 13C) data: Application to wine analysis

Synergistic effect of the simultaneous chemometric analysis of 1H NMR spectroscopic and stable isotope (SNIF-NMR, 18O, 13C) data: Application to wine analysis

Accepted Manuscript Title: Synergistic effect of the simultaneous chemometric analysis of 1 H NMR spectroscopic and stable isotope (SNIF-NMR, 18 O, 13...

659KB Sizes 0 Downloads 70 Views

Accepted Manuscript Title: Synergistic effect of the simultaneous chemometric analysis of 1 H NMR spectroscopic and stable isotope (SNIF-NMR, 18 O, 13 C) data: Application to wine analysis Author: Yulia B. Monakhova Rolf Godelmann Armin Hermann Thomas Kuballa Claire Cannet Hartmut Sch¨afer Manfred Spraul Douglas N. Rutledge PII: DOI: Reference:

S0003-2670(14)00577-7 http://dx.doi.org/doi:10.1016/j.aca.2014.05.005 ACA 233249

To appear in:

Analytica Chimica Acta

Received date: Revised date: Accepted date:

23-1-2014 24-4-2014 2-5-2014

Please cite this article as: Yulia B.Monakhova, Rolf Godelmann, Armin Hermann, Thomas Kuballa, Claire Cannet, Hartmut Sch¨afer, Manfred Spraul, Douglas N.Rutledge, Synergistic effect of the simultaneous chemometric analysis of 1H NMR spectroscopic and stable isotope (SNIF-NMR, 18O, 13C) data: Application to wine analysis, Analytica Chimica Acta http://dx.doi.org/10.1016/j.aca.2014.05.005 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Synergistic effect of the simultaneous chemometric analysis of 1H NMR spectroscopic and stable isotope (SNIF-NMR, 18O, 13C) data: application to wine

ip t

analysis

Yulia B. Monakhovaa,b,d*, Rolf Godelmanna, Armin Hermannc, Thomas Kuballaa, Claire

a

us

cr

Cannetb, Hartmut Schäferb, Manfred Spraulb, Douglas N. Rutledgee

Chemisches und Veterinäruntersuchungsamt (CVUA) Karlsruhe, Weissenburger Strasse 3,

an

76187 Karlsruhe, Germany b

Bruker Biospin GmbH, Silbersteifen, 76287 Rheinstetten, Germany Landesuntersuchungsamt -Institut für Lebensmittelchemie und Arzneimittelprüfung, Emy-

Roeder-Straße 1, 55129 Mainz, Germany

M

c

d

e

Ac ce pt e

Russia

d

Department of Chemistry, Saratov State University, Astrakhanskaya Street 83, 410012 Saratov,

AgroParisTech, UMR 1145, Ingénierie Procédés Aliments, 16 rue Claude Bernard, F-75005 Paris

*

Corresponding author. Tel.: 0721-926-5453 Fax: 0721-926-5539. E-mail address: yul-

[email protected] (Y.B. Monakhova)

Graphical abstract

Page 1 of 40

2 Highlights



1

H NMR profilings of 718 wines were fused with stable isotope analysisdata (SNIF-

NMR, 18O, 13C) 

The best improvement was obtained for prediction of the geographical origin of wine



Certain enhancement was also obtained for the year of vintage (from 88-97% for 1H



ip t

NMR to 99% for the fused data)

Independent component analysis was used as an alternative chemometric tool for

us

cr

classification

an

Abstract

It is known that 1H NMR spectroscopy represents a good tool for predicting the grape variety,

M

the geographical origin, and the year of vintage of wine. In the present study we have shown that classification models can be improved when 1H NMR profiles are fused with stable

d

isotope (SNIF-NMR, 18O, 13C) data. Variable selection based on clustering of latent variables

Ac ce pt e

was performed on 1H NMR data. Afterwards, the combined data of 718 wine samples from Germany were analyzed using Linear Discrimination Analysis (LDA), Partial Least Squares – Discriminant Analysis (PLS-DA), Factorial Discriminant Analysis (FDA) and Independent Components Analysis (ICA). Moreover, several specialized multiblock methods (Common Components and Specific Weights Analysis (ComDim), consensus PCA and consensus PLSDA) were applied to the data.

The best improvement in comparison with 1H NMR data was obtained for prediction of the geographical origin (up to 100% for the fused data, whereas stable isotope data resulted only in 60-70% correct prediction and 1H NMR data alone in 82-89% respectively). Certain enhancement was obtained also for the year of vintage (from 88-97% for 1H NMR to 99% for the fused data), whereas in case of grape varieties improved models were not obtained.

Page 2 of 40

3 The combination of 1H NMR data with stable isotope data improves efficiency of classification models for geographical origin and vintage of wine and can be potentially used for other food products as well.

ip t

Keywords: wine authentication; German wine; nuclear magnetic resonance; stable isotope

1. Introduction

M

an

us

cr

analysis; chemometrics; data fusion

d

Despite rich information generated by modern analytical platforms, the analysis of a

Ac ce pt e

dataset obtained from a single analytical technique may be too limited to provide a holistic picture of the phenomena under study. Therefore, during recent years, hyphenated analytical techniques have been increasingly used, and multiple measurements on the same samples are carried out by many laboratories using different techniques. Therefore, the development of specialized methods for analysis of such data is of great necessity in analytical chemistry. There are not many applications of data fusion in chemical analysis so far. However,

the number of such studies is definitely growing [1-5]. For example, fusion of the data from mass spectrometry (MS) and diode-array detectors (DAD) was applied to solve coelution problems in liquid chromatography [4]. Mixtures of biocide compounds in model mixtures and in environmental samples were analyzed in this case. Data fusion of HPLC-DAD and MS was used for evaluation of ketoprofen photodegradation processes [5]. Data fusion of three analytical techniques – MS e-nose, mid-IR optical-tongue and UV-visible spectroscopy - was

Page 3 of 40

4 used to predict production factory of the same beer brand with the 95% correct classification [6]. Chromatographic techniques (LS/MS and GC/MS) well complemented 1H NMR profiling from metabolomics studies of biological fluids such as rat urine or cerebrospinal fluid [7,8]. Fusion of 1H NMR and UV-VIS data was used to determine banned dyes (Sudan III and IV)

ip t

in culinary species with the classification rates between 80% and 100% [9]. Data fusion from Raman and near infrared (FT-NIR) images of pharmaceutical formulations allowed to detect

cr

possible problems during production [10].

us

Regarding wine analysis, data fusion of NIR and mid infrared (MIR) techniques were previously used to discriminate 96 red wine, that were aged in oak barrels, in stainless steel

an

tanks with oak chips and in stainless steel tanks alone [11]. However, the information provided by these methods is based on the same physical phenomenon (vibration of the

M

bonds) and, therefore, easier to be fused. In another study, the phenolic extracts of wines from three varieties were analyzed using two-dimensional

1

H-13C heteronuclear NMR

d

spectroscopy. The resulting multiway tensor can be considered as a multiblock dataset with a

Ac ce pt e

matrix of 81 observations (samples) and 625 variables (1H dimension) measured across 413 data blocks (13C dimension). [12]. The best obtained model showed 87.6% of discrimination accuracy. To the best of our knowledge, the fusion of 1H NMR and stable isotope data has not been applied for wine analysis (and for any other food in general) so far. Nowadays, the most powerful analytical methods available for wine analysis are

isotopic analysis and 1H NMR spectroscopy. The possibilities of 1H NMR spectroscopy in non-targeted wine analysis were discussed in some publications [13-17] as well as in our ongoing project aimed at analysis of a big dataset (n=1383) of German wines (results to be published separately). The second technique, isotopic analysis, is now the official and standard method in Europe and North America for routine use in testing the authenticity of several food products and beverages, including wine [18-21]. The method is based on the measurement of stable

Page 4 of 40

5 isotope content (2H, 13C, 18O) of the product or of a specific component such as an ingredient or target molecule. The determinations are carried out using site-specific natural isotope fractionation – nuclear magnetic resonance (SNIF-NMR) and/or Isotopic Ratio Mass Spectrometry (IRMS), and these values could provide information on the botanical and

ip t

geographical origin of a food product [18-20,22]. The main idea behind these official methods is that each plant has its own characteristic range of naturally occurring stable isotopes of 13

C), hydrogen (1H, 2H), oxygen (16O,

18

O) and nitrogen (14N,15N), whose

cr

carbon (12C,

us

distribution has been influenced by a number of physical and/or biochemical properties. The main application of stable isotope analysis in wine analysis is the assessment of

an

authenticity, i.e. the determination of illegal sugar, alcohol, and water addition beside the verification of false labeling of the geographical origin or vintage. Originally, 2H/1H ratios of

M

fermentative ethanol in wine were proved to be useful for the detection of sugar addition [23]. Later, it was shown that stable isotope ratios of wines are affected by the geographical origin,

d

the year of vintage and, to a certain extent, the grape variety [19,20,22,24-29]. Therefore, we

Ac ce pt e

have hypothesized that stable isotope data can improve our chemometric models based on 1 H NMR profiling and the investigation of this possibility is the main focus of our work. Apart from classical classification methods (PCA, LDA, FDA, and PLS-DA), in this

article we have also applied independent components analysis (ICA) as an alternative chemometric tool for the same purpose. The preliminary goal of this method is to extract the pure signals from a data set of mixed signals by finding a transformation that minimizes dependences between “pure” sources (called ICs) [30,31]. This technique has been previously extensively used for multicomponent spectroscopic analysis of different matrices [32-35,3538]. On the other hand, little is known about the applicability of ICA algorithms for solving classification problems [39]. It is worth mentioning that up to now classification of wines was mainly based on the analysis of either elements and isotopes [19,20,22,25-28,40,41] or organic constituents

Page 5 of 40

6 [14,42]. To the best of our knowledge, there is no study that combines isotopic composition of wine (and any food product in general) and information about organic composition obtained

2.

Experimental

2.1.

Samples

ip t

by 1H NMR.

13

C) were measured. Authentic samples

us

samples, stable isotope values ((D/H)I, (D/H)II, 18O,

cr

In total, 1383 wines were collected and analyzed by 1H NMR. For a subset of 718

of pure grape variety wines of vintages 2005 and 2010 were taken from wine research

an

institutes in the Federal State Baden-Württemberg, Wine Research Institute Freiburg and Wine Research Institute Weinsberg. Wines, microvinified according to protocol of EU

M

regulation 2729/2000 for EU Wine Data Base, were collected from official wine research institutes in Baden-Württemberg and Rheinland-Pfalz as (EU Database Wines). All 13

d

German wine growing regions were considered: Baden BAD, Württemberg WT, Pfalz PFL,

Ac ce pt e

Rheinhessen RHH, Mosel-Saar-Ruwer MSR, Franken, Nahe NAH, Sachsen, Saale-Unstrut, Mittelrhein MRH, Rheingau, Ahr, Hessische Bergstrasse. All samples had not been blended with any other variety, other vintage or wine from other regions. The overview of the collected samples regarding vintage and geographical origin is shown in Table 1. In general, wines of the 37 grape varieties were analyzed. The main red grape varieties

were Pinot noir (116/49), Dornfelder (86/69), Lemberger (26/4), Portugieser (23/20), Trollinger (18/6), Regent (14/5), and Pinot Meunier (12/2). Regarding white wine, Riesling (342/247), Müller Thurgau (121/64), Pinot blanc (81/31), Kerner (63/45), Pinot gris (43/16), Chardonnay (16/8), and Gutedel (11/4)) grape varieties were dominant. In the brackets the number of samples measured by NMR/Stable isotopes techniques is shown. Other wines were from unknown grape varieties or the sample number was smaller than 10.

Page 6 of 40

7 2.2.

1

H NMR Experiments

1

H NMR measurements were performed under full automation for the whole process

on an AVANCE III 400 at Bruker BioSpin GmbH, Rheinstetten, Germany, equipped with a 5mm 1H/D-TXI probe-head with z-gradient, automated tuning and matching accessory and

NMR experiments can be found in previous publications [14,43]. Stable isotope analysis

cr

2.3.

ip t

BTO-2000 for temperature control. Sample preparation and all acquisition parameters of 1H

us

The (D/H) ratios were determined at the methyl (D/H)I and methylene (D/H)II sites of the ethanol molecule according to Resolution OIV/OENO 381/2009 [44]. (D/H)I mainly

an

characterizes the plant species which synthesized the sugar and to a lesser extent the geographical location of the place of harvest (type of water used during photosynthesis).

M

(D/H)II represents the climatology of the place of production of the grapes (type of rain and weather conditions) and to a lesser extent the sugar concentration of the original must.

d

R=2*(D/H)II)/(D/H)I express the relative enrichment or depletion of the methylene site, the

Ac ce pt e

methyl site being arbitrarily given the statistical weight of 3. A random distribution of deuterium within the ethyl fragment – as is the case for petrochemically synthesized ethanol would therefore be characterized by a value R=2. The value of R varies according to the biochemical pathways (C3, C4 or CAM) of the plant producing the sugar and to a smaller extend on the conditions employed by the fermentation process. The 13C/12C isotope ratio was measured by IRMS according to Resolution OIV/OENO

381/2009 [45]. The 18O/16O isotopic ratios of the water from wine were determined by IRMS according to Resolution OIV/OENO 353/2009 [46]. In brief, the values were determined by IRMS using the ions m/z 46 (12C16O18O) and m/z 44 (12C

16

O2) which were obtained after

equilibrium of the isotope exchange of water and carbon dioxide. The exchange reaction 12

C16O2 + H218O <—> 12C16O18O + H216O proceeds via the solved hydrogen carbonate and is

temperature dependant. After cryogenic separation from water and ethanol, the carbon dioxide Page 7 of 40

8 in the vapour phase was used for analysis. The

18

O/16O-isotope ratio of water can be

calculated and expressed as the relative difference δ18O ‰ versus the standard „V-SMOW“ (Vienna Standard Mean Ocean Water).

Spectral preprocessing and chemometrics 1

ip t

2.4.

H NMR spectra were preprocessed by bucketing using AMIX v.3.9.12 (Bruker

cr

BioSpin GmbH, Rheinstetten, Germany). Spectral intensities were scaled to total intensity

us

(namely, when each spectrum is set to have unit total intensity by expressing each data point as a fraction of the total spectral integral) and reduced to integrating regions of equal width

an

(0.01 ppm) within the spectral region of δ 9.5-0.5 ppm. The signals of the regions between the ethanol satellites, of water and of acetic acid were excluded from the analysis. The final

M

pretreated data were converted to ASCII files and transferred for multivariate analysis. MATLAB v. 7.0 (The Math Works, Natick, MA, USA) and SAISIR package for

d

MATLAB [47] were used for further statistical calculations. Multiway (n-way) analysis of

Ac ce pt e

variance (ANOVA) was used for testing effects of multiple factors (vintage, grape variety, origin) on whole stable isotope data set (n=718) as well as on subsets of 1H NMR data (i.e., 2009 vintage or Rieslings wines). Statistical significance was assumed at below the 0.05 probability level.

The following methods – ICA, LDA, FDA, and PLS-DA – were applied to the

concatenated 1H NMR and stable isotope data. In this study LDA and FDA were applied to the PCA scores. Furthermore, several specific methods have also been used for analysis: Common Component and Specific Weight Analysis (ComDim) [51,52], CPCA-W [53] and MB-PLS-DA [54-56] multiblock methods. For evaluation only stable isotope data (5 variables) Hotelling t-test, which uses the Mahalanobis distance was additionally utilized for comparison (values were estimated for 95% confidence interval) [57,58].

Page 8 of 40

9 The technique of cross-validation was applied to determine the optimal number of latent variables required to obtain robust models. During test set validation, cross-validation was once again applied on this reduced training set to check if the optimal number of latent variables is the same.

ip t

Concatenation is a straightforward method of data fusion but might require appropriate block scaling to prevent one block of the data being totally dominant. This is especially

18

O,

13

C – with only 5 variables). Therefore, a number of

us

isotope data ((D/H)I, (D/H)II, R,

cr

important in our case, where 1H NMR data should be fused with completely unrelated table

different methods (mean-centering, weighting, auto-scaling, inverse of the sum of squares,

an

root square scaling, log scaling and second derivatives) were tried.

For variable selection of 1H NMR data we used multiway analysis of variance

M

(ANOVA) method and clustering of latent variables (CLV) method [48,49]. In multiway ANOVA we considered a variable (bucket) to be significant, when its p-value was less than

d

0.05. The CLV method involves two stages, namely a hierarchical clustering analysis

Ac ce pt e

followed by a partitioning algorithm. Partitioning is determined by the value of a quality criterion (T) – the sum of the first eigenvalues of the data matrices of each clusters. The discriminant power of each group of latent variables is assessed by ANOVA [48,49]. An unsupervised technique, Independent Components Analysis (ICA), was also

utilized in this study [30,33,35]. In the present paper, the Mutual Information Least Dependent Component Analysis (MILCA) ICA algorithm was applied. The MILCA algorithm has MATLAB interfaces and is available for free on the internet [50]. To get the 'scores' of a set of samples using ICA we used the following formula: Scores=X * Signal * inv(Signal' * Signal), where X is the matrix with buckets of new samples and Signal is the set of the calculated IC vectors. The sample was considered to be correctly classified if its ‘scores’ were found within the 95% probability ellipsoid [39].

Page 9 of 40

10 Different confusion matrices containing information about actual and predicted group memberships made by each classification algorithm were established. Based on the data obtained, the sensitivity and specificity rates were calculated. All chemometric models were validated using leave-one-out cross validation as well

Results and discussion

cr

3.

ip t

as test set validation (approximately one-fourth of the complete data set).

us

During our preliminary investigations, PCA was performed separately on the 1H NMR spectra and stable isotope data for the whole data set trying to construct models for the

an

discrimination of grape varieties (separately for red and white wine varieties), vintage and geographical origin. This would be the most desirable case, because it would allow the

M

determination of all three main characteristics of a wine sample independently of any a priori knowledge about it. However, this first evaluation led us to the conclusion that the variations

d

within the data sets (1H NMR spectra and stable isotope data) were too important to be

Ac ce pt e

considered in one scatter plot regarding these major parameters. The reason for this is that, apart from main wine features (grape variety, vintage and geographical origin), each wine has its specific additional characteristics (for example, special conditions of grape growing or wine production and storage), which influence 1 H NMR profiling and stable isotope values. Therefore, for the sake of simplicity or transparency, we tried to bring the complex multicriterion problem down to individual problems with smaller numbers of samples. To do this, we constrained our database by setting one of the parameters (grape variety or year of vintage) to a constant. For example, we analyzed only wines from the 2009 vintage regarding geographical origin classification. For the same reason we considered separately models for the differentiation of white and red wine grape varieties. We selected subsets for chemometric analysis so that they include the biggest number of samples possible so as to provide acceptable validation (see Table 1) and also based on ANOVA calculation.

Page 10 of 40

11

3.1.

Variable selection for 1H NMR spectra of wines Before data fusion, a reasonable set of variables of the 1H NMR data set has to be

selected. After that we can regard our multivariate analysis as a higher level of fusion in

ip t

comparison with simple concatenation, which usually gives inferior results [59]. It is also a reasonable way to reduce the number of variables for 1H NMR in relation to the small number

cr

of isotope variables (five). The detailed analysis of 1H NMR data set without variable

us

selection is to be published separately.

The analysis of ΔT (change in the sum of the first eigenvalue of the data matrix of

an

each clusters) observed during hierarchical clustering showed that there are eight significant clusters of variables in the three sub data sets considered (vintage 2009, Riesling and red

M

wines, see Table 2 for details). As an example, the graph for Riesling wines showed that the increase of the ΔT criterion is significant when passing from a partitioning of nine to eight

d

groups (Fig. 1). ANOVA on the latent variables calculated by PCA on the eight groups of the

Ac ce pt e

retained partition was used to identify the significant components for discriminating wine regarding geographical origin, year of vintage and grape variety (Table 2). The variables belonging to groups 2 and 6 (geographical origin), 2-4 for year of vintage and 1-2 for red wine grape variety are highly significant for explaining the wine parameters and were subjected for further analysis.

All the significant buckets (chemical shifts) were not listed because of the relatively

large number of the resulting variables (the 896 buckets in the original data set were reduced to 713 buckets for geographical origin). However, we have found that, for example, for discrimination of the red wine varieties, the resonances of lactic acid (group 1), shikimic acid (group 2), citric acid (group 1), malic acid (group 1), acetic acid (group 2), and arginine (group 2) are the most responsible. Glycerol (group 2), succinic acid (group 6), lactic acid (group 6), proline (group 2), malic acid (group 2) and phenolic compounds (groups 2 and 6)

Page 11 of 40

12 have discriminant power for geographical origin. Differences in glucose (group 1), fumaric acid (group 2), tartaric acid (groups 3), alanine (groups 3), glycerol (groups 2), succinic acid (groups 1) contents and phenolic profile (groups 1-3) correlated to differences between vintages.

ip t

The resulting NMR data were analyzed by different multivariate methods (LDA, FDA, PLS-DA, ICA) to evaluate the ability of the selected variables to classify the samples

cr

regarding their membership. Table 3 shows the classification results (leave-one-out cross

us

validation was used) after applying multiway ANOVA and CLV methods in comparison with the initial entire set of buckets. Examining the classification results in Table 3, it is obvious

an

that the correct classification rate for the two methods has increased (for example, from 85% on average without variable selection to 86% with multiway ANOVA and 89% with CLV in

M

the case of geographical origin). It can be also noted that variable selection reduces the

complicated.

d

optimal number of factors in the classification models and, therefore, makes them less

Ac ce pt e

It is no surprise that variable selection has positive influence on multivariate models,

as only those variables associated with the underlying phenomenon that is most correlated with the wine groups are retained to perform classification [48,49]. In all three cases (geographical origin, vintage and red wine grape variety), CLV gave better results than multiway ANOVA. Therefore, the reduced 1H NMR datasets after CLV were later fused with stable isotope data.

3.2.

Analysis of stable isotope data An overview of the obtained stable isotope data for 718 investigated wines is present

in Table 4. The (D/H)I and (D/H)II values were found to be 101.7±1.1 and 127.4±1.6 respectively. The R values (linear combination of (D/H)I and (D/H)II data) were 2.505±0.039

Page 12 of 40

13 on average. The average values of

18

O and

13

C were found to be -0.38±1.32‰ and -

28.5±1.0‰ respectively (Table 4).

3.2.1. Geographical origin

ip t

One of the main applications of stable isotope analysis regarding wine authentication is the verification of the labeling of the geographical origin [20,26,28,40,58,60,61]. The

cr

biggest available study is based on 5220 Italian wine samples collected in the period of 2000-

us

2010 [57,58]. It was shown that geographical origin as well as illegal addition of water and sugar can be better determined using multivariate methods than using univariate approach,

an

which is similar to official method [57,58]. Another interesting study was conducted for Slovenian wines (n=120) of three wine growing regions, where LDA and PCA based on the 13

C,

18

O) allowed to obtain information on regional variability and

M

stable isotopes data (2H,

vintage [28]. It was found that it is possible to discriminate between wines grown in distinct

d

areas within the Piemont region in Italy based on D/H and 13C isotopic ratios [20].

Ac ce pt e

We performed multiway ANOVA (influence of three parameters on stable isotope

wine profiling). P-values for all 718 samples were found to be 0.007, 0.021 and 0.35 for geographical origin, vintage and grape variety, respectively, meaning that stable isotope data are significantly affected by origin and vintage. Therefore, the differentiation of wines according geographical origin would result in the most promising models. For the reason mentioned above, we selected only the wines from the 2009 vintage as an example for further analysis. From Fig.2 it can be seen that wines from Pfalz and Rheinhessen are clearly separated for this vintage. Two other groups, Nahe and Mosel, while being separated from the above-mentioned clusters, overlap with each other. In practice, however, the differentiation between NAH and MSR wines has more scientific than practical importance. These regions fit to a small overlapped region (Fig.3), and even when

Page 13 of 40

14 overlapping problem is present, producers would not discover this because isotopic values from databank are confidential. In this case the first two PCs explained 77.5% of total variability. Not taking into account the remaining 22.5% of variability could prevent reliable classification of new samples if

ip t

sugaring and/or watering of those wines needs to be considered additionally. Therefore, PCA was not utilized as a classification tool but only for data visualization.

cr

We also performed PCA on the example of 111 wines from 2009 vintage using only

us

these parameters (18O, (D/H)I and (D/H)II) and excluding R and 13C values from consideration. However, we did not observe any shrinkage of 95% contour line on PC plot. On the contrary,

an

the performance of PCA decreased (probably due to small number of evaluated variables). Looking at the map of German wine regions (Fig. 3), we can see that although the

M

overlapping regions (NAH and MSR) are not directly bordering, they are situated very close to each other. We analyzed the importance of each stable isotope parameter on the

d

discrimination of the geographical origin of the wines by ANOVA. We observed that

18

O,

Ac ce pt e

(D/H)I and (D/H)II have the biggest F-values and, thus, are significant parameters for the discrimination of geographical origin.

The next important factor to be considered is the harvesting time of the grapes, which

in principle could affect the statistical discrimination on the geographical origin. Instead of performing separate PCAs for early and late harvested wines (which would be difficult, because this parameter is severely influenced by geographical origin and by grape variety), we analyzed the importance of the harvesting time on the discrimination of geographical origin by ANOVA. Although harvesting time has a significant influence on the model, the other four variables - (D/H)I, (D/H)II, 13C and especially 18O have much higher F-values and thereby outweigh the much smaller effect of “harvesting time”. We observed that 18O has the biggest F-value meaning it is the most important parameter for geographical origin discrimination.

Page 14 of 40

15 Therefore, we cannot exclude any of the stable isotope parameters from our models, and consequently all of them were used for data fusion. Based on the circle of correlation (projections of the correlations of the initial variables with Principal Components) we can examine relationship between variables (Fig .4). We

other and, at the same time are not correlated with (D/H)II and R.

13

ip t

concluded that 13C, 18O and also to a smaller degree (D/H)I are positively correlated with each C and

18

O have similar

18

cultivation subregions than

O isotopic ratios more selective and efficient in differentiating

13

us

where it was shown that

cr

influence on the discrimination model contrary to the previous study of Brazilian wines,

C values from ethanol [25]. According to our experience,

18

O

an

and 13C values do not correlate strictly positively due to relatively cold weather conditions in Germany, and the correlation can be observed only for certain vintages (as in our case for

M

2009) and could be completely absent for wines from late-harvested grapes. A significant negative correlation was found between the buckets between 2.77 ppm 1

H NMR profiling and both the

d

and 2.91 ppm of the

13

C- and 18O- values. The

Ac ce pt e

corresponding 1H-signals are due to malic acid, which is degraded during regular grape maturation. At higher temperatures, this degradation is more pronounced. Elevated temperatures also lead to a stronger evaporation and thereby to higher 18O- and 13C-values. Of course, this relation holds only as long as the initial malic acid of the wine has not been removed (e.g. by malo-lacto-fermentation).

3.2.2. Year of vintage

Moreover, to check whether the different years of vintage have any influence on the isotope ratios of German wines, the data of wines from different vintages of Riesling variety were analyzed using PCA. This dataset was selected in this case because it contains the maximum number of samples (n=247) among other grape varieties. Moreover, vintage influences the 1H NMR profiling to a bigger extent than origin or variety (see below) and,

Page 15 of 40

16 thus making the application of data fusion more effective. We observed inferior but still sufficient classification in this case (PC1/PC3): the 2005 and 2009 groups can barely be differentiated. Several literature sources support our observations that stable isotopes can be

ip t

useful for vintage discrimination [19,28,60].

3.2.3. Grape variety

cr

The third important wine parameter to be considered is grape variety. In some studies

us

it was discussed that wines produced from different grape varieties showed different isotopic patterns [19,20,40,61]. Although it can be assumed that the grape variety could influence to a

an

certain extent stable isotope composition, we did not obtain any significant clustering for either red wine or white wine grape varieties using PCA. In this case, we did not test vintage

M

or geographical origin because the number of samples of each particular grape variety for each subset would have been too small for classification and validation. The probable

d

explanation for this is that harvesting time varies significantly within grape varieties [62].

Ac ce pt e

Grapes are harvested in Germany over several months (for example, from 28.08 to 25.10 for the 2007 vintage in our case). Therefore, since our database included wines from “early” and “late” harvested grape, the grape variety discrimination seems to be almost impossible. Nevertheless, we tried to discriminate four red wine grape varieties (Dornfelder,

Portugieser, Regent, and Pinot noir) using reduced data sets: wines from Pfalz (n=49) and from 2005 vintage (n=33). In both cases, multivariate analyses were not able to find significant clusterings of our groups, although the results have been improved in comparison with those obtained for the whole data set.

3.2.4. Classification results Several classification methods (LDA, FDA, and PLS-DA) as well as ICA were evaluated for predicting class membership with respect to geographical origin, red wine grape variety and

Page 16 of 40

17 vintage of wine samples from stable isotope data alone (Table 5). Table 5 also contains information about the optimal number of latent variables (LV). It should be mentioned that the number of LV for the whole data set and reduced calibration data set (without samples that were used for independent test set validation) was the same in each case. Regarding the

ip t

geographical origin of the wines from the 2009 vintage, we obtained only 60-70% correct classification rate for the four areas (PFL, NAH, MSR, and RHH). Another approach for

18

O,

13

C) were used for geographical origin discrimination instead of the optimum

us

NMR, R,

cr

assessing wine authenticity was recently evaluated [58]. In that study all five variables (SNIF-

number of latent variables as in our case (Table 5). Our 95% confidence ellipsoids (for

an

example, Fig.2) correspond to the 95% of confidence level of the chi-squared distribution in this study [57,58]. Using this approach for the data we obtained average specificity rate of

M

72%, which is a bit better than obtained by other chemometric methods (Table 5). As expected, the sensitivity was found to be equal to our predefined confidence level of 95%. It

Ac ce pt e

analysis (Table 5).

d

is clear, however, that NMR has more discrimination power than stable isotopes for wine

As it was mentioned in the section 3.2.1, PCA plot cannot be reliable tool for

evaluation of multivariate approach for predicting wine parameters (Fig.2). As alternative, to estimate the percentage of overlapped samples one can consider confusion matrix obtained by any of the multivariate method used (LDA, FDA, ICA or PLS-DA). As an example, the confusion matrix obtained by LDA for predicting geographical origin of wines showed that, although the average classification rate is 61%, the greatest uncertainty in class membership prediction between MSR and NAH regions (Table 6), which is in a good agreement with the visual results of Fig.2. For the year of vintage all chemometric methods used showed comparable efficiencies of 61-62%. As expected, the worst results were obtained for differentiation of red wine grape varieties, which ranged from 35% for FDA and LDA to 58% for PLS-DA (Table 5). Page 17 of 40

18 Specificity based on the Mahalanobis distance test [57,58] was found to be 60% for the year of vintage and 30% for red wine grape variety, which is comparable with classification rates obtained by other four approaches (LDA, PLS-DA, FDA and ICA) (Table 5). It can be seen that in all cases considered, classification based on 1H NMR data is superior to that obtained

ip t

with stable isotope data (Table 5). We expected, however, that wine authentication will greatly benefit if the discriminant powers of both techniques are combined thereby improving

3.3.

us

cr

classification rates (Table 5).

Fusion of 1H NMR and stable isotope data for wine analysis

an

The next logical step would be to try to combine complementary information of 1H NMR data after variable selection and stable isotope profiling. It can be expected that this

M

would result in better multivariate models reflecting the different sources of information. To avoid the possible problem that after variable selection significant variables (which were

d

initially non-significant) will be omitted from the combined model (which of course would

Ac ce pt e

lead to inferior differentiation), we tried to fuse initial buckets. However, the results were definitely better after variable selection.

3.3.1.

Chemometric analysis of concatenated data

The percentages of correctly classified samples obtained only by 1H NMR, stable

isotope and concatenated data using ICA, LDA, FDA and PLS-DA are summarized in Table 5. Moreover, we used a MB-PLS-DA method, which can be regarded as a multi-block extension of PLS-DA modelling [54-56]. In this study we utilized an algorithm which deflate the response block (either geographical origin or year of vintage or grape variety) Y with the super scores to built the MB-PLS-DA model [54]. The particular choices for the Riesling wine sub-set - for vintage differentiation and the vintage 2009 subset for geographical origin classification - were motivated by the results

Page 18 of 40

19 of multiway ANOVA with respect to 1H NMR data. For the 2009 vintage data set the following values were found: porigin=0.010 and pvariety=0.041, which means that origin influences more on 1H NMR spectral profiling of wine than does grape variety. The p-values for Riesling wines (porigin=0.013 and pvintage=0.008) also motivated our choice to study this

ip t

data set for vintage discrimination. For the analysis of fused data, preprocessing is very important especially in our case,

cr

where spectroscopic and discrete data are combined. The MB-PCA models are also prone to

us

be dominated by large variance variables as is PCA, but are more robust against noise as they attempt to model the common trend among the different blocks. We have found that auto-

an

scaling was not a good solution because it puts noise at the same variance level as the chemical signals. We observed the best classification models when the inverse of the sum of

total variance of each block equals 1).

M

squares of the block scaling factor was used (i.e. after applying the block scaling factor, the

d

As expected, the best improvement in comparison to separate analysis of both data sets

Ac ce pt e

was obtained for prediction of geographical origin. Using leave-one-out cross validation we have even achieved 100% correct classification for the fused data by PLS-DA, whereas stable isotope data alone resulted in just 60-70% correct prediction (multiclass model) and in 72% specificity according to Mahalanobis distance test and 1H NMR data in 84-94%. As the performance of the method can be different from year to year due to climatic impacts, we calculated the percentages of correct classification rates for prediction of geographical origin for other vintages. We have found that the concatenation with stable isotope data always increases discrimination power of NMR spectroscopy: from 94% to 95% for 2005, from 94% to 99% for 2006, from 95% to 97% for 2007, and from 96% to 99% for 2010 (LDA and leave-one-out cross validation was used for evaluation). The specificity values (average for all regions) based on Hotelling t-test based on the Mahalanobis distance for stable isotope data

Page 19 of 40

20 alone were 51% in 2005, 75% in 2006, 61% in 2007 and 63% in 2010. These values explain the biggest gain in efficiency of concatenation for the 2006 vintage. A definite improvement was obtained also for the year of vintage (from 93-98% and for 1H NMR and 61-62% for multiclass model and 60% specificity (Hotelling t-test) for stable

ip t

isotope analysis), compared to 99% for the fused data) (Table 5). In the case of grape varieties, data fusion has no benefits in comparison with 1H NMR models, probably due to the

cr

fact that stable isotope data do not contain much information about grape variety in our case

us

and only introduce additional noise in the fused model (Table 5). Some improvement of 1 H NMR models can be expected for the specific vintages or geographical origins, however,

an

additional samples need to be collected in this case.

In order to avoid over-optimistic results, test set validation has to be done with the

M

samples that were not used to build the calibration models. For this purpose, the independent test sets consisted of approximately one-fourth of randomly selected wine samples (Table 5).

d

The rest of the available data was included in the calibration data sets. To provide more

Ac ce pt e

reliable test set validation, we repeated the training/test splitting ten times. Due to the considerable number of samples at our disposal, during independent set validation we have identified the same significant factors in initial and reduced training data sets for each method. The best results were achieved by PLS-DA for geographical origin and grape variety and by ICA for the year of vintage (Table 5). We also calculated the sensitivity and specificity values for all methods (Table 5). The

sensitivity is defined as the number of true positives divided by the number of true positives + false negatives. This value outlines the probability that the test is positive, when the wine sample actually belongs to the defined group. The specificity (true negatives / (true negatives + false negatives) describes the probability that the test is negative when the wine sample indeed does not belong to the defined group. The values of specificity were generally lower than sensitivity (Table 5). Satisfactory sensitivity results (88-96% for geographical origin, 89-

Page 20 of 40

21 97% for year of vintage) were found for all methods. The specificity values were slightly inferior, and were 85-90% for geographical origin and 87-94% for the year of vintage. As expected, unacceptable sensitivity/specificity rates were obtained for red wine grape varieties (Table 5). The same approach was found useful for evaluating wine authenticity based on

ip t

stable isotope values [57,58] and in our previous 1H NMR study on rice [39]. It should be mentioned that in our study, ICA was applied separately to 1H NMR and

cr

stable isotope data sets as well as to the fused data (Table 3,5). Despite being an unsupervised

us

method, in most cases ICA showed comparable or only slightly inferior performance compared to the classical supervised classification methods used (LDA, PLS-DA, and FDA),

an

thus demonstrating its applicability for solving classification problems. Moreover, to the best of our knowledge, we are the first to apply ICA to the analysis of discrete data (stable

M

isotopes) and concatenated data from different analytical techniques. 3.3.2.

Multiblock (MB) methods - Consensus PCA and Common Component and

Ac ce pt e

wine

d

Specific Weight Analysis (ComDim) - for the determination of geographical origin of

To continue illustrating the performance of specialized methods, we tried other

multiblock methods for 111 wines from the 2009 vintage. First, the data were analyzed separately by standard PCA. The results obtained using stable isotope profiling have already been discussed (Fig.2). In this case clusters (Mosel and Nahe) overlap with each other. Using 1

H NMR data, better discrimination was observed, as expected. However, the others clusters

(Rheinhessen and Nahe) overlap and full differentiation of clusters cannot be had (results not shown). Fortunately, full discrimination can be achieved when the two data sets are taken into account, for example, by using one of the multiblock PCA methods - CPCA-W [53] (Fig. 5). It is very easy to see that all four clusters can be completely separated (Fig. 5), that MB methods are able to integrate the advantages of each analytical technique into a combined

Page 21 of 40

22 model. This result came somewhat as a surprise because all four wine production regions (RHH, PFL, MSR, and NAH) are very closely located and even border with each other (for example, PFL-RHH or NAH-RHH) (Fig.3). The same data were analyzed by another MB tool - Common Components and

ip t

Specific Weights Analysis (ComDim) [51,52,64]. The method consists in determining a common space for several blocks (two in our case), with each matrix having a specific

cr

weighting (“salience”). The best separation for four classes was found in the D1 and D3 plane

us

at 99% probability level (Fig. 6A). The saliences indicate that in the first two dimensions, stable isotope data play a more significant role, while 1H NMR is mostly responsible for the

an

third dimension, resulting in the best clustering in D1-D3 (Fig. 6B).

For the multiblock models we took the 111 samples from both late and early

M

harvesting season (harvesting time lay between 15.09.2009 and 11.11.2009), therefore, the models could be used for the entire 2009 season. However, the classification models cannot

d

be used to predict geographical origin of wines from other vintages. To verify this fact we

Ac ce pt e

have predicted geographical origin of 24 “new” wine samples from 2005, 2006, 2007 and 2010 vintages by visualizing them on consensus PCA plot. For each year two samples from each wine production area (RHH, MSL, PFL, and NAH) were selected. Of these samples, only four wines from PFL and RHH (all from the 2010 vintage) were correctly classified, being projected within the 95% confidence ellipsoids. Other samples were found outside the 95% confidence ellipsoids of the consensus PCA model. This result suggests that neither 1H NMR nor stable isotope data includes vintage-invariant information for the differentiation of geographical origin. Practically speaking, an application of the presented technique will therefore have to be based on annually updated databanks. Thus, both Consensus PCA and ComDim can be used to determine the geographical origin of wines. We did not apply these methods for the determination of year of vintage and

Page 22 of 40

23 grape variety, due to the low efficiency of data fusion in these cases (Table 5). We believe that such a combination is an important improvement for wine and food authentication.

4.

Conclusions

ip t

The combination of data from different analytical platforms is a powerful strategy to enrich the final information content and thus improve the classification results. In this study

cr

several classification strategies with concatenated data have been applied to get the benefits

us

from the synergistic effect of the information obtained from two techniques: 1H NMR and stable isotopes. 1H NMR spectroscopy of wine is becoming quite easy nowadays thanks to

an

simple sample preparation and fast automated measurements. Moreover, we have shown that good classification models regarding important wine parameters (year of vintage and

M

geographical origin) can be constructed. On the other hand, stable isotope ratio analysis has been recognized in recent years to provide a good source of chemical information for the

d

authenticity assessment of food products. Although being relatively time-consuming, these

Ac ce pt e

methods improve classification rates of 1H NMR spectroscopy for the determination of geographical origin and vintage of wine. Therefore, we believe that it is worthwhile to use data fusion of these techniques in combination with multivariate analysis for wine control in cases where high levels of certainty are required.

Acknowledgments

We express our great thank to all

institutions providing wine samples, especially E.

Annweiler and M. Metschies, Chemisches und Veterinäruntersuchungsamt Freiburg. We gratefully acknowledge S. Klein, and J. Geisser Chemisches und Veterinäruntersuchungsamt Karlsruhe for accurate sample preparation. Finally, the authors are grateful for Yun Xu for useful suggestions about multiblock analysis.

Page 23 of 40

24

References

Ac ce pt e

d

M

an

us

cr

ip t

[1] A.K. Smilde, M.J. van der Werf, S. Bijlsma, B.J.C. van der Werff-van der Vat, R.H. Jellema, Fusion of mass spectrometry-based metabolomics data, Anal.Chem.77 (2005) 6729-6736. [2] T.I. Dearing, W.J. Thompson, C.E. Rechsteiner, B.J. Marquardt, Characterization of crude oil products using data fusion of process raman, infrared, and nuclear magnetic resonance (NMR) spectra, Appl.Spectrosc. 65 (2011) 181-186. [3] Y. Xu, E. Correa, R. Goodacre, Integrating multiple analytical platforms and chemometrics for comprehensive metabolic profiling: application to meat spoilage detection, Anal.Bioanal.Chem. 405 (2013) 5063-5074. [4] E. Pere-Trepat, R. Tauler, Chemometrics modelling of organic contaminats in fish and sediment river samples, J.Chromatogr.A 1131 (2006) 85-96. [5] S. Mas, R. Tauler, A. de Juan, Chromatographic and spectroscopic data fusion analysis for interpretation of photodegradation processes, J.Chromatogr.A 1218 (2011) 92609268. [6] L. Vera, L. Acena, J. Guasch, R. Boque, M. Mestres, O. Busto, Discrimination and sensory description of beers through data fusion, Talanta 87 (2011) 136-142. [7] J. Forshed, H. Idborg, S.P. Jacobsson, Evaluation of different techniques for data fusion of LC/MS and 1H-NMR, Chem.Intell.Lab.Syst. 85 (2007) 102-109. [8] A. Smolinska, L. Blanchet, L. Coulier, K.A.M. Ampt, T. Luider, R.Q. Hintzen, S.S. Wijmenga, L.M.C. Buydens, Interpretation and visualization of non-linear data fusion in kernel space: study on metabolomic characterization of progression of multiple sclerosis, PLoS One 7 (2012) e38163 - e38163. [9] C.V. Di Anibal, M.P. Callao, I. Ruisanchez, 1H NMR and UV-visible data fusion for determining Sudan dyes in culinary spices, Talanta 84 (2011) 829-833. [10] F.C. Clarke, M.J. Jamieson, D.A. Clark, S.V. Hammond, R.D. Jee, A.C. Moffat, Chemical image fusion: the synergy of FT-NIR and Raman mapping microscopy to enable a more complete visualization of pharmaceutical formulations, Anal.Chem. 73 (2001) 2213-2220. [11] S.J. Tao, J.M. Li, J.H. Li, J.B. Tang, J.R. Mi, L.L. Zhao Discriminant analysis of red wines from different aging ways by information fusion of NIR and MIR spectra, in IFIP Advances in Information and Communication Technology, 369 (2012), 478-483. [12] J. Boccard, D.N. Rutledge, A consensus orthogonal partial least squares discriminant analysis (OPLS-DA) strategy for multiblock Omics data fusion, Anal.Chim.Acta 769 (2013) 30-39. [13] Y.S. Hong, NMR-based metabolomics in wine science, Magn.Res.Chem. 49 (2011) S13-S21. [14] R. Godelmann, F. Fang, E. Humpfer, B. Schütz, M. Bansbach, H. Schäfer, M. Spraul, Targeted and nontargeted wine analysis by (1)h NMR spectroscopy combined with multivariate statistical analysis. Differentiation of important parameters: grape variety, geographical origin, year of vintage, J.Agric.Food Chem. 61 (2013) 5610-5619. [15] M. Anastasiadi, A. Zira, P. Magiatis, S.A. Haroutounian, A.L. Skaltsounis, E. Mikros, 1H NMR-Based Metabonomics for the Classification of Greek Wines According to Variety, Region and Vintage – Comparison with HPLC Data J.Agric.Food Chem. 57 (2009) 11067-11074.

Page 24 of 40

25

[20]

[21] [22]

M

[23]

ip t

[19]

cr

[18]

us

[17]

K. Ali, F. Maltese, R. Toepfer, Y.H. Choi, R. Verpoorte, Metabolic characterization of Palatinate German white wines according to sensory attributes, varieties, and vintages using NMR spectroscopy and multivariate data analyses, J.Biomol.NMR 49 (2011) 255-266. M. Koda, K. Furihata, F. Wei, T. Miyakawa, M. Tanokura, NMR-based metabolic profiling of rice wines by F2-selective TOCSY spectra, J.Agric.Food Chem. 60 (2012) 4818-4825. H.-L. Schmidt, Food quality control and studies on human nutrition by mass spectrometric and nuclear magnetic resonance isotope ratio determination, Fresenius J. Anal. Chem. 324 (1986) 760-766. D.A. Magdas, S. Cuna, G. Cristea, R.E. Ionete, D. Costinel, Stable isotopes determination in some Romanian wines, Isot.Environ.Health Stud. 48 (2012) 345-353. C. Aghemo, A. Albertino, R. Gobetto, F. Spanna, Correlation between isotopic and meteorological parameters in Italian wines: a local-scale approach, J.Sci.Food Agric. 91 (2011) 2088-2094. M. Perini, F. Camin, δ18O of ethanol in wine and spirits for authentication purposes, J.Food Sci. 78 (2013) C839-C844. D. Costinel, A. Tudorache, R.E. Ionete, R. Vremera, The impact of grape varieties to wine isotopic characterization, Anal.Lett. 44 (2011) 2856-2864. G.J. Martin, M.L. Martin, F. Mabor, M.J. Michon, A new method for the identification of the origin of ethanols in grain and fruit spirits: high-field quantitative deuterium nuclear magnetic resonance at the natural abundance level, J.Agric.Food Chem. 31 (1983) 311-315.

an

[16]

Ac ce pt e

d

[24] A. Monetti, F. Reniero, G. Versini, Classification of Italian wines on a regional scale by means of a multi-isotopic analysis Z.Lebensm.Unters.Forsch. 199 (1994) 311-316. [25] L. Adami, S.V. Dutra, A.R. Marcon, G.J. Carnieli, C.A. Roani, R. Vanderlinde, Geographic origin of southern Brazilian wines by carbon and oxygen isotope analysis, Rapid Commun.Mass Spectrom. 24 (2010) 2943-2948. [26] I.J. Košir, M. Kocjančič, N. Orginc, J. Kidrič, Use of SNIF-NMR and IRMS in combination with chemometric methods for the determination of chaptalisation and geographical origin of wines (the example of Slovenian wines), Anal.Chim.Acta 429 (2001) 195-206. [27] F. Guyon, L. Gaillard, M.H. Salagoity, B. Medina, Intrinsic ratios of glucose, fructose, glycerol and ethanol 13C/12C isotopic ratio determined by HPLC-co-IRMS: toward determining constants for wine authentication, Anal.Bioanal.Chem. 401 (2011) 15511558. [28] N. Ogrinc, I.J. Košir, M. Kocjančič, J. Kidrič, J.Agric.Food Chem. 49 (2001) 14321440. [29] D.A. Magdas, S. Cuna, G. Cristea, R.E. Ionete, D. Costinel, Stable isotopes determination in some Romanian wines, Isot.Environ.Health Stud. 48 (2012) 345-353. [30] A. Hyvärinen, J. Karhunen, E. Oja Independent component analysis, Wiley, New York, 2001. [31] A. Cichocki, S. Amari Adaptive blind signal and image processing. Learning algorithms and applications, Wiley, New York, 2002. [32] Y.B. Monakhova, S.S. Kolesnikova, S.P. Mushtakova, Independent component analysis algorithms for spectral decomposition in UV/VIS analysis of metalcontaining mixtures including multimineral food supplements and platinum concentrates, Anal.Methods 5 (2013) 2761-2772.

Page 25 of 40

26

Ac ce pt e

d

M

an

us

cr

ip t

[33]Y.B. Monakhova, S.A. Astakhov, A.V. Kraskov, S.P. Mushtakova, Independent components in spectroscopic analysis of complex mixtures, Chem.Intell.Lab.Syst. 103 (2010) 108-115. [34] Y.B. Monakhova, S.P. Mushtakova, S.S. Kolesnikova, Chemometrics-assisted spectrophotometric method for simultaneous determination of vitamins in complex mixtures, Anal.Bioanal.Chem. 397 (2010) 1297-1306. [35] D.N. Rutledge, D. Jouan-Rimbaud Bouveresse, Independent Components Analysis with the JADE algorithm, Trends Anal.Chem. 50 (2013) 22-32. [36] I. Schelkanova, V. Toronov, Independent component analysis of broadband nearinfrared spectroscopy data acquired on adult human head, Biomed.Opt.Express. 3 (2012) 64-74. [37] M. Mecozzi, M. Pietroletti, M. Scarpiniti, R. Acquistucci, M.E. Conti, Monitoring of mucilage fromation in Italian seas investigated by infrared spectroscopy and independent component analysis, Environ.Monit.Assess. 184 (2012) 6025-6036. [38] D. Jouan-Rimbaud Bouveresse, A. Moya-González, F. Ammari, D.N. Rutledge, Two novel methods for the determination of the number of components in independent component analysis models, Chem.Intell.Lab.Syst. 112 (2012) 24-32. [39] Y.B. Monakhova, D.N. Rutledge, A. Rossmann, H. Waiblinger, M. Mahler, M. Ilse, T. Kuballa, D.W. Lachenmeier, Determination of rice type by 1H NMR spectroscopy in combination with different chemometric tools rice, 28 J Chemometr. (2013) 83-92. [40] R.D. Di Paola-Naranjo, M.V. Baroni, N.S. Podio, H.R. Rubinstein, M.P. Fabani, R.G. Badini, M. Inga, H.A. Ostera, M. Cagnoni, E. Gallegos, E. Gautier, P. Peral-Garcia, J. Hoogewerff, D.A. Wunderlin, Fingerprints for main varieties of argentinean wines: terroir differentiation by inorganic, organic, and stable isotopic analyses coupled to chemometrics, J.Agric.Food Chem. 59 (2011) 7854-7865. [41] I. Geana, A. Iordache, R. Ionete, A. Marinescu, A. Ranca, M. Culea, Geographical origin identification of Romanian wines by ICP-MS elemental analysis. Food Chemistry 138 (2013) 1125-1134. [42] C.J. Bevin, R.G. Dambergs, A.J. Fergusson, D. Cozzolino, Varietal discrimination of Australian wines by means of mid-infrared spectroscopy and multivariate analysis, Anal.Chim.Acta 621 (2008) 19-23. [43] Y.B. Monakhova, H. Schäfer, E. Humpfer, M. Spraul, T. Kuballa, D.W. Lachenmeier, Application of automated eightfold suppression of water and ethanol signals in 1H NMR to provide sensitivity for analyzing alcoholic beverages, Magn.Reson.Chem. 49 (2011) 734-739. [44] OIV 2009 Resolution OIV/OENO 381/2009. Bestimmung der Deuteriumverteilung im Ethanol aus alkoholischen Getränken aus Erzeugnissen der Weinrebe mittels magnetischer Kernresonanz des Deuteriums (SNIF-NMR), Zagreb, Republic of Croatia, 2009. [45] OIV 2009 Resolution OIV/OENO 381/2009. Bestimmung des Isotopenverhältnisses 13C/12C von Ethanol aus alkoholischen Getränken aus Erzeugnissen der Weinrede mittels Isotopenmassenspektrometrie, Zagreb, Republic of Croatia, 2009. [46] OIV 2009 Resolution OIV/OENO 353/2009. Methode zur Bestimmung des Isotopenverhältnisses 18O/16O von Wasser in Wein und Most, Zagreb, Republic of Croatia, 2009. [47] C.B.Y. Cordella, D. Bertrand, SAISIR: A new general chemometric toolbox, Trends Anal. Chem. 54 (2014) 75–82. [48] E. Vigneau, E.M. Qannari, Clustering of variables around latent components, Commun.Stat.-Simulat. 32 (2003) 1131-1150.

Page 26 of 40

27

Ac ce pt e

d

M

an

us

cr

ip t

[49]M. Cuny, E. Vigneau, G. Le Gall, I.J. Colquhoun, M. Lees, D.N. Rutledge, Fruit juice authentication by 1 H NMR spectroscopy in combination with different chemometrics tools, Anal.Bioanal.Chem. 390 (2008) 419-427. [50] A.V. Kraskov. http://www.ucl.ac.uk/ion/departments/sobell/Research/RLemon/MILCA/MILCA (accessed 8.08.2013). [51] E.M. Qannari, I. Wakeling, P. Courcoux, H.J.H. MacFie, Defining the underlying sensory dimensions, Food Qual.Prefer. 11 (2000) 151-154. [52] E.M. Qannari, I. Wakeling, H.J.H. MacFie, A hierarchy of models for analysis sensory data, Food Qual.Prefer. 6 (1995) 309-314. [53] J.A. Westerhuis, T. Kourti, J.F. Macgregor, Analysis of multiblock and hierarchical PCA and PLS models, J.Chemometr. 12 (1998) 301-321. [54] J.A. Westerhuis, A.K. Smilde, Deflation in multiblock PLS, J.Chemometr. 15 (2001) 485-493. [55] J.A. Westerhuis, P.M.J. Coenegracht, Multivariate modeling of the pharmaceutical 2step process of wet granulation and tableting with multiblock partial least-squares, J.Chemometr. 11 (1997) 379-392. [56] B.R. Kowalski, L.E. Wangen, A multiblock partial least squares algorithm for investigating complex chemical systems, J.Chemometr. 3 (1989) 3-20. [57] N. Dordevic, F. Camin, R.M. Marianella, G.J. Postm, L.M.C. Buydens, R. Wehrens, Detecting the addition of sugar and water to wine , Aust.J.Grape Wine Res. 19 (2013) 324-330. [58] N. Dordevic, R. Wehrens, G.J. Postma, L.M.C. Buydens, F. Camin, Statistical methods for improving verification of claims of origin for Italian wines based on stable isotope ratios, Anal.Chim.Acta 757 (2012) 19-25. [59] Y. Liu, S.D. Brown, Wavelet multiscale regression from the perspective of data fusion: new conceptual approaches, Anal.Bioanal.Chem. 380 (2004) 445-452. [60] A. Pirnau, M. Bogdan, D.A. Magdas, D. Statescu, Food Biophysics 8 (2013) 24-28. [61] J.E. Gimenez-Miralles, D.M. Salazar, I. Solona, Regional origin assignment of red wines from Valencia (Spain) by 2H NMR and 13C IRMS stable isotope analysis of fermentative ethanol, J.Agric.Food Chem. 47 (1999) 2645-2652. [62] F. Camin, G. Versini, D. Depentori, M. Simoni, A. Tonon, L. Ziller. Variation of stable isotopes in several wine constituents inside limited areas and in relation to cultivar and altitude. Proceedings AlpeAdria Symposium.Dodroipo/Udine, 8-10 November 2000. ESRA, Gorizia, 613-628. 2000. [63] M.E. Spitzke, C. Fauhl-Hassek, Determination of the 13C/12C ratios of ethanol and higher alcohols in wine by GC-C-IRMS analysis, Eur.Food Res.Technol. 231 (2010) 247-257. [64] M. Hanafi, G. Mazerolles, E. Dufour, E.M. Qannari, Common components and specific weight analysis and multiple co-inertia analysis applied to the coupling of several measurement techniques, J.Chemometr. 20 (2006) 1-12.

Page 27 of 40

28 Figure Captions

Fig.1. Evolution of the criterion ΔT with the number of clusters for the Riesling wine dataset (n=247).

ip t

Fig. 2. Scatter plot of the PCA scores of stable isotope data regarding determination of geographical origin of wines from 2009 vintage (n=111, mean-centered data) (ellipsoids show

cr

95% probability) MSR – Mosel; RHH- Rheinhessen; PFL- Pfalz; NAH – Nahe.

us

Fig.3. The map of German wine production regions.

Fig. 4. Correlation circle of PCA of stable isotope data regarding determination of

an

geographical origin of wines from 2009 vintage (Fig.2).

M

Fig. 5. Application of the Consensus PCA multiblock method to the fused NMR and stable isotope data (both matrices were mean-centered) regarding geographical origin of wine from

Ac ce pt e

PFL- Pfalz; NAH – Nahe.

d

2009 vintage (n=111) (ellipsoids show 95% probability): MSR – Mosel; RHH- Rheinhessen;

Fig. 6. Application of ComDim multiblock method to the fused NMR and stable isotope data regarding geographical origin (n=111): scatter plot of D1-D3 dimensions (ellipsoids showed 99% probability) (A) and salience values (influence) of NMR and SI values on the Common Components of the model (B).

Page 28 of 40

ip t

29

us

cr

Table 1. An overview of the sample set measured by 1H NMR/SNIF-NMR

2005

2006

2007

2008

2009

2010

Baden

24/24

25/-

25/25

28/-

18/-

149/-

Pfalz

33/33

32/32

31/31

-/-

32/32

33/33

Mosel

38/38

37/37

38/38

-/-

32/32

36/36

Franken

-/-

-/-

-/-

61/-

46/-

Bergstraße

-/-

-

-/-

-/-

5/-

5/-

Mittelrhein

4/4

5/-

5/4

-/-

5/5

3/3

Nahe

15/15

16/16

16/16

-/-

16/16

17/14

Rheingau

-

-

-/-

-/-

10/-

10/-

Rheinhessen

40/40

38/38

37/37

-/-

38/38

37/37

Saale

5/-

5/-

5/-

5/-

5/-

5/-

Sachsen

6/-

7/-

7/-

6/-

5/-

4/-

Württemberg

14/13

13/-

15/14

6/-

43/-

62/-

Ahr

2/2

4/4

4/4

-/-

4/4

-/-

Totala

181/169

45/0

274/127

407/122

M

d

-/-

Ac ce pt e

Hessische

an

Origin/vintage

182/131 183/169

Page 29 of 40

30 a

Additionally, a set of 111 wine samples of unknown origin and vintage was measured only

ip t

by 1H NMR

cr

Table 2. ANOVA results for wine parameters on the eight significant groups obtained by

Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7 Group 8 Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7

Mean squares

an

Degree of freedom

F

M

Geographical origin (2009 vintage, n=111, 4 groupsa) 0.00027 3 0 3.8 0.0346 3 0.115 16.7 0.00056 3 0.00019 3.3 0 3 0 0.2 0.01078 3 0.00359 6.2 0.0362 3 0.121 15.2 0.00031 3 0.00021 2.1 0.00079 3 0.00795 3.4 b Year of vintage (Riesling, n=247, 5 groups ) 0.07646 4 0.01912 6.8 0.19251 4 0.4813 19.6 0.21248 4 0.5312 22.2 0.22820 4 0.5705 24.3 0.08354 4 0.02089 7.5 0.07384 4 0.01846 6.6 0.02158 4 0.01542 2.1 0.03217 4 0.03482 2.3 c Red wine grape variety (n=154, 6 groups ) 0.16687 6 0.21781 9.9 0.10941 6 0.1824 6.1 0.03310 6 0.00189 1.8 0.03682 6 0.00614 1.2 0.03042 6 0.00507 1.6 0.02223 6 0.00371 1.1 0.03548 6 0.00348 0.5

Ac ce pt e

Group 1d Group 2 Group 3 Group 4 Group 5 Group 6 Group 7 Group 8

Sum of squares

d

Source

us

CLV.

Prob>F

0.0131 0 0.0057 0.0215 0.0006 0 0.0054 0.0082 0.0021 0 0 0 0.0087 0.0054 0.01534 0.05437 0 0 0.0127 0.0830 0.1624 0.3496 0.5432 Page 30 of 40

31 Group 8 0.03458 6 0.00642 0.8 0.4532 the groups were: PFL, NAH, MSR, RHH b the groups were: 2005, 2006, 2007, 2009, 2010 c the groups were: Pinot noir, Dornfelder, Lemberger, Portugieser, Trollinger, Regent d “group” means the combination of buckets (chemical shifts) in the cluster

us

cr

ip t

a

Table 3. Summary of classification results for the NMR data set with and without variable

an

selection (leave-one-out cross validation)

Variable selection method (number of variables)

LDA

Without variable selection (896)e

89 (6)d

Multiway ANOVA (713)

M

Geographical origin (2009 vintage, n=111, 4 groupsa)

FDA

ICA

83 (5)

82 (7)

88 (4)

91 (5)

79 (4)

84 (6)

90 (4)

94 (5)

85 (4)

85 (6)

90 (4)

Ac ce pt e

d

PLS-DA

CLV (780)

Year of vintage (Riesling, n=247, 5 groupsb)

Without variable selection (896)

97 (6)

97 (6)

88 (6)

Multiway ANOVA (756)

98 (5)

97 (6)

91 (5)

98 (5)

97 (6)

93 (5)

CLV (732)

95 (6) 95 (5) 96 (5)

Red wine grape variety (n=154, 6 groupsc) Without variable selection (896)

83 (8)

97 (7)

82 (6)

87 (7)

Page 31 of 40

32 Multiway ANOVA (798)

84 (7)

98 (6)

84 (6)

89 (6)

CLV (767)

87 (7)

98 (6)

85 (6)

90 (6)

a

the groups were: PFL, NAH, MSR, RHH the groups were: 2005, 2006, 2007, 2009, 2010 c the groups were: Pinot noir, Dornfelder, Lemberger, Portugieser, Trollinger, Regent d the optimal number of latent variable is given in brackets e the number of buckets in the data set are shown in brackets

an

us

cr

ip t

b

Table 4. An overview of stable isotope analysis data

Average

101.7

127.4

Standard deviation

1.1

1.6

Number of observations

718

718

R

M

(D/H)II

2.505

Ac ce pt e

18

O

13

C

-0.38

-28.5

0.039

1.32

1.0

718

718

718

d

(D/H)I

Table 5. Classification results for 1H NMR, stable isotope (SI) and fused data of wine samples (percent of correctly classified samples). The optimal number of latent variable is given in brackets Geographical origin (2009 vintage, n=111, 4 groupsa)

Page 32 of 40

33 Data set 1

H NMR with CLV e Multiclass model

SI e

LDA

PLS-DA

FDA

ICA

MB-PLSDA

94 (5)

85 (4)

84 (6)

90 (4)

-

61 (3)

60 (4)

61 (4)

70 (3) -

Sensitivity/specificity

H NMR (with CLV)

d

+SI

f

e

99 (7)

100 (6)

96 (7)

92 (4)

100

92 (7)

93 (6)

91 (7)

89 (4)

95

90/88

91/85

92/90

88/86

b

Multiclass model

98 (5)

97 (6)

93 (5)

95 (5)

62 (5)

62 (5)

62 (5)

61 (5)

Sensitivity/specificity

d

99 (6)

-

-

(with CLV)

e

91 (6)

+SI

f

99 (6)

92 (6)

95 (6)

92 (6)

88 (6)

92 (6)

95/91

89/88

94/88

d

H NMR

Ac ce pt e

1

96/90

95/60

(Hotelling t-test)

M

SI e

an

H NMR with CLV e

us

Year of vintage (Riesling, n=247, 5 groups ) 1

cr

1

ip t

95/72

(Hotelling t-test)

92/87

99 96 97/94

Red wine grape variety (n=154, 6 groupsc)

1

H NMR with CLV e

Multiclass model

SI e

83 (8)

97 (7)

82 (6)

87 (7)

35 (4)

58 (3)

35 (3)

40 (4) -

Sensitivity/specificity (Hotelling t-test)

1

-

95/30

H NMR

d

87 (7)

98 (6)

85 (6)

90 (6)

85

(with CLV)

e

71 (7)

83 (6)

79 (6)

81 (6)

75

+SI

f

75/69

84/82

81/78

85/76

78/70

a

the groups were: PFL, NAH, MSR, RHH the groups were: 2005, 2006, 2007, 2009, 2010 c the groups were: Pinot noir, Dornfelder, Lemberger, Portugieser, Trollinger, Regent d leave -one-out cross validation e test set validation (approximately one fourth of the initial data set, average of ten random splitting) b

Page 33 of 40

34 sensitivity/specificity values [%] for test set validation (average values for all groups)

ip t

f

Table 6. Confusion matrix for geographical origin of wine (2009 vintage) using LDA (the

cr

percentage of correctly classified samples are highlighted in bold)

NAH

PFL

RHH

MSR

52

33

5

10

NAH

33

53

13

0

PFL

5

14

67

14

RHH

4

14

11

71

Ac ce pt e

d

M

an

us

MSR

Page 34 of 40

Ac ce pt e

Fig.1.

d

M

an

us

cr

ip t

35

Page 35 of 40

an

us

cr

ip t

36

Ac ce pt e

d

M

Fig. 2.

Page 36 of 40

Fig.3.

Ac ce pt e

d

M

an

us

cr

ip t

37

Page 37 of 40

Fig. 4.

Ac ce pt e

d

M

an

us

cr

ip t

38

Page 38 of 40

d Ac ce pt e

Fig. 5.

M

an

us

cr

ip t

39

Page 39 of 40

40

M

Ac ce pt e

d

B

an

us

cr

ip t

A

Fig. 6.

Page 40 of 40