Accepted Manuscript Toward Automated Chromatographic Fingerprinting: A Non-Alignment Approach to Gas Chromatography Mass Spectrometry Data Jochen Vestner, Gilles de Revel, Sibylle Krieger-Weber, Doris Rauhut, Maret du Toit, André de Villiers PII:
S0003-2670(16)30090-3
DOI:
10.1016/j.aca.2016.01.020
Reference:
ACA 234360
To appear in:
Analytica Chimica Acta
Received Date: 27 October 2015 Revised Date:
14 January 2016
Accepted Date: 19 January 2016
Please cite this article as: J. Vestner, G. de Revel, S. Krieger-Weber, D. Rauhut, M. du Toit, A. de Villiers, Toward Automated Chromatographic Fingerprinting: A Non-Alignment Approach to Gas Chromatography Mass Spectrometry Data, Analytica Chimica Acta (2016), doi: 10.1016/ j.aca.2016.01.020. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
ts
men
PA RA FA C
... 4 5 3 1 2
SC
samples
RI PT
seg r of e b num
ta
Student Version of MATLAB
tio n
Student Version of MATLAB
Student Version of MATLAB
Student Version of MATLAB
Student Version of MATLAB
Student Version of MATLAB
Student Version of MATLAB
s ss
a
m
c
pe
samples
5 ... 4 2 3 1 segmented retention profile
AC C
EP
Student Version of MATLAB
TE D
Student Version of MATLAB
en
gm
se
M AN U
samples
1
5 4 2 3 segmented retention profile
...
transformations
XXT
...
ACCEPTED MANUSCRIPT
RI PT
Toward Automated Chromatographic Fingerprinting: A Non-Alignment Approach to Gas Chromatography Mass Spectrometry Data Jochen Vestnera,b,c,∗, Gilles de Revela,b , Sibylle Krieger-Weberd , Doris Rauhutc , Maret du Toite , Andr´e de Villiersf a Universit´ e
M AN U
SC
de Bordeaux, ISVV, EA 4577, Unit´ e de recherche Œnologie, 33882 Villenave d’Ornon, France. b INRA, ISVV, USC 1366 Œnologie, 33882 Villenave d’Ornon, France. c Department of Microbiology and Biochemistry, Hochschule Geisenheim University, Von-Lade-Straße 1, 65366 Geisenheim, Germany. d Lallemand, In den Seiten 53, 70825 Korntal-M¨ unchingen, Germany. e Institute of Wine Biotechnology, Department of Viticulture and Oenology, Stellenbosch University, Private Bag X1, Matieland (Stellenbosch) 7602, South Africa. f Department of Chemistry and Polymer Science, Stellenbosch University, Private Bag X1, Matieland (Stellenbosch) 7602, South Africa.
Abstract
In contrast to targeted analysis of volatile compounds, non-targeted ap-
TE D
proaches take information of known and unknown compounds into account, are inherently more comprehensive and give a more holistic representation of the sample composition. Although several non-targeted approaches have been developed, there’s still a demand for automated data processing tools, especially
EP
for complex multi-way data such as chromatographic data obtained from multichannel detectors. This work was therefore aimed at developing a data processing procedure for gas chromatography mass spectrometry (GC-MS) data
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
obtained from non-targeted analysis of volatile compounds. The developed approach uses basic matrix manipulation of segmented GC-MS chromatograms and PARAFAC multi-way modelling. The approach takes retention time shifts and peak shape deformations between samples into account and can be done ∗ Corresponding
author Tel.: +49 6722 502 346; fax +49 6722 502 330 347. Email address:
[email protected] (Jochen Vestner)
Preprint submitted to Analytica Chimica Acta
January 14, 2016
ACCEPTED MANUSCRIPT
with the freely available N-way toolbox for MATLAB. A demonstration of the
RI PT
new fingerprinting approach is presented using an artificial GC-MS data set and an experimental full-scan GC-MS data set obtained for a set of experimental wines. Keywords:
non-targeted analysis, gas chromatography, fingerprinting, multi-way analysis,
SC
metabolomics, non-alignment
M AN U
1. Introduction
Non-targeted analysis has increasingly gained importance in numerous domains of analytical chemistry such as life science, food science and especially the ‘-omics’ related sciences. In contrast to conventional targeted analysis, non5
targeted analysis aims to gather qualitative and quantitative information on as many compounds as possible in the analysed samples in a short period of time,
TE D
and thus to provide the researcher with a more holistic view of the composition of samples [1]. Holistic strategies benefit from the vast amount of information obtained from modern analytical instrumentation. However, the main chal10
lenges are data handling and full exploitation of dimensionality of the acquired
EP
data.
The data generated by hyphenated chromatographic techniques such as GCMS or LC-MS are especially information rich. Feature extraction such as peak integration in single ion chromatograms, total ion chromatograms or deconvo-
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
15
luted signals are the most common approaches to extract information from chromatographic data and result in relatively small data tables which are straightforward to analyse [2, 3, 4, 5, 6, 7, 8]. Although various peak integration algorithms and software packages have been developed [9, 10, 11, 12], automated peak integration remains troublesome due to coelution and potential erroneous
20
peak integration and/or assignment. Time consuming manual correction of the 2
ACCEPTED MANUSCRIPT
results is often necessary. Moreover, relevant information from the raw data
RI PT
can be lost due to such feature extraction before modelling [13, 14]. Deconvoluting chromatographic signals can also be time-consuming in terms of model construction and evaluation of results [15, 16, 2, 17]. 25
An alternative, more comprehensive approach aiming at the extraction of
more information and underlying patterns in the data involves the usage of the
SC
two dimensional raw data signal of each sample in entirety as a chromatographic fingerprint for modelling. Examples for holistic non-targeted analyses can be
30
M AN U
found in numerous reports [14, 18, 19, 20, 21, 22, 23, 24, 25], some of which also include the application of multi-way analysis methods such as TUCKER3, PARAFAC and N-PLS to hyphenated chromatographic data. When factor models are used on chromatographic data, challenges are associated with the increased size of data and the handling of shifts and peak shape deformation, which result in distortion of the bilinear/trilinear structure of the data. Several algorithms and software programmes have been developed for peak alignment
TE D
35
[26, 27, 28, 29, 30]. Depending on the data, shift correction can, however, be difficult and time-consuming.
The above described problems of conventional data analysis approaches to
40
EP
non-targeted GC-MS analysis, in particular challenges with automated peak integration and retention time alignment of chromatograms, were the main motivation for the development of an alternative data analysis approach. The major consideration to overcome the peak integration issue was the direct modelling of
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
the chromatographic raw data (without feature selection), including a reduction of the data. The main idea to master the distortion of bilinear/trilinear struc-
45
ture of the data due to shifting peaks was the consideration of a mathematical transformation of pieces (segments) of the chromatograms using SSCP matrices. SSCP matrices are positive, squared and symmetric, similar to variance-
3
ACCEPTED MANUSCRIPT
covariance matrix [31], which are utilised for instance in PARAFAC2, STATIS
50
RI PT
and the calculation of RV -coefficients [32, 19, 33, 34, 35]. Particularly the indirect fitting algorithm for PARAFAC2 [36] served as major inspiration for the development of the new approach. Moreover, for the sake of simplicity another aim was to use a single model for the entire set of chromatograms of all samples
to find systematic differences among samples and to identify important regions
55
SC
of the chromatograms which, if desired, can be further deconvoluted and investigated using e.g. PARAFAC2. A method using multiple PARAFAC2 models on
M AN U
segmented chromatograms has been reported recently [37]. This approach gives very detailed information on fully decomposed mass spectra and peak profiles, which are finally summarized using PCA. The here described new approach can be considered as a ‘segment pre-selection tool’ for subsequent deconvolution of 60
only important chromatogram segments. By this means a significant amount of time used for the construction and evaluation of PARAFAC2 models can so be
TE D
saved.
This paper gives an overview on the algorithm of the new data analysis approach, including the theoretical background such as the calculation of SSCP 65
matrices and all other mathematical transformations used. The approach is
EP
explained and tested on an artificial, well defined GC-MS data set with and without peak shifts. After the theoretical discussion, the approach is tested on a real GC-MS dataset of experimental wines and results are confirmed using a reference method for data analysis approaches including PARAFAC2 deconvolu-
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
70
tion and peak integration of deconvoluted peak profiles of the entire segmented chromatograms with subsequent PCA on the obtained peak table.
4
ACCEPTED MANUSCRIPT
2.1. Defined, artificial GC-MS data set
RI PT
2. Theory
To demonstrate and verify the developed algorithm an defined, artificial GC75
MS data set was created using an in-house developed MATLAB script. The data set consists of 20 chromatograms, each containing 9 to 10 gaussian peaks with
SC
different mass spectra (mz 35 to mz 318) and different degrees of overlapping.
The whole chromatogram can be divided into five segments. Segment one contains two peaks which perfectly overlap. Peaks three and four partially coelute in segment two, which is also the case for the peaks five, six and seven in seg-
M AN U
80
ment three. Peak eight is in segment four and the last segment contains the last two peaks nine and ten, which also partially coelute (Figure 14 in Supporting Information). Peak sizes vary among chromatograms as indicated in Table 1, consequently samples can be divided into four groups. Moreover, a small ran85
dom variation was added to all peak sizes to simulate a natural deviation of
TE D
measurements. To simulate baseline noise a random normal distributed noise was added to the whole data set. Each chromatogram can be considered as a matrix of dimensions 1100 scans × 283 masses, thus the entire data set can be considered as a three-way array (i × j × k ), with the dimensions 20 samples × 1100 scans × 283 masses.
EP
90
segment
peak no.
size difference
sample no.
1 2 5
2 4 9
only present in 0.7× higher in 3× higher in
14 & 15 1 to 5 1 to 10
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Table 1: Differing peaks among samples in the defined, artificial GC-MS data set.
5
ACCEPTED MANUSCRIPT
2.2. A new non-alignment approach to non-targeted GC-MS data: Mathematical
RI PT
transformations of raw chromatograms Using basic matrix algebra a SSCP matrix XX T is obtained by multiplication of a matrix X with its transpose, as displayed in Equation 1.
95
j=1
···
PC
.. .
··· .. .
PC
xRj x2j
···
x1j x2j
PC
2 j=1 x2j
PC
j=1
j=1
x1j xRj
j=1 x2j xRj , .. . PC 2 x Rj j=1
SC
PC
(1)
M AN U
P C 2 j=1 x1j PC j=1 x2j x1j XX T = .. . PC j=1 xRj x1j
where X is a R × C-matrix of elements xij , i = 1, . . . , R, j = 1, . . . , C. The matrix product XX T is the R×R matrix of Sums of Squares and Cross Products (SSCP matrix).
TE D
100
In Detail, the diagonal of XX T includes the sums of squares with respect to a PC given row i of X, namely j=1 x2ij . Moreover, all off-diagonal elements represent PC cross products between two different rows i, k of X, in particular j=1 xij xkj for i 6= k. Consequently, the sums of squares are a measure of variation within a row, whereas the cross products are a measure of covariation between two rows. Note the similarity to the variance-covariance matrix: diagonal elements
105
EP
of the variance-covariance matrix are variances and all off-diagonal elements are covariances. The terms variation and variance as well as covariation and covariance can for the sake of simplicity be replaced in the following (although not strictly mathematically true).
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
PARAFAC2 is a powerful tool for the deconvolution of small chromatogram
segments [35, 37, 38, 39].The approach presented here is mainly inspired by the
110
idea of the indirect fitting algorithm of the PARAFAC2 model, which instead of modelling an array consisting of the matrices X i (spectral profile × elution profile for I samples) directly considers a model of an array consisting of the
6
ACCEPTED MANUSCRIPT
SSCP matrices X i (X i )T [40, 36]. In this manner, PARAFAC2 is suitable for
115
RI PT
deconvoluting chromatographic peaks with shift along the retention axis among samples. A disadvantage of PARAFAC2 is that for each segment of the chromatogram a single model has to be constructed and evaluated.
The utilisation of SSCP matrices as a preprocessing step for multivariate modelling of whole chromatograms has also been reported before [19, 33]. If
120
SC
entire two dimensional chromatograms are used for the construction of SSCP
matrices, information on the retention time of compounds is lost, complicating
by multivariate modelling.
M AN U
the identification of peaks contributing to the differentiation between samples
However, by dividing all chromatograms along the retention axis into segments containing a small number of peaks and subsequent construction of SSCP 125
matrices for each segment, information on the location of peaks in the chromatogram contributing to the differentiation of samples can be preserved. The
TE D
SSCP matrices for each segment and each sample have dimensions number of mass channels × number of mass channels and contain information on the variation of each mass channel and covariation between all mass channels in each 130
segment for the corresponding sample. For each segment the constructed SSCP
EP
matrices of all samples are vectorized and compiled into a new matrix. This step results in a compilation matrix for each segment with the dimensions number of samples × [(number of mass channels + 1) · number of mass channels / 2]. These compilation matrices are then also transformed into SSCP matrices
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
135
with the dimensions of number of samples × number of samples, which contain information about the variation of the content of the compilation matrix for each sample and the covariation of the content of the compilation matrix between all samples in each segment. These SSCP matrices are finally compiled in a three-way array with the dimension (number of samples × number of samples)
7
ACCEPTED MANUSCRIPT
140
× number of segments.
RI PT
The whole procedure is summarized in matrix notation in the following. Each two dimensional chromatogram (sample) is characterized by M mass chanPK nels and N scan points. N is divided into K segments, that is N = k=1 Nk ,
where Nk describes the number of scans in the k-th segment. In particular, we have altogether I samples. First, we define an I × K-matrix X by X = (X ik ) i=1,...,I
··· .. .
X1K .. . , XIK
M AN U
k=1,...,K
X11 . . = . XI1
SC
145
···
(2)
where X ik is a M × NK -matrix containing the data of the i-th sample and k-th segment, that is
X ik = (xik mn )m=1,...,M
ik x11 . . = . xik M1
TE D
n=1,...,Nk
··· .. .
xik 1Nk .. .
···
xik M Nk
.
(3)
The SSCP matrix Aik = X ik (X ik )T containing information on the variation and covariation between all mass channels of the i-th sample and k-th segment is defined by
EP
150
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Aik = (aik rt )r,t=1,...,M with aik rt =
Nk X
(4)
ik xik rs xst
∀r, t = 1, . . . , M
(5)
s=1
and dim(Aik ) = M × M,
for all i = 1, . . . , I and k = 1, . . . , K.
8
(6)
ACCEPTED MANUSCRIPT
Subsequently only the upper triangular part of the symetric SSCP matrix
RI PT
Aik is vectorised (unfolded) and concatenated into a new matrix Y k . The vectorisation vec(Aik ) of the upper triangular of Aik is defined by1
ik vec(Aik ) = α1ik_ α2ik_ · · ·_ αM ,
(7)
where
SC
155
ik ik αlik = (aik l,l , al,(l+1) , . . . , al,M )
∀l = 1, . . . , M,
M AN U
for all i = 1, . . . , I and k = 1, . . . , K.
Consequently, the vectorisation vec(Aik ) has J =
PM
l=1
l=
M (M +1) 2
(8)
compo-
nents. The I × J-matrix Y k is constructed by the above row vectors vec(A1k ), . . . , vec(AIk ) as follows:
1k
vec(A ) .. , Y = . vec(AIk )
160
(9)
TE D
k
for all k = 1, . . . , K.
In the end, we form SSCP matrices Z k = Y k (Y k )T , which contain informa-
EP
tion on the variation and covariation between all samples in the k-th segment with regard to the variation and the covariation between all mass channels of the i-th sample and k-th segment,
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
1 The concatenation defined as:
_
of two arbitrary row vectors x = (x1 , . . . , xn ), y = (y1 , . . . , yn ) is x_ y = (x1 , . . . , xn , y1 , . . . , yn ).
9
k Z k = (Zrs )r,s=1,...,I
(10)
k with Zrs = vec(Ark ) · (vec(Ask ))T
(11)
for all k = 1, . . . , K. Finally, the matrices Z k are rearranged into the (I ×I)×Karray Z: Z = Z1
Z
.
(12)
M AN U
···
K
SC
165
∀r, s = 1, . . . , I,
Prior to multi-way analysis the three-way array Z is mean centered across the first and second mode and scaled to unit variance within the third mode. The term mode refers here to the dimension of the array.
170
2.3. PCA, TUCKER3 and PARAFAC
TE D
Principal component analysis (PCA) is a bilinear multivariate model searching for common patterns in a two dimensional data set. PCA can be understood as a projection method to find directions (components) that maximize the variance in a dataset. These directions, the loadings or latent variables, are constructed as linear combinations of the original variables. The projec-
EP
175
tions of each sample onto these directions are the score values. PARAFAC and TUCKER3, which can be understood as extension of PCA to multi-way data, are multi-linear decomposition methods decomposing a multi-way array into
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
RI PT
ACCEPTED MANUSCRIPT
sets of loadings. The loadings ideally describe the data in a more condensed
180
way, thereby facilitating the extraction of information. PARAFAC can be expressed as a constrained version of Tucker3, and Tucker3 a constrained version of two-way PCA [41]. For the matrix xij and the three-way array xijk the PCA model (Equation 13), TUCKER3 model (Equation 14) and PARAFAC model (Equation 15), respectively, are described as follows: 10
xij =
F X
aif bjf + eij
(13)
f =1
aif bjf ckf gf1 f2 f3 + eijk
f =1 f =1 f =1
F X
M AN U
xijk =
(14)
SC
xijk =
F3 F2 X F1 X X
aif bjf ckf + eijk
(15)
f =1
185
Where F is the number of factors (components), aif , bif and ckf are elements of the loading matrices A(I×F ) , B(J×F ) and C(K×F ) . gf1 f2 f3 are the elements of the TUCKER3 core array, and eij and eijk are elements in the residual matrix
TE D
E(I×J) and residual array E (I×J×K) , respectively. Note that in PCA A(I×F ) and B(J×F ) are called scores and loadings, while in multi-way analysis only the 190
term loadings is used[15].
EP
3. Application of the new non-alignment approach to an artificial GC-MS data set
The artificial GC-MS data set was analysed using the new approach to show
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
RI PT
ACCEPTED MANUSCRIPT
its validity. To prove theoretical considerations the new approach was first
195
tested on the artificial GC-MS data set without noise and without any shifting peaks. Subsequently, the new approach was tested on the artificial GC-MS data set with noise and non-linear peak shifts to show that the new algorithm can accommodate peak shifts.
11
ACCEPTED MANUSCRIPT
3.1. Artificial data set without retention shifts and noise In the artificial GC-MS data set each of the three differences among samples
RI PT
200
(see Table 1) is caused by varying peaks in different segments. After segmen-
tation and mathematical transformation the resulting three-way array contains information on the covariation among samples in terms of differences in their
205
SC
mass traces in each segment. The decomposition of this array using PARAFAC is therefore expected to give one component to explain each of the three differences among the four groups of samples. Noise was excluded from the artificial
M AN U
data set, as it is a source of random variation. Prior to multi-way analysis the three-way array Z was mean centered across the first and second mode to reduce offsets in these modes and scaled to unit variance within the third mode to give 210
each segment the same weight. Preprocessing was done using the nprocess.m function of the N-way toolbox [42].
In fact, a three component PARAFAC model explains the segmented and
TE D
transformed three-way array perfectly. The proper number of components was determined by evaluating residuals, core consistency, convergence speed, and by 215
assessing the interpretability of the solution. As no noise was introduced to the artificial GC-MS data set 100 % variation is explained, evenly distributed over
EP
the three components. The loadings of the first (sample) and third (segment) mode are shown in Figure 1. Note that due to the calculation of SSCP matrices included in the mathematical transformation modes one and two are identical. Component one explains the differences between samples one to five and the
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
220
other samples, which is caused by peak four in segment two as indicated by the loadings of mode three of this component. PARAFAC component two reflects the differences of the samples 14 and 15 which are the only samples that contain peak number two in segment one. Finally, the differences between the samples
225
one to ten and eleven to 20 are shown by component three. Here segment five
12
ACCEPTED MANUSCRIPT
0.7
0.25
14 15
0.5 0.4 0.3 0.2 0.1
0.1 0.05 0 −0.05 −0.1 −0.15
0 1113 18 8 6 9 12 16 19 7 10 17 20 −0.1 −0.2 −0.1
−0.2
3 14 25 0 0.1 0.2 0.3 Component 1: 33.3 % expl. var.
0.4
−0.25 −0.2
0.5
0 0.1 0.2 0.3 Component 1: 33.3 % expl. var.
0.4
0.5
component 1 component 2 component 3
15
5 0 −5 −10 −15
1
M AN U
10
loading
−0.1
(b) Mode 1: comp. 1 vs. comp. 3
20
−20
11 16 19 12 17 20 13 18
SC
(a) Mode 1: comp. 1 vs. comp. 2
14 15
RI PT
0.15 Component 3: 33.3 % expl. var.
Component 2: 33.3 % expl. var.
1 3 2 4 5
6 9 7 10 8
0.2
0.6
2
3 Segment
4
5
(c) Mode 3: comp. 1 to comp. 3
TE D
Figure 1: Loadings of the modes one and three of the PARAFAC model on the three-way array of the segmented and mathematically transformed artificial GC-MS dataset without noise and without shifted peaks. Note that mode one and two are identical. Samples are coloured according to Table 1.
is responsible for this separation, which contains peak nine.
EP
3.2. Artificial data set with retention shift and noise To prove the applicability of the new algorithm to shifted chromatograms the artificial GC-MS data set with introduced peak shifts (Figure 15 in Sup230
porting Information). After segmentation and mathematical transformation a
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
four component PARAFAC model explaining 83.8 % of the total variation in the data was obtained. The proper number of components was determined by evaluating residuals, core consistency, iterations until convergence, and by assessing the interpretability of the solution. Component one explaining 68.6 % of
235
the total variation in the data separates samples one to ten from samples eleven to 20 (Figure 2(a)). Segment five, which contains peak number 9, shows high 13
ACCEPTED MANUSCRIPT
loadings on this component (Figure 2(d)). Samples one to five differ from the
RI PT
other samples on component two, which explains 9.5 % of variation. The loadings of the segment mode (mode three) reveal that segment two containing peak 240
four is responsible for this difference. Two samples 14 and 15, which as only
samples contain peak number 2, are differentiated from the other samples on
component three explaining 5.5 % of variation (Figure 2(b)). Here segment one
SC
shows high loadings on this component. Furthermore, component four explaining 3.5 % variation reflected unsystematic variation in the data (Figure 2(c)), which is related to noise, as PARAFAC on the transformed shifted artificial GC-
M AN U
245
MS data set which does not contain noise resulted in a three component model (model not shown). It can be shown here, that using the developed approach for the non-shifted and for the shifted artificial GC-MS data the same structural information on the differences among samples could be extracted from the data. 250
The three-way data array which is obtained after the segmentation and
TE D
mathematical transformation can also be seen as a ‘stack’ of matrices. It seems reasonable to evaluate different multi-block methods for the analysis of this data type besides multi-way methods. Different multi-block methods have therefore been applied to the three-way array, in a manner such that each slab of the array corresponds to a block. The following methods were tested: PCA on concate-
EP
255
nated matrices, Multiple Factorial Analysis (MFA)[43], Common Component and Specific Weights Analysis (CCSWA)[44], analysis of co-inerita with common components [45] and STATIS [34] using the SAISIR toolbox for MATLAB[46]
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
kindly and freely available on www.chimiometrie.fr (July 2014). From the tested
260
models only CCSWA gave interpretable results which are shown in Figure 3. Except of CCSWA, non of the tested multi-block methods lead to interpretable results. A CCSWA model with 4 components revealed the structural information in the data comparable to the results from PARAFAC (Figure 2). Common
14
4
x 10
Component 3: 5.5 % expl. var.
1.5
0.5 0 18
14
9
13 11 15
−1
17 12 16 19 −4
−1.5 −6
6
4
−1 −6
6
4
19
0
13 15 14
3
12 16
−0.5 1820
−1
4
−2 0 2 Component 1: 68.6 % expl. var.
4
6 4
x 10
4 9
10
0.6 component 1 component 2 component 3 component 4
0.4
0.2
1 7 8
0
5
−0.2
6
−4
−2 0 2 Component 1: 68.6 % expl. var.
4
6
4
x 10
EP
−1.5 −6
1 5 7
0.8
TE D
1.5
0.5
8 2
6
(b) Mode 1: comp. 1 vs. comp. 3
2
11
10 3
1
17
1
−4
x 10
x 10
9
19 12 1820 16 17
13 11
4
(a) Mode 1: comp. 1 vs. comp. 2
2
1
0.5
−0.5
8
10 −2 0 2 Component 1: 68.6 % expl. var.
2 1.5
0
7
20
2.5
M AN U
1
−0.5
Component 4: 3.5 % expl. var.
14 15
3
2
2.5
x 10
3.5
loading
Component 2: 9.5 % expl. var.
4
1 2 5 4
3
SC
4
3 2.5
(c) Mode 1: comp. 1 vs. comp. 4
−0.4
1
2
3 Segment
4
5
(d) Mode 3: comp. 1 to comp. 4
Figure 2: Loadings of the modes one and three of the PARAFAC model on the three-way array of the segmented and mathematically transformed artificial GC-MS dataset with shifted peaks. Note that mode one and two are identical. Samples are coloured according to Table 1.
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
RI PT
ACCEPTED MANUSCRIPT
15
ACCEPTED MANUSCRIPT
17 13 16 11 2018 12 19
0.4 0.3
4 7 15 28
0
6
Common component 3: 1.4 % expl. var.
Common component 2: 4.1 % expl. var.
8 10
0.1 310
9
−0.1 −0.2 −0.3 −0.4 −0.5
15
9
6
0.2 0.1
16 15 12 19 20 17 11 13 18 14
0 −0.1 −0.2 4
−0.3
−0.6
7
RI PT
0.2
5 21 3
−0.7 −0.4
−0.3
−0.2 −0.1 0 0.1 Common component 1: 90.8 % expl. var.
14 0.3
0.2
−0.4 −0.4
−0.3
0.5
0.3
0.6 0.5
8 7
0
11 15
4
−0.1
3 10
1
−0.2
5
14 13
0.4 0.3
12
6
−0.3 −0.4 −0.4
Saliences
9
0.1
0.2
16
0.1
20 18
0
−0.3
0.3
PC 1 PC 2 PC 3 PC 4
0.7
M AN U
Common component 4: 0.9 % expl. var.
0.8
2
0.2
0.2
(b) Scores: q1 vs. q3 17 19
0.4
−0.2 −0.1 0 0.1 Common component 1: 90.8 % expl. var.
SC
(a) Scores: q1 vs. q2
−0.2 −0.1 0 0.1 Common component 1: 90.8 % expl. var.
(c) Scores: q1 vs. q4
0.2
0.3
1
1.5
2
2.5
3 3.5 Segment
4
4.5
5
(d) Saliences: q1 to q4
TE D
Figure 3: Scores and saliences (weights of blocks/segments) of CCSWA on the three-way array of the segmented and mathematically transformed artificial GC-MS dataset with shifted peaks. Only common components one to four are shown. Samples are coloured according to Table 1.
component one (90.8 % explained variance) separates the samples one to ten and 265
eleven to 20, while segment five has the strongest influence on this component.
EP
Common component two (4.1 % explained variance) explains differences between the samples 14 and 15 and the other samples (Figure 3(a)). Segment two shows the highest weight on this component. The differences among the samples one
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
to five from the other samples are explained by common component three (Fig-
270
ure 3(b)), on which segment two has a high salience value. Component four (Figure 3(c)) shows the same random variation reflecting noise in the data as component four of the PARAFAC model.
16
ACCEPTED MANUSCRIPT
4. Comparison of the new non-alignment approach and a reference
275
RI PT
method on experimental GC-MS data Modern analytical instrumentation allow an enormous amount of data to be acquired in a short period of time. This is especially the case for chro-
matographic instrumentation coupled to multi-channel detectors. The extrac-
SC
tion and full exploration of this abundance of information is still an important bottleneck in work-flows of non-targeted strategies. The work presented here 280
was instigated by the need for new data processing approaches which take the
M AN U
most important limiting factors regarding the processing and multivariate modelling of chromatographic data, namely feature extraction, peak shifts and peak shape changes, into account. The in this study developed approach is compared to PARAFAC2 deconvolution of all chromatogram segments with subsequent 285
PCA of deconvoluted peak area values, which is very powerful deconvolution methodology previously described by Amigo et al. [37]. A brief summary of
TE D
both approaches is provided in the supporting information. 4.1. Experimental
The data set explored in this study consists of solid phase microextraction (SPME) GC-MS analysis of Cabernet Sauvignon wines, which were fermented
EP
290
with different combinations of yeast and lactic acid bacteria using sequential inoculation and co-inoculation strategies.
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
4.1.1. Wine Samples All wines were produced from the same Cabernet Sauvignon grapes from
295
California of 2012 vintage. Fermentations were carried out using six combinations of yeast and lactic acid bacteria, which were selected according to their organoleptic properties indicated by the manufacturer. Three wines were made with the yeast Lalvin Clos and the lactic acid bacteria Enoferm Alpha, Enoferm 17
ACCEPTED MANUSCRIPT
Beta and Lalvin PN4; two wines were made with the yeast Uvaferm RBS and the lactic acid bacteria Lalvin VP41 and O-Mega; and one wine was made with
RI PT
300
the yeast Uvaferm VRB and the lactic acid bacteria Enoferm Alpha (all from
Lallemand Inc., Canada). Moreover, for all of these six yeast/bacteria com-
binations, two different inoculation strategies were used: inoculation of lactic
acid bacteria 24 hour after yeast inoculation (co-inoculation), and inoculation of lactic acid bacteria after the completion of alcoholic fermentation (sequential
SC
305
inoculation). In total, the volatile composition of 12 experimental wines was
4.1.2. SPME-GC-MS Analysis
M AN U
studied here.
Headspace solid phase microextraction (HS-SPME) sampling was carried 310
out in randomized order using a 100 µm polydimethylsiloxane (PDMS) fibre and the following procedure: 5 mL of the wine sample was transferred to a 20 mL headspace crimp-top vial, two grams of sodium chloride (preheated to
TE D
250 ◦C and cooled to room temperature) was added and the vial was capped immediately using a PTFE-lined septum and aluminium cap. Each wine sample 315
was submitted to HS-SPME sampling with agitation at 500 rpm for 30 min. Fiber blank and column blank analyses were carried out regularly to confirm
EP
that no sample carry-over occurred. A standard 12 % hydro-alcoholic solution containing some esters and alcohols commonly present in wine was regularly analysed to monitor the performance of the system.
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
320
For GC-MS analysis an Agilent 6890 GC (Agilent Technologies) coupled to a
quadrupole mass spectrometer Agilent 5973 N (Agilent Technologies, PaloAlto, CA) was used applying electron impact ionisation (EI) at 70 eV. Full mass spectra were acquired in the range 35 u to 300 u at four spectra per second. The ion source temperature was set to 230 ◦C, and the detector voltage was
325
2105 V. Separation was carried out on a 30 m HP-5 MS column with an internal
18
ACCEPTED MANUSCRIPT
diameter (i.d.) of 0.25 mm and a film thickness of 0.25 µm. The following oven
RI PT
temperature program was used: 40 ◦C; kept for 5 min; ramped at 15 ◦C min−1 to 250 ◦C; and held for 5 min, resulting in a total run time of 25 min. Thermal
desorption and injection were performed using a split/splitless injector, operated 330
at 250 ◦C in the splitless mode, with a splitless time of 3 min. Helium was used as carrier gas at a constant flow of 1.0 mL min−1 . Linear retention indices
SC
were calculated using a series of n-alkanes. Experimental retention indices were
compared to literature values to confirm tentative peak identification based on
335
4.1.3. Data Treatment
M AN U
mass spectra. All chromatographic analyses were performed in triplicate.
All raw chromatograms were exported from Agilent Chemstation version D.03.00.611 (Agilent Technologies) as netCDF-files and imported into MATLAB version 8.0 (R2012b) (The MathWorks Inc., Natick, MA, USA) using built-in functions. All further data processing was done in MATLAB utilizing the freely available N-way toolbox [42] and in-house written functions. Preprocessing of
TE D
340
multi-way arrays was done using the nprocess.m function of the N-way toolbox [42]. Useless parts of the chromatogram at the beginning and at the end of chromatogram were removed. Each of the 36 GC-MS raw chromatograms was
345
EP
arranged as a matrix of size 3977 × 266 (elution profile × spectral profile). Deconvoluted mass spectra were exported as ASCII text files in NIST .msp format using an in-house written MATLAB function and imported into NIST
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
08 spectral library [47]. 4.2. Application of the new non-alignment approach to the experimental GC-MS data
350
The developed fingerprinting approach was applied to GC-MS data obtained for a set of twelve Carbernet Sauvignon wines fermented with different
19
ACCEPTED MANUSCRIPT
yeast/bacteria combinations using co-inoculation and sequential inoculation to
RI PT
study the impact of these factors on the volatile composition of the wines. SPME was chosen for sample preparation because of its simplicity for wine analysis in 355
terms of full automation speed and sensitivity [8, 48, 49]. A PDMS fibre was
chosen, as all PDMS degradation products contain silicone, which facilitates the differentiation of analytes from artefacts by means of siloxane fragments present
SC
in the mass spectra of the latter. This is particularly important when performing non-targeted analysis. A fast temperature ramp was used in this study to provide relatively fast GC separation. Under these conditions some resolution
M AN U
360
is sacrificed. However, the data analysis approach reported here takes the entire mass dimension into account, and therefore complete separation of peak is not needed provided that co-eluting compounds differ in terms of their mass spectra. During the analyse of all samples, the system stability was monitored using 365
a hydro-alcoholic standard solution containing common wine volatiles.
TE D
4.2.1. PARAFAC on transformed raw chromatograms Initially, all chromatograms were divided into 84 small segments based on visual examination of overlays of total ion chromatograms (TICs) of all samples and of overlays of single ion chromatograms of all mass channels of single samples. Special attention was paid to avoid the inclusion of too many peaks in one
EP
370
segment and splitting of peaks into different segments. The latter is particularly important for segments containing peaks which shift between different samples.
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
In this way, as few as possible peaks were included in each segment (one to five) and the dimensions of the segments ranged between 22 and 114 scans. The seg-
375
ments 15, 58 - 62, 72, 76, 77, 80, 81, 83 were excluded from the data set as they either contained only baseline or artefacts in the chromatograms in. Seventy one small segments in total were kept for further analysis. To evaluate the effect of the number of segments, every two and every four neighbouring segments were
20
ACCEPTED MANUSCRIPT
combined which resulted in 36 and 18 larger segments, respectively. The outcome of the mathematical transformation (see section 2.2) of the
RI PT
380
segmented chromatographic raw data is a three-way array of size 36 × 36 × 71 (samples × samples × number of segments) , 36 × 36 × 36 and 36 × 36 × 18, respectively. The array which was obtained from the smallest segments (to-
tal of 71 segments) was analysed using CCSWA, TUCKER3 and PARAFAC. While the TUCKER3 results were promising, although due to the nature of
SC
385
the TUCKER3 model difficult to interpret, CCSWA did not show any inter-
M AN U
pretable results against expectation (not shown). The results of the PARAFAC model were however much more informative and easier to interpret, revealing information on systematic differences among samples. The two other three-way 390
arrays with 36 and 18 segments were therefore only analysed using PARAFAC. The number of components of the PARAFAC models were determined using the core consistency diagnostic [50], by examination of residuals, and by evaluating
TE D
captured variance and number of iterations untill the PARAFAC algorithm converged for models with one to 20 components. For the three-way array with 71 395
segments a eleven component PARAFAC model was chosen, explaining 73.0 % of the total variation in the data set. The best PARAFAC models for the three-
EP
way array with 36 and 18 segments were a ten component PARAFAC model explaining 83.0 % of the total variation and a nine component PARAFAC model explaining 92.2 % of the total variation, respectively. 400
In general, PARAFAC loadings can be interpreted in the same way as PCA
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
scores and loadings. In multi-way terminology, however, only the word ‘loading’ is used. For each mode of the analysed multi-way array a loading matrix is obtained. In the approach presented here, the first and second modes of the obtained PARAFAC model are identical, as the SSCP matrices from Equation
405
11, which were compiled into a three-way array in Equation 12, are symmetric.
21
ACCEPTED MANUSCRIPT
Congruence loadings were calculated for the third mode (segment mode) and
RI PT
each segment with an a congruence loading value higher than 0.5 was considered as ‘high to medium correlated’ with the raw data. Dependant on the aim of the study, this value can also be chosen higher (e.g. 0.75) if only highly correlated 410
segments are of interest. A rather conservative value of 0.5 has been chosen here, to ensure that the data set will be fully explored.
SC
The information content of the three PARAFAC models are discussed and compared in the following. Examination of the loadings of the sample modes
415
M AN U
(first and second modes) of the PARAFAC model of the 71 segments showed that five of the eleven components contained important information revealing systematic differences between wines made with different yeast starter cultures and inoculation scenarios (Figures 4, 5 and 6). The remaining six components mainly reflect unsystematic variations in the chromatograms, for instance component five shown in Figure 7. From the congruence loadings of the segment mode of this component in Figure 7(b) it is evident that only one segment, that
TE D
420
is segment 73, is responsible for the discrepancy of samples on this component (Figure 7(a)). The overlay of the TICs of segment 73 of all samples in Figure 7(c) shows that component 5 returns the information in segment 73 very
425
EP
well. One injection of each of the wine made with the yeast/bacteria combination Lalvin Clos/Lalvin PN4 sequentially inoculated (clos PN4) and the wine made with the yeast/bacteria combination Uvaferm RBS /O-Mega sequentially inoculated (rbs 271) show a much higher peak than all other samples in this
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
segment. This pattern is exactly reproduced in the loadings of the sample mode of component 5. All other components containing redundant information are
430
not further discussed here. PARAFAC components three and eleven are displayed in Figure 4(a) showing
the variation between wines fermented with different yeasts. Wines fermented
22
ACCEPTED MANUSCRIPT
with the yeast Uvaferm RBS (rbs) are separated from the wines fermented with
435
RI PT
the yeast Lalvin Clos (clos) and Uvaferm VRB (vrb) on component three (7.8 % explained variation), whereas the wines fermented with the yeast Uvaferm VRB differ from the other wines by component eleven (2.3 % explained variation).
The impact of each segment on component three and eleven, respectively, is shown in the congruence loadings plots of the segment mode of these compo-
440
SC
nents in Figure 4(b). For component eleven only the segments 9 and 20 are responsible for the differences of the wines made with the yeast Uvaferm VRB
M AN U
compared to the wines made with the other two yeast starter cultures, considering a congruence loading value of a segment higher than 0.5. The differences between the wines fermented with the yeast starter culture Uvaferm RBS and all other wines described by component three are caused by the segments 1, 4, 445
8, 11, 14, 22, 23, 24, 30, 31 and 38.
Figure 5 shows the PARAFAC results for components one and two. Com-
TE D
ponent one (17.6 % explained variation) mainly explains the differences in the wine fermented with the yeast Uvaferm RBS and the lactic acid bacteria O-Mega sequentially inoculated (rbs 271), but this component also shows a difference be450
tween co-inoculated and sequentially inoculated wines. Component two (11.3 %
EP
explained variation) mainly describes the distinction of the wine fermented with the yeast/bacteria combination Lalvin Clos/Enoferm Beta sequentially inoculated (clos beta) compared to all other wines. Congruence loadings of the segment mode for component one and two are shown in 5(b). Segments 4, 6, 11,
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
455
18, 28, 31, 33, 35, 36, 38, 41, 45, 46, 48, 49, 50, 53, 67, 74 and 75 had congruence loading higher than 0.5 on component one, while on component two segments 28, 64, 65, 68, 69, 71, 78 are important. Component 4 explaining 6.9 % of the total variation in the data set differen-
tiates the wine fermented with the yeast Lalvin Clos and the lactic acid bacteria
23
ACCEPTED MANUSCRIPT
1 clos vrb rbs co−inoculated sequential
Component 11: 2.3% expl. var.
vrb alpha 0.4
0.8 Component 11: 2.3% expl. var.
0.5
0.3 vrb alpha
0.2 0.1 0
clos beta
−0.1 −0.2 −0.2
clos alpha clos PN4
rbs 271
clos alpha clos PN4 clos beta −0.1
0
20
0.9
0.7 9
0.6 0.5 0.4
3217 21
0.3 3 7 0.2 13
rbs 41
0.4
23
26 19 2527 2 22 38 41 11 14 53 52 4 39 48 16 67 10 45 342963 36 18 4978 46 5 33 12 37 51 65 35 28 84 42 75 66 6440 30 56 68 69 71 54 6 79 43 73 74 50 44 70 55 82 0 47 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Component 3: 7.8% expl. var.
rbs 271
0.1
rbs 41 0.1 0.2 0.3 Component 3: 7.8% expl. var.
RI PT
0.6
0.5
0.8
0.9
1
(b) Third mode (segments) congruence loadings
SC
(a) First mode (samples) loadings
8
1 24 31
460
M AN U
Figure 4: Loadings plot of PARAFAC components three vs. eleven (model with 71 segments); Yeast starter cultures: Lalvin Clos (clos), Uvaferm RBS (rbs), Uvaferm VRB (vrb); Lactic acid bacteria starter cultures: Enoferm Alpha (alpha), Enoferm Beta (beta), Lalvin PN4 (PN4), Lalvin VP41 (41) and O-Mega (271).
Lalvin PN4 co-inoculated (clos PN4) from the other wines (6(a)). Responsible for this differences are segments 41, 43, 51 and 63, as shown in the congruence loading plot of the segment mode of this component (6(b)).
The results of the PARAFAC model with only 36 segments (neighbouring
465
TE D
segments were combined) are very similar to the results of the PARAFAC model with 71 segments and will be discussed in the following. Component one of both PARAFAC models (Figure 5 and 18 in Supporting Information) reflect the same information, which is the differences of the wine fermented with the yeast
EP
Uvaferm RBS and the lactic acid bacteria O-Mega sequentially inoculated (rbs 271), and difference between co-inoculated and sequentially inoculated wines. 470
Moreover, component three and two (Figure 18 and 17 in Supporting Informa-
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
tion) of the PARAFAC model with 36 segments and component two and four (Figure 5 and 6) of the PARAFAC model with 71 segments show the same information on the differences of the wines made with the yeast/lactic acid bacteria combination Lalvin Clos/Enoferm beta (clos beta) sequentially inoculated and
475
Lavin Clos/Lalvin PN4 (clos PN4) co-inoculated, respectively. Components three and eleven (Figure 4) of the PARAFAC model with the smallest segments
24
0.4
0.2
rbs 41 rbs 271
0
vrb alpha
−0.1 clos PN4 clos alpha −0.2 −0.2
−0.1
clos PN4
clos alpha vrb alpha 0
clos beta rbs 41
0.5
0.7 0.6
78
28
65 0.5 18
0.4 37 0.3
40 56 82 0.2 3013 70 3417 39
22
74
75
25
48
84
46
36
49
31
51 41 11 44 4 26 52 23143 6 79 10 21 8 12 7275 32 19 1 54 20 73 55 92947 42 16 24 63 66 0 243 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Component 1: 17.6% expl. var.
rbs 271
0.1 0.2 0.3 0.4 Component 1: 17.6% expl. var.
64
0.8
0.3
0.1
6869 71
0.9
Component 2: 11.3% expl. var.
Component 2: 11.3% expl. var.
0.5
1
clos vrb rbs co−inoculated sequential
clos beta
SC
0.6
0.1 0.6
33
0.8
38 35 50 45 53 67
0.9
1
(b) Third mode (segments) congruence loadings
M AN U
(a) First mode (samples) loadings
0.7
clos vrb rbs co−inoculated sequential
clos PN4
0.5 0.4 0.3
EP
Component 4: 6.9% expl. var.
0.6
0.2 0.1 0
clos alpha
rbs 271
clos beta clos beta clos PN4 rbs 271vrb alpha rbs 41 rbs 41 clos alpha vrb alpha −0.1 0 0.1 0.2 0.3 0.4 Component 1: 17.6% expl. var.
−0.1 −0.2
0.5
1 43 0.9 0.8 Component 4: 6.9% expl. var.
TE D
Figure 5: Loadings plots of PARAFAC components one vs. three (model with 71 segments); Yeast starter cultures: Lalvin Clos (clos), Uvaferm RBS (rbs), Uvaferm VRB (vrb); Lactic acid bacteria starter cultures: Enoferm Alpha (alpha), Enoferm Beta (beta), Lalvin PN4 (PN4), Lalvin VP41 (41) and O-Mega (271).
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
RI PT
ACCEPTED MANUSCRIPT
0.7 0.6
41
63 51
0.5 0.4
39
13 52 56
27 48 36 18 3 65 25 4 40 17 0.1 35 44 21 26 11 46 33 74 45 53 67 22 30 49 9 534 19 66 238 64 28 38 50 20 10 73 163224 84 78 14 6982 54 791 68 55 729 42 75 71 37 47 12 31 6 0 270 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Component 1: 17.6% expl. var. 0.3 0.2
0.6
(a) First mode (samples) loadings
(b) Third mode (segments) congruence loadings
Figure 6: Loadings plots of PARAFAC components one vs. four (model with 71 segments); Yeast starter cultures: Lalvin Clos (clos), Uvaferm RBS (rbs), Uvaferm VRB (vrb); Lactic acid bacteria starter cultures: Enoferm Alpha (alpha), Enoferm Beta (beta), Lalvin PN4 (PN4), Lalvin VP41 (41) and O-Mega (271).
25
0.7
0.9 0.8
0.6 0.5 0.4 0.3
rbs 271
0.2 0.1 0 −0.1 −0.2
0.7 0.6 0.5
M AN U
Component 5: 5.9% expl. var.
1 73
clos vrb rbs co−inoculated sequential
clos PN4
Component 5: 5.9% expl. var.
0.8
SC
0.9
0.4 0.3
10
42
19
0.2
vrbclos alpha rbs 41 alpha vrb 41 alpha rbs clos beta clos alphaclos PN4 clos beta −0.1
0
36 54 66 25 53 26 47 48 1617 2156 7929 52 78 565 32 71 33 74 38 35 67 51 44 4118 75 70 12 82 43 31 63 45 50 49 6923 34 30 22 84 4 28 14 39 79 40 6 46 27 37 11 13 20 1 68 8 3 64 24 0 255 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Component 1: 17.6% expl. var.
0.1
rbs 271
0.1 0.2 0.3 0.4 Component 1: 17.6% expl. var.
0.5
0.6
(a) First mode (samples) loadings
18000
(b) Third mode (segments) congruence loadings
clos PN4 seq
16000 14000
Abundance
TE D
12000
rbs 271 seq
10000
8000
vrb alpha seq clos alpha coin clos beta seq vrb alpha coin rbs 271 seq clos alpha seq rbs 271 coin clos beta coin rbs 41 coin clos PN4 coin rbs 41 seq clos PN4 seq
6000 4000 2000
EP
0
22.7
22.75
22.8 22.85 Retention time [min]
22.9
(c) TICs of segment 73
Figure 7: Loadings plots of PARAFAC components one vs. five (model with 71 segments); Yeast starter cultures: Lalvin Clos (clos), Uvaferm RBS (rbs), Uvaferm VRB (vrb); Lactic acid bacteria starter cultures: Enoferm Alpha (alpha), Enoferm Beta (beta), Lalvin PN4 (PN4), Lalvin VP41 (41) and O-Mega (271).
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
RI PT
ACCEPTED MANUSCRIPT
26
ACCEPTED MANUSCRIPT
(71 segments) reveal the same information on systematic differences according
RI PT
to the different yeast starter cultures used as components five and ten (Figure 16 in Supporting Information) of the PARAFAC model with 36 segments: 480
systematic differences according to the different yeast starter cultures.
The results of the PARAFAC model where four neighbouring segments were
combined (total of 18 segments) are, in contrast to the results of the PARAFAC
SC
model with 36 segments, not fully comparable to the results of the PARAFAC
model with the smallest segments (71 segments). Only three components are comparable between these models. Component one (Figure 19 in Supporting
M AN U
485
Information) of the 18 segments PARAFAC model reflecting the differences between the wine fermented with the co-inoculated yeast Lalvin Clos and the lactic acid bacteria Lalvin PN4 (clos PN4) and the other wines shows the same information as component 4 of the PARAFAC model with 71 segments. Com490
ponent two (Figure 19 in Supporting Information) of the PARAFAC model
TE D
with the biggest segments (18 segments) is comparable with component one of the 71 segment PARAFAC model mainly explaining the wine made with the yeast Uvaferm RBS and the lactic acid bacteria O-Mega (sequentially inoculated) and a tendency between co-inoculated and sequentially inoculated wines. Furthermore, component three of the PARAFAC model with 18 segments shows
EP
495
differences of the wine made with sequential inoculation of the yeast Lalvin Clos and the lactic acid bacteria Enoferm Beta (clos beta) and is comparable with the information obtained from component two of the PARAFAC model with 71
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
segments. Information on the systematic differences caused by the yeast strains
500
as obtained on component eleven and three (Figure 4) of the PARAFAC model with the smallest segments (71 segments) and on components ten and five (Figure 16 in Supporting Information) of the PARAFAC model with 36 segments could not be observed.
27
ACCEPTED MANUSCRIPT
In conclusion, the comparison of the results of the three PARAFAC models with different segment sizes shows that the size of the segments clearly has
RI PT
505
an influence on the information obtained from the PARAFAC model. While
the models with small and medium size (71 and 36 segments respectively) revealed the same information on systematic differences in the data, important information on systematic differences among the wines caused by the different
yeast starter cultures could not be obtained from the PARAFAC model with
SC
510
the biggest segments (18 segments). These results demonstrate that a smaller
M AN U
segment size is beneficial. Another positive aspect of smaller segments is that they are easier to investigate after PARAFAC modeling. In this manner peaks in segments which have been determined to be important for the differentiation 515
of samples can be easier deconvoluted and identified.
4.2.2. Deconvolution and identification of compounds in important chromatogram segments
TE D
From the discussion above it can be summarized that the components one, two, three, four and eleven from the PARAFAC model with 71 segments are 520
important to explain information on systematic differences between the wines. The segments with congruence loadings higher than 0.5, which can be consid-
EP
ered as ‘medium to high correlated’ with the data, are the segments 4, 6, 11, 18, 28, 31, 33, 35, 36, 38, 41, 45, 46, 48, 49, 50, 53, 67, 74 and 75 for component one, 28, 64, 65, 68, 69, 71 and 78 for component two, 1, 4, 8, 11, 14, 22, 23, 24,
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
525
30, 31 and 38 for component three, 41, 43, 51 and 63 for component four and 9 and 20 for component eleven. To confirm the results from PARAFAC modelling of the segmented and transformed GC-MS chromatograms and to study the important chromatogram segments in more detail, all of these 38 segments were deconvoluted using PARAFAC2 on each of the segments. The number of
530
factors for each of the PARAFAC2 models were first evaluated as described by
28
ACCEPTED MANUSCRIPT
[39] using the autochrom.m MATLAB function, which is kindly and freely pro-
RI PT
vided on www.models.life.ku.dk (July 2014). The number of components of each model was then manually verified using the freely available N-way toolbox [42] for MATLAB. The number of factors were checked, and if needed corrected, by 535
examining core consistency, number of iterations until the algorithm converges,
residuals, and the interpretability of the loadings. Moreover, non-negativity con-
SC
straints were applied in the spectra mode. After exporting all deconvoluted mass spectra using an in-house written MATLAB function, tentative identification of
540
M AN U
the deconvoluted peaks were performed based on comparison of deconvoluted mass spectra with the NIST 08 spectral library. Furthermore, linear retention indices (LRI) were calculated using a homologous series of n-alkanes and compared with literature values to confirm tentative identifications. Details on the PARAFAC2 models and the identified compounds are summarized in Table 2. 4.2.3. PCA on deconvoluted peak areas
To visualize the above summarized and discussed results three different
TE D
545
PCAs were constructed. All compounds in the segments which had high congruence loadings on the PARAFAC component three and eleven, which distinguished all samples according to which yeast starter culture was used, were
550
EP
included in the first PCA. A two component PCA model was sufficient to separate the wines into three groups. The model was then improved by successively removing all compounds with low loadings on PC1 and PC2 (small impact on
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
these two components). The wines fermented with the yeast starter culture Uvaferm RBS were separated from the other wines by PC1, which explains 67.4 % of the total variance (Figure 8(a)). The loadings in Figure 8(b) reveal
555
that ethyl 2-methylbutyrate (1), iso-amyl iso-butyrate (8), ethyl-2-hexenoate (15), the unknowns 46 and 49 (both terpenoid-like mass spectra) and the two unknowns 48 and 65 are positively correlated with the wines made with the
29
Table 2: Summary of all segments showing high congruence loadings (> 0.5) on PARAFAC components one, two, three, four and eleven and details of PARAFAC2 model of each segment with corresponding compounds.
of PARAFAC component
1
2
3
4
PARAFAC2
11
no.
compound name
M AN U
segment
SC
congruence loadings
LRIa
MS match
857
900
861
852
component no. 1
0.85
1
1
butanoic acid, 2-methyl-, ethyl ester (ethyl
2-methylbutyrate)
2
2
butanoic acid, 3-methyl-, ethyl ester (ethyl
0.51
0.69
3
-
baseline
1
7
acetic acid, hexyl ester (hexyl acetate)
1005
931
2
8
propanoic acid, 3-methyl-, ethyl ester
1003
812
EP
4
TE D
3-methylbutanoate)
6
0.66
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
RI PT
ACCEPTED MANUSCRIPT
(iso-amyl iso-butyrate)
3
9
unknown
999
1
12
unknown
1022
2
-
baseline
Table 2 – continued congruence loadings
segment
1
2
3
4
PARAFAC2
11
no.
3 8
0.97
1
compound name
M AN U
component no.
SC
of PARAFAC component
LRIa
MS match
13
eucalyptol (1,8-cineole)
1025
877
15
2-hexenoic acid, ethyl ester
1048
860
(ethyl-2-hexenoate)
2
31 11
0.62
0.67
0.59
-
baseline
-
baseline
1
16
2
-
artefact (bleeding)
3
-
baseline
EP
9
TE D
3
4
-
unknown
1051
1
19
propanoic acid 2-hydroxy-, 3-methylbutyl
1068
871
1070
880
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
RI PT
ACCEPTED MANUSCRIPT
unknown
1048
ester (isoamyl lactate) 2
20
1-octanol
Table 2 – continued congruence loadings
segment
1
2
3
4
PARAFAC2
11
no.
3 4 14
0.63
1 2
20
21
unknown
1069
22
acetophenone
1066
29
unknown
1106
-
MS match
920
baseline
unknown
1112
4
31
unknown
1111
1
36
octanoic acid ethyl ester (ethyl octanoate)
1200
931
2
-
1
39
6-octen-1-ol, 3,7-dimethyl- (citronellol)
1231
888
2
40
unknown
1233
3
-
TE D
30
0.51
0.98
AC C
18
LRIa
EP
3
compound name
M AN U
component no.
SC
of PARAFAC component
32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
RI PT
ACCEPTED MANUSCRIPT
baseline
baseline
Table 2 – continued congruence loadings
segment
1
2
3
4
PARAFAC2
11
no.
22
0.63
1
compound name
M AN U
component no.
SC
of PARAFAC component
42
hexanoic acid, 3-methylbutyl ester
LRIa
MS match
1252
930
1255
868
1250
852
(isopentyl hexanoate)
2
43
hexanoic acid, 2-methylbutyl ester (2-methylbutyl hexanoate)
33 23
0.54
44
benzeneacetic acid, ethyl ester (ethyl benzeneacetate)
4
45
unknown
1248
5
46
unknown (terpenoid-like MS)
1246
EP
TE D
3
47
acetic acid, 2-phenylethyl ester
1262
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
RI PT
ACCEPTED MANUSCRIPT
1
(phenylethyl acetate)
2
-
baseline
3
-
artefact (bleeding)
961
Table 2 – continued congruence loadings
segment
1
2
3
4
PARAFAC2
11
no.
4 5 24
0.89
1 2
0.57
0.51
AC C
30
0.63
-
unknown
49
unknown (terpenoid-like MS)
-
50 -
1
59
LRIa
MS match
artefact (bleeding)
48
4
EP
28
TE D
3
compound name
M AN U
component no.
SC
of PARAFAC component
34
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
RI PT
ACCEPTED MANUSCRIPT
1263
artefact (bleeding) nonanoic acid
1270
843
1297
892
baseline nonanoic acid, ethyl ester (ethyl nonanoate)
2
60
unknown
1295
3
61
propyl octanoate
1294
1
65
unknown (succinic acid ester)
1331
2
66
unknown
1333
841
Table 2 – continued congruence loadings
segment
1
2
3
4
PARAFAC2
11
no.
3 4 31
0.67
0.85
1
compound name
M AN U
component no.
SC
of PARAFAC component
67 -
68
unknown
LRIa
MS match
1330
baseline
octanoic acid, 2-methylpropyl ester
1350
890
(isobutyl octanoate)
35 35
0.79
0.93
69
unknown
1352
3
-
1
72
2
-
EP
33
TE D
2
3
73
unknown
1368
4
74
naphthalene, 1,2-dihydro-1,1,6-trimethyl-
1363
870
1389
873
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
RI PT
ACCEPTED MANUSCRIPT
baseline decanoic acid
1369
910
baseline
(TDN) 1
78
ethyl trans-4-decenoate
Table 2 – continued congruence loadings
segment
1
2
3
4
PARAFAC2
11
no.
2 3 36
0.64
1 2
0.57
decanoic acid, ethyl ester (ethyl decanoate)
1397
80
unknown
1408
baseline
unknown
1406
4
82
unknown
1410
5
83
unknown
1404
6
84
unknown
1403
87
octanoic acid, 3-methylbutyl ester (isoamyl
1449
TE D 0.89
79
-
1
MS match
baseline
81
AC C
38
-
LRIa
EP
3
compound name
M AN U
component no.
SC
of PARAFAC component
36
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
RI PT
ACCEPTED MANUSCRIPT
942
octanoate)
2
88
unknown
1450
3
89
octanoic acid, 2-methylbutyl ester
1451
921
Table 2 – continued congruence loadings
segment
1
2
3
4
PARAFAC2
11
no.
41
0.52
0.57
1 2 3
compound name
M AN U
component no.
SC
of PARAFAC component
-
LRIa
MS match
baseline
96
unknown
1490
97
decanoic acid, propyl ester (propyl
1492
857
decanoate)
0.91
1493
5
99
unknown
1489
1
102
unknown
1515
2
103
butylated hydroxytoluene (BHT)
1520
3
-
4
104
unknown
1521
5
105
unknown
1523
1
109
unknown
1547
AC C
45
0.99
unknown
TE D
4
43
98
EP
37
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
RI PT
ACCEPTED MANUSCRIPT
baseline
959
Table 2 – continued congruence loadings
segment
1
2
3
4
PARAFAC2
11
no.
2 3 46
0.73
1
compound name
M AN U
component no.
SC
of PARAFAC component
110 -
111
unknown
LRIa
112
baseline
1,6,10-dodecatrien-3-ol, 3,7,11-trimethyl-
1570
49
0.56
0.82
1571
3
-
baseline
4
-
artefact (bleeding)
5
113
unknown
1574
1
115
unknown
1583
2
-
baseline
3
-
artefact (bleeding)
1
116
AC C
48
unknown
EP
TE D
2
MS match
1550
(cis,trans-nerolidol)
38
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
RI PT
ACCEPTED MANUSCRIPT
unknown
1588
915
Table 2 – continued congruence loadings
segment
1
2
3
4
PARAFAC2
11
no.
2 50
0.95
1
compound name
M AN U
component no.
SC
of PARAFAC component
-
117
LRIa
MS match
1595
971
baseline
dodecanoic acid, ethyl ester (ethyl
-
baseline
-
baseline
2
-
aretefact (bleeding)
3
118
unknown
1610
4
119
unknown
1612
EP
dodecanoate)
2
121
pentadecanoic acid, 3-methylbutyl ester
1647
39 53
0.5
0.94
1
TE D
51
1
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
RI PT
ACCEPTED MANUSCRIPT
(iso-amyl decanoate)
2
122
3
-
unknown (long chain fatty acid ester) baseline
1650
936
Table 2 – continued congruence loadings
segment
1
2
3
4
PARAFAC2
11
no.
4 63
0.54
1 2
67
0.67
0.96
130 -
131
LRIa
MS match
artefact (bleeding)
unknown (long chain fatty acid ester)
1783
baseline
tetradecanoic acid, ethyl ester (ethyl
1794
925
tetradecanoate)
2
-
baseline
1
-
basline
2
-
artefact (bleeding)
EP
65
1
-
TE D
0.66
3
132
unknown
1820
4
133
unknown
1824
1
135
dodecanoic acid, 3-methylbutyl ester
1847
AC C
64
compound name
M AN U
component no.
SC
of PARAFAC component
40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
RI PT
ACCEPTED MANUSCRIPT
(isoamyl laurate)
891
Table 2 – continued congruence loadings
segment
1
2
3
4
PARAFAC2
11
no.
2 3 68
0.51
1 2
0.67
136
unknown
1841
137
unknown (long chain fatty acid ester)
1859
138
compname
1851
-
139
2
-
artefact (bleeding)
3
-
baseline
142
unknown (long chain fatty acid ester)
pentadecanoic acid, ethyl ester (ethyl
1866
1896
pentadecanoate)
2
143
3
-
MS match
baseline
1
1
LRIa
baseline
EP
71
0.57
AC C
69
-
TE D
3
compound name
M AN U
component no.
SC
of PARAFAC component
41
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
RI PT
ACCEPTED MANUSCRIPT
unknown baseline
1890
874
Table 2 – continued congruence loadings
segment
1
2
3
4
PARAFAC2
11
no.
74
0.83
1 2
75
0.57
1
compound name
M AN U
component no.
SC
of PARAFAC component
146 -
147
ethyl 9-hexadecenoate
LRIa
MS match
1976
917
1995
911
baseline
hexadecanoic acid, ethyl ester (ethyl hexadecanoate)
42 a experimentally
0.79
determined linear retention indices
-
baseline
1
148
2
-
baseline
3
-
baseline
EP
78
TE D
2
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
RI PT
ACCEPTED MANUSCRIPT
unknown (long chain fatty acid ester)
2067
ACCEPTED MANUSCRIPT
clos vrb rbs co−inoculated sequential
4 vrb alpha
1
rbs 41
0
clos alpha clos PN4 clos alpha clos PN4 clos beta clos beta
−1 −2
rbs 271 rbs 271 rbs 41
−3
65
0.2
1 49 8 46 15
0
−0.2
48
−0.4
−4 −5 −5
39
0.4
vrb alpha
2
PC 2: 20.8% expl. var.
PC 2: 20.8% expl. var.
3
31 0.6
−0.6
−4
−3
−2
−1 0 1 PC 1: 67.4% expl. var.
2
3
4
5
−0.6
−0.4
−0.2 0 0.2 PC 1: 67.4% expl. var.
0.4
0.6
(b) Loadings
SC
(a) Scores
RI PT
5
M AN U
Figure 8: Scores and loadings plots of the PCA of compounds in segments which had high congruence loadings on components three and eleven of the PARAFAC model with 71 segments; Yeast starter cultures: Lalvin Clos (clos), Uvaferm RBS (rbs), Uvaferm VRB (vrb); Lactic acid bacteria starter cultures: Enoferm Alpha (alpha), Enoferm Beta (beta), Lalvin PN4 (PN4), Lalvin VP41 (41) and O-Mega (271).
yeast Uvaferm RBS. Moreover, the grouping of the wines fermented with yeast Uvaferm VRB is explained by PC2 (20.8 % explained variance). Citronellol and 560
the unknown compound 31 are positively correlated on PC2 with these wines. All compounds in the segments which had high congruence loadings on
TE D
PARAFAC component one were included in the second PCA. A one component model was sufficient to explain the differences between the co-inoculated wines and the sequentially inoculated wines. After successively removing all com565
pounds with low loadings on PC1 (small impact on this component) a final one
EP
component model was obtained explaining 59.7 % of variance. Figure 9(a) shows the scores of PC1 which show that all samples are discriminated according to the inoculation scenario (co-inoculation vs. sequential inoculation). The branched
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
esters isoamyl iso-butyrate (8), isoamyl lactate (19), isoamyl octanoate (87),
570
isoamyl decanoate (121), isoamyl laurate (135) as well as isobutyl octanoate (68) and octanoic acid, 2-methylbutyl ester (89), the straight chain fatty acid ester ethyl octanoate (36), ethyl nonanoate (59), ethyl decanoate (79), ethyl deodecanoate (117), propyl octanoate (61), the two unsaturated ethyl trans-4decenoate (78) and ethyl 9-hexadecenoate (146), the fatty acid decanoic acid
43
ACCEPTED MANUSCRIPT
clos vrb rbs co−inoculated sequential
0.15
−5
0.1
0 −0.05 −0.1
ph a be t rb a s 27 1 rb s 41 vr b al ph a cl os PN cl 4 os al ph cl a os be t rb a s 27 1 rb s 41 vr b al ph a
al
os
cl
PN
−0.2
os
4
−10
cl
117121 135 122 146 116
98109
87 88
111 110
115
112 7 9
0.05
−0.15
os
8
79 78
89
0
cl
6872 193659 6061 12
74
83
RI PT
5
0.2
PC 1: 59.7% expl. var.
PC 1: 75.5% expl. var.
10
Compound number
(b) Loadings
SC
(a) Scores
575
M AN U
Figure 9: Scores and loadings plots of the PCA of compounds in segments which had high congruence loadings on component one of the PARAFAC model with 71 segments; Yeast starter cultures: Lalvin Clos (clos), Uvaferm RBS (rbs), Uvaferm VRB (vrb); Lactic acid bacteria starter cultures: Enoferm Alpha (alpha), Enoferm Beta (beta), Lalvin PN4 (PN4), Lalvin VP41 (41) and O-Mega (271).
(72), the terpenoid nerolidol (111), the unknown long chained fatty acid ester 122 and the unknowns 12, 60, 88, 109, 110, 115, 116 all correlate positively with the co-inoculated wines.
The third PCA included all compounds from segments which had high con-
580
TE D
gruence loadings on the components two and four (Figure 10). All compounds with low loadings (small impact on the model) were successively removed from the model. The wine made with the yeast Lalvin Clos and the lactic acid bacteria Enoferm Beta (sequentially inoculated) is separated from all other wines
EP
on PC1, which explains 52.9 % variance. Ethyl tetradecanoate (131) and the two unknown long chain fatty acid ester 137 and 139 show positive correla585
tion on PC1, while ethyl nonanoate (59), propyl octanoate (61) and unknown compound 60 correlate negatively with this component. Principal component
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
two (26.4 % explained variance) shows the differentiation of the wine which was co-inoculated with the yeast Lalvin Clos and the lactic acid bacteria Lalvin PN4 as well as the wines made with the yeast/bacteria combination Uvaferm
590
RBS/O-Mega (co-inoculated), Lalvin Clos/Enoferm Alpha (sequentially inoculated) and Lalvin Clos/Enoferm Beta (sequentially inoculated). This difference is explained by propyl decanoate (97), BHT (103) and the unknown compound 44
ACCEPTED MANUSCRIPT
clos vrb rbs co−inoculated sequential
clos PN4
PC 2: 26.4% expl. var.
4
0.6
0 −2
clos beta
60
0.2 59
61
0
−0.2
−4 −0.4
−6 −8 −8
−6
−4
−2 0 2 PC 1: 52.9% expl. var.
4
6
97 118
2 clos alpha rbs 41 clos alphaclos PN4 rbs 271 rbs 271 rbs 41 clos beta vrb alpha vrb alpha
103
0.4
PC 2: 26.4% expl. var.
6
−0.6 −0.6
8
−0.2 0 0.2 PC 1: 52.9% expl. var.
139 137 131
0.4
0.6
(b) Loadings
SC
(a) Scores
−0.4
RI PT
8
M AN U
Figure 10: Scores and loadings plots of the PCA of compounds in segments which had high congruence loadings on components two and four of the PARAFAC model with 71 segments; Yeast starter cultures: Lalvin Clos (clos), Uvaferm RBS (rbs), Uvaferm VRB (vrb); Lactic acid bacteria starter cultures: Enoferm Alpha (alpha), Enoferm Beta (beta), Lalvin PN4 (PN4), Lalvin VP41 (41) and O-Mega (271).
(118). BHT (103) and the unknown compound (118) are very likely artefact compounds not associated to wine. 595
Several studies on the impact of the inoculation mode of malolactic fermentation and the yeast/lactic acid bacteria combination on the volatile composition
TE D
of wine have been conducted, but no clear systematic changes have been reported [51, 52, 53, 54]. Some authors have observed higher amounts of some esters in co-inoculated wines [53, 54]. Higher levels of long chain fatty acid 600
esters as well as unsaturated and branched species as a function of malolactic
EP
fermentation inoculation mode as discussed above have, however, not yet been reported. This is most likely due to the fact that long chain fatty acid esters are normally not the focus of targeted methods for general wine aroma analysis.
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Nevertheless, these compounds were included in the non-targeted approach used
605
here, although this was a priori not specifically known. 4.3. PARAFAC2 on all segments of the chromatogram of the experimental GCMS data with subsequent PCA
As a reference method, PARAFAC2 was also applied to all segments which have not been considered in the above discussed new approach and area values 45
ACCEPTED MANUSCRIPT
clos vrb rbs co−inoculated sequential rbs 41 rbs 271
5
0
clos beta
0.1
clos PN4
rbs 41
rbs 271
clos PN4 vrb alpha clos beta clos alpha clos alpha
−5
54 58 91 48 130 46 47 49 89 90 96 4263 15 4476 128 108 127104 233 8 40 118 129 42 18 102 132 51 61 133 64 57 6 131 69 68 138 10638 77 6675 80119 74 81 97124136 103 71 139 19 137 37 84 99 25 9 29 36 65 1124 142 32 95 85 43 140 92 82 78 109 22 105 50 113 148 12 93 107 122 126 128 27 134 125 67 51035 100 7 120 115 72111 87 41 62 20 8652 143 60 98135 88 146 123 101 73 45 144 17 121 112 53 59 79 5616 70 145 116 117 83 114 147 13 23 63 152 55 141 149 30 34 110 14 39 150 151 31 21
94
PC 2: 12.7% expl. var.
PC 2: 12.7% expl. var.
10
0.2 0.15
vrb alpha
0.05 0 −0.05 −0.1
−10 −0.15
−15 −15
−10
−5
0 5 PC 1: 25% expl. var.
10
−0.2 −0.2
15
−0.1
−0.05 0 0.05 PC 1: 25% expl. var.
0.1
0.15
0.2
(b) Loadings
SC
(a) Scores
−0.15
RI PT
15
610
M AN U
Figure 11: Scores and loadings plots of PC1 and PC2 of the PCA on all autoscaled compounds of all deconvoluted segments; Yeast starter cultures: Lalvin Clos (clos), Uvaferm RBS (rbs), Uvaferm VRB (vrb); Lactic acid bacteria starter cultures: Enoferm Alpha (alpha), Enoferm Beta (beta), Lalvin PN4 (PN4), Lalvin VP41 (41) and O-Mega (271).
of all integrated deconvoluted peak profiles were analysed using PCA, according reference [37]. A total of 152 peak area values were obtained in this manner. Figures 11 and 12 show the scores and loadings plots of PC1 (25.0 % explained variance), PC2 (12.7 % explained variance) and PC3 (11.8 % explained variance)
615
TE D
of the autoscaled peak table. Note that only a relatively small proportion of variance is explained, even when compounds with low loadings were successfully removed (not shown). Some structural information is however revealed from the scores plots (Figures 11(a) and 12(a)), although the interpretation remains
EP
difficult.
PC1 shows, as component one from the PARAFAC model with 71 segments 620
(Figure 5), a difference between most of the co-inoculated and sequentially inoculated wines. The co-inoculated wines fermented with the yeast starter culture
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Uvaferm RBS correlate most positively, while the wine made with the yeast starter culture Lalvin Clos sequentially inoculated with the Enoferm Beta correlates most negatively with this PC. The compounds 8, 12, 19, 36, 59, 60,
625
61, 68, 72, 78, 79, 87, 88 98 109, 101, 111, 115, 116, 117, 121, 122, 135 and 146 show high positive loadings on PC1 (Figure 11(b)). These results are comparable to component one of the PARAFAC model with 71 segments (Figure 46
ACCEPTED MANUSCRIPT
10
clos PN4
clos vrb rbs co−inoculated sequential
0.2
124 97 119 126 7 112 103 118 127 108 20 115 132 104 12823 74 82 72 3513695 120 57 84 138 53 80 78 34 25 142 113 148 99 116 91145 114 1096 107 94 140 75 141 102 111 133 90 6 9 60146 22 10576 16 134 147 135 149 1936117 139 137 130 123 30151 59 122 79 13 81 129 51 21 121 63 106 40 73 1714 67 11 89 88 109 39 100 41 131 5 143 38 110 87 47101 24 69 56 62 12570 93 152 98 43 12 83 150 66 55 64 71 61 58 50 44 2 31 54 68 37 77 18 86 27 29 48 42 32 144 263 85 28 8 92 4 15 45 1 52 33 46 49
0.15
PC 3: 11.8% expl. var.
PC 3: 11.8% expl. var.
0.1
5
clos beta
clos beta clos alpha
0
clos PN4 vrb alpha rbs 271
−5
clos alpha vrb alpha
rbs 41
rbs 271
rbs 41
0.05 0 −0.05 −0.1
−10 −0.15
−15 −15
−10
−5
0 5 PC 1: 25% expl. var.
10
−0.2 −0.2
15
65
−0.15
−0.1
−0.05 0 0.05 PC 1: 25% expl. var.
0.1
0.15
0.2
(b) Loadings
SC
(a) Scores
RI PT
15
M AN U
Figure 12: Scores and loadings plots of PC1 and PC3 of the PCA on all autoscaled compounds of all deconvoluted segments; Yeast starter cultures: Lalvin Clos (clos), Uvaferm RBS (rbs), Uvaferm VRB (vrb); Lactic acid bacteria starter cultures: Enoferm Alpha (alpha), Enoferm Beta (beta), Lalvin PN4 (PN4), Lalvin VP41 (41) and O-Mega (271).
9). While the compounds 131, 137 and 139 correlate negatively with PC1, showing a similar pattern as reflected in PARAFAC component two of the 71 630
segment model (Figure 10). Principal component two shows differentiation of the wines fermented with the yeast starter culture Uvaferm RBS and the wine
TE D
made with the yeast/lactic acid bacteria combination Lalvin Clos/Lalvin PN4 (co-inoculated). This separation is however not very clear, while there is no valuable information extractable from the loadings plot (Figure 11(b)). A sim635
ilar observation also applies to PC3, which also explains differences of the wine
EP
made with the yeast/lactic acid bacteria combination Lalvin Clos/Lalvin PN4 (co-inoculated) and of the wines fermented with the yeast/lactic acid bacteria combination Uvaferm VRB/Enoferm alpha (co-inoculated). PCA on the autoscaled peak table was not suitable to detect the same pat-
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
640
terns among the samples as has been received from the new approach presented here. Therefore, class centroid centering and scaling to intra-class variance was used where classes were defined according to the three yeast starter cultures, with the aim to obtaining information on the differences among the wines made with the three different yeast starter cultures. Figure 13 shows the scores and
645
loading of PC1 (38.8 % explained variance) and PC3 (10.7 % explained variance) 47
ACCEPTED MANUSCRIPT
10
39 0.4
vrb alpha vrb alpha
5 0
clos alpha clos alpha clos PN4 clos beta clos beta
−5 −10
PC 3: 10.7% expl. var.
PC 3: 10.7% expl. var.
0.6
clos vrb rbs co−inoculated sequential
15
rbs 41 rbs 41 rbs 271 rbs 271
65
49
55458685 41 27 150 12 110 14 30 37 56 92 1 8 144 151 16 83 152 34 32 62 101 66 24 63 117 50 149 88 141 146 26 147 29 143 5 98 79 125 105 18 77 59 121 13 100 87 71 116 93 106 145 67 17 44 135 43242 43 109 61 69 123 111 64 53 128 114 60 10 140 51 911 122 23 84 35 129 40 131 73 21 137 134 107 22 38 72 139 36 95 112 75 148 80 133 113 89 142 78 82 81 19 99 130 136 90 20 102 120 7 115 94 57 138 6 47 48 119 103 126 7496 132 104 54 108 124 91 118 76 97 25 127 58
0
−0.2
clos PN4
−0.4
−15 −20 −20
31 0.2
−15
−10
−5 0 5 PC 1: 38.8% expl. var.
10
15
−0.6 −0.6
20
−0.4
−0.2 0 0.2 PC 1: 38.8% expl. var.
46
15
0.4
0.6
(b) Loadings
SC
(a) Scores
RI PT
20
M AN U
Figure 13: Scores and loadings plots of PC1 and PC3 of the PCA on all compounds of all deconvoluted segments, where class centroid centering and scaling was applied; Yeast starter cultures: Lalvin Clos (clos), Uvaferm RBS (rbs), Uvaferm VRB (vrb); Lactic acid bacteria starter cultures: Enoferm Alpha (alpha), Enoferm Beta (beta), Lalvin PN4 (PN4), Lalvin VP41 (41) and O-Mega (271).
of this PCA. The three sample groups show a very similar pattern as obtained for the PCA on the autoscaled compounds of segments with high congruence loadings of component three and eleven of the PARAFAC model with 71 segments (Figure 8). The PCA on the autoscaled peak table showed some systematic differences among the samples, but is not suitable to fully explore the data set
TE D
650
without any pre selection of variables. The same information on the differences between co-inoculated and sequentially inoculated wines was obtained as from the new approach presented here. The interpretation of the loadings, however,
655
EP
is complicated by the presence of more noise in the data and the larger number of variables. The use of a supervised preprocessing method, class centroid centering and scaling to intra-class variance, helped to differentiate wines according
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
to the yeast starter culture used, resulting in the same results as obtained using the new approach presented here. Overall the results from the PCAs after the PARAFAC2 deconvolution of the 38 important segments and the results from
660
PCA after PARAFAC2 modelling of all 71 segments are comparable, albeit the latter were more difficult to interpret and more sophisticated methods then PCA with autoscaling are needed, such as supervised methods, or variable selection.
48
ACCEPTED MANUSCRIPT
The comparability of the results from the new approach using PARAFAC
665
RI PT
on segmented and mathematically transformed chromatograms in combination with PARAFAC2 deconvolution of important segments with subsequent PCA, and the deconvolution of all segments using PARAFAC2 and subsequent PCA modelling proves the validity of the results of the new approach. Only 38 seg-
ments of the chromatogram turned out to be important for the differentiation
670
SC
of samples using the new approach. Almost half of the 71 segments had to be deconvoluted using PARAFAC2, which is a considerable time saving. In this
M AN U
study only segments with congruence loadings greater than 0.5 were considered as ‘medium to highly correlated’ with the raw data. If, depending on the aim of a study, a higher value is chosen here, such as 0.75, which can be considered as ‘highly correlated’, even less PARAFAC2 models would have to be constructed 675
and interpreted. The new approach can therefore be considered as a segment selection tool prior to deconvolution of segments of chromatograms. Further-
TE D
more, the information on systematic differences obtained from the PARAFAC model on the segmented and transformed chromatograms can be used to study the important segments separately: separate PCAs can be constructed on only 680
compounds from segments which are responsible for a certain grouping of sam-
EP
ples. Peak tables obtained in this manner are much smaller than a global peak table of all compounds and contain less redundant information, making them easier to explore using for instance simple plotting (e.g. boxplots) or as have been shown here PCA on autoscaled data. The PCAs are constructed on these
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
685
smaller subsets of peak areas of the deconvoluted profiles are much easier to interpret, as has been shown above.
49
ACCEPTED MANUSCRIPT
5. Conclusions
RI PT
In this study, the potential of the conversion of segmented two dimensional GC-MS chromatograms into sums of squares and cross product matrices (SSCP) 690
prior to PARAFAC modelling has been demonstrated as a powerful data treatment technique for non-targeted GC-MS analysis. The presented approach con-
SC
sists of three steps. First, all chromatograms are segmented and SSCP matrices are calculated for each segment and sample. This transformation of the chromatogram segments into SSCP matrices summarizes information on the variation and covariation of all mass channels in the segments for the corresponding
M AN U
695
sample and makes alignment of peaks unnecessary. The following step, the compilation of the vectorized SSCP matrices into a compilation matrix for all samples in each segment and their transformation into SSCP matrices, gives information on the variation and covariation between samples in each segment 700
as a function of the variation and covariation among mass channels in each
TE D
segment for the corresponding sample. In the final step these SSCP matrices are merged to a three way array, which is then analysed using PARAFAC. In essence, only the segmentation of the chromatograms and the construction of the PARAFAC model have to be done manually. This makes this approach a fast, holistic and semi-automated method for GC-MS fingerprinting. A set of 36
EP
705
chromatograms derived from triplicate SPME-GC-MS analyses of twelve Carbernet Sauvignon wines was used to demonstrate the performance of the data
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
treatment methodology. Wines could be differentiated according to the yeast starter culture used and the inoculation mode of yeast and lactic acid bacteria.
710
Compounds responsible for this discrimination could be tentatively identified after deconvoluting peaks in the important segments using PARAFAC2. Separate PCAs on the integrated deconvoluted signals of segments which are responsible for a certain grouping of samples in the PARAFAC model provide in-depth
50
ACCEPTED MANUSCRIPT
insights to the observed phenomena. The advantage of the novel GC-MS fingerprinting approach presented herein could be confirmed by comparing it with
RI PT
715
PCA on deconvoluted peak profiles of all chromatogram segments. The final results from the new approach could not be summarized by a single PCA on
the autoscaled peak table from all compounds. The new approach can, therefore, also been seen as a segment pre-selection tool prior to deconvolution of chromatogram segments.
SC
720
M AN U
6. Acknowledgements
Lallemand is thanked for partial funding, and Lallemand North America for the donation of wine samples.
JV is supported through the Initiative
d’Excellence (IdEx) Universit de Bordeaux and the Hochschule Geisenheim Uni725
versity. Marie-Claire Perello and Laurent Riquier is thanked for assistance in the laboratory, Rasmus Bro for his suggestions and comments. Julius Witte
EP
TE D
and Kimmo Sirn is thanked for discussions on matrix algebra.
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
51
ACCEPTED MANUSCRIPT
References
730
RI PT
[1] C. De Vos, Y. Tikunov, A. Bovy, R. Hall, Flavour metabolomics: Holistic
versus targeted approaches in flavour research, in: Expression of Multidis-
ciplinary Flavour Science. Proceedings of the 12th Weurman Symposium. Interlaken, Switzerland: Z¨ urcher Hochschule f¨ ur Angewandte and Institut
SC
F¨ ur Chemie und Biologische Chemie, 2008, pp. 573–580.
[2] V. Behrends, G. D. Tredwell, J. G. Bundy, A software complement to AMDIS for processing GC-MS metabolomic data, Analytical biochemistry 415 (2) (2011) 206–208.
M AN U
735
[3] S. E. Stein, An integrated method for spectrum extraction and compound identification from gas chromatography/mass spectrometry data, Journal of the American Society for Mass Spectrometry 10 (8) (1999) 770–781. 740
[4] R. Aggio, S. G. Villas, K. Ruggiero, Metab: an R package for high-
TE D
throughput analysis of metabolomics data generated by GC-MS, Bioinformatics 27 (16) (2011) 2316–2318. [5] E. Want, P. Masson, Processing and Analysis of GC/LC-MS-Based
745
EP
Metabolomics Data, in: T. O. Metz (Ed.), Metabolic Profiling, Vol. 708 of Methods in Molecular Biology, Humana Press, 2011, pp. 277–298. [6] A. Luedemann, K. Strassburg, A. Erban, J. Kopka, TagFinder for the
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
quantitative analysis of gas chromatography - mass spectrometry (GC-MS)based metabolite profiling experiments, Bioinformatics 24 (5) (2008) 732– 737.
750
[7] C. A. Smith, E. J. Want, G. O’Maille, R. Abagyan, G. Siuzdak, XCMS: processing mass spectrometry data for metabolite profiling using nonlinear
52
ACCEPTED MANUSCRIPT
peak alignment, matching, and identification, Analytical chemistry 78 (3)
RI PT
(2006) 779–787. [8] J. Vestner, S. Malherbe, M. Du Toit, H. H. Nieuwoudt, A. Mostafa, 755
T. G´ orecki, A. G. Tredoux, A. De Villiers, Investigation of the volatile composition of pinotage wines fermented with different malolactic starter
SC
cultures using comprehensive two-dimensional gas chromatography coupled to time-of-flight mass spectrometry (GC×GC-TOF-MS), Journal of agri-
760
M AN U
cultural and food chemistry 59 (24) (2011) 12732–12744.
[9] S. J. Dixon, R. G. Brereton, H. A. Soini, M. V. Novotny, D. J. Penn, An automated method for peak detection and matching in large gas chromatography-mass spectrometry data sets, Journal of chemometrics 20 (8-10) (2006) 325–340.
[10] S. Furbo, J. H. Christensen, Automated peak extraction and quantification in chromatography with multichannel detectors, Analytical chemistry
TE D
765
84 (5) (2012) 2211–2218.
[11] C. A. Hastings, S. M. Norton, S. Roy, New algorithms for processing and peak detection in liquid chromatography/mass spectrometry data, Rapid
770
EP
Communications in Mass Spectrometry 16 (5) (2002) 462–467. [12] G. Viv´ o-Truyols, J. Torres-Lapasi´o, A. Van Nederkassel, Y. Vander Heyden, D. Massart, Automatic program for peak detection and deconvolution of
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
multi-overlapped chromatographic signals: Part I: Peak detection, Journal of Chromatography A 1096 (1) (2005) 133–145.
[13] T. Skov, R. Bro, A new approach for modelling sensor based data, Sensors
775
and Actuators B: Chemical 106 (2) (2005) 719–729.
53
ACCEPTED MANUSCRIPT
[14] D. Ballabio, T. Skov, R. Leardi, R. Bro, Classification of GC-MS mea-
RI PT
surements of wines by combining data dimension reduction and variable selection techniques, Journal of chemometrics 22 (8) (2008) 457–463.
[15] R. Bro, PARAFAC. Tutorial and applications, Chemometrics and intelli780
gent laboratory systems 38 (2) (1997) 149–171.
SC
[16] M. C. Rodr´ıguez, G. H. S´anchez, M. S. Sobrero, A. V. Schenone, N. R. Marsili, Determination of mycotoxins (aflatoxins and ochratoxin A) using
M AN U
fluorescence emission-excitation matrices and multivariate calibration, Microchemical Journal 110 (2013) 480–484. 785
[17] R. Tauler, Multivariate curve resolution applied to second order data, Chemometrics and Intelligent Laboratory Systems 30 (1) (1995) 133–146. [18] N. A. Sinkov, J. J. Harynuk, Cluster resolution: A metric for automated, objective and optimized feature selection in chemometric modeling, Talanta
790
TE D
83 (4) (2011) 1079–1087.
[19] M. Daszykowski, R. Danielsson, B. Walczak, No-alignment-strategies for exploring a set of two-way data tables obtained from capillary
EP
electrophoresis–mass spectrometry, Journal of Chromatography A 1192 (1) (2008) 157–165.
[20] C. Durante, R. Bro, M. Cocchi, A classification tool for N-way array based
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
795
on SIMCA methodology, Chemometrics and Intelligent Laboratory Systems 106 (1) (2011) 73–85.
[21] M. Cocchi, C. Durante, M. Grandi, D. Manzini, A. Marchetti, Three-way principal component analysis of the volatile fraction by HS-SPME/GC of aceto balsamico tradizionale of modena, Talanta 74 (4) (2008) 547–554.
54
ACCEPTED MANUSCRIPT
800
[22] C. Durante, M. Cocchi, M. Grandi, A. Marchetti, R. Bro, Application of N-
RI PT
PLS to gas chromatographic and sensory data of traditional balsamic vinegars of Modena, Chemometrics and Intelligent Laboratory Systems 83 (1) (2006) 54–65.
[23] J. H. Christensen, J. Mortensen, A. B. Hansen, O. Andersen, Chromatographic preprocessing of GC–MS data for analysis of complex chemical
SC
805
mixtures, Journal of Chromatography A 1062 (1) (2005) 113–123.
M AN U
[24] J. H. Christensen, A. B. Hansen, U. Karlson, J. Mortensen, O. Andersen, Multivariate statistical methods for evaluating biodegradation of mineral oil, Journal of Chromatography A 1090 (1) (2005) 133–145. 810
[25] J. H. Christensen, G. Tomasi, Practical aspects of chemometrics for oil spill fingerprinting, Journal of Chromatography A 1169 (1) (2007) 1–22. [26] N.-P. V. Nielsen, J. M. Carstensen, J. Smedsgaard, Aligning of single and
TE D
multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping, Journal of Chromatography A 815
805 (1) (1998) 17–35.
EP
[27] T. Skov, F. van den Berg, G. Tomasi, R. Bro, Automated alignment of chromatographic data, Journal of Chemometrics 20 (11-12) (2006) 484– 497.
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
[28] G. Tomasi, F. van den Berg, C. Andersson, Correlation optimized warping
820
and dynamic time warping as preprocessing methods for chromatographic data, Journal of Chemometrics 18 (5) (2004) 231–241.
[29] E. Lange, C. Gr¨ opl, O. Schulz-Trieglaff, A. Leinenbach, C. Huber, K. Reinert, A geometric approach for the alignment of liquid chromatographymass spectrometry data, Bioinformatics 23 (13) (2007) i273–i281.
55
ACCEPTED MANUSCRIPT
825
[30] N. A. Sinkov, B. M. Johnston, P. M. L. Sandercock, J. J. Harynuk, Au-
RI PT
tomated optimization and construction of chemometric models based on highly variable raw chromatographic data, Analytica chimica acta 697 (1) (2011) 8–15.
[31] D. Lay, Linear Algebra and Its Applications, 3rd Edition, Addison Wesley, 2002.
SC
830
[32] R. Danielsson, D. B¨ ackstr¨om, S. Ullsten, Rapid multivariate analysis of
M AN U
LC/GC/CE data (single or multiple channel detection) without prior peak alignment, Chemometrics and intelligent laboratory systems 84 (1) (2006) 33–39. 835
[33] M. Daszykowski, B. Walczak, Methods for the exploratory analysis of twodimensional chromatographic signals, Talanta 83 (4) (2011) 1088–1097. [34] I. Stanimirova, B. Walczak, D. Massart, V. Simeonov, C. Saby,
TE D
E. Di Crescenzo, STATIS, a three-way method for data analysis. Application to environmental data, Chemometrics and Intelligent Laboratory 840
Systems 73 (2) (2004) 219–233.
EP
[35] R. Bro, C. A. Andersson, H. A. Kiers, PARAFAC2-Part II. Modeling chromatographic data with retention time shifts, Journal of Chemometrics 13 (3-4) (1999) 295–309.
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
[36] R. A. Harshman, Parafac2: Mathematical and technical notes, UCLA work-
845
ing papers in phonetics 22 (1972) 30–44.
[37] J. M. Amigo, M. J. Popielarz, R. M. Callej´on, M. L. Morales, A. M. Troncoso, M. A. Petersen, T. B. Toldam-Andersen, Comprehensive analysis of chromatographic data by using PARAFAC2 and principal components analysis, Journal of Chromatography A 1217 (26) (2010) 4422–4429.
56
ACCEPTED MANUSCRIPT
850
[38] J. M. Amigo, T. Skov, R. Bro, J. Coello, S. Maspoch, Solving gc-ms prob-
RI PT
lems with parafac2, Trac Trends in Analytical Chemistry 27 (8) (2008) 714–725.
[39] L. G. Johnsen, J. M. Amigo, T. Skov, R. Bro, Automated resolution of over-
lapping peaks in chromatographic data, Journal of Chemometrics 28 (2) (2014) 71–82.
SC
855
[40] R. Bro, Multi-way analysis in the food industry: models, algorithms, and
M AN U
applications, Ph.D. thesis, Københavns Universitet’Københavns Universitet’, LUKKET: 2012 Det Biovidenskabelige Fakultet for Fødevarer, Veterinærmedicin og NaturressourcerFaculty of Life Sciences, LUKKET: 2012 860
Institut for FødevarevidenskabDepartment of Food Science, 2012 Institut for Fødevarevidenskab, 2012 Kvalitet og TeknologiDepartment of Food Science, Quality & Technology (1998).
TE D
[41] H. A. Kiers, Hierarchical relations among three-way methods, Psychometrika 56 (3) (1991) 449–470. 865
[42] C. A. Andersson, R. Bro, The N-way Toolbox for MATLAB, Chemometrics
EP
and Intelligent Laboratory Systems 52 (1) (2000) 1–4. [43] B. Escofier, J. Pag`es, Analyses factorielles simples et multiples: objectifs, m´ethodes et interpr´etation, Dunod, 2008.
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
[44] G. Mazerolles, M. Hanafi, E. Dufour, D. Bertrand, E. Qannari, Common
870
components and specific weights analysis: a chemometric method for dealing with complexity of food products, Chemometrics and Intelligent Laboratory Systems 81 (1) (2006) 41–49.
[45] D. Chessel, M. Hanafi, Analyses de la co-inertie de k nuages de points, Revue de Statistique Applique 44 (2) (1996) 35–60.
57
ACCEPTED MANUSCRIPT
875
[46] C. B. Cordella, D. Bertrand, Saisir: a new general chemometric toolbox,
RI PT
TrAC Trends in Analytical Chemistry 54 (2014) 75–82. [47] S. Stein, Y. Mirokhin, D. Tchekhovskoi, G. Mallard, NIST Mass Spectral Search Program, National Institute of Standards and Technology, Gaithersburg, MD (2008).
[48] S. Rocha, V. Ramalheira, A. Barros, I. Delgadillo, M. A. Coimbra,
SC
880
Headspace solid phase microextraction (SPME) analysis of flavor com-
M AN U
pounds in wines. Effect of the matrix volatile composition in the relative response factors in a wine model, Journal of Agricultural and Food Chemistry 49 (11) (2001) 5142–5151. 885
[49] G. Antalick, M.-C. Perello, G. de Revel, Development, validation and application of a specific method for the quantitative determination of wine esters by headspace-solid-phase microextraction-gas chromatography–mass
TE D
spectrometry, Food chemistry 121 (4) (2010) 1236–1245. [50] R. Bro, H. A. Kiers, A new efficient method for determining the number of 890
components in PARAFAC models, Journal of chemometrics 17 (5) (2003)
EP
274–286.
[51] G. Antalick, M. Perello, G. de Revel, Changes in wine secondary metabolite composition by the timing of inoculation with lactic acid bacteria: Impact on wine aroma, in: Proceedings of the 3rd International Sympo-
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
895
sium MACROWINE 2010 on Macromolecules and Secondary Metabolites in Grapevine and Wines, Universita di Torin Torino, Italy, 2010, pp. 143– 148.
[52] M. Gammacurta, S. Marchand, W. Albertin, V. Moine, G. de Revel, Impact of yeast strain on ester levels and fruity aroma persistence during aging of
58
ACCEPTED MANUSCRIPT
900
bordeaux red wines, Journal of agricultural and food chemistry 62 (23)
RI PT
(2014) 5378–5389. [53] C. E. Abrahamse, E. J. Bartowsky, Timing of malolactic fermentation inoc-
ulation in Shiraz grape must and wine: influence on chemical composition, World Journal of Microbiology and Biotechnology 28 (1) (2012) 255–265.
[54] C. Knoll, S. Fritsch, S. Schnell, M. Grossmann, S. Krieger-Weber,
SC
905
M. du Toit, D. Rauhut, Impact of different malolactic fermentation in-
M AN U
oculation scenarios on Riesling wine aroma, World Journal of Microbiology
EP
TE D
and Biotechnology 28 (3) (2012) 1143–1153.
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
59
ACCEPTED MANUSCRIPT
Supporting Information The in this study developed approach, can be summarized (abbreviated form) as follows: • Segmentation of chromatograms along retention axis
RI PT
910
– Calculation of SSCP matrices for every segment and sample
915
SC
– Concatenation of all vectorized SSCP matrices (only upper triangular part) of each segment into a compilation matrix
– Calculation of SSCP matrices of each compilation matrix
three-way array
M AN U
– Assembling of all SSCP matrices of each compilation matrix to a
– PARAFAC on three-way array 920
– Visual examination of loadings and selection of important segments • Deconvolution of only important segments using PARAFAC2 • Integration of deconvoluted peak profiles and identification of compounds
results
925
TE D
• Multiple PCAs on selected compounds with consideration of PARAFAC
Multiple PARAFAC2 models on all segments of the chromatograms with
EP
subsequent PCA [37] was used as a reference method in this study. This approach can be summarized as follows: • Segmentation of the chromatograms along retention axis • Deconvolution of every segment using PARAFAC2
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
930
• Integration of deconvoluted peak profiles and identification of compounds • PCA on peak area tables
60
4
x 10
Peaks 3 & 4
Peak 8 4000
2
3000
SC
2.5
1.5 2000
1
1000
0.5
0
300
310
M AN U
0
320
330
4
x 10
abundance
2 1.5 1
TE D
0.5
700
100
200
Peaks 1 & 2 10000 8000 6000
300
400
14000
500 600 scan number
710
700
800
0 60
EP
2000
70
80
90
100
730
900
12000
10000
10000
8000
1000
1100
6000
6000
4000
4000
2000
2000 0
740
Peaks 9 & 10
8000
4000
720
Peaks 5,6 & 7
550
560
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
RI PT
ACCEPTED MANUSCRIPT
570
580
590
0
940
950
960
970
980
Figure 14: Overlay of all mass channels of one sample (sample no. 14) of the artificial GC-MS data set. Dotted lines show the segmentation of the chromatogram.
61
Peaks 3 & 4
5
x 10
Peak 8
4
x 10
3
SC
2
2.5
1.5
2 1.5
1
1 0.5
0 300
310
320
330
340
5
3.5
x 10
3 abundance
2.5 2 1.5 1 0.5
4
4
x 10
100
0 70
300
400
80
0
90
100
500 600 scan number
4
x 10
EP
1
200
Peaks 1 & 2
3 2
350
TE D
0
M AN U
0.5
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
RI PT
ACCEPTED MANUSCRIPT
110
710
720
700
730
800
740
750
900
1000
5
Peaks 5,6 & 7
x 10
8
2
6
1.5
4
1
2
0.5
0
0 560
580
600
760
1100
Peaks 9 & 10
940
960
980
1000
Figure 15: Overlay of TICs of all samples of the artificial GC-MS data set with introduced shift. Dotted lines show the segmentation of the chromatogram.
62
0.3
0.8
vrb alpha
0.2 0.1 0 −0.1 −0.2 −0.2
clos beta clos alpha clos PN4 clos PN4 clos beta clos alpha −0.1
0
rbs 271
rbs 41
0.7 0.6 0.5 0.4
rbs 271
3&4
0.3 0.2
rbs 41
0.1 0.2 0.3 Component 5: 8.5% expl. var.
20&21
0.9
Component 10: 3.1% expl. var.
Component 10: 3.1% expl. var.
vrb alpha
0.4
1
clos vrb rbs co−inoculated sequential
0.5
22&23
13&14
7&8
SC
0.6
26&27
0.1
0.4
38&39 1&2 11&12
18&19 52&53 9&10 48&49 16&17 28&29 66&67 56&63 46&47 32&33 36&37 44&45 &34&35 75&78 42&43 64&65 5&6 50&51 40&41 68&69 73&74 70&71 79&82 0 54&55 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Component 5: 8.5% expl. var.
0.5
24&25
30&31 0.8
0.9
1
(b) Third mode (segments) congruence loadings
M AN U
(a) First mode (samples) loadings
0.6
clos vrb rbs co−inoculated sequential
clos beta
0.4 0.3 0.2
EP
Component 3: 10.4% expl. var.
0.5
0.1 0
−0.1
−0.2 −0.2
rbs 271 rbs 41
vrb alpha clos beta clos PN4 clos PN4 clos alpha clos alpha vrb alpha rbs 41
−0.1
0
0.1 0.2 0.3 Component 1: 18% expl. var.
1
64&65
0.8 0.7 0.6 0.5
73&74 18&19
0.4 0.3 0.2
75&78 36&37 70&71 13&14
40&41 79&82 22&23 30&31 46&47
34&35 38&39 50&51
48&49
44&45 32&33 11&12 5&6 3&4 66&67 56&63 9&10 7&8 54&55 26&27 42&43 20&21 28&29 24&25 0 1&216&17 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Component 1: 18% expl. var.
rbs 271 0.4
68&69
0.9
Component 3: 10.4% expl. var.
TE D
Figure 16: Loadings plots of PARAFAC components five vs. ten (model with 36 segments); Yeast starter cultures: Lalvin Clos (clos), Uvaferm RBS (rbs), Uvaferm VRB (vrb); Lactic acid bacteria starter cultures: Enoferm Alpha (alpha), Enoferm Beta (beta), Lalvin PN4 (PN4), Lalvin VP41 (41) and O-Mega (271).
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
RI PT
ACCEPTED MANUSCRIPT
52&53
0.1
0.5
(a) First mode (samples) loadings
0.9
1
(b) Third mode (segments) congruence loadings
Figure 17: Loadings plots of PARAFAC components one vs. three (model with 36 segments); Yeast starter cultures: Lalvin Clos (clos), Uvaferm RBS (rbs), Uvaferm VRB (vrb); Lactic acid bacteria starter cultures: Enoferm Alpha (alpha), Enoferm Beta (beta), Lalvin PN4 (PN4), Lalvin VP41 (41) and O-Mega (271).
63
clos vrb rbs co−inoculated sequential
0.5
clos PN4
1 42&43 0.9 0.8 Component 2: 11.3% expl. var.
Component 2: 11.3% expl. var.
0.6
0.4 0.3 0.2 0.1 clos alpha clos beta clos beta rbs 271 clos alpha rbs 41 rbs 41 clos PN4 vrb alpha −0.1 vrb alpha −0.2 −0.1 0 0.1 0.2 0.3 Component 1: 18% expl. var.
0.7 0.6 56&63 0.5 0.4 0.3
13&14 26&27
0.2
0
rbs 271 0.4
3&4 40&41
18&19
SC
0.7
0.1
48&49 52&53 44&45 32&33 11&12 38&3950&51 30&31 & 66&67 75&78 5&6 73&74 34&35 24&25 46&47 9&10 7&822&23 79&82 20&21 16&17 64&65 68&69 28&29 54&55 36&37 70&71 0 1&2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Component 1: 18% expl. var.
0.5
(b) Third mode (segments) congruence loadings
M AN U
(a) First mode (samples) loadings
0.5 0.4 0.3
clos PN4
EP
Component 1: 19.1% expl. var.
0.6
clos vrb rbs co−inoculated sequential
0.2 0.1
clos alpha
0
vrb alpha vrb alpha rbs 41 rbs 271 clos beta rbs 41 clos PN4 clos betaclos alpha −0.1 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 Component 2: 16.2% expl. var.
1 42&43&44&45 0.9 0.8 Component 1: 19.1% expl. var.
0.7
TE D
Figure 18: Loadings plots of PARAFAC components one vs. two (model with 36 segments); Yeast starter cultures: Lalvin Clos (clos), Uvaferm RBS (rbs), Uvaferm VRB (vrb); Lactic acid bacteria starter cultures: Enoferm Alpha (alpha), Enoferm Beta (beta), Lalvin PN4 (PN4), Lalvin VP41 (41) and O-Mega (271).
0.7 0.6 0.5 0.4 0.3 0.2
rbs 271
0.1
0.4
0
0.5
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
RI PT
ACCEPTED MANUSCRIPT
(a) First mode (samples) loadings
0
18&19&20&21 1&2&3&4 50&51&52&53 30&31&32&33 5&6&7&8 9&10&11&12 38&39&40&41 22&23&24&25 13&14&16&17 79&82&84 46&47&48&49 73&74&75&78 34&35&36&37 64&65&66&67 54&55&56&63 26&27&28&29 68&69&70&71 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Component 2: 16.2% expl. var.
(b) Third mode (segments) congruence loadings
Figure 19: Loadings plots of PARAFAC components 2 vs. 1 (model with 18 segments); Yeast starter cultures: Lalvin Clos (clos), Uvaferm RBS (rbs), Uvaferm VRB (vrb); Lactic acid bacteria starter cultures: Enoferm Alpha (alpha), Enoferm Beta (beta), Lalvin PN4 (PN4), Lalvin VP41 (41) and O-Mega (271).
64
SC 0.4 0.3 0.2 0.1 0
rbs 41 rbs 271 vrb alpha
−0.1
clos PN4
clos PN4
clos alpha −0.2
−0.1
clos beta clos alpha
vrb alpha
rbs 41
0 0.1 0.2 0.3 Component 2: 16.2% expl. var.
68&69&70&71 64&65&66&67
0.9 0.8
rbs 271
0.4
0.5
TE D
−0.2 −0.3
1
clos vrb rbs co−inoculated sequential
clos beta
Component 3: 13.9% expl. var.
Component 3: 13.9% expl. var.
0.5
M AN U
0.6
(a) First mode (samples) loadings
0.7 0.6 0.5
18&19&20&21 73&74&75&78 34&35&36&37 38&39&40&41 50&51&52&53 22&23&24&25 0.2 46&47&48&49 9&10&11&12 30&31&32&33 0.1 5&6&7&8 1&2&3&4 54&55&56&63 26&27&28&29 13&14&16&17 0 42&43&44&45 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Component 2: 16.2% expl. var. 0.4 0.3
79&82&84
(b) Third mode (segments) congruence loadings
EP
Figure 20: Loadings plots of PARAFAC components 2 vs. 3 (model with 18 segments); Yeast starter cultures: Lalvin Clos (clos), Uvaferm RBS (rbs), Uvaferm VRB (vrb); Lactic acid bacteria starter cultures: Enoferm Alpha (alpha), Enoferm Beta (beta), Lalvin PN4 (PN4), Lalvin VP41 (41) and O-Mega (271).
AC C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
RI PT
ACCEPTED MANUSCRIPT
65
Highlights
SC
October 27, 2015
RI PT
ACCEPTED MANUSCRIPT
• A novel data processing procedure for non-targeted gas chromatography mass spectrometry (GC-MS) data is proposed.
M AN U
• Basic matrix manipulation of segmented GC-MS chromatograms and PARAFAC multi-way modelling is used. • Retention time shifts and peak shape deformations between samples are taken into account.
AC C
EP
TE D
• The procedure is demonstrated on an artificial and an experimental fullscan GC-MS data set.
1