Journal Pre-proof 1 Improved lipid mixtures profiling by H NMR using reference lineshape adjustment and deconvolution techniques Ghina Hajjar, Noelle Merchak, Charbel Daniel, Toufic Rizk, Serge Akoka, Joseph Bejjani PII:
S0039-9140(19)31108-7
DOI:
https://doi.org/10.1016/j.talanta.2019.120475
Reference:
TAL 120475
To appear in:
Talanta
Received Date: 22 May 2019 Revised Date:
19 September 2019
Accepted Date: 13 October 2019
Please cite this article as: G. Hajjar, N. Merchak, C. Daniel, T. Rizk, S. Akoka, J. Bejjani, Improved lipid 1 mixtures profiling by H NMR using reference lineshape adjustment and deconvolution techniques, Talanta (2019), doi: https://doi.org/10.1016/j.talanta.2019.120475. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier B.V.
Improved lipid mixtures profiling by 1H NMR using reference lineshape adjustment and deconvolution techniques
Ghina Hajjara,b, Noelle Merchaka,b, Charbel Daniela, Toufic Rizka, Serge Akokab, Joseph Bejjania,*,
a
Laboratory of Metrology and Isotopic Fractionation, Research Unit: Technologies et
Valorisation Agroalimentaire (TVA), Faculty of Science, Saint Joseph University of Beirut, P.O. Box 17-5208 Mar Mikhael, Beirut 1104 2020, Lebanon b
EBSI team, Interdisciplinary Chemistry: Synthesis, Analysis, Modelling (CEISAM),
University of Nantes - CNRS UMR 6230, 2 rue de la Houssinière, BP 92208, F-44322 Nantes Cedex 3, France
*
Corresponding author. Tel.: +961 1 421683; fax: +961 4 532657. E-mail:
[email protected]
1
Abstract Analysis of one-dimensional 1H NMR spectra of complex mixtures, such as lipids from natural extracts, is hampered by the small spectral width leading to a great number of overlapped signals. Additional complications including lineshape broadening and distortion may occur due to magnetic field inhomogeneity. Quantitation of such spectra is therefore challenging. We present in this work a quantitation approach based on deconvolution after correction of spectra by means of reference lineshape adjustment (RLA), also known as reference deconvolution. Spectral fit and precision obtained on deconvoluted peaks were used as indicators to iteratively improve the deconvolution process. This approach was tested on 1
H NMR spectra of olive oil samples and allowed extraction of 77 peaks (available as peak
intensities or areas), whereas spectral integration afforded 5 variables when only wellresolved signals were considered and 29 variables when a bucket around each discernable peak was integrated. Deconvoluted peak intensities and areas were obtained with improved precision after RLA of raw spectra. The use of these spectral variables as predictors in multivariate statistical analysis enhanced the classification of olive oil samples according to the altitude of the olive field or to the color of the olive drupes. The same variables allowed quantitation of oleic, palmitoleic, and vaccenic acids within triacylglycerols, which was not possible by 1H NMR, and improved quantitation of linoleic and linolenic acids. These results proved the high potential of the presented approach in the characterization and authentication of complex mixtures by 1H NMR spectroscopy.
Keywords: Reference lineshape adjustment – 1H NMR spectral deconvolution – Triacylglycerols – Olive oil authentication – Fatty acids quantitation – Multivariate models
2
1. Introduction Metabolite profiling by 1H NMR provides powerful information used in different domains such as medicine, health, and food technology. Among the wide number of metabolites, lipids gain an important interest in the field of metabolomics. Therefore, lipidomics emerged in 2003 as a tool for the study of several cellular functions, for the improvement of the diagnosis and/or prognosis of a disease, and for the identification of new therapeutic targets [1,2]. Since lipids are essential for multiple biological pathways, lipid imbalance could lead to several health diseases such as Alzheimer [3], diabetes [4], and muscular dystrophy [5]. In addition, lipids are known for their ubiquitous property being quasi-universal component of food products. Their isotopic composition and metabolic profile are closely related to the geographical origin, weather conditions, and agricultural practices. NMR can be used to study changes in the lipid isotopic and/or metabolomic profiles for food authentication and fraud detection [6–8]. One-dimensinal (1D) 1H NMR spectra of complex mixtures usually suffers from a great number of overlapped signals especially when dealing with biomolecules having different conformations. In this latter case, a switch between the different states can be noticeable on the NMR spectra and the deconvolution can give useful information for a better understanding of the exchange phenomenon [9]. 1H NMR spectra of lipids are commonly used for untargeted metabolomics based on profiling [10]. They have been also used for the quantitation of lipid groups within total lipid extracts, the determination of the percentage of saturated, mono- and poly-unsaturated fatty acids (SFA, MUFA and PUFA, respectively), and the quantitation of specific fatty acids having well-known and isolated chemical shifts such as omega-3 fatty acids [11,12]. However, the quantitation of individual fatty acids within triacylglycerol matrices usually requires more targeted techniques such as gas chromatography [13] or the development of advanced NMR techniques [14,15]. Therefore, in
3
order to improve the precision of results when dealing with 1H NMR spectra of complex mixtures, it is of a great importance to take into account the signal processing steps and favor signal deconvolution [16,17]. The tools dedicated to metabolomics aiming to facilitate the processing of 1D NMR spectra are further being developed [18–20]. Some of these tools included reference deconvolution in the processing script [12,21], which is a Reference Lineshape Adjustment (RLA) of the whole spectrum. RLA is a well-known technique used to compensate lineshape distortions of Lorentzian resonances observed in a high-resolution NMR spectrum [22,23]. Many parameters affect NMR signals shape such as the magnetic field inhomogeneity and instability leading to broad and asymmetric peaks [24,25]. Lineshapes can even undergo more complicated distortions and can sometimes show splitting, thereby hiding useful information in the spectrum. However, an inhomogeneous magnetic field affects all peaks of a spectrum in the same way. Distorted lineshapes are thus the result of a biased Free Induction Decay (FID) expressed by the following equation: = ×
(1)
where is the experimental biased FID, is the so-called perfect FID, and is the error function resulting from inhomogeneous field [26,27]. Reference deconvolution aims to overcome such distortions and to adjust lineshapes, and thus getting closer to the spectrum that should be obtained from Fourier transform of the perfect FID. It consists in finding the error function and using it to adjust the experimental FID. This is done by selecting a singlet peak from the experimental spectrum and setting all the other points in the spectrum to zero. This spectrum with only the distorted reference signal is then
inverse Fourier transformed to get the . An adjusted synthetic FID is constructed
4
close to what is expected for a symmetrical Lorentzian peak. The error function
for the reference signal is determined by Equation 2 [26,27]. =
(2)
Assuming that all the observed experimental signals are equally affected by the field inhomogeneity, and are supposed to have the same value. Therefore, the biased experimental FID is adjusted according to Equation 1 as follows: =
(3)
where is the adjusted FID that henceforth replaces the . The Fourier transform of yields an adjusted spectrum that can be used for further spectral processing and analysis [26,27]. The objective of this work was to make the most of 1H NMR spectra in terms of data provided and to test whether characterization and classification of complex matrices, such as triacylglycerols, could be improved. The spectral deconvolution was applied on reference lineshape adjusted spectra of olive oil samples and the resulting peak areas and intensities were used as inputs in multivariate statistical analysis. The classification of samples by means of canonical discriminant analysis (CDA) and linear discriminant analysis (LDA) using the aforementioned variables was compared with the classifications obtained through two types of variables from spectral integration processes: first, percentages of SFA, MUFA, Linoleic (L) and Linolenic (Ln) acids, calculated from integrals of well-resolved allylic, diallylic, methylene at position 2, and methyl of Ln signals [11]; and second, integrals around all discernable peaks in spectra. Moreover, the potential of our approach was evaluated in the quantitation of individual fatty acids using multivariate regression models.
2. Materials and Methods
5
2.1. Olive oil extraction Thirty-six olive samples were collected from different Lebanese regions and were grouped according to the color of olive drupes (Black and Green) or to the altitude of the olive field [Low (below 350 m) and High (above 700 m)]. The olive oil extraction procedure solely involves mechanical methods. Olives were crushed, pitted, and then ground to obtain a paste. The paste was centrifuged; the resulting oil layer was decanted and then centrifuged to eliminate residual particles. The olive oil samples were stored at −18 °C. The same samples were used in previous studies [11,15].
2.2. 1H NMR data acquisition Acquisition of 1H NMR spectra were previously performed according to the following procedure [11]. A volume of 180 µL of olive oil was dissolved in 560 µL of CDCl3 (purchased from Deutero GmbH) and the mixture was transferred to a 5 mm NMR tubes [28]. One-dimensional 1H NMR spectra were recorded on a 400 Bruker Avance II spectrometer operating at 400.13 MHz using the following conditions: probe temperature 298 K, time domain size 64 K, pulse angle 30°, pulse width 27.4 µs, spectral width 9 ppm, acquisition time 9.1 s, relaxation delay 1 s, 4 dummy scans and 32 scans. The longest longitudinal relaxation time T1 (3.19 s) was observed for the methyl group of the linolenic acid as measured by the inversion-recovery method. For each sample, seven spectra were recorded leading to a global experiment time of 43 min.
2.3. Spectra processing The recorded FID was zero-filled to 128 K. An exponential apodization function was applied with a line broadening of 0.3 Hz prior to the Fourier transformation. The phase was corrected manually and all the spectra were calibrated by setting the peak with the highest chemical
6
shift of the glycerol methylene signal at 4.299 ppm. The spectral resolution was expressed in percent as intensity of the minimum between the signals at 4.288 and 4.299 ppm relative to intensity of the signal at 4.299 ppm [11]. Further processing parameters were adapted to signals integration or to spectral deconvolution as described in sections below.
2.3.1. Signals integration First, spectral integration, as previously described [11], was used to determine areas of relatively non-overlapping signals, namely vinylic, diallylic, methylene at position 2, allylic, aliphatic methylene, and Ln methyl signals. Briefly, the baseline correction was performed using the Cubic Spline technique in Bruker TOPSPIN software. The spectral resolution was systematically calculated as described above and never exceeded 25%. Signal areas were determined using fixed chemical shift intervals and their values were normalized relative to the area of methylene at position 2 signal. Thus, this signal was not directly used in the statistical analyses. Besides, information afforded by vinylic signal was redundant with that obtainable from allylic and diallylic integrals. We also mention that glycerol, methylene at position 3, and methyl (other than Ln methyl) signals were not considered since they afforded equivalent information as methylene at position 2 signal. For each olive oil sample, integrated signals afforded four continuous variables related to percentages of fatty acids within triacylglycerols: SFA, MUFA, L, and Ln. Both signal integrals and percentages of fatty acids were tested as predictors in multivariate models. Second, after an automatic fifth-order polynomial baseline correction, bucket boundaries defined by local minima around all discernable peaks within the vinylic, diallylic, allylic, aliphatic, and methylic regions of the spectra were set and the resulting buckets integrated. When a spectral region showed a chemical shift discrepancy relative to a reference spectrum, integrals limits within this region were shifted accordingly. Integral values were normalized
7
relative to the area of methylene at position 2 signal. Thus, 29 spectral variables were obtained and then tested as predictors in multivariate models. Limits of peak integrals and their repeatability are reported in Supplemental Table S-1.
2.3.2. Reference lineshape adjustment (RLA) and signal deconvolution After manual phasing and automatic fifth-order polynomial baseline correction, RLA was applied for each spectrum using Bruker TOPSPIN software. CHCl3 signal was used as reference to compute the error function . The peak shape was adjusted until a spectral resolution between 19 and 20% is obtained. Each region was then calibrated for chemical shift and then deconvoluted by adding the minimum number of peaks allowing the best fit (See section 3.1 for details regarding the procedure to optimize peaklists for deconvolution). This was done for each of the following regions related to fatty acid chains: methylene at position 2, aliphatic, allylic, diallylic, vinylic, and methyl of Ln (Supplemental Fig. S-1). Intensities of deconvoluted peaks were normalized relative to the sum of methylene at position 2 peak intensities set to 600. Corresponding areas were normalized similarly.
2.4. Gas chromatographic analysis Gas chromatography (GC) was used as a reference for the quantitation of fatty acids within triacylglycerols in olive oil samples. Samples were previously analyzed according to the following procedure [11,15]. Fatty acid methyl esters were prepared by shaking 200 mg of oil in 3 mL of hexane with 0.4 mL of 2 N methanolic potassium hydroxide [29,30]. A 1:10 dilution in hexane was performed. An Agilent Technologies chromatograph equipped with a flame ionization detector and a fused-silica capillary column (SGE-054616, 30m×0.32mm i.d., BPX70 0.25 µm) was used in this study with helium as carrier gas at flow of 1.0 mL/min. Oven temperature was programmed as follows: at 160 °C for 15 min, to 200 °C at a rate of 10
8
°C/min, at 200 °C for 10 min. A volume of 1 µL was injected with a split ratio of 50:1 and an injector temperature of 200 °C. The detector temperature was set at 270 °C. Two injections were performed for each sample with a total elution time of 58 min. Standard fatty acid methyl esters, purchased from sigma Aldrich, were used to determine the retention time of each fatty acid. Molar percentages of palmitic, palmitoleic, margaric, margaroleic, hypogeic, stearic, oleic, vaccenic, linoleic, linolenic, arachidic, gondoic, behenic, and lignoceric acids in olive oils were determined relatively to their total number of moles by dividing each peak area by the molar weight of the corresponding fatty acid methyl ester.
2.5. Multivariate analysis Data obtained from 1H NMR spectra and molar percentages of fatty acids obtained from GC analysis of olive oil samples were used as independent and dependent variables, respectively, in multivariate regression analyses performed with TANAGRA data mining software [31]. CDA and LDA were used for the classification of olive oil samples. A backward elimination approach was used to select variables for classification models based on their Wilks’ λ values. In each case, performance and robustness of the model were assessed using LDAError rate (LDA-Er), leave-one-out Error rate (LOO-Er), and k-fold Cross-Validaton Error rate (k-fold-CV-Er) as indicators. Fatty acid quantitation models were constructed using Partial Least Square Regression (PLSR). Spectral variables (peak intensities and/or areas) obtained by 1H NMR were used as predictors and mass percentages of fatty acids obtained by GC served as dependent variables. PLSR models were built by eliminating non-relevant variables according to their standardized regression coefficient while adjusting the number of components (h) based on the cross-validation test (8-fold). The maximum cumulative Q2 (Q2cum) and the Predicted
9
Residual Error Sum of Squares (PRESS) were used as robustness indicators [32,33]. Q2cum and PRESS parameters are related by the following equation: % !"#$ = 1 − ∏012
)*+,,*,,-./
(4) %
where h is the number of components in a PLSR model; PRESS = ∑:1289: − 9;<: = ; Residual Sum of Squares RSS = ∑:129: − 9;: %; n is the number of samples in the training set; 9: is the fatty acid amount determined by GC; 9;<: values are the prediction of 9: for omitted samples with a model constructed after this omission from the training set and using the initial descriptors; and 9;: is the prediction of 9: while the corresponding sample is included in the training set when the model is constructed. The best compromise between the lowest PRESS and the highest Q2cum values was used to determine the number of components to retain. The coefficient of determination R2 was used to determine the proportion of the variance in the dependent variable that is predictable by the independent variables. In other words, it was used to assess the goodness-of-fit. Adjusted R2, given by equation 5, was used to compare models constructed with different number of predictors [34]. <2
Adjusted E % = 1 − <$<2 1 − E %
(5)
where n is the sample size in the training set and m is the number of predictors (the number of components in this case) taking into account the constant term. However, since internal validation parameters (Q2cum and PRESS) are not sufficient to evaluate the predictive power of models [35], external validation was applied using coefficient of determination for the test set (Pred-R2) as indicator [36]. This parameter is given by the following equation: Pred − E % = 1 −
∑ ; G H GI/G < ∑ J H GI/G <
(6)
10
where n is the sample size in the test set; 9: is the fatty acid amount of the test samples determined by GC; 9;: is the prediction of 9: by the constructed regression model; and 9J is the mean of 9: values for all the training set samples. The best model was retained by compromising between the goodness-of-fit (R2 and Adjusted R2) and validation (Q2cum, PRESS, and Pred-R2) parameters.
3. Results and Discussion 3.1. Optimization of spectral deconvolution Deconvolution was carried out on raw spectra of olive oil using Lorentz fitting function for each spectral region separately (see Section 2.3.2). An iterative approach was applied at this level, guided by the improvement of lineshape fitting and the precision on the quantitation of deconvoluted peaks. In the first place, peak picking was directed by the visual aspect of the signal. For example, as shown in Fig. 1a, five peaks were used to fit the signal corresponding to aliphatic protons. The fitted lineshape did not perfectly match with the experimental signal (Fig. 1a). In a second place, to overcome this issue, peaks were added (or removed) where the fitted lineshape and the experimental one were not perfectly superposed. Two criteria were considered to judge the deconvolution result. First, any discrepancies between the fitted lineshape and the real signal should be undetectable by visual inspection after increasing spectral intensities by a factor of 8 starting from a full-spectrum view. Second, repeatability (relative standard deviation based on 7 spectra) of deconvoluted peak intensities, should be less or equal to 10%. Wherever this precision was not reached but the added peaks were necessary for the required quality of fit, the first criterion prevailed. Therefore, in the case described in Fig.1, another set of eight peaks has led to a substantial improvement as shown in Fig. 1b. As a result, signals at 1.239, 1.250, and 1.284 ppm were deconvoluted each with two peaks instead of one (Fig. 1a & Fig. 1b). These peaks were
11
deconvoluted with a high precision, as detailed later in this section (Table 1). In addition, intensities (or areas) of each pair of deconvoluted peaks were not highly correclated: R2 were respectively 0.233, 0.693, and 0.799 for A3i/A4i, A5i/A6i, and A7i/A8i (See Fig. 1b and Supplemental Table S-2 for labelling). Consequently, this processing approach allowed us to reveal hidden peaks, and thus uncover masked information. It is noteworthy to mention that for each spectral region, the peaklist used for deconvolution was tested on 5 olive oil samples and seven spectra per sample before being adopted. In each case, the lineshape fitting and the precision on deconvoluted peaks were monitored (Supplemental Table S-4). Deconvoluted peaks of each spectral region and reproducibility of their intensities and areas are reported in Supplemental Table S-2. Obtained fits are shown in Supplemental Fig. S-2. Eighty-one peaks were deconvoluted and quantitated (peaks of methylene at position 2 region are not reported since they were not used as variables; the sum of their intensities or areas was only used to normalize other peaks). In order to determine repeatabilities (S) and within-lab reproducibilities (SRw) of peak areas and intensities, an olive oil sample was analyzed 7 times over a one-year period following the same experimental procedure described in Section 2.2 (7 spectra were recorded each time). S is the relative standard deviation of measurements from the 7 spectra recorded each time. SRw is the relative standard deviation of mean values obtained from the 7 analyses performed over the one-year period. In general, peak intensities or areas determined with SRw lower than 10% were considered individually, whereas peak intensities or areas showing higher values were combined with their appropriate neighboring peaks guided by the resulting SRw. As for the signal corresponding to methyl protons of Ln, 5 peaks were added to perfectly fit its lineshape; yet only the sum of their intensities and that of their areas were considered. As a result, 58 variables –available as intensities and areas– were obtained (Supplemental Tables S-2 & S-3).
12
3.2. Improvement of deconvolution through RLA As mentioned above, series of 7 spectra were recorded for each olive oil sample. Although acquisition parameters were carefully calibrated at the beginning of each experiment, many factors may have induced lineshape deformities. Even though shim correction is done prior to acquisition, disruption of magnetic field homogeneity –among other factors– led to lineshape distortions and lack of superposition throughout the 7 replicate spectra (Fig. 2a). Aiming to overcome distortions and to improve spectral superposition, RLA was applied using the CHCl3 signal as a reference. A “perfect” Lorentzian shape of this signal and an improved resolution of other signals (Fig. 2c) were observed leading to a better superposition of replicate spectra (Fig. 2b). In order to evaluate the impact of RLA on repeatabilities of peak intensities and areas, region corresponding to aliphatic protons in raw spectra (RS) and in reference lineshape adjusted spectra (RLAS) was deconvoluted using the peak list mentioned in Fig. 1b. RLA has resulted in a better repeatability of peak intensities and areas (Table 1). As shown in Table 1, intensities were obtained with better S and SRw than areas. However, when areas or intensities were determined with low precision (e.g., A7 and A8 in Table 1), a dramatic improvement of repeatability was noticeable. Similar behavior was perceived with five other samples chosen randomly (Supplemental Table S-4). Precision high enough to conduct metabolomics was reached on most of deconvoluted peaks of the triacylglycerol spectrum (Supplemental Table S-2). Therefore, RLA was applied to the whole set of samples considered in this study. Using this spectral processing approach, high-resolution metabolomics of the complex triacylglycerol matrices became possible by 1H NMR spectroscopy.
3.3. Effect of replicate spectra on reproducibility
13
Effect of replicate spectra on within-lab reproducibility was evaluated for all the deconvoluted regions. As shown in Supplemental Fig. S-3, SRw of intensities and areas significantly decreased when 2 spectra were recorded each time, instead of 1. Although SRw was still decreasing until seven replicate spectra, four spectra were a good compromise between precision and acquisition/processing time.
3.4. Improved classification of olive oil samples The large number of variables (intensities and areas of 58 peaks) obtained by applying the spectral deconvolution method described above (see Section 2.3.2) afforded a highly detailed profiling of triacylglycerol matrices compared with information obtained by signal integration, either integration of the well-resolved signals or of the buckets around discernable peaks in the 1H NMR spectra (see Section 2.3.1 and Supplemental Table S-1). In order to test the potential of the presented approach in the authentication of such matrices, a comparative classification study was conducted using authentic olive oil samples. Oils from high and low altitudes were not well discriminated using variables from spectral integration of signals (Fig. 3a) or buckets (Fig. 3b). However, as shown in Fig. 3c, an excellent separation (LDA-Er = 0%) between oils from high and low altitudes was reached using 6 variables from deconvoluted RLAS. LOO and 10-fold-CV error rates of only 3.4% and 4,2%, respectively, pointed out the model robustness. We mention herein that peak width (W, peak width at half height) variations occurring from a sample to another were not exactly the same for all peaks. For this reason, in the present multivariate model both intensities and areas of peaks A7 (A7i and A7s = WA7i) and A8 (A8i and A8s = WA8i) were necessary in order to compensate inhomogeneous width variation occurring between samples (Fig. 3c). A similar enhancement was observed in the classification of olive oils according to the color of olive drupes. While weak classifications were observed using variables obtained by
14
integration of signals (Fig. 4a) or buckets (Fig. 4b), a much better and robust classification model (LDA-Er = 0%, LOO-Er = 0%, and 10-fold-CV-Er = 1.1%) was constructed using 6 variables from deconvoluted RLAS (Fig. 4c). These results highlighted the contribution of our approach to the improvement of olive oil classification.
3.5. Quantitation of individual fatty acids through linear regression models Classical processing of 1H NMR spectra of lipids, using signals integration, allows calculation of molar percentages of linoleic and linolenic acids in olive oil [11] and molar percentages of specific categories of fatty acids (FA) such as w3 in fish oil [37]. Improved processing of 1H NMR spectra combined with bioinformatics tools allowed the quantitation of more FA categories such as w6 and w9, as well as the combined quantitation of arachidonic and eicosapentanoic acids in serum [12]. Only few studies report the quantitation of individual fatty acids by 1H NMR. Most cases dealt with quantitation of fatty acids unique in their type in the studied matrix. In this respect, Willker and Weibfritz used the high resolution power of 800 MHz NMR to identify, by means of 2D NMR experiments, specific signals in 1H NMR spectra related to particular unsaturated fatty acids in tissues and body fluids. Based on the identified peaks, authors quantitated corresponding acids via 1H NMR using a 600 MHz spectrometer [38]. In order to test the potential of our spectral processing method in this field, areas and intensities of deconvoluted peaks were used as predictors (independent variables) in PLSR, and molar percentages of each FA obtained by GC method were used as targets (dependent variables). Quantitation models were constructed based on a training set of 30 olive oil samples and were assessed using the cross-validation test (8-fold) as well as by external validation with 6 samples chosen randomly. In addition to linoleic and linolenic acids, this approach allowed the individual quantitation of three monounsaturated fatty acids having close 1H NMR profiles: oleic (C18:1, w9), palmitoleic (C16:1, w7), and
15
vaccenic (C18:1, w7) acids. Regression coefficients, intercepts, and p values of corresponding models are summarized in Table 2. Correlations between FA percentages obtained by GC and those predicted using the 1H NMR method are shown in Fig. 5 along with the statistical parameters of the prediction models. A robust model for quantitation of oleic acid was obtained using 3 PLS components based on 5 spectral variables (Table 2 & Fig. 5a). Q2cum and Pred-R2 of the constructed model were 0.975 and 0.988, respectively; whereas R2 between oleic acid and MUFA was 0.977 over the 6 test samples, which is lower than Pred-R2 for the same samples. Thus, the present model predicted oleic acid amounts more accurately than MUFA did. Only one of the variables used in the model, variable A6i, was highly correlated with oleic acid percentage (n = 36, R2 = 0.9398, F = 530.8, p < 0.00001). The other variables had negative coefficients in the model and showed mid to high correlations with percentages of other fatty acids over the 36 samples (A1s with linolenic acid, R2 = 0.8713, F = 230.2, p < 0.00001; A5i with the sum of palmitoleic and vaccenic acids, R2 = 0.7975, F = 133.9, p < 0.00001; A7s with palmitic acid, R2 = 0.7478, F = 100.8, p < 0.00001; and A3i with linoleic acid, R2 = 0.5766, F = 46.3, p < 0.00001). Their role in the model was probably to subtract the contribution of acids other than oleic from A6i. On the other hand, variable A4i (not in the model) was highly specific to oleic acid (n = 36, R2 = 0.9748, F = 1315.2, p < 0.00001). It can be used to quantitate this acid with a higher accuracy than by means of MUFA. For quantitation of palmitoleic acid, a robust model was constructed with 5 PLS components of 6 spectral variables each (Table 2 & Fig. 5b). Likewise, PLSR allowed construction of a model for quantitation of vaccenic acid (5 PLS components of 6 spectral variables each, Table 2 & Fig. 5c). Predictive powers of these models were assessed by Q2cum (0.968 and 0.932, respectively) and Pred-R2 (0.935 and 0.906, respectively). For models used in quantitation of palmitoleic and vaccenic acids, both intensity and area of peak A2 (A2i and
16
A2s, respectively, Table 2) were used. Interpretation reported in Section 3.4 to explain the simultaneous use of variables A7i, A7s, A8i, and A8s was valid in this case. Moreover, deconvolution allowed a direct quantitation of linoleic acid (L). Intensity AL3i (Supplemental Fig. S-4) at 2.037 ppm, obtained with a high reproducibility (0.54%), was highly correlated with L over the 36 studied samples (L = 0.394 × AL3i − 0.112, n = 36, R2 = 0.970, p < 0.00001). Similarly, quantitation of Ln was more precise using the sum of intensities of its methyl protons (Ln = 0.053 × SumLNi − 0.007, n = 36, R2 = 0.984, p < 0.00001) than using integral of the corresponding signal (n = 36, R2 = 0.923, p < 0.00001). It is worth noting that it was not possible to quantitate oleic, palmitoleic, and vaccenic acids by using the 29 bucket integrals as predictors (see section 2.3.1 and Supplemental Table S-1).
4. Conclusion In this work, a “high-resolution processing” was applied to 1H NMR spectra of a complex matrix aiming to improve its compositional profiling. In order to test our approach, spectra of olive oil samples were first treated by means of the reference lineshape adjustment technique after manual phasing and automatic baseline correction. This step allowed improving superposition of replicate spectra by fixing distortion and resolution issues. Second, each spectral region was deconvoluted separately after iterative optimization of peaklists guided by the precision on intensities or areas of resulting peaks. Therefore, 58 variables –available as intensities and areas– determined with high precision were obtained. Last, to assess the usefulness of the processing method applied, these variables were used as predictors in statistical multivariate analyses for classification of samples and for quantitation of individual fatty acids. Classifications of samples according to the altitude of olive fields and to the color of olive drupes were improved compared with those obtained with variables resulting from
17
integration of well-resolved signals or buckets around discernable peaks. Besides, this strategy allowed quantitation of oleic, palmitoleic, and vaccenic acids. As it is commonly known, 1H NMR allows analysis of homogenous mixtures without prior separation but lacks of sufficient resolution needed in complex matrices. The herein presented method has afforded a substantial improvement in the extraction of metabolomic information from 1H NMR spectra of triacylglycerols by partially overcoming spectral overlapping and by improving precision of spectral quantitation. Since triacylglycerols are present in most of plant extracts, animal tissues and fluids, this method could be of general use in metabolic studies as well as in characterization and authentication of food products.
Acknowledgment The authors would like to acknowledge the National Council for Scientific Research of Lebanon (CNRS-L) and the Research Council of Saint-Joseph University of Beirut for granting a doctoral fellowship to G.H. The CORSAIRE platform from Biogenouest is also acknowledged.
References [1]
M. Lagarde, A. Géloën, M. Record, D. Vance, F. Spener, Lipidomics is emerging, Biochim. Biophys. Acta. 1634 (2003) 61. doi:10.1016/j.bbalip.2003.11.002.
[2]
X. Han, R.W. Gross, Global analyses of cellular lipidomes directly from crude extracts of biological samples by ESI mass spectrometry, J. Lipid Res. 44 (2003) 1071–1079. doi:10.1194/jlr.R300004-JLR200.
[3]
T. Tukiainen, T. Tynkkynen, V.P. Mäkinen, P. Jylänki, A. Kangas, J. Hokkanen, A. Vehtari, O. Gröhn, M. Hallikainen, H. Soininen, M. Kivipelto, P.H. Groop, K. Kaski,
18
R. Laatikainen, P. Soininen, T. Pirttilä, M. Ala-Korpela, A multi-metabolite analysis of serum by 1H NMR spectroscopy: Early systemic signs of Alzheimer’s disease, Biochem.
Biophys.
Res.
Commun.
375
(2008)
356–361.
doi:10.1016/j.bbrc.2008.08.007. [4]
V.P. Mäkinen, P. Soininen, C. Forsblom, M. Parkkonen, P. Ingman, K. Kaski, P.H. Groop, M. Ala-Korpela, 1H NMR metabonomics approach to the disease continuum of diabetic complications and premature death, Mol. Syst. Biol. 4 (2008) 1–12. doi:10.1038/msb4100205.
[5]
N.K. Srivastava, R. Yadav, S. Mukherjee, L. Pal, N. Sinha, Abnormal lipid metabolism in skeletal muscle tissue of patients with muscular dystrophy: In vitro, high-resolution NMR spectroscopy based observation in early phase of the disease, Magn. Reson. Imaging. 38 (2017) 163–173.
[6]
B. Gouilleux, J. Marchand, B. Charrier, G.S. Remaud, P. Giraudeau, High-throughput authentication of edible oils with benchtop Ultrafast 2D NMR, Food Chem. 244 (2018) 153–158. doi:10.1016/j.foodchem.2017.10.016.
[7]
S. Guyader, F. Thomas, V. Portaluri, E. Jamin, S. Akoka, V. Silvestre, G.S. Remaud, Authentication of edible fats and oils by non-targeted 13C INEPT NMR spectroscopy, Food Control. 91 (2018) 216–224. doi:10.1016/j.foodcont.2018.03.046.
[8]
N. Merchak, T. Rizk, V. Silvestre, G.S. Remaud, J. Bejjani, S. Akoka, Olive oil characterization and classification by 13C NMR with a polarization transfer technique: A comparison with gas chromatography and 1H NMR, Food Chem. 245 (2018) 717– 723. doi:10.1016/j.foodchem.2017.12.005.
[9]
T.S. Hughes, H.D. Wilson, I.M.S. De Vera, D.J. Kojetin, Deconvolution of complex 1D NMR spectra using objective model selection, PLoS One. 10 (2015) 1–16. doi:10.1371/journal.pone.0134474.
19
[10] J. Marchand, E. Martineau, Y. Guitton, B. Le Bizec, G. Dervilly-Pinel, P. Giraudeau, A multidimensional 1H NMR lipidomics workflow to address chemical food safety issues, Metabolomics. 14 (2018). doi:10.1007/s11306-018-1360-x. [11] N. Merchak, E. El Bacha, R. Bou Khouzam, T. Rizk, S. Akoka, J. Bejjani, Geoclimatic, morphological, and temporal effects on Lebanese olive oils composition and classification: A 1H NMR metabolomic study, Food Chem. 217 (2017) 379–388. doi:10.1016/j.foodchem.2016.08.110. [12] R. Barrilero, M. Gil, N. Amigó, C.B. Dias, L.G. Wood, M.L. Garg, J. Ribalta, M. Heras, M. Vinaixa, X. Correig, LipSpin: A New Bioinformatics Tool for Quantitative1H NMR Lipid Profiling, Anal. Chem. 90 (2018) 2031–2040. doi:10.1021/acs.analchem.7b04148. [13] L.A. Taylor, V. Ziroli, U. Massing, Analysis of fatty acid profile in plasma phospholipids by solid-phase extraction in combination with GC, Eur. J. Lipid Sci. Technol. 111 (2009) 912–919. doi:10.1002/ejlt.200810183. [14] N. Merchak, J. Bejjani, T. Rizk, V. Silvestre, G.S. Remaud, S. Akoka, 13C isotopomics of triacylglycerols using NMR with polarization transfer techniques, Anal. Methods. 7 (2015) 4889–4891. doi:10.1039/c5ay01250c. [15] N. Merchak, V. Silvestre, D. Loquet, T. Rizk, S. Akoka, J. Bejjani, A strategy for simultaneous determination of fatty acid composition, fatty acid position, and positionspecific isotope contents in triacylglycerol matrices by 13C-NMR, Anal. Bioanal. Chem. 409 (2017) 307–315. doi:10.1007/s00216-016-0005-z. [16] S. Zhang, C. Zheng, I.R. Lanza, K.S. Nair, D. Raftery, Interdependence of Signal Processing and Analysis of Urine 1H NMR Spectra for Metabolic Profiling, Anal. Chem. 81 (2009) 6080–6088. doi:10.1002/nbm.1395.(23). [17] C. Zheng, S. Zhang, S. Ragg, D. Raftery, O. Vitek, Identification and quantification of
20
metabolites in 1H NMR spectra by Bayesian model selection, Bioinformatics. 27 (2011) 1637–1644. doi:10.1093/bioinformatics/btr118. [18] D. Jacob, C. Deborde, M. Lefebvre, M. Maucourt, A. Moing, NMRProcFlow: a graphical and interactive tool dedicated to 1D spectra processing for NMR-based metabolomics, Metabolomics. 13 (2017) 1–5. doi:10.1007/s11306-017-1178-y. [19] L. Margueritte, P. Markov, L. Chiron, J.P. Starck, C. Vonthron-Sénécheau, M. Bourjot, M.A. Delsuc, Automatic differential analysis of NMR experiments in complex samples, Magn. Reson. Chem. 56 (2018) 469–479. doi:10.1002/mrc.4683. [20] M. Martin, B. Legat, J. Leenders, J. Vanwinsberghe, R. Rousseau, B. Boulanger, P.H.C. Eilers, P. De Tullio, B. Govaerts, PepsNMR for 1H NMR metabolomic data pre-processing, Anal. Chim. Acta. 1019 (2018) 1–13. doi:10.1016/j.aca.2018.02.067. [21] H. Hu, Q.N. Van, V.A. Mandelshtam, A.J. Shaka, Reference Deconvolution, Phase Correction, and Line Listing of NMR Spectra by the 1D Filter Diagonalization Method, J. Magn. Reson. 134 (1998) 76–87. doi:10.1006/jmre.1998.1516. [22] J.M. Wouters, G.A. Petersson, Reference lineshape adjusted difference NMR spectroscopy. I. Theory, J. Magn. Reson. 28 (1977) 81–91. doi:10.1016/00222364(77)90258-X. [23] J.M. Wouters, G.A. Petersson, W.C. Agosta, F.H. Field, W.A. Gibbons, H. Wyssbrod, D. Cowburn, Reference lineshape adjusted difference NMR spectroscopy. II. experimental verification, J. Magn. Reson. 28 (1977) 93–104. doi:10.1016/00222364(77)90259-1. [24] G.A. Morris, Compensation of instrumental imperfections by deconvolution using an internal reference signal, J. Magn. Reson. 80 (1988) 547–552. doi:10.1016/00222364(88)90253-3. [25] A. Gibbs, G.A. Morris, Reference deconvolution. Elimination of distortions arising
21
from reference line truncation, J. Magn. Reson. 91 (1991) 77–83. doi:10.1016/00222364(91)90409-M. [26] G.A. Morris, H. Barjat, T.J. Horne, Reference deconvolution methods, Prog. Nucl. Magn. Reson. Spectrosc. 31 (1997) 197–257. doi:10.1016/S0079-6565(97)00011-3. [27] K.R. Metz, M.M. Lam, A.G. Webb, Reference Deconvolution. A Simple and Effective Method for Resolution Enhancement in Nuclear Magnetic Resonance Spectroscopy, Concepts Magn. Reson. 12 (2000) 21–42. [28] C. Fauhl, F. Reniero, C. Guillou, 1H NMR as a tool for the analysis of mixtures of virgin olive oil with oils of different botanical origin, Magn. Reson. Chem. 38 (2000) 436–443. doi:10.1002/1097-458X(200006)38:6<436::AID-MRC672>3.0.CO;2-X. [29] International Olive Council, Determination of fatty acid methyl esters by gas chromatography, Madrid, 2015. doi:COI/T.20/Doc. No 33. [30] European Comission, Comission Regulation No 72/77 of 13 January 1977 Amending Regulation ( EEC ) No 1470 / 68 on the drawing and reduction of samples and the determination of oil content, impurities and moisture in oil seeds, 1977. http://eurlex.europa.eu. [31] R. Rakotomalala, Tanagra: a free software for research and academic purposes, (2005). [32] J.P. Gauchi, P. Chagnon, Comparison of selection methods of explanatory variables in PLS regression with application to manufacturing process data, Chemom. Intell. Lab. Syst. 58 (2001) 171–193. doi:10.1016/S0169-7439(01)00158-7. [33] D.M. Hawkins, S.C. Basak, D. Mills, Assessing Model Fit by Cross-Validation, J. Chem. Inf. Comput. Sci. 43 (2003) 579–586. [34] J.A. Schinka, W.F. Velicer, I.B. Weiner, Handbook of Psychology, 2003. [35] A.
Golbraikh,
A.
Tropsha,
Beware
of
doi:https://doi.org/10.1016/S1093-3263(01)00123-1.
22
q2!,
20
(2002)
269–276.
[36] P.P. Roy, K. Roy, On some aspects of variable selection for partial least squares regression
models,
QSAR
Comb.
Sci.
27
(2008)
302–313.
doi:10.1002/qsar.200710043. [37] R. Sacchi, I. Medina, S.P. Aubourg, F. Addeo, L. Paolillo, Proton nuclear magnetic resonance rapid and structure-specific determination of ω-3 polyunsaturated fatty acids in fish lipids, J. Am. Oil Chem. Soc. 70 (1993) 225–228. doi:10.1007/BF02545299. [38] W. Willker, D. Leibfritz, Assignment of mono‐ and polyunsaturated fatty acids in lipids of tissues and body fluids, Magn. Reson. Chem. 36 (1998) S79–S84. doi:10.1002/(sici)1097-458x(199806)36:13
3.3.co;2-q.
23
Figures
Fig. 1. Fitted lineshape obtained from deconvolution of signal corresponding to aliphatic protons using a list of 5 peaks (a) or 8 peaks (b). In the latter case, the resulting lineshape perfectly fitted the experimental one.
24
Fig. 2. Signal corresponding to CHCl3 over 7 replicate spectra before (a) and after (b) RLA and that of methyl protons of fatty acids before and after RLA (c).
25
Fig. 3. Classification of olive oils according to the altitude of the olive field using variables obtained from integration of well-resolved signals (a), from integration of buckets around discernable peaks (b), and from deconvolution of RLAS (c). Variables used in classifications are reported in decreasing order of importance based on the value of their Wilks’ λ parameter.
26
Fig. 4. Classification of olive oils according to the color of olive drupes using variables obtained from integration of well-resolved signals (a), from integration of buckets around discernable peaks (b), and from deconvolution of RLAS (c). Variables used in classifications are reported in decreasing order of importance based on the value of their Wilks’ λ parameter.
27
Fig. 5. Correlations between fatty acid percentages obtained by GC (x-axis) and those predicted using the high-resolution 1H NMR spectra processing (y-axis). ◦ Training samples, • test samples for model validation, p = 0 means that it is less than 0.00001.
28
Table Table 1 Precision of peak intensities and areas without and with RLA Peak
Intensity
Area
labela RS
RLAS
RS
RLAS
Meanb
S%
Meanb
S%
SRw %
Meanb
S%
Meanb
S%
SRw %
A1
28.8
7.86
26.1
2.06
1.08
84.8
8.99
95.1
9.10
2.93
A2
91.0
9.63
79.2
3.76
0.69
558.1
7.96
570.4
7.22
1.43
A3
395.7
7.93
337.9
1.08
0.83
2495.5
1.88
2487.1
1.39
1.10
A4
231.8
6.45
207.8
2.78
1.62
458.6
4.13
457.3
3.27
1.91
A5
133.0
6.59
129.1
3.26
0.73
579.2
5.82
650.6
5.32
2.56
A6
467.3
7.96
403.3
1.68
0.79
1507.1
4.08
1457.8
2.59
1.15
A7
311.3
24.35
232.8
4.77
2.60
472.5
19.45
386.1
2.31
2.03
A8
304.6
18.17
345.3
4.81
1.99
356.0
20.24
413.1
9.69
3.69
a
See Fig. 1b for peak labelling.
b
Mean values correspond to average of intensities or areas throughout 7 replicate spectra.
Intensities and areas were normalized as described in Section 2.3.2.
29
Table 2 Models of individual fatty acid quantitation in Lebanese olive oil samples Predictora Intercept and regression coefficients corresponding to fatty acids regression modelsb Oleic
Palmitoleic Vaccenic
Intercept
98.2087
6.6823
A1s
-0.0504
-0.0155
5.6454
A2i
0.2780
0.0626
A2s
-0.0186
-0.0084
-0.0077
-0.0128
A3i
-0.0631
A4s A5i
-0.0829
A6i
0.0531
A7s
-0.0309
AL6s
0.0433 -0.0327 -0.0051 0.0044
DAL5i a
-0.4380
See Supplemental Table S-2 for labelling
b
Models were trained using 30 olive oil samples and assessed by cross-validation (8-fold) and
external validation (6 test samples)
30
Improved lipid mixtures profiling by 1H NMR using reference lineshape adjustment and deconvolution techniques
Highlights • Improved lipid profiling by reference lineshape adjustment of 1H NMR spectra • High resolution profiling of Triacylglycerols using deconvolution of 1H NMR spectra • Improved geographical and morphological classifications of olive oils • Quantitation of individual fatty acids within triacylglycerols by 1H NMR