Classification and authentication of Iranian walnuts according to their geographical origin based on gas chromatographic fatty acid fingerprint analysis using pattern recognition methods

Classification and authentication of Iranian walnuts according to their geographical origin based on gas chromatographic fatty acid fingerprint analysis using pattern recognition methods

Chemometrics and Intelligent Laboratory Systems xxx (2017) 1–8 Contents lists available at ScienceDirect Chemometrics and Intelligent Laboratory Sys...

2MB Sizes 0 Downloads 32 Views

Chemometrics and Intelligent Laboratory Systems xxx (2017) 1–8

Contents lists available at ScienceDirect

Chemometrics and Intelligent Laboratory Systems journal homepage: www.elsevier.com/locate/chemometrics

Classification and authentication of Iranian walnuts according to their geographical origin based on gas chromatographic fatty acid fingerprint analysis using pattern recognition methods Mahnaz Esteki a, *, Bahman Farajmand a, Setareh Amanifar b, Roghaye Barkhordari a, Zahra Ahadiyan a, Elham Dashtaki a, Mina Mohammadlou a, Yvan Vander Heyden c a

Department of Chemistry, University of Zanjan, Zanjan 45195-313, Iran Department of Agriculture, University of Zanjan, Zanjan 45195-313, Iran c Department of Analytical Chemistry, Applied Chemometrics and Molecular Modelling, Center for Pharmaceutical Research (CePhaR), Vrije Universiteit Brussel (VUB), Laarbeeklaan 103, B-1090 Brussels, Belgium b

A R T I C L E I N F O

A B S T R A C T

Keywords: Classification Chromatographic fingerprint Iranian walnut Multivariate data analysis PCA-LDA

Recently, food authenticity has raised worldwide attention in food manufacturing and a growing concern about food qualification, based on a clear regional identity, is noticed. Therefore, the development of suitable methodologies allowing the characterization of different products, based on their geographical origin, is of great importance. In this study, the potential of gas chromatographic fatty acid fingerprints in combination with multivariate data analysis was examined to classify walnuts from different regions in Iran according to their geographical origins. Walnut samples were collected during the harvesting period 2013–2014 from six regions in Iran. Chromatographic fingerprints of the walnut oil were employed to discriminate the walnut origin. Principal component analysis-Linear discriminant analysis (PCA-LDA) results showed that the six regions of geographical origin can be identified based on the fatty acid fingerprints. Almost all samples were correctly classified by the PCA-LDA model using cross validation (99.2%). The average percent correct classification for the prediction set was 98.3%, indicating the satisfactory performance of the model. A high percentage of correct classifications for the training data demonstrates the strong relationship between the fatty acid profile and the origin, while a high percentage for the prediction set shows the ability to indicate the origin of an unknown sample based on its fatty acid chromatographic data.

1. Introduction Walnuts form a significant source of nutrients and can also be identified as a profitable agricultural product. The walnut kernel is a highquality source of fatty acids and tocopherols (vitamin-E homologues) that mainly contributes to reducing blood cholesterol levels leading to a reduced risk of coronary heart diseases [1,2]. The walnut kernel generally contains about 60% oil by weight; however this may range from 52% to 70%, depending on the cultivar, availability of water and geographic location [3,4]. The major constituents of the oil are triglycerides. The triglyceride composition of the walnut oil contains high levels of monounsaturated and polyunsaturated fatty acids, mainly including linoleic (57–62%), oleic (12–20%), linolenic (11–16%) and palmitic (6–8%) acids [4].

Currently, the three most important varieties of walnut that commonly employed are Juglans regia L., Juglans cinerea L. and Juglans nigra L. The Persian walnut (Jug lans regia L.), which is widely cultivated around the world, has the highest quality among the walnut varieties. It has a sweet taste with a relatively large kernel and has a thin shell, which makes it easy to crack [5]. The Persian walnut, originating from Central Asia, mainly grows in mild and dry environments with low rainfall, such as the Middle East and Mediterranean climates. Iran is the third largest walnut producer worldwide with annual production of 150,000 tons (11% of the world's total walnut production) [6]. Walnut orchards are spread all around the country, since most regions fulfill the requirements of growing this native tree. Considering geographic specifications of quality, research efforts have been focused on the classification of food products according to

* Corresponding author. E-mail addresses: [email protected] (M. Esteki), [email protected] (B. Farajmand), [email protected] (S. Amanifar), [email protected] (R. Barkhordari), z_ahadiyan@ yahoo.com (Z. Ahadiyan), [email protected] (E. Dashtaki), [email protected] (M. Mohammadlou), [email protected] (Y. Vander Heyden). https://doi.org/10.1016/j.chemolab.2017.10.014 Received 26 April 2017; Received in revised form 7 October 2017; Accepted 18 October 2017 Available online xxxx 0169-7439/© 2017 Elsevier B.V. All rights reserved.

Please cite this article in press as: M. Esteki, et al., Classification and authentication of Iranian walnuts according to their geographical origin based on gas chromatographic fatty acid fingerprint analysis using pattern recognition methods, Chemometrics and Intelligent Laboratory Systems (2017), https://doi.org/10.1016/j.chemolab.2017.10.014

M. Esteki et al.

Chemometrics and Intelligent Laboratory Systems xxx (2017) 1–8

application of gas chromatographic fatty acid fingerprinting for the authentication of walnuts based on their geographical origin. However, the walnut quality of different regions in Iran is different and Iranian people know about this quality differences. This work is a beginning of traceability and authentication of this special product. Therefore, the main aim of this study was to establish a reliable model using pattern recognition methods and GC fingerprints for the classification and authentication of the Iranian walnuts. For this purpose, the chromatographic fatty acid profiles of walnut samples, produced in six Iranian regions were considered. Unsupervised (PCA) and supervised (SIMCA and PCA-LDA) pattern recognition techniques were both applied to effectively analyze the obtained data. The predictive ability of a model has been evaluated by the statistical parameters of a prediction set.

their geographical origins. Recently performed researches are based on the analysis of certain chemicals, such as the fatty acids [7], aroma components and multi-element compositions [8–11]. This can be achieved using a wide range of instrumental techniques, such as gas chromatography (GC) [12], gas chromatography with mass spectrometry (GC–MS) [13], liquid chromatography with mass spectrometry (LC–MS) [14] and Nuclear Magnetic Resonance (NMR) [15]. Fingerprint analysis has become one of the most powerful systematic approaches to determine authenticity. A fingerprint is a characteristic profile of a sample, which can be established through common techniques, such as chromatography and spectroscopy. In this regard, chromatographic fingerprints are appropriate records [16,17], reflecting the chemical composition, for quality control, discrimination or classification of numerous food products [18–23]. A chromatographic fingerprint is highly complex multivariate data consequently small differences between similar chromatograms might visually be ignored. Therefore, chemical pattern recognition methods, such as principal component analysis (PCA) and linear discriminant analysis (LDA) could be considered as reasonable methods to classify the samples. Chromatographic fingerprinting combined with chemometric pattern recognition methods has widely been used in food product analysis [21,24–27]. In recent years, fatty acid profiling has become a promising approach to distinguish fat-rich foods by geographical origin or botanical identity [28–30]. However, classification of walnuts based on gas chromatographic fatty acid contents has not been reported yet in the literature. In this study, the potential of gas chromatographic fatty acid fingerprints in combination with multivariate data analysis was examined to classify walnuts from different regions in Iran according to their geographical origins. To the best of our knowledge, no reports have been published on the classification of Iranian walnuts according to their geographical origin. In addition, no reports either were found on the

2. Materials and methods 2.1. Chemicals and reagents Methanol, sulfuric acid (98%) and sodium hydroxide (NaOH) were purchased from Merck (Darmstadt, Germany). Light petroleum ether (bp 40–60  C, analytical grade) was supplied from Daejung (Shiheung, Gyeonggi-Do, Republic of Korea). 2.2. Sample collection Walnut samples were collected from six geographical regions in Iran. The exact location of these regions, including Bavanat (Bav), Maragheh (Mar), Ramsar (Ram), Tuyserkan (Tuy), Saman (Sam) and Alamout (Ala), has been shown on the map (Fig. 1). Six orchards were selected in each region for sampling purposes during the harvesting period 2013–2014 (October 2013). An amount of 1 kg walnuts was purchased from each orchard. The further sampling

Fig. 1. Map of Iran: illustration of the six geographical origins of the walnut samples. 2

M. Esteki et al.

Chemometrics and Intelligent Laboratory Systems xxx (2017) 1–8

2.5.2. Outlier detection Outlier detection (leverage points and orthogonal outliers) is an important part of exploratory multivariate data analysis. PCA is a technique sensitive to outlying observations. Orthogonal outliers consist of objects with a large orthogonal distance (OD) to the PCA space. Some samples also may have a large score distance (SD) which means that their projection in the PCA space is far away from the center. The points with both high SD and high OD are called bad leverage points because they can lever the estimation of the PCA space. They should be removed from the data set prior to further data analysis. On the other hand, the points with high SD and low OD are good leverage points because these types of outliers even stabilize the estimation of the PCA space [33]. In order to diagnose outliers, SD and OD values were computed for all chromatograms belonging to each specific class. Then, SD and OD values were plotted together with critical boundaries (Fig. 3). The SD values were computed as follows:

followed a random selection of 5 walnuts from every 1 kg bags. The walnuts were oven-dried at 30  C for at least 5 days. Afterwards, the samples were stored in a refrigerator at 4  C till the preparation process. 2.3. Preparation of walnut samples The sample preparation process includes two steps, crude oil extraction and methyl esterification. In the first step, finely ground kernels (2.0 g) and 10.0 mL light petroleum ether are placed in a 25-mL round bottom flask, equipped with a magnetic stirring bar and a reflux condenser. The extraction was performed at 70  C for 30 min, and the resulting solution centrifuged for 20 min at 3000 rpm. The clear solvent is separated and transferred to another flask. Finally, crude oil was collected after solvent evaporation by vacuum distillation. In the next step, the method proposed by Rogozinski [31] was implemented for saponification and esterification. Saponification of extracted oil was performed in 10.0 mL methanolic sodium hydroxide solution (10% w/v) under reflux conditions over a period of 30 min. Then approximately 1.0 mL highly concentrated sulfuric acid is added to the mixture, which is refluxed for 1 h. The flask was cooled to room temperature and 2.0 mL petroleum ether was added. The mixture is transferred quantitatively to the screw cap test tube and centrifuged for 15 min. After centrifugation, the upper layer was collected in a screw-cap vial and stored at 4  C until gas chromatography.

" SDi ¼

a X tik2 υ k¼1 k

#1=2

where, a is the number of PCs forming the PCA space, tik are the scores of the score matrix and ϑk is the variance of the kth PC. If the data is normally distributed, the squared score distances can be approximated by a chi-square distribution [33]. Therefore a cutoff value qffiffiffiffiffiffiffiffi for the score distance could be χ 2a;b , with a degrees of freedom and b confidence level. The degrees of freedom are equal to the number of PCs included in the calculation. OD values were calculated according to the following equation:

2.4. Chromatographic conditions Fatty acid methyl esters (FAMEs) analysis was performed by a gas chromatograph (7890 N series, Agilent Technologies, Santa Clara, CA, USA) equipped with split/spiltless injector and a flame ionization detector (FID). Separation was achieved on a DB-WAX fused silica capillary (30 m  0.25 mm, 0.25 μm film thickness; ARudent J&W Scientific, Folsom, CA, USA). The flow rate of the nitrogen carrier gas and H2 gas for FID was set at 1.6 mL min1 and 30 mL min1, respectively. An amount of 1.0 μL of FAMEs dissolved in petroleum ether was injected into the instrument using a split ratio of 30:1. The sample was injected using the “hot needle injection” technique in order to improve the repeatability [32]. The column was maintained at 50  C for 1 min. Then, the temperature increased to 150  C at a rate of 20  C/min and the oven temperature was held at 150  C for 5 min. In the next step, the temperature increased again at the rate of 10  C/min to reach 280  C, which was maintained for 15 min. The temperature of injector and detector were set at 240  C and 250  C, respectively. The chromatograms were exported as CSV files by the ChemStation software and subjected to multivariate data analysis.

ODi ¼ xi  P:t Ti where xi is the ith object of the centered data matrix, P is the loading matrix using a PCs, tiT is the transposed score vector of object i for a PCs. The cutoff value cODj is then calculated as follows:

     3=2 cODj ¼ median OD2=3 þ MAD OD2=3 :z0:975 3

In this equation MAD returns the mean absolute deviation of the OD2/ values.

2.5.3. Soft independent modeling of class analogy (SIMCA) SIMCA is a classification technique that applies PCA to each class to calibrate a data set and then calculates a residual parameter (a distanceto-the-model measurement) for new measurements to decide on their class membership. This determines which class is the nearest to the new profile and classifies the sample according to a preset threshold. If all calculated distances from all classes are outside the threshold, the new sample is unclassified. The orthogonal distance (OD) and score distance (SD) were calculated for each object in a class based on the above equations. qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Using the cutoff values cSDj ¼ χ 2aj ;0:975 for the score distance and cODj

2.5. Multivariate data analysis The chromatograms of the walnut samples were randomly divided into calibration (120 samples) and prediction sets (60 samples). The calibration samples were used for exploratory data analysis (PCA) and model building (including SIMCA, PCA-LDA) while the prediction samples were utilized to evaluate the model performance. PCA, SIMCA and PCA-LDA were carried out by using an in-house program, routinely implemented in Matlab (version 6.5; Mathworks, Natick, MA, USA).

for the orthogonal distance, the distance measures were standardized and combined to result in a score value dDj as follows:

2.5.1. Principal component analysis (PCA) PCA is a widely used technique for visualization, exploration and dimension reduction of multivariate data. This technique is also an unsupervised pattern recognition method which has been applied for data understanding and anomaly detection. PCA provides new latent variables (principal components) which are linear combinations of the original variables. The variation in the data is captured in a decreasing amount in the consecutive components.

here, γ is a tuning parameter taking values in the interval [0, 1] that adjusts the importance of the score and orthogonal distance for the classification. We used cross-validation to find the optimum value of γ. The ‘‘soft’’ classification rule of SIMCA defines that an object x is assigned to the groups for which the score value is smaller than 1. Moreover, objects that do not fit to any of the groups (the standardized score and/or orthogonal distances are larger than 1 for all groups) are not assigned and

djD ðxÞ ¼ γ

3

    ODj SDj þ ð1  γÞ for j ¼ 1; …; k cODj cSDj

M. Esteki et al.

Chemometrics and Intelligent Laboratory Systems xxx (2017) 1–8

treated as outliers. 2.5.4. Linear discriminant analysis (LDA) LDA is a well-known supervised pattern recognition technique, which is based on locating linear combinations of features [33]. These combinations are transformation functions that maximize the ratio of between-group variance to within-group variance. The transformation requires rotating the space in such a way that when observations are projected on to the new space, the differences between the groups are maximized. For high dimension data, such as in the chromatographic data set employed in this work, the use of appropriate variable selection procedures is usually required. 3. Results and discussions 3.1. Gas chromatographic fatty acid fingerprinting Fig. 2 shows a typical FAME GC chromatogram of a walnut sample. Gas chromatography-mass spectrometry (GC-MS) was employed to identify the fatty acids profile of walnut oil. Palmitic acid (C16:0) and stearic acid (C18:0) as saturated fatty acids and oleic acid (C18:1), linoleic acid (C18:2) and linolenic acid (C18:3) as unsaturated fatty acids, are the major compounds in the chromatogram. Other types of fatty acids, such as hexadecenoic acid (C16:1), heptadecanoic acid (C17:0), heptadecadienoic acid (C17:2), eicosenoic acid (C20:1) and arachidic acid (C20:0) are found as the minor parts. In many studies, the detected and integrated peaks in the chromatograms are evaluated during fingerprinting analysis. This approach has two limitations. Finding all peaks can be time consuming, especially when a large number of samples is being investigated. In addition, some details in chromatogram may be ignored and as a result the study outcome could be affected. Consequently, the whole chromatogram was employed in this work and the resulting two-dimensional matrix (180  3600, 180 samples and 3600 retention times) was used for subsequent chemometric analysis. As a result, the differences between samples are not only found in the main, but also in the low-concentration compounds. In this way, many details are taken into account during chromatographic fingerprinting [34].

Fig. 3. The score distance and orthogonal distance plots of the Saman samples.

3.2. Outlier detection The calculated values will allow distinguishing regular observations from outliers. Fig. 3 shows SD and OD graphs of a specific class (Saman). Two objects have SD values above the cutoff value (11.9). However, they are only borderline significant and because these points both have OD lower than the cutoff value (22.8), they are good leverage points which

Fig. 4. a) PCA score plot, and b) loading plot of the gas chromatographic fatty acid fingerprints from the walnut samples.

Fig. 2. Representative fatty acid methyl esters GC chromatogram of an extract from walnut kernel.

4

M. Esteki et al.

Chemometrics and Intelligent Laboratory Systems xxx (2017) 1–8

Fig. 5. The probability density function of projected points of the gas chromatographic fatty acid fingerprints by linear discriminant analysis using the third projection vector. a) PC ¼ 8; b) PC ¼ 9; c) PC ¼ 10; d) PC ¼ 11; e) PC ¼ 12; f) PC ¼ 13; g) PC ¼ 14; h) PC ¼ 15; i) PC ¼ 16; j) PC ¼ 17. 5

M. Esteki et al.

Chemometrics and Intelligent Laboratory Systems xxx (2017) 1–8

PC1 axis). It is important to note that the scores of the principal components are weighted sums of the original variables (signal at different measurement times). PCA has the advantage of its visualization possibilities given by the score and loading plots, often combined in score-loading bi-plots. The loading plot of the fatty acid chromatograms, using three principal components, is depicted in Fig. 4b. The retention time of specific peaks is shown in the loading plot. The relationship between the chromatographic retention times and the walnut samples could be indicated based on location similarity found in bi-plots. Each point shown in Fig. 4b represents a retention time of given components. The numbers in parentheses indicate given peaks. Numbers 1 to 5 correspond to C16:0, C18:0, C18:1, C18:2 and C18:3 respectively. Some points are superimposed on the plot shown in red colored format. These points belong to the baseline of the chromatograms and have no informative value for discriminating the samples. On the other hand the remaining points, which are distributed through the three dimensions of the plot are effective variables in classification of the samples. These points belong to peaks 1 to 5 and are shown in black colored format. Higher frequency of C18:1 (3) and C18:2 (4) indicate the effectiveness of these fatty acids in discriminating samples. The results of PCA as an unsupervised pattern recognition technique were utilized for exploration of data relationships. However, the prediction of the origin of an unknown sample is still required for authentication. For this purpose, supervised pattern recognition methods are further applied to implement the comprehensive establishment of walnut fingerprints from different regions.

Fig. 6. Scatter plot of the projected points in 3-D space defined by the three discriminant functions of the LDA model constructed to classify walnut samples according to their geographical origin.

are kept in the data set. The explained method outcomes demonstrated no outliers among the different chromatograms. 3.3. Unsupervised pattern recognition analysis using PCA

3.4. Supervised pattern recognition analysis In order to provide an overview of the capacity of fatty acid compounds to discriminate walnut samples, PCA was applied to 120 autoscaled chromatograms (calibration set). The score variations of the walnut profiles based on the differences in their GC fingerprints are illustrated in Fig. 4. The first three PCs (PC1, PC2 and PC3) described 81% of the sample variability and allowed appropriate visualization and differentiation. The first three PCs score plot is shown in Fig. 4a. The samples can be classified into five separate groups according to their origins. The samples from Mar, Ram, Bav, Tuy, Ala and Sam are clearly located at different places on the score plot (Ala and Tuy samples seems to be overlapping in Fig. 4a, whereas they are separated along the

3.4.1. PCA-LDA The walnut samples from six geographical regions were subjected to LDA, providing a mathematical model to classify and identify them according to their origins. The LDA model was constructed using 120 samples as calibration set and 60 samples as prediction set to test the classification ability. The calibration data matrix was constituted by 120 objects (walnut samples) and 3600 variables (measurement points for each chromatogram). LDA has some limitations if the number of available samples is smaller than the dimension of the sample space (small sample size problem). In that case, the traditional LDA encounters two

Table 1 Predicted group membership of Iranian walnuts according to their geographic origin using LDA. Original group

Number of cases

Group Predicted

Model LDA

Bav Mar Ram Tuy Sam Ala

10 10 10 10 10 10

Bav

Mar

Ram

Tuy

Sam

Ala

10 0 0 0 0 0

0 10 0 0 0 0

0 0 9 0 0 0

0 0 0 10 0 0

0 0 1 0 10 0

0 0 0 0 0 10 Predictive ability

Accuracy (%)

Sensitivity

Specificity

100 100 90 100 100 100

100 100 90 100 100 100

100 100 100 100 98 100

98.3

98.3

99.7

Table 2 Predicted group membership of Iranian walnuts according to their geographic origin using SIMCA. Original group

Number of cases

Model SIMCA

Bav Mar Ram Tuy Sam Ala

10 10 10 10 10 10

Group Predicted Bav

Mar

Ram

Tuy

Sam

Ala

8 1 1 0 0 0

0 8 0 0 0 0

0 0 9 0 0 0

0 0 0 10 0 2

2 1 0 0 10 0

0 0 0 0 0 8

No. PCs

γ

Accuracy (%)

Sensitivity

Specificity

6 9 6 10 5 7

0.8 0.8 0.9 0.8 0.7 0.9

80 80 90 100 100 80

80 80 90 100 100 80

96 100 100 96 94 100

88.3

88.3

97.7

Predictive ability

6

M. Esteki et al.

Chemometrics and Intelligent Laboratory Systems xxx (2017) 1–8

been reported yet. The GC fingerprints of the walnut oils were measured and analyzed by PCA as an exploratory data analysis. In addition PCALDA and SIMCA were used as classification methods for geographical authentication of the walnut. PCA-LDA displayed a more accurate classification than SIMCA. The results indicate that it is possible to establish a classifying model of Iranian walnuts using gas chromatographic data representing their fatty acid profile. The proposed method provides a proper means of walnut classification according to the geographical origin and can be used as a technique to verify the labeling compliance of Iranian walnuts. The fact that only walnuts from one season were used removed some potential variability within each region. Therefore in future to become more generally applicable this source of variability should be included in the model and walnuts from other harvesting years should be included as well.

aspects of difficulty. Under such circumstances, the within-class scatter matrix becomes singular and the LDA algorithm cannot be used directly [35]. Additionally, the high-dimensional data vectors lead to computational limitations. In order to avoid these problems, an effective approach, often PCA plus LDA, is widely utilized. In the latter technique, PCA is first used for dimensionality reduction before LDA is applied [35]. Therefore, because of the high-dimensional chromatographic data, PCA scores obtained from the chromatograms were used as input variables to the LDA model. The categorized dependent variables are the origins of the walnuts and the independent variables from the chromatograms. In this method the walnut samples from the different regions were assigned by values 1 to 6. A probability density function of projected points [36], using the third projection vector was used to optimize the number of PCs for LDA modeling (Fig. 5). Based on the graphs the best performance in distinguishing the regions corresponds to 14 PCs (Fig. 5g). Therefore 14 PCs were used for LDA model building. With this model, an excellent resolution between all classes was obtained (Fig. 6). Subsequently, the constructed PCA-LDA model was employed to predict the origin of unknown samples (60 samples). The PCA-LDA classification results of all test samples are shown in Table 1. Almost all samples from the calibration set were correctly classified in by the PCA-LDA model using leave-one-out cross validation (99.2%). Bootstrapping was also done by using five samples for prediction in each step; the average of accuracy for this process was 98.5%. The sensitivity of each class which was calculated from its TP/(TP þ FN) (TP: True Positive, FN: False Negative) and the specificity based on TN/(TN þ FP) (TN: True Negative, FP: False Positive) are also reported in Table 1. The model provided the accuracy of 98.3, the percent correct classification for the prediction set was almost always 100.0%; one exception with 90% (Table 1). A high percentage of correct classification of training data represents a strong relationship between the fatty acid profile and the origin, while a high correct predictive percentage for the prediction set indicates the ability to classify the origin of an unknown sample based on the fatty acid chromatographic data.

Conflict of interest We have no conflict of interest to declare. Acknowledgement The authors gratefully acknowledge the University of Zanjan for financial support of this work. References [1] L.S. Maguire, S.M. O'Sullivan, K. Galvin, T.P. O'Connor, N.M. O'Brien, Fatty acid profile, tocopherol, squalene and phytosterol content of walnuts, almonds, peanuts, hazelnuts and the macadamia nut, Int. J. Food Sci. Nutr. 55 (2004) 171–178. [2] D. Zamb on, J. Sabate, S. Munoz, B. Campero, E. Casals, M. Merlos, J.C. Laguna, E. Ros, Substituting walnuts for monounsaturated fat improves the serum lipid profile of hypercholesterolemic men and women, Ann. Intern. Med. 132 (2000) 538. [3] B.R. Moser, Preparation of fatty acid methyl esters from hazelnut, high-oleic peanut and walnut oils and evaluation as biodiesel, Fuel 92 (2012) 231–238. [4] G.P. Savage, P.C. Dutta, D.L. McNeil, Fatty acid and tocopherol contents and oxidative stability of walnut oils, J. Am. Oil Chem. Soc. 76 (1999) 1059–1063. [5] S.M.T. Gharibzahedi, S.M. Mousavi, M. Hamedi, K. Rezaei, F. Khodaiyan, Evaluation of physicochemical properties and antioxidant activities of Persian walnut oil obtained by several extraction methods, Ind. Crops Prod. 45 (2013) 133–140. [6] K.W.C. Sze-Tao, S.K. Sathe, Walnuts (Juglans regia L): proximate composition, protein solubility, protein amino acid composition and protein in vitro digestibility, J. Sci. Food Agric. 80 (2000) 1393–1401. [7] Y. Yang, M.D. Ferro, I. Cavaco, Y. Liang, Detection and identification of extra virgin olive oil adulteration by GC-MS combined with chemometrics, J. Agric. Food Chem. 61 (2013) 3693–3702. [8] E.I. Geana, A. Marinescu, A.M. Iordache, C. Sandru, R.E. Ionete, C. Bala, Differentiation of Romanian wines on geographical origin and wine variety by elemental composition and phenolic components, Food Anal. Methods 7 (2014) 2064–2074. [9] C. Xiong, Y. Zheng, Y. Xing, S. Chen, Y. Zeng, G. Ruan, Discrimination of two kinds of geographical origin protected Chinese vinegars using the characteristics of aroma compounds and multivariate statistical analysis, Food Anal. Methods 9 (2016) 768–776. [10] V. Uríckova, J. Sadecka, Determination of geographical origin of alcoholic beverages using ultraviolet, visible and infrared spectroscopy: a review, Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 148 (2015) 131–137. [11] P.H.G.D. Diniz, M.F. Pistonesi, M.B. Alvarez, B.S.F. Band, M.C.U. de Araújo, Simplified tea classification based on a reduced chemical composition profile via successive projections algorithm linear discriminant analysis (SPA-LDA), J. Food Compos. Anal. 39 (2015) 103–110. [12] J. Rutkowska, M. Bialek, A. Adamska, A. Zbikowska, Differentiation of geographical origin of cream products in Poland according to their fatty acid profile, Food Chem. 178 (2015) 26–31. [13] F. Troya, M.J. Lerma-García, J.M. Herrero-Martínez, E.F. Sim o-Alfonso, Classification of vegetable oils according to their botanical origin using n-alkane profiles established by GC–MS, Food Chem. 167 (2015) 36–39. [14] K. Hori, T. Kiriyama, K. Tsumura, A liquid chromatography time-of-flight mass spectrometry-based metabolomics approach for the discrimination of cocoa beans from different growing regions, Food Anal. Methods 9 (2016) 738–743. [15] X. Zheng, Y. Zhao, H. Wu, J. Dong, J. Feng, Origin identification and quantitative analysis of honeys by nuclear magnetic resonance and chemometric techniques, Food Anal. Methods 9 (2016) 1470–1479. [16] L. Yi, H. Wu, Y. Liang, Chromatographic fingerprint and quality control of traditional Chinese medicines, Chin. J. Chromatogr. 26 (2008) 166–171.

3.4.2. SIMCA SIMCA, as an alternative class-modeling technique, was also used for model construction and prediction of class membership of the samples in the test set. The problem of selecting the optimum dimensions a1,…, ak of the PCA models was also done using the cross-validation technique, and the goal was to minimize the total probability of misclassification. The latter was obtained from the calibration set, by computing the percentage of misclassified objects in each group, multiplied by the relative group size, and summarized over all groups. The optimum value of γ and the number of PCs which were used for each class are tabulated in Table 2. The constructed model exhibited separated clusters leading to discrimination of the walnut samples. The predictive ability was also evaluated using the predictive accuracy, sensitivity and the specificity. The average of sensitivity and specificity were found to be 88.3 and 97.7, respectively. The SIMCA prediction results are also provided in Table 2. They illustrate that the LDA model is more capable of predicting the sample origin than SIMCA. Using SIMCA, in several cases, 2 out of 10 test samples were misclassified, yielding a classification efficiency of 80.0%. The discrimination accuracy obtained for the calibration (94.5%) and prediction sets (88.3%) suggest a moderate predictive model to classify the geographical origins of walnut samples. In relation to the classification errors, the overall results show the superiority of the LDA model. 4. Conclusions In this study, an approach based on the combination of fatty acid compounds fingerprinting and multivariate data analysis was developed in order to obtain a model capable of classifying and authentifying Iranian walnuts. To the best of our knowledge, this geographical classification using the entire chromatograms and chemometric tools has not 7

M. Esteki et al.

Chemometrics and Intelligent Laboratory Systems xxx (2017) 1–8

[17] D. Custers, M. Canfyn, P. Courselle, J.O. De Beer, S. Apers, E. Deconinck, Headspace–gas chromatographic fingerprints to discriminate and classify counterfeit medicines, Talanta 123 (2014) 78–88. [18] Y. Liang, P. Xie, F. Chau, Chromatographic fingerprinting and related chemometric techniques for quality control of traditional Chinese medicines, J. Sep. Sci. 33 (2010) 410–421. [19] B.-Y. Li, Y. Hu, Y.-Z. Liang, L.-F. Huang, C.-J. Xu, P.-S. Xie, Spectral correlative chromatography and its application to analysis of chromatographic fingerprints of herbal medicines, J. Sep. Sci. 27 (2004) 581–588. [20] W. Yang, M. Hu, S. Chen, Q. Wang, S. Zhu, J. Dai, X. Li, Identification of adulterated cocoa powder using chromatographic fingerprints of polysaccharides coupled with principal component analysis, Food Anal. Methods 8 (2015) 2360–2367. [21] J. Viaene, M. Goodarzi, B. Dejaegher, C. Tistaert, A. Hoang Le Tuan, N. Nguyen Hoai, M. Chau Van, J. Quetin-Leclercq, Y. Vander Heyden, Discrimination and classification techniques applied on Mallotus and Phyllanthus high performance liquid chromatography fingerprints, Anal. Chim. Acta 877 (2015) 41–50. [22] P.M. Kus, S. van Ruth, Discrimination of Polish unifloral honeys using overall PTRMS and HPLC fingerprints combined with chemometrics, LWT - Food Sci. Technol. 62 (2015) 69–75. [23] J. Zhou, L. Yao, Y. Li, L. Chen, L. Wu, J. Zhao, Floral classification of honey using liquid chromatography–diode array detection–tandem mass spectrometry and chemometric analysis, Food Chem. 145 (2014) 941–949. [24] R. Nescatelli, R.C. Bonanni, R. Bucci, A.L. Magrì, A.D. Magrì, F. Marini, Geographical traceability of extra virgin olive oils from Sabina PDO by chromatographic fingerprinting of the phenolic fraction coupled to chemometrics, Chemom. Intell. Lab. Syst. 139 (2014) 175–180. [25] I. Ortea, J.M. Gallardo, Investigation of production method, geographical origin and species authentication in commercially relevant shrimps using stable isotope ratio and/or multi-element analyses combined with chemometrics: an exploratory analysis, Food Chem. 170 (2015) 145–153. [26] X. Gao, M. Xie, S. Liu, X. Guo, X. Chen, Z. Zhong, L. Wang, W. Zhang, Chromatographic fingerprint analysis of metabolites in natural and artificial

[27]

[28]

[29]

[30]

[31] [32] [33] [34]

[35] [36]

8

agarwood using gas chromatography–mass spectrometry combined with chemometric methods, J. Chromatogr. B 967 (2014) 264–273. A.M. van Nederkassel, C.J. Xu, P. Lancelin, M. Sarraf, D.A. MacKenzie, N.J. Walton, F. Bensaid, M. Lees, G.J. Martin, J.R. Desmurs, D.L. Massart, J. Smeyers-Verbeke, Y. Vander Heyden, Chemometric treatment of vanillin fingerprint chromatograms: effect of different signal alignments on principal component analysis plots, J. Chromatogr. A 1120 (2006) 291–298. E.-C. Shin, B.D. Craft, R.B. Pegg, R.D. Phillips, R.R. Eitenmiller, Chemometric approach to fatty acid profiles in Runner-type peanut cultivars by principal component analysis (PCA), Food Chem. 119 (2010) 1262–1270. A. Tres, C. Ruiz-Samblas, G. van der Veer, S.M. van Ruth, Geographical provenance of palm oil by fatty acid and volatile compound fingerprinting techniques, Food Chem. 137 (2013) 142–150. F. Destaillats, C. Cruz-Hernandez, F. Giuffrida, F. Dionisi, Identification of the botanical origin of pine nuts found in food products by GasLiquid chromatography analysis of fatty acid profile, J. Agric. Food Chem. 58 (2010) 2082–2087. M. Rogozinski, A rapid quantitative esterification technique for carboxylic acids, J. Chromatogr. Sci. 2 (1964) 136–137. K. Grob, S. Rennhard, Evaluation of syringe handling techniques for injections into vaporizing GC injectors, J. High. Resolut. Chromatogr. 3 (1980) 627–633. K. Varmuza, P. Filzmoser, Introduction to Multivariate Statistical Analysis in Chemometrics, CRC press, Taylor & Francis Group, London, 2010. G. Alaerts, S. Pieters, H. Logie, J. Van Erps, M. Merino-Arevalo, B. Dejaegher, J. Smeyers-Verbeke, Y. Vander Heyden, Exploration and classification of chromatographic fingerprints as additional tool for identification and quality control of several Artemisia species, J. Pharm. Biomed. Anal. 95 (2014) 34–46. J. Yang, J. Yang, Why can LDA be performed in PCA transformed space? Pattern Recognit. 36 (2003) 563–566. D. Garrett, D.A. Peterson, C.W. Anderson, M.H. Thaut, Comparison of linear, nonlinear, and feature selection methods for eeg signal classification, IEEE Trans. Neural Syst. Rehabil. Eng. 11 (2003) 141–144.