NMR-based metabolic profiling discriminates the geographical origin of raw sesame seeds

NMR-based metabolic profiling discriminates the geographical origin of raw sesame seeds

Journal Pre-proof NMR-based metabolic profiling discriminates the geographical origin of raw sesame seeds Seok-Young Kim, EunBi Kim, Byeung Kon Shin, ...

1MB Sizes 0 Downloads 23 Views

Journal Pre-proof NMR-based metabolic profiling discriminates the geographical origin of raw sesame seeds Seok-Young Kim, EunBi Kim, Byeung Kon Shin, Jeong-Ah Seo, Young-Suk Kim, Do Yup Lee, Hyung-Kyoon Choi PII:

S0956-7135(20)30029-3

DOI:

https://doi.org/10.1016/j.foodcont.2020.107113

Reference:

JFCO 107113

To appear in:

Food Control

Received Date: 20 August 2019 Revised Date:

9 January 2020

Accepted Date: 12 January 2020

Please cite this article as: Kim S.-Y., Kim E., Shin B.K., Seo J.-A., Kim Y.-S., Lee D.Y. & Choi H.-K., NMR-based metabolic profiling discriminates the geographical origin of raw sesame seeds, Food Control (2020), doi: https://doi.org/10.1016/j.foodcont.2020.107113. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2020 Published by Elsevier Ltd.

Credit Author Statement Seok-Young Kim: Formal analysis, Investigation, Writing - Original Draft, Methodology, EunBi Kim: Investigation, Resources, Byeung Kon Shin: Resources, Validation, Jeong-Ah Seo: Resources, Validation, Young-Suk Kim: Resources, Validation, Do Yup Lee: Conceptualization, Supervision, Validation, Writing – review & editing, Hyung-Kyoon Choi: Conceptualization, Funding acquisition, Project administration, Supervision, Writing – original draft, Writing – review & editing

1

[Original article]

2 3

NMR-based metabolic profiling discriminates the geographical origin of raw sesame

4

seeds

5 6

Seok-Young Kima, EunBi Kima, Byeung Kon Shinb, Jeong-Ah Seoc, Young-Suk Kimd, Do

7

Yup Leee, Hyung-Kyoon Choia,*

8 9

a

College of Pharmacy, Chung-Ang University, Seoul, Republic of Korea

10

b

National Agricultural Products Quality Management Service, Gimcheon, Republic of Korea

11

c

School of Systems Biomedical Science, Soongsil University, Seoul, Republic of Korea

12

d

Department of Food Science and Engineering, Ewha Woman’s University, Seoul, Republic

13

of Korea

14

e

15

Institute for Agricultural and Life Sciences, Seoul National University, Seoul, Republic of

16

Korea

Department of Agricultural Biotechnology, Center for Food and Bioconvergence, Research

17 18

1

19

* Corresponding author.

20

College of Pharmacy, Chung-Ang University, Seoul 06974, Republic of Korea

21

Phone: +82-2-820-5605

22

Fax: +82-2-812-3921

23

E-mail address: [email protected] (Hyung-Kyoon Choi)

24 25 26 27 28 29 30 31

2

32

Abstract

33

Sesame seeds are an oil crop mainly cultivated in Asian and African countries.

34

Identification of the geographical origin of sesame seeds is an important issue for preventing

35

adulteration and for quality assurance. This study was performed to establish a discrimination

36

model and investigate potential biomarkers for differentiating the geographical origin of

37

sesame seeds from Korea, China, and other countries (India, Nigeria, and Ethiopia) by

38

nuclear magnetic resonance -spectroscopy based metabolic profiling. A total of 24 polar

39

metabolites in sesame seeds from 10, 6, and 4 samples of Korea, China, and other countries,

40

respectively, were identified, and an orthogonal partial least squares-discriminant analysis

41

model was established applying a total normalization and unit variance scaling method.

42

Leave-one-out cross validation showed an accuracy of 97.5, 90.0, and 100.0% in

43

differentiating the sesame seed geographical origin. Acetate, phenylalanine, and tryptophan

44

were suggested as potential biomarkers by variable influence on projection value (over 1.0)

45

and area under the curve value (over 0.75). This study demonstrated that 1H-NMR analysis

46

with multivariate and univariate statistical analyses of the polar metabolites in sesame seeds

47

could be successfully applied to discriminate the geographical origin of sesame seeds. These

48

results could be applied to develop a standard analytical process to verify seed origin and halt

49

the global distribution of falsely labeled sesame seeds.

50 51

Keywords: sesame seeds, biomarker, geographical origin, NMR, metabolic profiling

3

52

1. Introduction

53

Asia and Africa account for over 96% of sesame seed production in the world with India,

54

China, and Sudan acting as the major sesame seed producing countries (Dossa et al., 2016).

55

The identification of origin of sesame seeds is an important factor influencing its price

56

(Horacek et al., 2015). The similar appearance of sesame seeds does not allow for visual

57

differentiation of seed origin. Therefore, the development of a precise and objective

58

standardized analytical process that can discriminate between imported and domestic sesame

59

seeds is needed.

60

Metabolomics can be used to distinguish food products with similar chemical composition

61

characteristics that may not be identifiable by appearance and flavor as it enables the

62

identification of food components for food adulteration and quality assessment (Oms-Oliu,

63

Odriozola-Serrano, & Mart´ın-Belloso, 2013). Therefore, the profiling of certain food

64

resources based on the identification and quantification of characteristic metabolites that

65

differ based on geographical origin may be possible (Mazzei & Piccolo, 2012). Direct

66

analysis by nuclear magnetic resonance (NMR) spectroscopy is ideal in high-throughput

67

metabolomics applications as it can detect a wide range of metabolites in an inherently

68

quantitative and unbiased manner (Mahrous & Farag, 2015). Multiple studies have been

69

performed to identify the geographical origins of diverse agricultural food resources using

70

NMR based metabolomics (Huo et al., 2017; Lamanna, Cattivelli, Miglietta, & Troccoli,

71

2011; Ritota et al., 2012).

72

Though sesame seeds are well known as a source of edible oil, they are also a good source

73

of proteins high in sulfur-containing amino acids (3.8-5.5%), which are considered beneficial

74

for supplementing soybean, peanut, and other vegetable proteins (Johnson, Suleiman, &

75

Lucas, 1974; Bandyopadhyay & Ghosh, 2002). Previous studies have focused on 4

76

discriminating the geographical origin of sesame oil using NMR spectroscopy (Jin et al.,

77

2017), near infrared (NIR) spectroscopy (Kim, Scotter, Voyiagis, & Hall, 1998), isotope ratio

78

mass spectrometry (Jeon et al., 2015), gas chromatography-mass spectrometry (GC-MS), and

79

high performance liquid chromatography (HPLC) analyses of fatty acids and lignans (Jeon et

80

al., 2013). However, few studies have been conducted to determine the geographical origin of

81

sesame seeds (as opposed to sesame oil) with a focus on analyzing the polar metabolites in

82

sesame seeds. Kwon and Cho (1998) reported that defatted sesame seeds mainly containing

83

proteins instead of oil were more useful for discriminating the geographical origin of the

84

seeds. Thus, it is anticipated that the polar metabolites in sesame seeds, including amino acids,

85

sugars, and organic acids will play an important role in characterizing sesame seeds

86

cultivated in diverse regions. To the best of our knowledge, no previous studies have focused

87

on discriminating the geographical origin of sesame seeds by profiling polar metabolites

88

using NMR-based metabolic profiling.

89

The objective of this research was to develop a novel model for discriminating the

90

geographical origin of sesame seeds by NMR-based metabolic profiling mainly focusing on

91

polar metabolites. This differs from other studies as they have generally analyzed non-polar

92

metabolites in sesame seed oil. This study hypothesized that the polar metabolites in sesame

93

seeds play an important role in discriminating geographical origin. Therefore, polar

94

metabolite profiling of sesame seeds originating from the Republic of Korea (Korea,

95

hereafter), China, India, Nigeria, and Ethiopia was performed. A discrimination model was

96

established to specify the different geographical origins of sesame seeds using multivariate

97

statistical analysis. Further, potential biomarkers for discriminating the geographical origin of

98

sesame seeds were also suggested. The results of the present study can be applied to the

5

99 100

development of practical and trustworthy platform for preventing the deliberate mislabeling of the geographical origin of sesame seeds.

101 102

2. Materials and Methods

103

2.1. Sesame seed samples and reagents

104

Raw sesame seed samples were obtained from Korea (ten samples), China (six samples),

105

India (one sample), Nigeria (two samples), and Ethiopia (one sample) (Fig. S1). Ten samples

106

of Korean sesame seeds harvested in 2017 were provided by the National Agricultural

107

Products Quality Management Service of Korea. Samples (1.5-2 kg) were collected from

108

each of the following provinces and cities: Gangwon (Chuncheon, Inje), Gyeonggi (Paju,

109

Icheon,), Chungcheong (Choongju, Dangjin), Gyeongsang (Angong, Jinju), and Jeolla (Iksan,

110

Muan). Raw sesame seeds from China (500-600 g) harvested in 2017 were purchased from

111

online suppliers in 2018. Those samples were obtained from the following regions: east-north

112

(Heilongjiang, Jilin), middle (Henan, Anhui), and south (Yunnan, Guangxi Zhuang

113

Autonomous Region). Raw sesame seeds from India, Nigeria, and Ethiopia (500-600 g)

114

harvested in 2017 were obtained from Ottogi Co., Ltd, CJ CheilJedang Co., Ltd, and Daesang

115

Co., Ltd, which are the representative imported sesame suppliers in Korea designated by the

116

Korea

117

(http://www.at.or.kr/home/apen000000/index.action). Manufacturer and supplier information

118

is listed in Supplementary Table S1. All samples were stored at -70°C in a deep freeze (Ilshin,

119

Gyeonggi-do, Korea) until use.

Agro-Fisheries

Trade

Corporation

120

HPLC grade chloroform, methanol, and water were obtained from Fisher Chem Alert (Fair

121

Lawn, NJ, USA). Methanol-d4 (CD3OD, 99.8% atom D) including 0.05% 3-(trimethylsilyl)

122

propionic acid- d4 sodium salt (TSP) was obtained from Cambridge Isotope Laboratories 6

123

(Tewksbury, MA, USA). Phosphate buffer (90 mM) was prepared using potassium phosphate

124

(KH2PO4, ≥99.0%, Sigma-Aldrich, St. Louis, MO, USA) and deuterium oxide (D2O, 99.9%

125

atom D, Sigma-Aldrich). pH was adjusted to 6.0 using sodium deuteroxide solution (NaOD,

126

99.5%, 40% in D2O, Cambridge Isotope Laboratories) (Kim, Choi, & Verpoorte, 2010).

127 128

2.2. Pre-preparation and extraction of sesame seeds

129

Pooled samples collected from each region were quick-frozen in liquid nitrogen,

130

pulverized with blender, and freeze-dried for 48 h. Samples were again stored in a deep

131

freezer until NMR analysis. A modified Bligh & Dyer method was applied to prepare the

132

polar metabolite extracts from sesame seeds (Bligh & Dyer, 1959; Mannina et al., 2008).

133

Lyophilized sesame seeds (0.1 g) in a 2 mL centrifugal tube (Eppendorf tube, Hamburg,

134

Germany) were extracted with 1600 µL of cold chloroform and methanol mixture (1:1) that

135

was vortexed for 1 min, shaken with ice for an hour (80 rpm), then centrifuged (10 000 × g,

136

4 °C, 10 min). The supernatant was collected and 400 µL of 4 °C water was added. The

137

suspension was centrifuged (10 000 × g, 4 °C, 10 min) and the upper hydro-soluble phase

138

was filtered with a PTFE 0.45 µm syringe filter (Whatman, Maidstone, UK). The filtered

139

extract (600 µL) was dried under a gentle nitrogen stream then resuspended in 600 µL of 7:3

140

CD3OD and KH2PO4 buffer in D2O (pH 6.0). The resuspended solutions were transferred to 5

141

mm NMR tubes (Norell Landisville, NJ, USA). Four replicates of sesame seed samples from

142

each region were extracted and analyzed. Quality control (QC) samples (n = 13, pooled

143

sample including equal portions of each sample) were also analyzed to assure instrumental

144

conditions and data quality during the analyses.

145

7

146

2.3. Peak NMR spectra assignment

147

One- and two-dimensional NMR experiments were performed on a JEOL 600 MHz

148

spectrometer (JNM-ECZ 600R, JEOL, Japan) and a Bruker 600 MHz spectrometer (Avance

149

600, Bruker, Germany), respectively. For the one dimensional 1H-NMR experiment, samples

150

were measured at 25 °C and the spectra were acquired at 16 K data points with 5 s of

151

relaxation delay, and a spectral width of 9024.4 Hz. A scan number of 128 and an acquisition

152

time of 1.45 s were used. Water suppression was conducted to exclude the region between δ

153

= 4.7 to 5.0. For two-dimensional NMR spectra, 1H-1H correlation spectroscopy (COSY)

154

spectra were acquired under following conditions: 32 scans, relaxation delay of 1.9 s, and a

155

6068.0 Hz spectral width. 1H–13C heteronuclear single quantum correlation (HSQC) spectra

156

were obtained with 32 scans, a 2.0 s relaxation delay, and 7183.9 and 36231.9 Hz spectral

157

widths. Baseline correction and identification of all 1H NMR spectra was performed using

158

Chenomx NMR suite software (version 8.1, Chenomx, Edmonton, AB, Canada). Peak

159

identification was performed with non-overlapping peaks. MestReNova (version 6.0.4,

160

Mestrelab Research, Santiago de Compostela, Spain) was used to calculate the J value of the

161

peak, and to identify the peaks of the 1H-1H COSY and 1H-13C HSQC spectra.

162 163

2.4. NMR data processing and statistical analyses

164

Binning and normalization of 1H NMR spectral data were performed by Chenomx NMR

165

suite software. NMR spectral data from 0.08 to 10.00 ppm were segmented into a series of

166

bins (245 total) with 0.04 ppm widths, with the exception of the water suppression region at

167

4.68-4.88 ppm. For establishing the optimal model, two normalization (total and standardized

168

area) and two scaling (unit variance (UV) and Pareto) methods were applied and their

169

performances were compared. Relative intensities of binned spectral data in total and 8

170

standardized area normalization were obtained by dividing the spectral data by the total area

171

of all bins and the area of reference peak, respectively. Binned datasets were converted to

172

Microsoft Office Excel (version 2010, Microsoft, Redmond, WA, USA) to create a

173

compatible format for measuring each metabolite by its loading value. Binning values of

174

metabolites with multiple non-overlapping peaks were summed. Then, the data were

175

imported into SIMCA-P+ (version 14.0, Umetrics, Umeå, Sweden) for principle component

176

analysis (PCA) and orthogonal partial least squares-discriminant analysis (OPLS-DA) of

177

sesame seed samples (n = 80). Optimal OPLS-DA models were determined by good-fit, R2Y

178

and predictability, Q2Y, R2Y-intercept values, and Q2Y-intercept values obtained by

179

permutation tests. The number of components was determined using the autofit function in

180

SIMCA P+ software to select the significant number of components. Leave-one-out cross

181

validation was performed to detect and prevent model over-fitting. Sensitivity, specificity,

182

and accuracy were calculated to evaluate the classification performance of the model based

183

on the class prediction value of samples obtained from the leave-one-out cross validation (Y-

184

predcv) using SIMCA-P+ software.

185

The non-parametric Kruskal-Wallis test was performed to compare the relative intensities

186

of sesame seed samples from Korea, China, and other countries (India, Nigeria, and Ethiopia)

187

by SPSS statistical analysis (SPSS, Version 25.0 for Windows, SPSS Inc, Chicago, IL, USA).

188

A p-value < 0.05 was considered statistically significant.

189

To suggest the potential biomarkers for discriminating the geographical origin of sesame

190

seed samples from Korea, China, and other countries (India, Nigeria, and Ethiopia), OPLS-

191

biplots and variable influence on projection (VIP) values were investigated. Metabolites

192

having VIP values over 1.0 were selected, which were generally accepted as the most

193

significant variables contributing to the group separation. In addition, receiver operating 9

194

characteristic (ROC) analysis was conducted to evaluate the predictive performance of

195

potential biomarkers using MetaboAnalyst 4.0 (http://www.metaboanalyst.ca/).

196 197

3. Results

198

3.1. Identification of polar metabolites in sesame seeds by 1H NMR

199

Assigned peaks are listed in Table 1, and the representative NMR spectrum for the extract

200

of polar metabolites in sesame seeds is shown in Fig. 1. Twenty-four polar metabolites

201

including 12 amino acids, 4 sugars, 3 organic acids, and 5 others were identified in sesame

202

seeds by one dimensional NMR analysis. Of the 12 amino acids, isoleucine, leucine, valine,

203

phenylalanine, and tryptophan were identified as essential amino acids. Arabinose, glucose,

204

sucrose, and xylose were sugars identified in sesame seeds. Organic acids including acetate,

205

malate, and succinate were identified. Further metabolite identification was confirmed by

206

two-dimensional NMR spectroscopy. As shown in Supplementary Fig. S2 and Table 1,

207

isoleucine, threonine, asparagine, betaine, xylose, sucrose, uridine, tyrosine, phenylalanine,

208

and tryptophan were assigned in COSY experiments, while glucose, xylose, sucrose, and

209

arabinose were assigned in HSQC experiments.

210 211

3.2. PCA and OPLS-DA for discriminating the geographical origin of sesame seeds

212

In the PCA score plot (Supplementary Fig. S3a), sesame seed samples from each of the

213

three groups, Korea, China, and other countries (India, Nigeria, and Ethiopia), were well

214

separated with partial overlapping. The 13 QC samples were also well clustered in the PCA

215

score plot (Supplementary Fig. S3b), indicating the instrumental stability of NMR

216

spectroscopy and the reliability of the data analyses.

10

217

Further, a supervised classification method, OPLS-DA, established a discriminative and

218

predictive model for determining the geographical origin of sesame seeds and identified

219

potential biomarkers involved in the separation of sesame seeds by region. The optimal

220

OPLS-DA model was established by applying total area normalization, UV scaling, and six

221

components (2 predictive + 4 orthogonal) with the highest R2Y (0.835) and Q2Y (0.769)

222

values as listed in Table 2. The separation of sesame seeds from Korea, China, and other

223

countries (India, Nigeria, and Ethiopia) was clear in the two predictive components of the

224

OPLS-DA score plot (Fig. 2a). With the first predictive component, Korean and Chinese

225

sesame seeds were clustered and clearly separated from those from India, Nigeria, and

226

Ethiopia. After performing 999 permutation tests to avoid the possibility of random group

227

designations, satisfactory R2Y and Q2Y intercept values of 0.155 and -0.330, respectively,

228

were obtained as shown in Fig. 2b.

229

Leave-one-out cross validation evaluated the classification rates of sesame seeds from

230

Korea, China, and other countries according to geographical origin. Three cases of leave-one-

231

out cross validation for the three groups were performed and the results are shown in Fig. 2c-

232

e. When Korean sesame seeds were used as a control group, only two samples (China and

233

Korea) were misclassified with a threshold value of 0.5, which resulted in 97.5% of

234

sensitivity, specificity, and accuracy (Fig. 2c). When Chinese sesame seeds were designated

235

as a control group, eight misclassified samples (two from Korea and six from China) were

236

identified with 96.4, 75.0, and 90.0% of sensitivity, specificity, and accuracy, respectively

237

(Fig. 2d). Sesame seed samples from India, Nigeria, and Ethiopia were clearly separated from

238

the other groups with 100% of sensitivity, specificity, and accuracy (Fig. 2e).

239 240

3.3. Potential biomarkers for discriminating sesame seed geographical origin 11

241

As shown in Fig. 3a, phenylalanine, tryptophan, and acetate were found to be the most

242

relevant biomarkers as the plots of those were markedly positioned within the circles

243

coordinated at radius of 1.0 and 0.75, and those plots were near the plots representing

244

samples from India, Nigeria, and Ethiopia. In this biplot, the combination of X-variables

245

(metabolites), Y-variables (group information), and observations were simultaneously

246

displayed in a two-dimensional space to describe the grouping of observations and the

247

correlation of the variables (Hesaka et al., 2019). Three ellipses graphed at the inner (0.5),

248

middle (0.75), and outer (1.0) areas of the plot coordinates show the explained variance of 50,

249

75, and 100%, respectively (Lutsiv, McGinley, Neil, & Thompson, 2019). Variables

250

positioned near observations describe high level metabolites in the sample groups, while low

251

level metabolites are in an opposite position to the sample groups. The closer the variables

252

are to the outer circle of the plot, the better they are described by the model components

253

(Thompson et al, 2016).

254

Univariate statistical analysis showed that significantly higher levels of betaine, glycine,

255

isoleucine, phenylalanine, tryptophan, acetate, malate, arabinose, glucose, sucrose, ascorbate,

256

choline, gallate, and uridine were observed in sesame seeds from India, Nigeria, and Ethiopia.

257

Of these, only four metabolites (phenylalanine, tryptophan, acetate, and uridine) showed

258

significantly different levels among three groups (Table 3). Variables with a VIP cut off

259

value over 1.0 were tyrosine (1.71), acetate (1.35), threonine (1.29), valerate (1.28),

260

phenylalanine (1.27), tryptophan (1.23), glucose (1.20), glycine (1.13), proline (1.09), malate

261

(1.05), and arabinose (1.02) as listed in Table 3. Of those metabolites, acetate, phenylalanine,

262

and tryptophan were also suggested as potential biomarkers based on their higher VIP cut off

263

values. Subsequently, ROC analysis showed that AUC values ranged from 0.845 to 0.894,

264

indicating the acceptable accuracy of potential biomarkers when separating Korean and 12

265

Chinese sesame seeds (Fig. 3b-d). When separating sesame seeds from Korea and other

266

countries, AUC values over 0.9 (0.977-0.998) showed excellent accuracy (Fig. 3e-g), and

267

AUC values ranging from 0.751-0.974 were obtained when separating sesame seeds from

268

China and other countries (Fig. S4a-c).

269 270

4. Discussion

271

Published studies have analyzed the oil components in sesame seeds by NMR

272

spectroscopy to discriminate different producing regions (Zhang, Zhao, Shen, Zhong, & Feng,

273

2018), detect adulterated seed oil (Vigli, Philippidis, Spyros, & Dais, 2003; Nam et al., 2014;

274

Kim et al., 2015), and investigate oxidative stability and ozone reactivity (Guillén & Ruiz,

275

2004; Sega et al., 2010). However, no reports have analyzed the polar metabolites in sesame

276

seeds by NMR-based metabolic profiling. This study used NMR-based metabolic profiling of

277

the polar metabolites in sesame seeds to determine their use as potential biomarkers in

278

discriminating the geographical origin of sesame seeds. Amino acids were the most identified

279

metabolites in sesame seeds by 1H NMR. Betaine, proline, threonine, and glycine showed

280

higher relative intensities in sesame seeds (Table 3). Sesame seeds have a high protein

281

content, especially the essential amino acids leucine, isoleucine, valine, and threonine (Kanu,

282

2011). Five essential amino acids in sesame seeds, leucine, isoleucine, threonine, tryptophan,

283

and valine, were identified in our study (Table 1).

284

Sesame seeds contain 18-20% (w/w) carbohydrates consisting mainly of glucose (3.2%),

285

fructose (2.6%), and sucrose (0.2%) as free sugars (Namiki, 2007; Anilakumar, Pal, Khanum,

286

& Bawa, 2010). Arabinose, xylose, galactose, and mannose were also found in sesame seeds

287

(Ghosh et al, 2005). Arabinose and glucose were the main sesame seed metabolites identified

13

288

in our study (Table 3). Interestingly, acetate, malate, and succinate were identified as organic

289

acids in sesame seeds by NMR spectroscopy for the first time in this study.

290

The quality and chemical composition of sesame seed oil have been reported to vary based

291

on diverse factors such as cultivars, land, processing, and cultivation regions (Deng et al.,

292

2012). In our study, the characteristics of polar metabolites in sesame seeds were very

293

different depending on the cultivation conditions and growth region. Although the cultivar

294

information of sesame seed samples could not be obtained in this study, it is thought that

295

those differences might be derived from the different landrace cultivars, considering that

296

unique landrace genetic information vary depending on local adaptations (Kang et al., 2006).

297

In addition, since 1978 and 1950, Korea and China, respectively, have developed and

298

cultivated diverse modern sesame cultivars, replacing historical landraces (Kang et al., 2006;

299

Zhang, Sun, Zhang, Wang, & Che, 2011). African countries, including Ethiopia and Nigeria

300

also initiated and performed sesame seed breeding research in the late 1960s and 1965-1973,

301

respectively, for developing conventional sesame seed cultivars. These cultivars are high-

302

yield genotypes with improved nutritional quality and resistance to environmental stresses

303

such as drought, heat, pests and disease (Alegbejo, Iwo, Abo, & Idowu, 2003; Daniel, 2017).

304

Owing to artificial selection, these modern sesame cultivars are assumed to have a lower

305

environmental adaptation capability than landraces (Yu et al., 2019). In particular, modern

306

sesame cultivars in China contain unique genes for energy metabolism (oxidative

307

phosphorylation and photosynthesis), lipid metabolism (oil and fatty acid synthesis), and

308

amino acid metabolism (cysteine and methionine metabolism), which are mainly relevant to

309

seed quality-related traits rather than disease resistance and environmental adaptation (Yu et

310

al., 2019). Accordingly, different metabolite characteristics in sesame seed samples from

311

Korea, China, and other countries (India, Nigeria, and Ethiopia) might be affected by the 14

312

inherent genetic characteristics of modern cultivars, which have been differently developed,

313

selected, and cultivated in those countries.

314

Some outliers in the PCA-derived score plots (Supplementary Fig. S3a) were from China-

315

derived samples, which were all collected from Heilongjiang province, located in

316

northeastern China. The extremely contrasting climate seen in Heilongjiang province

317

compared to other regions might affect the metabolic profiles of sesame seed samples. In the

318

middle and lower regions of the Yangtze River Valley, a heavy rainfall trend dominates in

319

summer and winter seasons, whereas a trend of drying and drought has appeared in north and

320

northeast China (Hu, Yang, & Wu, 2003). Further, the largest warming trend has appeared in

321

northern China with an average temperature increase of 0.36 °C per decade, while southwest

322

China has shown the smallest warming trend (0.15 °C increase per decade) (Hu et al., 2003;

323

Piao et al., 2010). Based on those reports, the unique metabolite characteristics of sesame

324

seeds cultivated in Heilongjiang province might result from the regional climate traits of a

325

warming and drying trend with low precipitation in northeastern China.

326

OPLS-DA was used in this study to discriminate and predict the geographical origins of

327

sesame seeds sourced from Korea, China, and other countries (India, Nigeria, and Ethiopia).

328

As shown in the OPLS-DA derived score plot (Fig. 2a), sesame seeds obtained from Korea,

329

China, and other countries (India, Nigeria, and Ethiopia) were well separated by applying

330

predetermined optimal data preprocessing methods. The optimal conditions for OPLS-DA

331

model development selected the total area normalization and UV scaling method with the

332

highest R2Y and Q2Y values (Table 2). Data preprocessing steps in metabolomics studies are

333

very important to minimize the systematic intensity variations within all variables that could

334

mask relevant biological differences. Thus, optimal normalization and scaling methods

335

should be selected to reduce unwanted systematic errors in signal intensity measurements, 15

336

retaining the meaningful biological information (Skov, Honore, Jensen, Næs, & Engelsen,

337

2014).

338

For further validation of the model, leave-one-out cross validation was performed (Fig. 2c-

339

e). Sensitivity is the parameter that measures the ability of the model to correctly classify

340

cases, whereas specificity measures the predictive ability of the model to correctly classify

341

the controls (Szymanska, Saccenti, Smilde, & Westerhuis, 2012). As shown in Fig. 2c-e, high

342

values of over 90.0% accuracy were obtained in three cases of validation testing to

343

differentiate sesame seed samples from Korea, China, and other countries (India, Nigeria, and

344

Ethiopia). This indicates that the OPLS-DA model established in this study could be used to

345

determine the geographical origin of sesame seeds with high sensitivity, specificity, and

346

accuracy.

347

Phenylalanine, tryptophan, and acetate were the potential biomarkers identified by the

348

OPLS-biplot that discriminated the geographical origin of sesame seeds from Korea, China,

349

and other countries (India, Nigeria, and Ethiopia) in this study. These potential biomarkers

350

were further validated using a ROC curve analysis. Higher AUC values over 0.75 identified

351

in three potential biomarkers (phenylalanine, tryptophan, and acetate) indicated that these

352

metabolites contributed considerably to determining the geographical origin of sesame seeds

353

from Korea, China, and other countries (India, Nigeria, and Ethiopia).

354

This study suggests that the polar metabolites in sesame seeds can be used to discriminate

355

geographical origin, differentiating our study from others that focus on the non-polar lipid

356

metabolites in sesame seeds to determine geographical origin.

357 358

5. Conclusions

16

359

This is the first study to develop a discriminatory predictive model to determine the

360

geographic origin of sesame seeds from Korea, China, and other countries (India, Nigeria,

361

and Ethiopia) by NMR based metabolomics focusing on polar metabolites. Twenty-four polar

362

metabolites were identified, and amino acids had the highest proportion. Total area

363

normalization and UV scaling methods established the optimal OPLS-DA model with 97.5,

364

90.0, and 100.0% accuracy in leave-one-out cross validation testing, indicating a stable and

365

usable model. As potential biomarkers, the relative levels of phenylalanine, tryptophan, and

366

acetate differed significantly among the three groups. This study suggests an effective, novel

367

strategy for developing a practical and trustworthy platform that can be used to remedy the

368

emerging social problem of false sesame seed sourcing information. As this study was limited

369

by the restricted number of samples, regions, and countries that sesame seeds could be

370

obtained from, further studies should analyze additional samples cultivated in diverse

371

countries to further evaluate the effectiveness of this method for practical application.

372 373

Declarations of interest: none.

374 375

Acknowledgments

376

This work was supported by the National Research Foundation of Korea (NRF) grant

377

funded by the Korean government (MSIP) [NRF-2015R1A5A1008958], and the Korea

378

Institute of Planning and Evaluation for Technology in Food, Agriculture, Forestry and

379

Fisheries (IPET) through Advanced Production Technology Development Program, funded

380

by the Ministry of Agriculture, Food, and Rural Affairs (MAFRA) [316081-04].

17

381

References

382

Alegbejo, M. D., Iwo, G. A., Abo, M. E., & Idowu, A. A. (2003). Sesame: a potential

383

industrial and export oilseed crop in Nigeria. Journal of Sustainable Agriculture, 23, 59–76.

384

https://doi.org/10.1300/J064v23n01_05.

385

Anilakumar, K. R., Pal, A., Khanum, F., & Bawa, A. S. (2010). Nutritional, medicinal and

386

industrial uses of sesame (Sesamum indicum L.) seeds-an overview. Agriculturae Conspectus

387

Scientificus, 75, 159–168.

388

Bandyopadhyay, K., & Ghosh, S. (2002). Preparation and characterization of papain-

389

modified sesame (Sesamum indicum L.) protein isolates. Journal of Agricultural and Food

390

Chemistry, 50, 6854–6857. https://doi.org/10.1021/jf020320x.

391

Bligh, E. G., & Dyer, W. J. (1959). A rapid method of total lipid extraction and

392

purification.

393

https://doi.org/10.1139/o59-099.

394 395

Canadian

Journal

of

Biochemistry

and

Physiology,

37,

911–917.

Daniel, E. G. (2017). Sesame (Sesamum indicum L.) breeding in Ethiopia. International Journal of Novel Research in Life Sciences, 4, 1–11.

396

Deng, D. H., Xu, L., Ye, Z. H., Cui, H. F., Cai, C. B., & Yu, X. P. (2012). FTIR

397

spectroscopy and chemometric class modeling techniques for authentication of Chinese

398

sesame Oil. Journal

399

https://doi.org/10.1007/s11746-011-2004-8.

of the American

Oil Chemists' Society, 89, 1003–1009.

400

Dossa, K., Wei, X., Zhang, Y., Fonceka, D., Yang, W., Diouf, D., et al. (2016). Analysis

401

of genetic diversity and population structure of sesame accessions from Africa and Asia as

402

major centers of its cultivation. Genes, 7, 1–14. https://doi.org/10.3390/genes7040014.

18

403

Ghosh, P., Ghosal, P., Thakur, S., Lerouge, P., Loutelier-Bourhis, C. L., Driouich, A., et al.

404

(2005). Polysaccharides from Sesamum indicum meal: Isolation and structural features. Food

405

Chemistry, 90, 719–726. https://doi.org/10.1016/j.foodchem.2004.04.032.

406

Guillén, M. D. & Ruiz, A. (2004). Formation of hydroperoxy- and hydroxyalkenals during

407

thermal oxidative degradation of sesame oil monitored by proton NMR. European Journal of

408

Lipid Science and Technology, 106, 680–687. https://doi.org/10.1002/ejlt.200401026.

409

Hesaka, A., Sakai, S., Hamase, K., Ikeda, T., Matsui, R., Mita, M., et al. (2019). D-Serine

410

reflects

kidney

function

and

diseases.

411

https://doi.org/10.1038/s41598-019-41608-0.

Scientific

Reports,

9,

5104–5111.

412

Horacek, M., Hansel-Hohl, K., Burg, K., Soja, G., Okello-Anyanga, W., & Fluch, S.

413

(2015). Control of origin of sesame oil from various countries by stable isotope analysis and

414

DNA

415

https://doi.org/10.1371/journal.pone.0123020.

416

based

markers-A

pilot

study.

PLoS

One,

10,

e0123020.

Hu, Z. Z., Yang, S., & Wu, R. G. (2003). Long-term climate variations in China and

417

global

warming

signals.

Journal

418

https://doi.org/10.1029/2003JD003651.

of

Geophysical

Research,

108,

4614–4626.

419

Huo, Y. Q., Kamal, G. M., Wang, J., Liu, H. L., Zhang, G. N., Hu, Z. Y., et al. (2017). 1H

420

NMR-based metabolomics for discrimination of rice from different geographical origins of

421

China. Journal of Cereal Science, 76, 243–252. https://doi.org/10.1016/j.jcs.2017.07.002.

422

Jeon, H., Kim, I. H., Lee, C., Choi, H. D., Kim, B. H., & Akoh, C. C. (2013).

423

Discrimination of origin of sesame oils using fatty acid and lignan profiles in combination

424

with canonical discriminant analysis. Journal of the American Oil Chemists' Society, 90,

425

337–347. https://doi.org/10.1007/s11746-012-2159-y.

19

426

Jeon, H., Lee, S. C., Cho, Y. J., Oh, J. H., Kwon, K., & Kim, B. H. (2015). A triple-

427

isotope approach for discriminating the geographic origin of Asian sesame oils. Food

428

Chemistry, 167, 363–369. https://doi.org/10.1016/j.foodchem.2014.07.032.

429

Jin, G., Kim, J., Lee, Y., Kim, J., Akoh, C. C., Chun, H. S., et al. (2017). A nuclear

430

magnetic resonance spectroscopy approach to discriminate the geographic origin of roasted

431

Asian

432

https://doi.org/10.5650/jos.ess16154.

433

sesame

oils.

Journal

of

Oleo

Science,

66,

337–344.

Johnson, L. A., Suleiman, M., & Lucas, E. N. (1974). Sesame protein: a review and

434

prospectus.

Journal

of

the

American

435

https://doi.org/10.1007/BF02671542.

Oil

Chemists'

Society,

56,

463–468.

436

Kang, C. W., Kim, S. Y., Lee, S. W., Mathur, P. N., Hodgkin, T., Zhou, M. D., et al.

437

(2006). Selection of a core collection of Korean sesame germplasm by a stepwise clustering

438

method. Breeding Science, 56, 85–91. https://doi.org/10.1270/jsbbs.56.85.

439

Kanu, P. J. (2011). Biochemical analysis of black and white sesame seeds from China.

440

American

441

https://doi.org/10.3923/ajbmb.2011.145.157.

442 443

Journal

of

Biochemistry

and

Molecular

Biology,

1,

145–157.

Kim, H. K., Choi, Y. H., & Verpoorte, R. (2010). NMR-based metabolomic analysis of plants. Nature Protocols, 5, 536–549. https://doi.org/10.1038/nprot.2009.237.

444

Kim, J., Jin, G., Lee, Y., Chun, H. S., Ahn, S., & Kim, B. H. (2015). Combined analysis of

445

stable isotope, 1H NMR, and fatty acid to verify sesame oil authenticity. Journal of

446

Agricultural and Food Chemistry, 63, 8955–8965. https://doi.org/10.1021/acs.jafc.5b04082.

447

Kim, Y., Scotter, M., Voyiagis, M., & Hall, M. (1998). Potential of NIR spectroscopy for

448

discriminating the geographical origin of sesame oil. Food Science and Biotechnology, 7, 18–

449

22. 20

450 451

Kwon, Y. S., & Cho, R. K. (1998). Identification of geographical origin of sesame seeds by near infrared spectroscopy, Applied Biological Chemistry, 41, 240–246.

452

Lamanna, R., Cattivelli, L., Miglietta, M. L., & Troccoli, A. (2011). Geographical origin

453

of durum wheat studied by 1H-NMR profiling. Magnetic Resonance in Chemistry, 49, 1–5.

454

https://doi.org/10.1002/mrc.2695.

455

Lutsiv, T., McGinley, J. N., Neil, E. S., & Thompson, H. J. (2019). Cell signaling

456

pathways in mammary carcinoma induced in rats with low versus high inherent aerobic

457

capacity. International Journal of Molecular Sciences, 20, 1506–1520. https://doi:

458

10.3390/ijms20061506.

459

Mahrous, E. A., & Farag, M. (2015). Two dimensional NMR spectroscopic approaches for

460

exploring plant metabolome: a review. Journal of Advanced Research, 6, 3–15.

461

https://doi.org/10.1016/j.jare.2014.10.003.

462

Mannina, L., Sobolev, A. P., Capitani, D., Iaffaldano, N., Rosato, M. P., Ragni, P., et al.

463

(2008). NMR metabolic profiling of organic and aqueous sea bass extracts: implications in

464

the

465

https://doi.org/10.1016/j.talanta.2008.07.006.

discrimination

of

wild

and

cultured

sea

bass.

Talanta,

77,

433–444.

466

Mazzei, P., & Piccolo, A. (2012). 1H HRMAS-NMR metabolomic to assess quality and

467

traceability of mozzarella cheese from Campania buffalo milk. Food Chemistry, 132, 1620–

468

1627. https://doi.org/10.1016/j.foodchem.2011.11.142.

469

Nam, Y. S., Noh, K. C., Roh, E. J., Keum, G., Lee, Y., & Lee, K. B. (2014).

470

Determination of edible vegetable oil adulterants in sesame oil using 1H nuclear magnetic

471

resonance

472

https://doi.org/10.1080/00032719.2013.865199.

spectroscopy.

Analytical

21

Letters,

47,

1190–1200.

473 474

Namiki, M. (2007). Nutraceutical functions of sesame: a review. Critical Reviews in Food Science and Nutrition, 47, 651–673. https://doi.org/10.1080/10408390600919114.

475

Oms-Oliu, G., Odriozola-Serrano, I., & Mart´ın-Belloso, O. (2013). Metabolomics for

476

assessing safety and quality of plant-derived food. Food Research International, 54, 1172–

477

1183. https://doi.org/10.1016/j.foodres.2013.04.005.

478

Piao, S., Ciais, P., Huang, Y., Shen, Z., Peng, S., Li, J., et al. (2010). The impacts of

479

climate change on water resources and agriculture in China. Nature, 467, 43–51.

480

https://doi.org/10.1038/nature09364.

481

Ritota, M., Casciani, L., Han, B. Z., Cozzolino, S., Leita, L., Sequi, P., et al. (2012).

482

Traceability of Italian garlic (Allium sativum L.) by means of HRMAS-NMR spectroscopy

483

and

484

https://doi.org/10.1016/j.foodchem.2012.05.032.

multivariate

data

analysis.

Food

Chemistry,

135,

684–693.

485

Sega, A., Zanardi, I., Chiasserini, L., Gabbrielli, A., Bocci, V., & Travagli, V. (2010).

486

Properties of sesame oil by detailed 1H and 13C NMR assignments before and after ozonation

487

and their correlation with iodine value, peroxide value, and viscosity measurements.

488

Chemistry

489

https://doi.org/10.1016/j.chemphyslip.2009.10.010.

and

Physics

of

Lipids,

163,

148–156.

490

Skov, T., Honore, A. H., Jensen, H. M., Næs, T., & Engelsen, S. B. (2014). Chemometrics

491

in foodomics: handling data structures from multiple analytical platforms. TrAC Trends in

492

Analytical Chemistry, 60, 71–79. https://doi.org/10.1016/j.trac.2014.05.004.

493

Szymanska, E., Saccenti, E., Smilde, A. K., & Westerhuis, J. A. (2012). Double-check:

494

Validation of diagnostic statistics for PLS-DA models in metabolomics studies.

495

Metabolomics, 8(Suppl 1), 3–16. https://doi.org/10.1007/s11306-011-0330-3.

22

496

Thompson, H. J., Neuhouser, M. L., Lampe, J. W., McGinley, J. N., Neil, E. S., Schwartz,

497

Y., et al. (2016). Effect of low or high glycemic load diets on experimentally induced

498

mammary carcinogenesis in rats. Molecular Nutrition & Food Research. 60, 1416–1426.

499

http://doi: 10.1002/mnfr.201500864.

500

Vigli, G., Philippidis, A., Spyros, A., & Dais, P. (2003). Classification of edible oils by 31

P and 1H NMR spectroscopy in combination with multivariate statistical

501

employing

502

analysis. a proposal for the detection of seed oil adulteration in virgin olive oils. Journal of

503

Agricultural and Food Chemistry, 51, 5715–5722. https://doi.org/10.1021/jf030100z.

504

Yu, J., Golicz, A. A., Lu, K., Dossa, K., Zhang, Y., Chen, J., et al. (2019). Insight into the

505

evolution and functional characteristics of the pan-genome assembly from sesame landraces

506

and

507

https://doi.org/10.1111/pbi.13022.

modern

cultivars.

Plant

Biotechnology

Journal,

17,

881–892.

508

Zhang, Y., Zhao, Y., Shen, G., Zhong, S., & Feng, J. (2018). NMR spectroscopy in

509

conjugation with multivariate statistical analysis for distinguishing plant origin of edible oils.

510

Journal

511

https://doi.org/10.1016/j.jfca.2018.03.006.

of

Food

Composition

and

Analysis,

69,

140–148.

512

Zhang, Y. X., Sun, J. A., Zhang, X. R., Wang, L. H., & Che, Z. (2011). Analysis on

513

genetic diversity and genetic basis of the main sesame cultivars released in China.

514

Agricultural Sciences in China, 10, 509–518. https://doi.org/10.1016/S1671-2927(11)60031-

515

X.

23

516

Figure captions

517

Fig. 1. Representative one dimensional 1H-NMR spectra of sesame seed samples.

518 519

Fig. 2. Development of a multivariate statistical model for discriminating the

520

geographical origin of sesame seeds from Korea, China, and other countries (India,

521

Nigeria, and Ethiopia). (a) OPLS-DA score plot of sesame seeds from Korea (blue), China

522

(red), and other countries (yellow) using six components (two predictive and four orthogonal

523

components). (b) A test plot using 999 permutations for the OPLS-DA model. R2Y (green

524

circle) and Q2Y (blue square) are shown in both original and permuted values with the R2Y

525

and Q2Y intercept values. Leave-one-out cross validation plots for differentiating sesame

526

seeds from Korea (c), China (d), and other countries (e) to test OPLS-DA model accuracy

527

showing calculated predicted Y values after cross validation. Misclassified samples are

528

marked as a circle with a threshold value of 0.5 on the Y-axis. Calculated values for

529

sensitivity, specificity, and accuracy are shown in the table.

530 531

Fig. 3. OPLS-biplot and receiver operating characteristic (ROC) curves for

532

discriminating the geographical origin of sesame seeds using three individual

533

metabolites. (a) OPLS biplot revealing the correlation of all metabolites (X-variables),

534

sample clusters (observations), and group information (Y-variables). Loading vectors of pq

535

(combination of p, X-variable and q, Y-variable vectors) and a score vector of t are displayed

536

as correlation scaled pq(corr) and t(corr) with the first and second predictive component.

537

ROC curves for acetate (b), phenylalanine (c), and tryptophan (d) on discriminating sesame

538

seeds from Korea and China. ROC curves for acetate (e), tryptophan (f), and phenylalanine (g)

539

on discriminating sesame seeds from Korea and other countries (India, Nigeria, and Ethiopia). 24

540

The area under curve (AUC) values for each metabolite are presented with the true positive

541

rate (sensitivity) against the false positive rate (specificity). The best cut off value is

542

determined as the nearest point to the upper left corner in the graph.

25

543

Table 1. Peak assignment of NMR spectra in sesame seeds. No.

Compounds

Chemical shift (multiplicity, J value)

1

Valerate

0.87 (t, J = 7.2), 1.28-1.32 (m), 2.14 (t, J = 7.2)

Assignment method 1D

2

Isoleucine

0.94 (t, J = 7.2), 0.99 (d, J = 7.2)

1D, COSY

3

Leucine

0.96 (t, J = 6.0)

1D

4

Valine

5

Threonine

1.33 (d, J = 6.6), 3.60 (d, J = 4.8)

1D, COSY

6

Alanine

1.48 (d, J = 7.2), 3.79 (q, J = 7.2)

1D

7

Acetate

1.90 (s)

1D

8

Proline

2.01-2.07 (m), 2.00-2.07 (m), 2.29-2.37 (m)

1D

9

Malate

2.38 (dd, J = 14.4, 10.8), 2.67 (dd, J = 15.6, 3)

1D

10

Succinate

2.39 (s)

1D

11

Asparagine

2.92-2.98 (m)

1D, COSY

12

Choline

3.22 (s), 3.48-3.51 (m)

1D

13

Betaine

3.24 (s), 3.89 (s),

1D, COSY

14

Glucose

15

Xylose

3.41(t, J = 9.0)

16

Glycine

3.55 (s)

1D

17

Ascorbate

3.70-3.76 (m)

1D

18

Sucrose

3.75 (t, J = 9.6), 3.79-3.84 (m), 3.81-3.85 (m),

1D, COSY,

3.82-3.87 (m), 4.02 (t, J = 8.4), 5.40 (d, J = 3.6)

HSQC

19

Arabinose

3.77 (dd, J = 11.4, 3.6), 3.80 (dd), 4.00-4.04 (m)

1D, HSQC

20

Uridine

21

Tyrosine

6.88-6.91 (m), 7.15-7.18 (m)

1D, COSY

22

Gallate

7.03 (s)

1D

23

Phenylalanine

7.33 (d, J = 7.2), 7.35-7.40 (m),

1D, COSY

0.96 (d, J = 6.6), 1.02 (d, J = 7.2), 3.60 (d, J = 4.2)

3.38-3.42 (m), 3.40-3.44 (m), 3.51-3.55 (dd, J = 10.2, 3.6), 3.82-3.86 (m)

5.88 (d, J = 9.0), 5.89 (d, J = 4.8), 7.84 (d, J = 7.8)

26

1D

1D, HSQC 1D, COSY, HSQC

1D, COSY

24

Tryptophan

7.15-7.19 (m), 7.32 (s), 7.53 (d, J = 7.8), 7.71 (d, J = 7.8)

s, singlet; d, doublet; dd, doublet of doublet; t, triplet; q, quartet; m, multiplet 544

27

1D, COSY

545

Table 2. Selection of the optimal OPLS-DA model based on various normalization and

546

scaling methods for discriminating the geographical origin of sesame seeds.

Scaling method

Component number

R2Y

Q2 Y

R2Y intercept

Q2 Y intercept

UV

2+1

0.647

0.589

0.040

-0.134

2

Par

2+2

0.690

0.575

0.068

-0.167

3

UV

2+4

0.835

0.769

0.155

-0.330

Par

2+4

0.810

0.753

0.114

-0.272

Group No.

Normalization method

1 Standardized area

Total area 4 UV, unit variance; Par, pareto 547

28

548

Table 3. Relative intensity of polar metabolites in sesame seed samples from Korea,

549

China, and other countries and variable influence on projection (VIP) values derived

550

from the OPLS-DA model for discriminating the geographical origin of sesame seeds.

551

Compounds Amino acids alanine asparagine betaine glycine isoleucine leucine phenylalanine proline threonine tyrosine tryptophan valine Organic acids acetate malate succinate Sugars arabinose glucose sucrose xylose Others ascorbate choline gallate uridine valerate Relative intensities in

552

obtained from peak intensity of NMR spectra with total area normalization multiplied by

553

1,000. *, #, and & indicate significant difference in the values between Korea and China,

554

Korea and other countries, and China and other countries, respectively determined by non-

555

parametric Kruskal-Wallis test (p< 0.05). NS means not significant.

Korea

China

Others

VIP value

0.56 ± 0.17 NS 0.86 ± 0.24 NS 10.18 ± 1.28 2.04 ± 0.84 0.37 ± 0.13 0.59 ± 0.27NS 0.37 ± 0.09* 7.27 ± 1.68 6.53 ± 1.22 0.88 ± 0.18* 0.59 ± 0.14* 0.59 ± 0.24 NS

0.62 ± 0.16 NS 0.84 ± 0.23 NS 10.66 ± 2.95& 2.07 ± 0.84& 0.36 ± 0.09& 0.52 ± 0.21 NS 0.53 ± 0.14& 6.54 ± 0.58& 6.02 ± 1.17& 1.65 ± 0.53 0.80 ± 0.18& 0.62 ± 0.18 NS

0.58 ± 0.10 NS 0.89 ± 0.09 NS 12.34 ± 0.91# 4.62 ± 1.17# 0.48 ± 0.12# 0.49 ± 0.09 NS 0.61 ± 0.08# 3.42 ± 0.47# 1.20 ± 0.18# 1.43 ± 0.16# 1.05 ± 0.10# 0.61 ± 0.12 NS

0.60 0.15 0.56 1.13 0.58 0.34 1.27 1.09 1.29 1.71 1.23 0.13

1.24 ± 0.38* 0.29 ± 0.08 3.60 ± 1.20

1.88 ± 0.31& 0.35 ± 0.09& 3.97 ± 1.90&

2.66 ± 0.28# 0.53 ± 0.08# 1.17 ± 0.10#

1.35 1.05 0.97

23.96 ± 5.26 25.20 ± 2.14 15.04 ± 2.94 11.68 ± 1.63

21.97 ± 6.15& 25.11 ± 5.90& 13.72 ± 3.94& 10.93 ± 1.87&

32.41 ± 2.62# 40.62 ± 6.98# 16.85 ± 1.25# 12.51 ± 1.22

1.02 1.20 0.83 0.67

13.84 ± 2.49 13.39 ± 3.74& 18.28 ± 2.31# 9.95 ± 1.15 9.94 ± 1.94& 12.60 ± 1.01# 0.31 ± 0.10 0.35 ± 0.08& 0.53 ± 0.08# 1.00 ± 0.15# 0.63 ± 0.18* 0.79 ± 0.19& & 21.42 ± 3.84 21.21 ± 6.73 2.80 ± 0.59# the table represent the mean ± standard deviation of

29

0.84 0.97 0.96 0.99 1.28 binning values

556 557

Fig. 1

30

558 559

Fig. 2

31

560 561

Fig. 3 32

Highlights •

Polar metabolites evaluated by NMR spectroscopy differentiated sesame seed origins



An OPLS-DA model successfully discriminated the geographic seed origin



Acetate, phenylalanine, and tryptophan differentiated seed origin



Seed origin determined with an accuracy of 97.5, 90.0, and 100.0%