Structure-based screening for discovery of sweet compounds

Structure-based screening for discovery of sweet compounds

Journal Pre-proofs Structure-based screening for discovery of sweet compounds Yaron Ben Shoshan-Galeczki, Masha Y. Niv PII: DOI: Reference: S0308-814...

4MB Sizes 0 Downloads 17 Views

Journal Pre-proofs Structure-based screening for discovery of sweet compounds Yaron Ben Shoshan-Galeczki, Masha Y. Niv PII: DOI: Reference:

S0308-8146(20)30134-5 https://doi.org/10.1016/j.foodchem.2020.126286 FOCH 126286

To appear in:

Food Chemistry

Received Date: Revised Date: Accepted Date:

15 October 2019 10 January 2020 21 January 2020

Please cite this article as: Ben Shoshan-Galeczki, Y., Niv, M.Y., Structure-based screening for discovery of sweet compounds, Food Chemistry (2020), doi: https://doi.org/10.1016/j.foodchem.2020.126286

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2020 Published by Elsevier Ltd.

1

Structure-based screening for discovery of sweet compounds

2

Yaron Ben Shoshan-Galeczki and Masha Y Niv*

3

The Institute of Biochemistry, Food and Nutrition, The Robert H Smith Faculty of

4

Agriculture, Food and Environment, The Hebrew University, 76100 Rehovot and The

5

Fritz Haber Center for Molecular Dynamics, The Hebrew University, Jerusalem, 91904,

6

Israel.

7

*correspondence to [email protected]

8

Abstract

9 10 11 12 13 14 15 16 17 18 19 20

Sweet taste is a cue for calorie-rich food and is innately attractive to animals, including humans. In the context of modern diets, attraction to sweetness presents a significant challenge to human health. Most known sugars and sweeteners bind to the Venus Fly Trap domain of T1R2 subunit of the sweet taste heterodimer. Because the sweet taste receptor structure has not been experimentally solved yet, a possible approach to finding sweet molecules is virtual screening using compatibility of candidate molecules to homology models of sugar-binding site. Here, the constructed structural models, docking and scoring schemes were validated by their ability to rank known sweet-tasting compounds higher than properties-matched random molecules. The best performing models were next used in virtual screening, retrieving recently patented sweeteners and providing novel predictions.

21 22

Keywords: Sweet taste receptor, docking, modeling, sweeteners, drug discovery, GPCR,

23

Tas1R2, T1R2

24 25

Introduction

26

Taste is one of the primary determinants of food preference and intake (Loper, La Sala,

27

Dotson, & Steinle, 2015), and consequently has a major impact on health and well-

28

being. The steady increase in the daily sugar consumption over recent decades has 1

29

contributed to the obesity crisis, the early onset of type 2 diabetes and other chronic

30

diseases (Lustig, Schmidt, & Brindis, 2012). Some studies present advantages associated

31

with using non-caloric sweeteners for weight loss, like reduction of glucose intolerance

32

and type 2 diabetes (Fitch & Keim, 2012). Others highlight safety issues and possible

33

opposite outcomes, such as weight gain, increased risk of diabetes, modification of gut

34

microbiota and even increased risk of neurodegenerative diseases (Pase, Himali, Beiser,

35

Aparicio, Satizabal, Vasan, et al., 2017; Suez, Korem, Zeevi, Zilberman-Schapira, Thaiss,

36

Maza, et al., 2014). Numerous low-calorie sweeteners have been identified in natural

37

extracts or chemically synthesized (DuBois & Prakash, 2012). Notably, many non-sugar

38

sweeteners elicit a bitter or metallic off-taste, or present a lingering after-taste (Di Pizio,

39

Ben Shoshan-Galeczki, Hayes, & Niv, 2018). Thus, the quest for optimal low-calorie

40

sweetener persists, with particular focus on natural or food-derived compounds.

41

The major pathway of sweet taste recognition is mediated by T1R2/T1R3 heterodimer,

42

while recognition of umami taste is mediated via T1R1/T1R3 heterodimer (Zhao, Zhang,

43

Hoon, Chandrashekar, Erlenbach, Ryba, et al., 2003). Additional pathways for sweet

44

taste recognition have also been suggested, involving glucose transporters and ATP-

45

gated K+ channels (Damak, Rong, Yasumatsu, Kokrashvili, Varadarajan, Zou, et al., 2003;

46

Yee, Sukumaran, Kotha, Gilbertson, & Margolskee, 2011).

47

The T1R2/T1R3 heterodimer consists of two Class C G Protein-Coupled Receptor (GPCR)

48

subunits (Montmayeur, Liberles, Matsunami, & Buck, 2001). These receptors feature a

49

Transmembrane Domain (TMD), a Cysteine Rich Domain (CRD) and an extracellular

50

Venus Fly Trap (VFT) domain. The Class C GPCR group consists of approximately 20

51

members, including Metabotropic Glutamate Receptors (mGluRs), Calcium Sensing

52

Receptors (CaSRs) (Moller, Moreno-Delgado, Pin, & Kniazeff, 2017), and the sweet and

53

umami taste receptors (Matsunami, Montmayeur, & Buck, 2000).

54

A combination of experimental studies, in particular construction of chimeric receptors

55

and site-directed mutagenesis (Maillet, Cui, Jiang, Mezei, Hecht, Quijada, et al., 2015;

56

Zhang, Klebansky, Fine, Liu, Xu, Servant, et al., 2010), supported by in-silico modeling

57

approaches (Temussi, 2011) led to the identification and characterization of the VFT 2

58

domain of the T1R2 subunit as the main binding site for sweet compounds. Other

59

binding sites were identified, as recently reviewed (Cheron, Golebiowski, Antonczak, &

60

Fiorucci, 2017).

61

Several machine learning methods were developed to predict sweetness of molecules

62

(Cheron, Casciuc, Golebiowski, Antonczak, & Fiorucci, 2017; Zheng, Chang, Xu, Xu, & Lin,

63

2019). These methods typically relied on physicochemical properties and fingerprints of

64

molecules and do not include direct information regarding the binding site of the

65

receptor. Acevedo et al. (Acevedo, Ramirez-Sarmiento, & Agosin, 2018) reported

66

correlation between docking scores and experimental sweetness for selected

67

sweeteners groups.

68

Computational techniques that rely on homology modeling of the receptor and

69

subsequent docking of ligands are useful for GPCRs in the absence of experimental

70

structures (Lim, Du, Chen, & Fan, 2018), and were successfully applied to several

71

chemosensory receptors, i.e. (Di Pizio, Waterloo, Brox, Lober, Weikert, Behrens, et al.,

72

2019) (Spaggiari, Di Pizio, & Cozzini, 2020). However, to the best of our knowledge,

73

structure-based methods have not yet been validated for discovery of sweet-tasting

74

compounds.

75

In the current study, we demonstrate the feasibility of structure-based virtual screening

76

for sweet compounds using homology models of extracellular VFT domain of human

77

hT1R2 receptor. We create several models of the orthosteric binding site of the sweet

78

taste receptor in the hT1R2 VFT domain. The best model is chosen based on its ability to

79

discriminate between known sweet compounds and decoys, quantified by ROC (receiver

80

operating characteristic) curves (Irwin & Shoichet, 2016). Next, we apply it to the

81

Generally Recognized As Safe (GRAS) data set, where success can be evaluated using

82

reported taste of GRAS compounds. Finally, we screen FooDB (Wishart, D. S. "FooDB:

83

the food database. FooDB version 1.0." (2014)) to predict sweet compounds from food

84

sources.

85

Methods 3

86

Modeling:

87

The sweet taste receptor sequence was obtained from Uniprot database (hT1R2 –ID:

88

Q8TE23). The 3D structures of the VFT domain of the human monomer were modeled

89

using several servers, including I-Tasser, Modeller, and Phyre2. I-Tasser was chosen for

90

further analysis based on preliminary performance of the models for known ligands and

91

on CASP competition results for template-based modeling (Yang, Zhang, He, Walker,

92

Zhang, Govindarajoo, et al., 2016). I-Tasser server was used to create models of the

93

hT1R2 VFT using default settings (multi-template) and by using specified templates –

94

PDB 5X2M chains A and B. The sequence identities with the templates used by I-Tasser

95

(April 2018), conservation analysis and binding site residues are listed in the

96

Supplementary Information. The main model used hereafter is the 5X2MB-based model,

97

also referred to as “fish-based model”. For analysis of compounds larger than 460

98

g/mol, a VFT T1R2 open-form homology model was obtained via I-Tasser using open

99

form class C GPCR. It was based on calcium sensing receptor (PDB ID: 5K5T) and on

100

metabotropic glutamate receptor (PDB ID: 1EWT). The top model from each of the I-

101

Tasser runs was minimized and prepared for docking with Protein Preparation Wizard

102

tool in Maestro and Glide Grid Generation (Schrodinger tools 2017-2).

103

Ligand similarity

104

Ligands similarity was calculated by Tanimoto scores:

105

𝑇𝑎𝑛𝑖𝑚𝑜𝑡𝑜 =

𝐴∩𝐵 𝐴∪𝐵―𝐴∩𝐵

106

MOLPRINT2D fingerprint was used for the similarity calculations, as described in

107

previous work (Nissim, Dagan-Wiener, & Niv, 2017). Comparison of different

108

fingerprints showed that MOLPRINT2D fingerprint had the best average enrichment

109

across 11 targets, while being less sensitive to precise settings than other fingerprints

110

(Sastry, Lowrie, Dixon, & Sherman, 2010). Commonly used similarity thresholds are

4

111

between 0.75–0.85 regardless of the fingerprint used. (Ripphausen, Nisius, & Bajorath,

112

2011). Here 0.75 threshold was used.

113

Data sets and decoy preparation for evaluating models performance:

114

The ligands were prepared for docking using LigPrep. Conformers, tautomers and

115

protomers (different protonation states of ligands) were enumerated at pH 7.0 ± 1.0,

116

retaining specified chiral centers. (Maestro Version 10.4.018, MMshare Version 3.2.018,

117

Release 2017-2, Platform Windows-x64).

118

Compounds reported as sweet by Rojas and coworkers (Rojas, Todeschini, Ballabio,

119

Mauri, Consonni, Tripaldi, et al., 2017) comprised the true positive (TP) set, consisting of

120

435 compounds. After ligand preparation there were 465 conformers, tautomers and

121

protomers. Since we are focusing on the sugar-binding site in the T1R2 VFT domain, 8

122

compounds that were reported to act via allosteric binding sites, and 5 compounds with

123

at least 0.75 Tanimoto similarity to them, were removed from the set, leading to a final

124

TP set of 452 compounds (conformers, tautomers and protomers). These 452

125

compounds were divided to two groups: 404 compounds up to 460 g/mol and 48

126

compounds over 460 g/mol.

127

Decoy compounds were obtained from the ZINC12 and ZINC15 databases (Sterling &

128

Irwin, 2015) according to physicochemical distribution properties of sweet compounds

129

from Rojas et al (Rojas, et al., 2017). The following molecular properties were used to

130

select decoy compounds: AlogP (–4.4 to –0.65), number of hydrogen bond acceptors (6–

131

11), number of hydrogen bond donors (5–8), polarizability (15–32), number of rotatable

132

bonds (1–5), number of chiral centers (3–10), and molecular weight; 180–460 g/mol or

133

460–1100 g/mol. The ~7000 resulting compounds were prepared for docking with

134

LigPrep, enumerating protomers at pH 7.0 ± 1.0 and generating up to 32 stereoisomers

135

per compound. (Maestro Version 10.4.018, MMshare Version 3.2.018, Release 2017-2,

136

Platform Windows-x64). Additional decoys were created using DUD-E (Mysinger,

137

Carchia, Irwin, & Shoichet, 2012) web server (http://dude.docking.org/) based on the

138

435 molecules in the TP dataset. For each true positive, up to 50 DUD-E decoys with 5

139

similar physicochemical properties but dissimilar 2-D topology were generated, resulting

140

in 2,619 DUD-E decoys. The final decoys set consisted of 22,125 entries within 180–460

141

g/mol molecular weight range and 14,073 entries within 460–1100 g/mol molecular

142

weight range (including conformers, tautomers and protomers).

143

Datasets used for virtual screening:

144

GRAS: A dataset of approved FDA Generally Recognized as Safe (GRAS) compounds

145

downloaded on August 2016. The data set includes 1,877 compounds. Taste and odor

146

descriptions of the GRAS compounds were obtained by data mining (annotations of

147

taste thresholds of GRAS compounds by FEMA ID (https://www.femaflavor.org/flavor-

148

library) of Fenaroli’s handbook of Flavor Ingredients (fifth edition, Burdock 2015).

149

FooDB: (http://foodb.ca/) a data set which holds food constituent compounds. 24,399

150

molecules extracted from the FooDB SQL version.

151

Binding site analysis

152

Binding pockets of the two models were analyzed with SiteMap (Schrödinger, LLC, New

153

York, NY, 2017): the binding site was defined as the region within 6 Å from the center of

154

mass of a docked D-glucose ligand, which turned out to be large enough for all the

155

docked ligands.

156

Docking protocol:

157

The binding site was defined as a 12 Å grid around the L-glutamate binding site in the

158

class C GPCR mGluR1 (PDB ID: 1EWK). Overlap of the models to the crystal structure was

159

used to define the binding grid in the models. Two docking protocols were applied

160

(Glide Standard Precision (SP) and Glide Extra Precision (XP)) Flexible and Rigid sampling

161

options.

162

Initial testing (see Supplementary Figure 1) indicated that the screening protocol, which

163

obtained the best combination of sensitivity and specificity was Maestro Schrodinger

6

164

2017-2, Glide Extra Precision mode (XP), flexible ligand sampling, and Glide XP docking

165

scores (Supplementary Figure 2). These settings were used in the rest of the study.

166

Enrichment:

167

ROC curves were prepared with Maestro Schrodinger 2017-2 using the enrichment

168

calculator, and evaluated using the ROC AUC (Truchon & Bayly, 2007). The AUC value

169

represents the total area below the ROC curve and can span values between 0

170

(minimum possible enrichment) and 1 (maximum possible enrichment). The ROC curve

171

horizontal axis (100-specificity, also called the false positive rate) shows the number of

172

false positives identified during the screen from all the decoys available in the set. The

173

ROC curve vertical axis (sensitivity, also called the true positive rate) indicates how many

174

true positives are retrieved during the screen.

175

Sensitivity and specificity measures are defined in the following way:

176

Sensitivity = True Positives Rate =

𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠

177

Specificity = True Negatives Rate =

𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠

𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠

𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠

178

Enrichment Factors (EFs) are the ratios of the true positives in a sample size to the

179

amount of true positives in the entire dataset (Huang, Shoichet, & Irwin, 2006).

180

Enrichment factors provide additional information on the success of the scoring or

181

ranking function in a selected subset size. 𝑆𝑢𝑏𝑠𝑒𝑡 𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠

Enrichment factor =

182

𝑆𝑢𝑏𝑠𝑒𝑡 𝑐𝑜𝑚𝑝𝑜𝑢𝑛𝑑𝑠 𝑇𝑜𝑡𝑎𝑙𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 𝑇𝑜𝑡𝑎𝑙𝐶𝑜𝑚𝑝𝑜𝑢𝑛𝑑𝑠

183

Data Availability

184

The prospective predictions on FooDB and the datasets in the current study are

185

available on the supplementary download section in the Niv-lab website

7

186

(https://biochem-food-nutrition.agri.huji.ac.il/mashaniv). Requests for predictions on

187

additional datasets can be submitted to the authors via email request.

188

Results

189

Currently 452 known sweet compounds constitute the TP set, consisting of 404

190

molecules with molecular weight up to 460 g/mol and 48 molecules over 460 g/mol

191

(Figure 1). Decoys for these molecules were filtered from ZINC based on pre-determined

192

properties based on properties of TPs, or generated for each TP by DUD-E. The ZINC and

193

DUD-E decoys datasets were used for validation. GRAS was used for retrospective

194

screening of annotated compounds, and FooDB dataset was screened for providing

195

prospective predictions.

196

“Default” I-Tasser (Yang, et al., 2016) model was mainly based on human mGluR

197

templates (PDB IDs: 2E4X and 2E4U). Additional model was created based on Medaka

198

fish crystal structure (PDB ID 5X2M, chain B) and termed “fish-based model”. Model

199

based on chain A was evaluated but did not perform as well, see Supplementary Figures

200

3 and 4. The models were evaluated for their ability to discriminate between true

201

positives and decoys using the Glide XP docking protocol, which was chosen after

202

preliminary testing (Supplementary Figures 1 and 2).

203

Only compounds below 460 g/mol could dock into either of these models. Docking of

204

larger compounds is discussed in the following section. Fish-based hT1R2 VFT model

205

(Figure 2) had slightly higher enrichment than the default model, see Supplementary

206

Table 2. The overall Area Under the Curve (AUC) of the fish-based model (0.83) was

207

better than the default model (0.75). Additionally, the fish-based model had overall

208

more docked compounds compared to the default model with 97% and 73% compounds

209

from the TPs dataset, respectively. The fish-based model had somewhat higher

210

sensitivity in top results of screening, compared to the default model (Supplementary

211

table 2).

212

The docking scores did not correlate with experimentally measured sweetness intensity.

213

This may be due to shortcomings of the model and the simplistic scoring function, as 8

214

well as to the complexity of the sensory system that depends on multiple factors,

215

including: genetic variation in the taste receptors, number of receptors expressed in the

216

cells, temperature, salivary proteins, age and mood of the human subjects reporting the

217

perceived sweetness.

218

To understand the reasons for improved performance of the fish-based model, we

219

compared the binding cavities of the fish-based model to the default model. The default

220

model is narrower than the fish-based model (Table 1), possibly due to the side chain

221

orientations of R383 and D142. In the fish-based model, these residues are facing away

222

from the binding site, contrary to the default model, in which both of these residues

223

face inwards toward the proposed binding site, in a manner likely to interfere with

224

ligand binding (Figure 3). In the default template model, the D142 and R383 side-chains

225

face towards the D-glucose ligand and are located 3.2 Å from the glucose 3-position

226

hydroxyl.

227 228

Area (Å2)

Default based Fish-based model model binding site binding site

H-Bond acceptor

103.1

102.6

H-bond donor

127.6

276.5

Hydrophilic

253.8

395

Hydrophobic

117.2

194.9

Table 1 – Comparison of the binding site of the default and fish-based models, within 6 angstrom radius of docked glucose.

229

Notably, larger ligands require more space and may clash with these residues. The

230

importance and orientation of D142 and R383 had been suggested previously (Kumari,

231

Choudhary, Arora, & Sharma, 2016). D142A mutation led to a loss of sucrose or

232

sucralose activity (Zhang, et al., 2010). Mutations of R383, namely R383A, R383D,

233

R383Q, R383L, R383F, R383K and R383H, led to weak or no response to aspartame. Any

234

charge-changing mutations led to weak or diminished activation by all tested ligands.

235

Mutations that kept the positive charge (R383H and R383K) had similar activation by 9

236

sucralose and D-tryptophan as the WT. These results can be explained by R383 that is

237

not directly involved in binding the ligand, but rather stabilized the VFT conformation

238

through interaction with D449. This is in agreement with fish-based template model

239

(see figure 4). Other mutagenesis data also support the involvement of binding site

240

residues: I67, L71, Y103, D142, S165, E302, S303, W304, D307 and V384 (Cheron,

241

Golebiowski, Antonczak, & Fiorucci, 2017).

242

The good performance of the fish-based model in prioritizing true positives indicates

243

that the model and the docking protocol are suitable for virtual screening of molecules

244

below 460 g/mol MW.

245

Ligands in the 460–1100 g/mol range: Since some of the true positives, such as stevia

246

glycosides (see Figure 1) are bigger than what the binding site was able to

247

accommodate, an additional model was created based on an open conformation of a

248

calcium receptor (human calcium-sensing receptor extracellular domain, 5K5T, (Geng,

249

Mosyak, Kurinov, Zuo, Sturchler, Cheng, et al., 2016)).

250

Although the open conformation is considered inactive, the open model of sweet taste

251

receptor does accommodate larger agonists that do not fit to the closed conformation.

252

We hypothesized that there are several active conformations that fit to the size of the

253

ligands, and that the open conformation may be used as an approximate model for the

254

larger compounds-induced active conformation. 48 known sweeteners with MW above

255

460 g/mol and ~14,000 property-matched decoys were docked using the protocol

256

described in Methods. For these compounds, the open form model performs well, with

257

an AUC of 0.85, EF2 of 4.17, and EF5 of 5. In this screening, all of 48 TPs compounds in

258

this range docked successfully.

259

Similarity of structure-based hits to known ligands

260

To ensure that the structure-based method does not simply return results that could be

261

trivially found using simple ligand-based similarity searches, the docking results

262

similarity to any of the sweet compounds used in the TPs dataset was evaluated by 2D

263

fingerprints similarity (MOLPRINT2D). For the smaller compounds (screened using the 10

264

fish-based model), only 8 compounds had a Tanimoto similarity score equal to or higher

265

than 0.75. For the larger compounds (screened using the open-form model), 163

266

compounds had a Tanimoto similarity score above 0.75. Hence, most structure-based

267

screening results are chemically different from known true-positives. Interestingly,

268

among these compounds we find Tubercidin (ZINC03873956) which was patented in

269

2006 as part of the application for “Fast dissolving composition with prolonged sweet

270

taste” (US7122198B1).

271

GRAS data set

272

The 1877 GRAS compounds are Generally Recognized As Safe for use in humans. The

273

GRAS dataset is relevant for food products usage, has annotations of taste and is fit for

274

validations of predictions with the fish-based model: all except 7 GRAS compounds are

275

below 460 g/mol. Top 5% docked compounds resulted in 100 compounds, out of which

276

49 compounds are annotated with sweet taste. For comparison, there are 37

277

compounds in GRAS that are within 0.75 Tanimoto similarity with any of the molecules

278

used in the true positives set; 13 of these had sweet taste annotation, 9 of these

279

appeared in the top docking results.

280

The percentage of sweet-tasting molecules in the docked subset (49%) increased

281

compared to the entire GRAS dataset (16%). As control, we examined the percentage of

282

sweet-smelling compounds, that act via odorant receptors and therefore should not be

283

affected by the screen. Indeed, sweet-smelling compounds were 7% in the original set

284

and remained at a similar 8% within the 5% top scoring compounds in the virtual screen.

285

These results lend further support to the virtual screening protocol for sweet-tasting

286

compounds.

287

FooDB data set

288

The docking protocol was next applied to FooDB dataset, the majority of which is not

289

yet annotated in terms of sensory properties. The FooDB data subset of MW under 460

290

g/mol was docked to the fish-based model, while molecules above this threshold were 11

291

docked to the open model. Compounds with Tanimoto similarity above 0.75

292

(MOLPRINT2D fingerprints) to any compound in the true positive set, were considered

293

“sweet-like” compounds. The full FooDB dataset contained 14,384 compounds under

294

460 g/mol (figure 5), with 177 sweet-like compounds (0.007%). The docking yielded

295

10,897 scored compounds, 117 of which were sweet-like compounds (1%). In the top

296

scoring 5% (545 compounds), 47 sweet-like compounds (8.6%) were present. The top

297

200 (top 1.3% ranked) compounds were subjected to manual inspection for existing

298

sensory information. After filtering out 20 (10%) sweet-like compounds, the IUPAC

299

names and SMILES strings of the remaining 180 compounds were submitted to searches

300

in literature and patent databases via Google Scholar and SciFinder-N (https://scifinder-

301

n.cas.org/). Three of the hits turned out to be recently patented sweeteners:

302

protocatechuic

303

(US20180132516A1) and galloyl glucose (EP3571933A1). An additional four compounds

304

turned out to be sweet compounds: sakebiose (also known as nigerose), turanose,

305

melibiitol, and inulobiose; none were found in the true positives dataset (see chemical

306

structures in Supplementary figure 6).

307

Since the 5X2MA-based model had high EF2 value in the initial screening (see

308

Supplemental data), it was applied as well. In the top 200 results of FooDB screening, 62

309

compounds overlapped with compounds already found with the hits retrieved by the

310

5X2MB model. These 62 overlapping compounds contained sakebiose, turanose and

311

melibiitol. in the remaining top hits was an additional patented compound, isopropyl

312

apiosylglucoside (WO2012107207A1).

313

Out of ~9500 FooDB compounds within the range of 460
314

docked to the open model. This portion of the dataset contained 104 sweet-like

315

molecules, 8 in the top 5% of the structure-based screen. 5 sweet or sweet-like

316

molecules were in the top 2% of the structure-based screen, including rebaudioside A

317

and rebaudioside C. In the top 200 ranked compounds (~2%), four additional patented

318

compounds were found: mannan (US9012520B2), proanthocyanidin B2 3,3'-digallate

319

(US9247758B2), maltotetraose (US20020025366A1) and narirutin (US9247758B2).

acid

4-glucoside

(WO2013121264A1),

morachalcone

A

12

320

To the best of our knowledge, the remaining compounds have no reported sensory data

321

and are therefore novel potential sweeteners candidates.

322

Summary and Discussion

323

We found that structure-based methods are applicable for identifying sweet-tasting

324

compounds. This study emphasizes the importance of the template used for homology

325

modeling of the sweet taste receptor and the necessity to validate the resulting models.

326

A model built via I-Tasser, using fish monomer (Nuemket, Yasui, Kusakabe, Nomura,

327

Atsumi, Akiyama, et al., 2017) as a template, performed better than a model using

328

default I-Tasser settings, which chose the mGluR2 structures as templates (PDB ID: 2E4X

329

and 2E4U). We tested the effect of different Glide protocols, sampling and scoring

330

functions and found that the Glide XP docking protocol with flexible ligand sampling

331

provides better ROC curves for virtual screening against the validation dataset.

332

Additionally, XP docking ranks known sweet compounds better than SP or XP rigid, and

333

is able to dock more true positive compounds. Despite heavier computational resources

334

required by XP docking, in this system it was the most successful protocol, as shown by

335

the AUC (Figure 1). Using this virtual screening experiment, the model was able to

336

detect sweeteners among both the known true positives and among the decoy

337

compounds: interestingly, a recently patented sweetener compound found among the

338

decoys, was ranked better than some of the TP compounds (top 2%).

339

In comparing the binding site of the two models, a major difference in the orientation of

340

R383 sidechain was observed. R383 faces outward in the binding site of the fish-based

341

model, but inward in the default template model. Analysis of previously reported

342

mutagenesis data (Maillet, et al., 2015; Zhang, et al., 2010) supported the suggested

343

orientations of D142 and R383 in the selected model and is in agreement with potential

344

interaction between R383 and D449. These differences between the models led to the

345

more restricted area of binding site in the default template model, which contributed to

346

its poorer performance in retrieving true positives.

13

347

The selected fish-based model and docking protocol were applied to the GRAS dataset.

348

The top 5% of docking results had a greater ratio of sweeteners to non-sweet

349

compounds (~50%) compared to the entire compounds list (16%), and, as expected, did

350

not affect the percentage of sweet-smelling compounds. The docking campaign was

351

more effective than a simple 2D similarity screening campaign: the top 5% of docking

352

hits identified 49 compounds of the sweet molecules in GRAS, while 2D similarity

353

identified only 13 compounds.

354

Interestingly, the template that provided the best results was based on the VFT of T1R3

355

monomer (5X2MB in pdb). The fish-T1R2 (5X2MA) based model resulted in lower EF2

356

than fish-T1R3 or the default model (Supplementary Figure 3). Additionally, GRAS top

357

5% screening results were not significantly enriched with sweet compounds

358

(supplementary Figure 4) for the fish-T1R2 based model.

359

The fish taste receptor Medaka fish heterodimer recognizes L-amino acids (Gln, Ala, Arg,

360

Glu and Gly) but not sugars or artificial sweeteners (Nuemket, et al., 2017), and the

361

amino acids were shown to bind to T1R2 and (with lower affinity) to T1R3. Chickens do

362

not recognize sweet taste, and when VFT of chicken T1R3 was introduced into T1R3 of

363

hummingbird (a bird that does recognize sugars) the heterodimeric receptor was

364

activated in-vitro by amino acids rather than sugars. Reintroducing 109 amino acids of

365

hummingbird T1R3 into the chicken T1R3 VFT restored sucrose responses (Baldwin,

366

Toda, Nakagita, O'Connell, Klasing, Misaka, et al., 2014). This suggests that T1R3 might

367

harbor a generalist binding site that can mutate into specialist recognition, and our

368

results indicate that it can serve as a successful template for human T1R2 modeling.

369

When the docking screen was applied to FooDB dataset, the abundance of sweet-like

370

compounds in the final output (10% for the compounds up to 460 g/mol and 2.5% for

371

compounds larger than 460 g/mol) increased from the initial dataset (0.007% for the

372

smaller compounds and 0.004% for the larger compounds). Overall, the screen found 7

373

newly patented and 4 known-to-be-sweet compounds in the top hits, all of which had

14

374

less than 0.75 similarity with the true positives used in this work, suggesting that

375

additional novel sweeteners may be found among the rest of the top scoring molecules.

376

The low sensitivity of this structure-based screen means that the molecules that are not

377

highly scored by this protocol cannot be claimed to be non-sweet. Potential parameters

378

for improving sensitivity may be considered in the future, such as inclusion of water

379

molecules in the binding site and ligand-induced conformational changes in the

380

receptor. Molecular Dynamics simulations may help to obtain enhanced sampling of the

381

receptor that will mimic ligand-induced conformational changes.

382

Importantly, this work focused on the sugar-binding site in the T1R2 VFT domain. Other

383

sites may be of importance: sweet proteins bind in the CRD between the two subunits –

384

T1R2 and T1R3, in a wedge model (Temussi, 2011). NHDC (Winnig, Bufe, Kratochwil,

385

Slack, & Meyerhof, 2007) and cyclamate (Jiang, Cui, Zhao, Snyder, Benard, Osman, et al.,

386

2005) bind the CRD of T1R3. Thus, some sweet compounds cannot be found with the

387

suggested protocol. Compounds that interact with other sites can be modeled by QSAR

388

approaches or machine learning techniques (Rojas, et al., 2017; Zheng, Chang, Xu, Xu, &

389

Lin, 2019).

390

The recently published ligand-based or machine-learning methods, together with the

391

structural screening presented in the current paper can work in conjunction, to

392

maximize the diversity of novel sweeteners.

393

Conflict of interests

394

The authors declare no conflict of interests.

395

Acknowledgements

396

The authors thank Dr. Tamir Dingjan, Dr. Tali Yarnitzky and Mr. Ido Nissim for critical

397

reading of the manuscript and Dr. Hillary Voet for helpful discussions. Funding from ISF

398

grants #2463/16 and #1129/19 and from UHJ-France and the Foundation Scopus, is

399

gratefully acknowledged. MYN is a member of COST actions Mu.Ta.Lig (CA15135) and

400

ERNEST (CA18133). 15

401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447

References Acevedo, W., Ramirez-Sarmiento, C. A., & Agosin, E. (2018). Identifying the interactions between natural, non-caloric sweeteners and the human sweet receptor by molecular docking. Food Chemistry, 264, 164-171. Baldwin, M. W., Toda, Y., Nakagita, T., O'Connell, M. J., Klasing, K. C., Misaka, T., Edwards, S. V., & Liberles, S. D. (2014). Sensory biology. Evolution of sweet taste perception in hummingbirds by transformation of the ancestral umami receptor. Science, 345(6199), 929-933. Cheron, J. B., Casciuc, I., Golebiowski, J., Antonczak, S., & Fiorucci, S. (2017). Sweetness prediction of natural compounds. Food Chem, 221, 1421-1425. Cheron, J. B., Golebiowski, J., Antonczak, S., & Fiorucci, S. (2017). The anatomy of mammalian sweet taste receptors. Proteins, 85(2), 332-341. Damak, S., Rong, M., Yasumatsu, K., Kokrashvili, Z., Varadarajan, V., Zou, S., Jiang, P., Ninomiya, Y., & Margolskee, R. F. (2003). Detection of sweet and umami taste in the absence of taste receptor T1r3. Science, 301(5634), 850-853. Di Pizio, A., Ben Shoshan-Galeczki, Y., Hayes, J. E., & Niv, M. Y. (2018). Bitter and sweet tasting molecules: It's complicated. Neurosci Lett. Di Pizio, A., Waterloo, L. A. W., Brox, R., Lober, S., Weikert, D., Behrens, M., Gmeiner, P., & Niv, M. Y. (2019). Rational design of agonists for bitter taste receptor TAS2R14: from modeling to bench and back. Cell Mol Life Sci. DuBois, G. E., & Prakash, I. (2012). Non-Caloric Sweeteners, Sweetness Modulators, and Sweetener Enhancers. Annual Review of Food Science and Technology, Vol 3, 3, 353-380. Fitch, C., & Keim, K. S. (2012). Position of the Academy of Nutrition and Dietetics: Use of Nutritive and Nonnutritive Sweeteners. Journal of the Academy of Nutrition and Dietetics, 112(5), 739-758. Geng, Y., Mosyak, L., Kurinov, I., Zuo, H., Sturchler, E., Cheng, T. C., Subramanyam, P., Brown, A. P., Brennan, S. C., Mun, H. C., Bush, M., Chen, Y., Nguyen, T. X., Cao, B., Chang, D. D., Quick, M., Conigrave, A. D., Colecraft, H. M., McDonald, P., & Fan, Q. R. (2016). Structural mechanism of ligand activation in human calcium-sensing receptor. Elife, 5. Huang, N., Shoichet, B. K., & Irwin, J. J. (2006). Benchmarking sets for molecular docking. J Med Chem, 49(23), 6789-6801. Irwin, J. J., & Shoichet, B. K. (2016). Docking Screens for Novel Ligands Conferring New Biology. J Med Chem, 59(9), 4103-4120. Jiang, P. H., Cui, M., Zhao, B. H., Snyder, L. A., Benard, L. M. J., Osman, R., Max, M., & Margolskee, R. F. (2005). Identification of the cyclamate interaction site within the transmembrane domain of the human sweet taste receptor subunit T1R3. Journal of Biological Chemistry, 280(40), 34296-34305. Kumari, A., Choudhary, S., Arora, S., & Sharma, V. (2016). Stability of aspartame and neotame in pasteurized and in-bottle sterilized flavoured milk. Food Chem, 196, 533-538. Lim, V. J. Y., Du, W. N., Chen, Y. Z., & Fan, H. (2018). A benchmarking study on virtual ligand screening against homology models of human GPCRs. Proteins-Structure Function and Bioinformatics, 86(9), 978-989. Loper, H. B., La Sala, M., Dotson, C., & Steinle, N. (2015). Taste perception, associated hormonal modulation, and nutrient intake. Nutr Rev, 73(2), 83-91. Lustig, R. H., Schmidt, L. A., & Brindis, C. D. (2012). Public health: The toxic truth about sugar. Nature, 482(7383), 27-29. 16

448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495

Maillet, E. L., Cui, M., Jiang, P., Mezei, M., Hecht, E., Quijada, J., Margolskee, R. F., Osman, R., & Max, M. (2015). Characterization of the Binding Site of Aspartame in the Human Sweet Taste Receptor. Chem Senses, 40(8), 577-586. Matsunami, H., Montmayeur, J. P., & Buck, L. B. (2000). A family of candidate taste receptors in human and mouse. Nature, 404(6778), 601-604. Moller, T. C., Moreno-Delgado, D., Pin, J. P., & Kniazeff, J. (2017). Class C G protein-coupled receptors: reviving old couples with new partners. Biophys Rep, 3(4), 57-63. Montmayeur, J. P., Liberles, S. D., Matsunami, H., & Buck, L. B. (2001). A candidate taste receptor gene near a sweet taste locus. Nat Neurosci, 4(5), 492-498. Mysinger, M. M., Carchia, M., Irwin, J. J., & Shoichet, B. K. (2012). Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J Med Chem, 55(14), 6582-6594. Nissim, I., Dagan-Wiener, A., & Niv, M. Y. (2017). The taste of toxicity: A quantitative analysis of bitter and toxic molecules. IUBMB Life, 69(12), 938-946. Nuemket, N., Yasui, N., Kusakabe, Y., Nomura, Y., Atsumi, N., Akiyama, S., Nango, E., Kato, Y., Kaneko, M. K., Takagi, J., Hosotani, M., & Yamashita, A. (2017). Structural basis for perception of diverse chemical substances by T1r taste receptors. Nature Communications, 8. Pase, M. P., Himali, J. J., Beiser, A. S., Aparicio, H. J., Satizabal, C. L., Vasan, R. S., Seshadri, S., & Jacques, P. F. (2017). Sugar- and Artificially Sweetened Beverages and the Risks of Incident Stroke and Dementia A Prospective Cohort Study. Stroke, 48(5), 1139-+. Ripphausen, P., Nisius, B., & Bajorath, J. (2011). State-of-the-art in ligand-based virtual screening. Drug Discov Today, 16(9-10), 372-376. Rojas, C., Todeschini, R., Ballabio, D., Mauri, A., Consonni, V., Tripaldi, P., & Grisoni, F. (2017). A QSTR-Based Expert System to Predict Sweetness of Molecules. Front Chem, 5, 53. Sastry, M., Lowrie, J. F., Dixon, S. L., & Sherman, W. (2010). Large-scale systematic analysis of 2D fingerprint methods and parameters to improve virtual screening enrichments. J Chem Inf Model, 50(5), 771-784. Spaggiari, G., Di Pizio, A., & Cozzini, P. (2020). Sweet, umami and bitter taste receptors: State of the art of in silico molecular modeling approaches. Trends in Food Science & Technology, 96, 21-29. Sterling, T., & Irwin, J. J. (2015). ZINC 15--Ligand Discovery for Everyone. J Chem Inf Model, 55(11), 2324-2337. Suez, J., Korem, T., Zeevi, D., Zilberman-Schapira, G., Thaiss, C. A., Maza, O., Israeli, D., Zmora, N., Gilad, S., Weinberger, A., Kuperman, Y., Harmelin, A., Kolodkin-Gal, I., Shapiro, H., Halpern, Z., Segal, E., & Elinav, E. (2014). Artificial sweeteners induce glucose intolerance by altering the gut microbiota. Nature, 514(7521), 181-186. Temussi, P. A. (2011). Determinants of sweetness in proteins: a topological approach. J Mol Recognit, 24(6), 1033-1042. Truchon, J.-F., & Bayly, C. I. (2007). Evaluating Virtual Screening Methods:  Good and Bad Metrics for the “Early Recognition” Problem. Journal of Chemical Information and Modeling, 47(2), 488-508. Winnig, M., Bufe, B., Kratochwil, N. A., Slack, J. P., & Meyerhof, W. (2007). The binding site for neohesperidin dihydrochalcone at the human sweet taste receptor. Bmc Structural Biology, 7. Yang, J., Zhang, W., He, B., Walker, S. E., Zhang, H., Govindarajoo, B., Virtanen, J., Xue, Z., Shen, H. B., & Zhang, Y. (2016). Template-based protein structure prediction in CASP11 and retrospect of I-TASSER in the last decade. Proteins, 84 Suppl 1, 233-246. 17

496 497 498 499 500 501 502 503 504 505 506

Yee, K. K., Sukumaran, S. K., Kotha, R., Gilbertson, T. A., & Margolskee, R. F. (2011). Glucose transporters and ATP-gated K+ (KATP) metabolic sensors are present in type 1 taste receptor 3 (T1r3)-expressing taste cells. Proc Natl Acad Sci U S A, 108(13), 5431-5436. Zhang, F., Klebansky, B., Fine, R. M., Liu, H., Xu, H., Servant, G., Zoller, M., Tachdjian, C., & Li, X. (2010). Molecular mechanism of the sweet taste enhancers. Proc Natl Acad Sci U S A, 107(10), 4752-4757. Zhao, G. Q., Zhang, Y., Hoon, M. A., Chandrashekar, J., Erlenbach, I., Ryba, N. J., & Zuker, C. S. (2003). The receptors for mammalian sweet and umami taste. Cell, 115(3), 255-266. Zheng, S. Q., Chang, W. P., Xu, W. X., Xu, Y., & Lin, F. (2019). e-Sweet: A Machine-Learning Based Platform for the Prediction of Sweetener and Its Relative Sweetness. Frontiers in Chemistry, 7.

507 508 509

Figure 1 – Example of varying MW compounds from the true positives data set that were used for evaluation of enrichment and preparation of decoys. 18

510 511 512 513 514 515

Figure 2.A. ROC curves for hT1R2 models, for compounds with MW up to 460 g/mol, using fishbased model (green curve) and mGluR class-C GPCR template (blue curve) models. Red dotted line indicates random enrichment performance. B. ROC curve for hT1R2 open-form model, for compounds with MW of 460-1100 g/mol, the model is colored in magenta. Red dotted line indicates random enrichment performance

19

516 517 518

Figure 3- Ribbon representation of superimposed hT1R2 models, fish-based (5XDMB) model residues are colored in green, default model in blue, glucose ligand in purple.

519

20

520

521 522 523

Figure 4 - A. Ribbon representation of hT1R2, fish-based model in green. B. Fish-based model 2D binding site with docked glucose and residues within 5A from the docked glucose.

524 525 526

Figure 5 – Virtual screening of FooDB dataset against the fish-based model.

21

527

Structure-based screening for discovery of sweet compounds

528

Yaron Ben Shoshan-Galeczki and Masha Y Niv*

529

The Institute of Biochemistry, Food and Nutrition, The Robert H Smith Faculty of

530

Agriculture, Food and Environment, The Hebrew University, 76100 Rehovot and The

531

Fritz Haber Center for Molecular Dynamics, The Hebrew University, Jerusalem, 91904,

532

Israel.

533

*correspondence to [email protected]

534

Yaron Ben Shoshan - Galeczki: Methodology, Data Curation, Formal Analysis, Resources, Writing, Visualization, Editing

535 536 537

Masha Y Niv: Conceptualization, Writing, Review and Editing, Supervision, Funding acquisition

538

22

539

23

540

24

541

542

25

543 544

Highlights

545



Docking to homology models of VFT domain of human T1R2 was evaluated

546



Medaka fish-based model performed well for compounds below 460 g/mol

547



Model based on open form experimental structures was useful for larger compounds

548 549 550



Screening of FooDB retrieved recently patented sweeteners and provides novel candidates

551 552

26