Journal Pre-proofs Structure-based screening for discovery of sweet compounds Yaron Ben Shoshan-Galeczki, Masha Y. Niv PII: DOI: Reference:
S0308-8146(20)30134-5 https://doi.org/10.1016/j.foodchem.2020.126286 FOCH 126286
To appear in:
Food Chemistry
Received Date: Revised Date: Accepted Date:
15 October 2019 10 January 2020 21 January 2020
Please cite this article as: Ben Shoshan-Galeczki, Y., Niv, M.Y., Structure-based screening for discovery of sweet compounds, Food Chemistry (2020), doi: https://doi.org/10.1016/j.foodchem.2020.126286
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2020 Published by Elsevier Ltd.
1
Structure-based screening for discovery of sweet compounds
2
Yaron Ben Shoshan-Galeczki and Masha Y Niv*
3
The Institute of Biochemistry, Food and Nutrition, The Robert H Smith Faculty of
4
Agriculture, Food and Environment, The Hebrew University, 76100 Rehovot and The
5
Fritz Haber Center for Molecular Dynamics, The Hebrew University, Jerusalem, 91904,
6
Israel.
7
*correspondence to
[email protected]
8
Abstract
9 10 11 12 13 14 15 16 17 18 19 20
Sweet taste is a cue for calorie-rich food and is innately attractive to animals, including humans. In the context of modern diets, attraction to sweetness presents a significant challenge to human health. Most known sugars and sweeteners bind to the Venus Fly Trap domain of T1R2 subunit of the sweet taste heterodimer. Because the sweet taste receptor structure has not been experimentally solved yet, a possible approach to finding sweet molecules is virtual screening using compatibility of candidate molecules to homology models of sugar-binding site. Here, the constructed structural models, docking and scoring schemes were validated by their ability to rank known sweet-tasting compounds higher than properties-matched random molecules. The best performing models were next used in virtual screening, retrieving recently patented sweeteners and providing novel predictions.
21 22
Keywords: Sweet taste receptor, docking, modeling, sweeteners, drug discovery, GPCR,
23
Tas1R2, T1R2
24 25
Introduction
26
Taste is one of the primary determinants of food preference and intake (Loper, La Sala,
27
Dotson, & Steinle, 2015), and consequently has a major impact on health and well-
28
being. The steady increase in the daily sugar consumption over recent decades has 1
29
contributed to the obesity crisis, the early onset of type 2 diabetes and other chronic
30
diseases (Lustig, Schmidt, & Brindis, 2012). Some studies present advantages associated
31
with using non-caloric sweeteners for weight loss, like reduction of glucose intolerance
32
and type 2 diabetes (Fitch & Keim, 2012). Others highlight safety issues and possible
33
opposite outcomes, such as weight gain, increased risk of diabetes, modification of gut
34
microbiota and even increased risk of neurodegenerative diseases (Pase, Himali, Beiser,
35
Aparicio, Satizabal, Vasan, et al., 2017; Suez, Korem, Zeevi, Zilberman-Schapira, Thaiss,
36
Maza, et al., 2014). Numerous low-calorie sweeteners have been identified in natural
37
extracts or chemically synthesized (DuBois & Prakash, 2012). Notably, many non-sugar
38
sweeteners elicit a bitter or metallic off-taste, or present a lingering after-taste (Di Pizio,
39
Ben Shoshan-Galeczki, Hayes, & Niv, 2018). Thus, the quest for optimal low-calorie
40
sweetener persists, with particular focus on natural or food-derived compounds.
41
The major pathway of sweet taste recognition is mediated by T1R2/T1R3 heterodimer,
42
while recognition of umami taste is mediated via T1R1/T1R3 heterodimer (Zhao, Zhang,
43
Hoon, Chandrashekar, Erlenbach, Ryba, et al., 2003). Additional pathways for sweet
44
taste recognition have also been suggested, involving glucose transporters and ATP-
45
gated K+ channels (Damak, Rong, Yasumatsu, Kokrashvili, Varadarajan, Zou, et al., 2003;
46
Yee, Sukumaran, Kotha, Gilbertson, & Margolskee, 2011).
47
The T1R2/T1R3 heterodimer consists of two Class C G Protein-Coupled Receptor (GPCR)
48
subunits (Montmayeur, Liberles, Matsunami, & Buck, 2001). These receptors feature a
49
Transmembrane Domain (TMD), a Cysteine Rich Domain (CRD) and an extracellular
50
Venus Fly Trap (VFT) domain. The Class C GPCR group consists of approximately 20
51
members, including Metabotropic Glutamate Receptors (mGluRs), Calcium Sensing
52
Receptors (CaSRs) (Moller, Moreno-Delgado, Pin, & Kniazeff, 2017), and the sweet and
53
umami taste receptors (Matsunami, Montmayeur, & Buck, 2000).
54
A combination of experimental studies, in particular construction of chimeric receptors
55
and site-directed mutagenesis (Maillet, Cui, Jiang, Mezei, Hecht, Quijada, et al., 2015;
56
Zhang, Klebansky, Fine, Liu, Xu, Servant, et al., 2010), supported by in-silico modeling
57
approaches (Temussi, 2011) led to the identification and characterization of the VFT 2
58
domain of the T1R2 subunit as the main binding site for sweet compounds. Other
59
binding sites were identified, as recently reviewed (Cheron, Golebiowski, Antonczak, &
60
Fiorucci, 2017).
61
Several machine learning methods were developed to predict sweetness of molecules
62
(Cheron, Casciuc, Golebiowski, Antonczak, & Fiorucci, 2017; Zheng, Chang, Xu, Xu, & Lin,
63
2019). These methods typically relied on physicochemical properties and fingerprints of
64
molecules and do not include direct information regarding the binding site of the
65
receptor. Acevedo et al. (Acevedo, Ramirez-Sarmiento, & Agosin, 2018) reported
66
correlation between docking scores and experimental sweetness for selected
67
sweeteners groups.
68
Computational techniques that rely on homology modeling of the receptor and
69
subsequent docking of ligands are useful for GPCRs in the absence of experimental
70
structures (Lim, Du, Chen, & Fan, 2018), and were successfully applied to several
71
chemosensory receptors, i.e. (Di Pizio, Waterloo, Brox, Lober, Weikert, Behrens, et al.,
72
2019) (Spaggiari, Di Pizio, & Cozzini, 2020). However, to the best of our knowledge,
73
structure-based methods have not yet been validated for discovery of sweet-tasting
74
compounds.
75
In the current study, we demonstrate the feasibility of structure-based virtual screening
76
for sweet compounds using homology models of extracellular VFT domain of human
77
hT1R2 receptor. We create several models of the orthosteric binding site of the sweet
78
taste receptor in the hT1R2 VFT domain. The best model is chosen based on its ability to
79
discriminate between known sweet compounds and decoys, quantified by ROC (receiver
80
operating characteristic) curves (Irwin & Shoichet, 2016). Next, we apply it to the
81
Generally Recognized As Safe (GRAS) data set, where success can be evaluated using
82
reported taste of GRAS compounds. Finally, we screen FooDB (Wishart, D. S. "FooDB:
83
the food database. FooDB version 1.0." (2014)) to predict sweet compounds from food
84
sources.
85
Methods 3
86
Modeling:
87
The sweet taste receptor sequence was obtained from Uniprot database (hT1R2 –ID:
88
Q8TE23). The 3D structures of the VFT domain of the human monomer were modeled
89
using several servers, including I-Tasser, Modeller, and Phyre2. I-Tasser was chosen for
90
further analysis based on preliminary performance of the models for known ligands and
91
on CASP competition results for template-based modeling (Yang, Zhang, He, Walker,
92
Zhang, Govindarajoo, et al., 2016). I-Tasser server was used to create models of the
93
hT1R2 VFT using default settings (multi-template) and by using specified templates –
94
PDB 5X2M chains A and B. The sequence identities with the templates used by I-Tasser
95
(April 2018), conservation analysis and binding site residues are listed in the
96
Supplementary Information. The main model used hereafter is the 5X2MB-based model,
97
also referred to as “fish-based model”. For analysis of compounds larger than 460
98
g/mol, a VFT T1R2 open-form homology model was obtained via I-Tasser using open
99
form class C GPCR. It was based on calcium sensing receptor (PDB ID: 5K5T) and on
100
metabotropic glutamate receptor (PDB ID: 1EWT). The top model from each of the I-
101
Tasser runs was minimized and prepared for docking with Protein Preparation Wizard
102
tool in Maestro and Glide Grid Generation (Schrodinger tools 2017-2).
103
Ligand similarity
104
Ligands similarity was calculated by Tanimoto scores:
105
𝑇𝑎𝑛𝑖𝑚𝑜𝑡𝑜 =
𝐴∩𝐵 𝐴∪𝐵―𝐴∩𝐵
106
MOLPRINT2D fingerprint was used for the similarity calculations, as described in
107
previous work (Nissim, Dagan-Wiener, & Niv, 2017). Comparison of different
108
fingerprints showed that MOLPRINT2D fingerprint had the best average enrichment
109
across 11 targets, while being less sensitive to precise settings than other fingerprints
110
(Sastry, Lowrie, Dixon, & Sherman, 2010). Commonly used similarity thresholds are
4
111
between 0.75–0.85 regardless of the fingerprint used. (Ripphausen, Nisius, & Bajorath,
112
2011). Here 0.75 threshold was used.
113
Data sets and decoy preparation for evaluating models performance:
114
The ligands were prepared for docking using LigPrep. Conformers, tautomers and
115
protomers (different protonation states of ligands) were enumerated at pH 7.0 ± 1.0,
116
retaining specified chiral centers. (Maestro Version 10.4.018, MMshare Version 3.2.018,
117
Release 2017-2, Platform Windows-x64).
118
Compounds reported as sweet by Rojas and coworkers (Rojas, Todeschini, Ballabio,
119
Mauri, Consonni, Tripaldi, et al., 2017) comprised the true positive (TP) set, consisting of
120
435 compounds. After ligand preparation there were 465 conformers, tautomers and
121
protomers. Since we are focusing on the sugar-binding site in the T1R2 VFT domain, 8
122
compounds that were reported to act via allosteric binding sites, and 5 compounds with
123
at least 0.75 Tanimoto similarity to them, were removed from the set, leading to a final
124
TP set of 452 compounds (conformers, tautomers and protomers). These 452
125
compounds were divided to two groups: 404 compounds up to 460 g/mol and 48
126
compounds over 460 g/mol.
127
Decoy compounds were obtained from the ZINC12 and ZINC15 databases (Sterling &
128
Irwin, 2015) according to physicochemical distribution properties of sweet compounds
129
from Rojas et al (Rojas, et al., 2017). The following molecular properties were used to
130
select decoy compounds: AlogP (–4.4 to –0.65), number of hydrogen bond acceptors (6–
131
11), number of hydrogen bond donors (5–8), polarizability (15–32), number of rotatable
132
bonds (1–5), number of chiral centers (3–10), and molecular weight; 180–460 g/mol or
133
460–1100 g/mol. The ~7000 resulting compounds were prepared for docking with
134
LigPrep, enumerating protomers at pH 7.0 ± 1.0 and generating up to 32 stereoisomers
135
per compound. (Maestro Version 10.4.018, MMshare Version 3.2.018, Release 2017-2,
136
Platform Windows-x64). Additional decoys were created using DUD-E (Mysinger,
137
Carchia, Irwin, & Shoichet, 2012) web server (http://dude.docking.org/) based on the
138
435 molecules in the TP dataset. For each true positive, up to 50 DUD-E decoys with 5
139
similar physicochemical properties but dissimilar 2-D topology were generated, resulting
140
in 2,619 DUD-E decoys. The final decoys set consisted of 22,125 entries within 180–460
141
g/mol molecular weight range and 14,073 entries within 460–1100 g/mol molecular
142
weight range (including conformers, tautomers and protomers).
143
Datasets used for virtual screening:
144
GRAS: A dataset of approved FDA Generally Recognized as Safe (GRAS) compounds
145
downloaded on August 2016. The data set includes 1,877 compounds. Taste and odor
146
descriptions of the GRAS compounds were obtained by data mining (annotations of
147
taste thresholds of GRAS compounds by FEMA ID (https://www.femaflavor.org/flavor-
148
library) of Fenaroli’s handbook of Flavor Ingredients (fifth edition, Burdock 2015).
149
FooDB: (http://foodb.ca/) a data set which holds food constituent compounds. 24,399
150
molecules extracted from the FooDB SQL version.
151
Binding site analysis
152
Binding pockets of the two models were analyzed with SiteMap (Schrödinger, LLC, New
153
York, NY, 2017): the binding site was defined as the region within 6 Å from the center of
154
mass of a docked D-glucose ligand, which turned out to be large enough for all the
155
docked ligands.
156
Docking protocol:
157
The binding site was defined as a 12 Å grid around the L-glutamate binding site in the
158
class C GPCR mGluR1 (PDB ID: 1EWK). Overlap of the models to the crystal structure was
159
used to define the binding grid in the models. Two docking protocols were applied
160
(Glide Standard Precision (SP) and Glide Extra Precision (XP)) Flexible and Rigid sampling
161
options.
162
Initial testing (see Supplementary Figure 1) indicated that the screening protocol, which
163
obtained the best combination of sensitivity and specificity was Maestro Schrodinger
6
164
2017-2, Glide Extra Precision mode (XP), flexible ligand sampling, and Glide XP docking
165
scores (Supplementary Figure 2). These settings were used in the rest of the study.
166
Enrichment:
167
ROC curves were prepared with Maestro Schrodinger 2017-2 using the enrichment
168
calculator, and evaluated using the ROC AUC (Truchon & Bayly, 2007). The AUC value
169
represents the total area below the ROC curve and can span values between 0
170
(minimum possible enrichment) and 1 (maximum possible enrichment). The ROC curve
171
horizontal axis (100-specificity, also called the false positive rate) shows the number of
172
false positives identified during the screen from all the decoys available in the set. The
173
ROC curve vertical axis (sensitivity, also called the true positive rate) indicates how many
174
true positives are retrieved during the screen.
175
Sensitivity and specificity measures are defined in the following way:
176
Sensitivity = True Positives Rate =
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
177
Specificity = True Negatives Rate =
𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠
𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠
178
Enrichment Factors (EFs) are the ratios of the true positives in a sample size to the
179
amount of true positives in the entire dataset (Huang, Shoichet, & Irwin, 2006).
180
Enrichment factors provide additional information on the success of the scoring or
181
ranking function in a selected subset size. 𝑆𝑢𝑏𝑠𝑒𝑡 𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
Enrichment factor =
182
𝑆𝑢𝑏𝑠𝑒𝑡 𝑐𝑜𝑚𝑝𝑜𝑢𝑛𝑑𝑠 𝑇𝑜𝑡𝑎𝑙𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 𝑇𝑜𝑡𝑎𝑙𝐶𝑜𝑚𝑝𝑜𝑢𝑛𝑑𝑠
183
Data Availability
184
The prospective predictions on FooDB and the datasets in the current study are
185
available on the supplementary download section in the Niv-lab website
7
186
(https://biochem-food-nutrition.agri.huji.ac.il/mashaniv). Requests for predictions on
187
additional datasets can be submitted to the authors via email request.
188
Results
189
Currently 452 known sweet compounds constitute the TP set, consisting of 404
190
molecules with molecular weight up to 460 g/mol and 48 molecules over 460 g/mol
191
(Figure 1). Decoys for these molecules were filtered from ZINC based on pre-determined
192
properties based on properties of TPs, or generated for each TP by DUD-E. The ZINC and
193
DUD-E decoys datasets were used for validation. GRAS was used for retrospective
194
screening of annotated compounds, and FooDB dataset was screened for providing
195
prospective predictions.
196
“Default” I-Tasser (Yang, et al., 2016) model was mainly based on human mGluR
197
templates (PDB IDs: 2E4X and 2E4U). Additional model was created based on Medaka
198
fish crystal structure (PDB ID 5X2M, chain B) and termed “fish-based model”. Model
199
based on chain A was evaluated but did not perform as well, see Supplementary Figures
200
3 and 4. The models were evaluated for their ability to discriminate between true
201
positives and decoys using the Glide XP docking protocol, which was chosen after
202
preliminary testing (Supplementary Figures 1 and 2).
203
Only compounds below 460 g/mol could dock into either of these models. Docking of
204
larger compounds is discussed in the following section. Fish-based hT1R2 VFT model
205
(Figure 2) had slightly higher enrichment than the default model, see Supplementary
206
Table 2. The overall Area Under the Curve (AUC) of the fish-based model (0.83) was
207
better than the default model (0.75). Additionally, the fish-based model had overall
208
more docked compounds compared to the default model with 97% and 73% compounds
209
from the TPs dataset, respectively. The fish-based model had somewhat higher
210
sensitivity in top results of screening, compared to the default model (Supplementary
211
table 2).
212
The docking scores did not correlate with experimentally measured sweetness intensity.
213
This may be due to shortcomings of the model and the simplistic scoring function, as 8
214
well as to the complexity of the sensory system that depends on multiple factors,
215
including: genetic variation in the taste receptors, number of receptors expressed in the
216
cells, temperature, salivary proteins, age and mood of the human subjects reporting the
217
perceived sweetness.
218
To understand the reasons for improved performance of the fish-based model, we
219
compared the binding cavities of the fish-based model to the default model. The default
220
model is narrower than the fish-based model (Table 1), possibly due to the side chain
221
orientations of R383 and D142. In the fish-based model, these residues are facing away
222
from the binding site, contrary to the default model, in which both of these residues
223
face inwards toward the proposed binding site, in a manner likely to interfere with
224
ligand binding (Figure 3). In the default template model, the D142 and R383 side-chains
225
face towards the D-glucose ligand and are located 3.2 Å from the glucose 3-position
226
hydroxyl.
227 228
Area (Å2)
Default based Fish-based model model binding site binding site
H-Bond acceptor
103.1
102.6
H-bond donor
127.6
276.5
Hydrophilic
253.8
395
Hydrophobic
117.2
194.9
Table 1 – Comparison of the binding site of the default and fish-based models, within 6 angstrom radius of docked glucose.
229
Notably, larger ligands require more space and may clash with these residues. The
230
importance and orientation of D142 and R383 had been suggested previously (Kumari,
231
Choudhary, Arora, & Sharma, 2016). D142A mutation led to a loss of sucrose or
232
sucralose activity (Zhang, et al., 2010). Mutations of R383, namely R383A, R383D,
233
R383Q, R383L, R383F, R383K and R383H, led to weak or no response to aspartame. Any
234
charge-changing mutations led to weak or diminished activation by all tested ligands.
235
Mutations that kept the positive charge (R383H and R383K) had similar activation by 9
236
sucralose and D-tryptophan as the WT. These results can be explained by R383 that is
237
not directly involved in binding the ligand, but rather stabilized the VFT conformation
238
through interaction with D449. This is in agreement with fish-based template model
239
(see figure 4). Other mutagenesis data also support the involvement of binding site
240
residues: I67, L71, Y103, D142, S165, E302, S303, W304, D307 and V384 (Cheron,
241
Golebiowski, Antonczak, & Fiorucci, 2017).
242
The good performance of the fish-based model in prioritizing true positives indicates
243
that the model and the docking protocol are suitable for virtual screening of molecules
244
below 460 g/mol MW.
245
Ligands in the 460–1100 g/mol range: Since some of the true positives, such as stevia
246
glycosides (see Figure 1) are bigger than what the binding site was able to
247
accommodate, an additional model was created based on an open conformation of a
248
calcium receptor (human calcium-sensing receptor extracellular domain, 5K5T, (Geng,
249
Mosyak, Kurinov, Zuo, Sturchler, Cheng, et al., 2016)).
250
Although the open conformation is considered inactive, the open model of sweet taste
251
receptor does accommodate larger agonists that do not fit to the closed conformation.
252
We hypothesized that there are several active conformations that fit to the size of the
253
ligands, and that the open conformation may be used as an approximate model for the
254
larger compounds-induced active conformation. 48 known sweeteners with MW above
255
460 g/mol and ~14,000 property-matched decoys were docked using the protocol
256
described in Methods. For these compounds, the open form model performs well, with
257
an AUC of 0.85, EF2 of 4.17, and EF5 of 5. In this screening, all of 48 TPs compounds in
258
this range docked successfully.
259
Similarity of structure-based hits to known ligands
260
To ensure that the structure-based method does not simply return results that could be
261
trivially found using simple ligand-based similarity searches, the docking results
262
similarity to any of the sweet compounds used in the TPs dataset was evaluated by 2D
263
fingerprints similarity (MOLPRINT2D). For the smaller compounds (screened using the 10
264
fish-based model), only 8 compounds had a Tanimoto similarity score equal to or higher
265
than 0.75. For the larger compounds (screened using the open-form model), 163
266
compounds had a Tanimoto similarity score above 0.75. Hence, most structure-based
267
screening results are chemically different from known true-positives. Interestingly,
268
among these compounds we find Tubercidin (ZINC03873956) which was patented in
269
2006 as part of the application for “Fast dissolving composition with prolonged sweet
270
taste” (US7122198B1).
271
GRAS data set
272
The 1877 GRAS compounds are Generally Recognized As Safe for use in humans. The
273
GRAS dataset is relevant for food products usage, has annotations of taste and is fit for
274
validations of predictions with the fish-based model: all except 7 GRAS compounds are
275
below 460 g/mol. Top 5% docked compounds resulted in 100 compounds, out of which
276
49 compounds are annotated with sweet taste. For comparison, there are 37
277
compounds in GRAS that are within 0.75 Tanimoto similarity with any of the molecules
278
used in the true positives set; 13 of these had sweet taste annotation, 9 of these
279
appeared in the top docking results.
280
The percentage of sweet-tasting molecules in the docked subset (49%) increased
281
compared to the entire GRAS dataset (16%). As control, we examined the percentage of
282
sweet-smelling compounds, that act via odorant receptors and therefore should not be
283
affected by the screen. Indeed, sweet-smelling compounds were 7% in the original set
284
and remained at a similar 8% within the 5% top scoring compounds in the virtual screen.
285
These results lend further support to the virtual screening protocol for sweet-tasting
286
compounds.
287
FooDB data set
288
The docking protocol was next applied to FooDB dataset, the majority of which is not
289
yet annotated in terms of sensory properties. The FooDB data subset of MW under 460
290
g/mol was docked to the fish-based model, while molecules above this threshold were 11
291
docked to the open model. Compounds with Tanimoto similarity above 0.75
292
(MOLPRINT2D fingerprints) to any compound in the true positive set, were considered
293
“sweet-like” compounds. The full FooDB dataset contained 14,384 compounds under
294
460 g/mol (figure 5), with 177 sweet-like compounds (0.007%). The docking yielded
295
10,897 scored compounds, 117 of which were sweet-like compounds (1%). In the top
296
scoring 5% (545 compounds), 47 sweet-like compounds (8.6%) were present. The top
297
200 (top 1.3% ranked) compounds were subjected to manual inspection for existing
298
sensory information. After filtering out 20 (10%) sweet-like compounds, the IUPAC
299
names and SMILES strings of the remaining 180 compounds were submitted to searches
300
in literature and patent databases via Google Scholar and SciFinder-N (https://scifinder-
301
n.cas.org/). Three of the hits turned out to be recently patented sweeteners:
302
protocatechuic
303
(US20180132516A1) and galloyl glucose (EP3571933A1). An additional four compounds
304
turned out to be sweet compounds: sakebiose (also known as nigerose), turanose,
305
melibiitol, and inulobiose; none were found in the true positives dataset (see chemical
306
structures in Supplementary figure 6).
307
Since the 5X2MA-based model had high EF2 value in the initial screening (see
308
Supplemental data), it was applied as well. In the top 200 results of FooDB screening, 62
309
compounds overlapped with compounds already found with the hits retrieved by the
310
5X2MB model. These 62 overlapping compounds contained sakebiose, turanose and
311
melibiitol. in the remaining top hits was an additional patented compound, isopropyl
312
apiosylglucoside (WO2012107207A1).
313
Out of ~9500 FooDB compounds within the range of 460
314
docked to the open model. This portion of the dataset contained 104 sweet-like
315
molecules, 8 in the top 5% of the structure-based screen. 5 sweet or sweet-like
316
molecules were in the top 2% of the structure-based screen, including rebaudioside A
317
and rebaudioside C. In the top 200 ranked compounds (~2%), four additional patented
318
compounds were found: mannan (US9012520B2), proanthocyanidin B2 3,3'-digallate
319
(US9247758B2), maltotetraose (US20020025366A1) and narirutin (US9247758B2).
acid
4-glucoside
(WO2013121264A1),
morachalcone
A
12
320
To the best of our knowledge, the remaining compounds have no reported sensory data
321
and are therefore novel potential sweeteners candidates.
322
Summary and Discussion
323
We found that structure-based methods are applicable for identifying sweet-tasting
324
compounds. This study emphasizes the importance of the template used for homology
325
modeling of the sweet taste receptor and the necessity to validate the resulting models.
326
A model built via I-Tasser, using fish monomer (Nuemket, Yasui, Kusakabe, Nomura,
327
Atsumi, Akiyama, et al., 2017) as a template, performed better than a model using
328
default I-Tasser settings, which chose the mGluR2 structures as templates (PDB ID: 2E4X
329
and 2E4U). We tested the effect of different Glide protocols, sampling and scoring
330
functions and found that the Glide XP docking protocol with flexible ligand sampling
331
provides better ROC curves for virtual screening against the validation dataset.
332
Additionally, XP docking ranks known sweet compounds better than SP or XP rigid, and
333
is able to dock more true positive compounds. Despite heavier computational resources
334
required by XP docking, in this system it was the most successful protocol, as shown by
335
the AUC (Figure 1). Using this virtual screening experiment, the model was able to
336
detect sweeteners among both the known true positives and among the decoy
337
compounds: interestingly, a recently patented sweetener compound found among the
338
decoys, was ranked better than some of the TP compounds (top 2%).
339
In comparing the binding site of the two models, a major difference in the orientation of
340
R383 sidechain was observed. R383 faces outward in the binding site of the fish-based
341
model, but inward in the default template model. Analysis of previously reported
342
mutagenesis data (Maillet, et al., 2015; Zhang, et al., 2010) supported the suggested
343
orientations of D142 and R383 in the selected model and is in agreement with potential
344
interaction between R383 and D449. These differences between the models led to the
345
more restricted area of binding site in the default template model, which contributed to
346
its poorer performance in retrieving true positives.
13
347
The selected fish-based model and docking protocol were applied to the GRAS dataset.
348
The top 5% of docking results had a greater ratio of sweeteners to non-sweet
349
compounds (~50%) compared to the entire compounds list (16%), and, as expected, did
350
not affect the percentage of sweet-smelling compounds. The docking campaign was
351
more effective than a simple 2D similarity screening campaign: the top 5% of docking
352
hits identified 49 compounds of the sweet molecules in GRAS, while 2D similarity
353
identified only 13 compounds.
354
Interestingly, the template that provided the best results was based on the VFT of T1R3
355
monomer (5X2MB in pdb). The fish-T1R2 (5X2MA) based model resulted in lower EF2
356
than fish-T1R3 or the default model (Supplementary Figure 3). Additionally, GRAS top
357
5% screening results were not significantly enriched with sweet compounds
358
(supplementary Figure 4) for the fish-T1R2 based model.
359
The fish taste receptor Medaka fish heterodimer recognizes L-amino acids (Gln, Ala, Arg,
360
Glu and Gly) but not sugars or artificial sweeteners (Nuemket, et al., 2017), and the
361
amino acids were shown to bind to T1R2 and (with lower affinity) to T1R3. Chickens do
362
not recognize sweet taste, and when VFT of chicken T1R3 was introduced into T1R3 of
363
hummingbird (a bird that does recognize sugars) the heterodimeric receptor was
364
activated in-vitro by amino acids rather than sugars. Reintroducing 109 amino acids of
365
hummingbird T1R3 into the chicken T1R3 VFT restored sucrose responses (Baldwin,
366
Toda, Nakagita, O'Connell, Klasing, Misaka, et al., 2014). This suggests that T1R3 might
367
harbor a generalist binding site that can mutate into specialist recognition, and our
368
results indicate that it can serve as a successful template for human T1R2 modeling.
369
When the docking screen was applied to FooDB dataset, the abundance of sweet-like
370
compounds in the final output (10% for the compounds up to 460 g/mol and 2.5% for
371
compounds larger than 460 g/mol) increased from the initial dataset (0.007% for the
372
smaller compounds and 0.004% for the larger compounds). Overall, the screen found 7
373
newly patented and 4 known-to-be-sweet compounds in the top hits, all of which had
14
374
less than 0.75 similarity with the true positives used in this work, suggesting that
375
additional novel sweeteners may be found among the rest of the top scoring molecules.
376
The low sensitivity of this structure-based screen means that the molecules that are not
377
highly scored by this protocol cannot be claimed to be non-sweet. Potential parameters
378
for improving sensitivity may be considered in the future, such as inclusion of water
379
molecules in the binding site and ligand-induced conformational changes in the
380
receptor. Molecular Dynamics simulations may help to obtain enhanced sampling of the
381
receptor that will mimic ligand-induced conformational changes.
382
Importantly, this work focused on the sugar-binding site in the T1R2 VFT domain. Other
383
sites may be of importance: sweet proteins bind in the CRD between the two subunits –
384
T1R2 and T1R3, in a wedge model (Temussi, 2011). NHDC (Winnig, Bufe, Kratochwil,
385
Slack, & Meyerhof, 2007) and cyclamate (Jiang, Cui, Zhao, Snyder, Benard, Osman, et al.,
386
2005) bind the CRD of T1R3. Thus, some sweet compounds cannot be found with the
387
suggested protocol. Compounds that interact with other sites can be modeled by QSAR
388
approaches or machine learning techniques (Rojas, et al., 2017; Zheng, Chang, Xu, Xu, &
389
Lin, 2019).
390
The recently published ligand-based or machine-learning methods, together with the
391
structural screening presented in the current paper can work in conjunction, to
392
maximize the diversity of novel sweeteners.
393
Conflict of interests
394
The authors declare no conflict of interests.
395
Acknowledgements
396
The authors thank Dr. Tamir Dingjan, Dr. Tali Yarnitzky and Mr. Ido Nissim for critical
397
reading of the manuscript and Dr. Hillary Voet for helpful discussions. Funding from ISF
398
grants #2463/16 and #1129/19 and from UHJ-France and the Foundation Scopus, is
399
gratefully acknowledged. MYN is a member of COST actions Mu.Ta.Lig (CA15135) and
400
ERNEST (CA18133). 15
401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447
References Acevedo, W., Ramirez-Sarmiento, C. A., & Agosin, E. (2018). Identifying the interactions between natural, non-caloric sweeteners and the human sweet receptor by molecular docking. Food Chemistry, 264, 164-171. Baldwin, M. W., Toda, Y., Nakagita, T., O'Connell, M. J., Klasing, K. C., Misaka, T., Edwards, S. V., & Liberles, S. D. (2014). Sensory biology. Evolution of sweet taste perception in hummingbirds by transformation of the ancestral umami receptor. Science, 345(6199), 929-933. Cheron, J. B., Casciuc, I., Golebiowski, J., Antonczak, S., & Fiorucci, S. (2017). Sweetness prediction of natural compounds. Food Chem, 221, 1421-1425. Cheron, J. B., Golebiowski, J., Antonczak, S., & Fiorucci, S. (2017). The anatomy of mammalian sweet taste receptors. Proteins, 85(2), 332-341. Damak, S., Rong, M., Yasumatsu, K., Kokrashvili, Z., Varadarajan, V., Zou, S., Jiang, P., Ninomiya, Y., & Margolskee, R. F. (2003). Detection of sweet and umami taste in the absence of taste receptor T1r3. Science, 301(5634), 850-853. Di Pizio, A., Ben Shoshan-Galeczki, Y., Hayes, J. E., & Niv, M. Y. (2018). Bitter and sweet tasting molecules: It's complicated. Neurosci Lett. Di Pizio, A., Waterloo, L. A. W., Brox, R., Lober, S., Weikert, D., Behrens, M., Gmeiner, P., & Niv, M. Y. (2019). Rational design of agonists for bitter taste receptor TAS2R14: from modeling to bench and back. Cell Mol Life Sci. DuBois, G. E., & Prakash, I. (2012). Non-Caloric Sweeteners, Sweetness Modulators, and Sweetener Enhancers. Annual Review of Food Science and Technology, Vol 3, 3, 353-380. Fitch, C., & Keim, K. S. (2012). Position of the Academy of Nutrition and Dietetics: Use of Nutritive and Nonnutritive Sweeteners. Journal of the Academy of Nutrition and Dietetics, 112(5), 739-758. Geng, Y., Mosyak, L., Kurinov, I., Zuo, H., Sturchler, E., Cheng, T. C., Subramanyam, P., Brown, A. P., Brennan, S. C., Mun, H. C., Bush, M., Chen, Y., Nguyen, T. X., Cao, B., Chang, D. D., Quick, M., Conigrave, A. D., Colecraft, H. M., McDonald, P., & Fan, Q. R. (2016). Structural mechanism of ligand activation in human calcium-sensing receptor. Elife, 5. Huang, N., Shoichet, B. K., & Irwin, J. J. (2006). Benchmarking sets for molecular docking. J Med Chem, 49(23), 6789-6801. Irwin, J. J., & Shoichet, B. K. (2016). Docking Screens for Novel Ligands Conferring New Biology. J Med Chem, 59(9), 4103-4120. Jiang, P. H., Cui, M., Zhao, B. H., Snyder, L. A., Benard, L. M. J., Osman, R., Max, M., & Margolskee, R. F. (2005). Identification of the cyclamate interaction site within the transmembrane domain of the human sweet taste receptor subunit T1R3. Journal of Biological Chemistry, 280(40), 34296-34305. Kumari, A., Choudhary, S., Arora, S., & Sharma, V. (2016). Stability of aspartame and neotame in pasteurized and in-bottle sterilized flavoured milk. Food Chem, 196, 533-538. Lim, V. J. Y., Du, W. N., Chen, Y. Z., & Fan, H. (2018). A benchmarking study on virtual ligand screening against homology models of human GPCRs. Proteins-Structure Function and Bioinformatics, 86(9), 978-989. Loper, H. B., La Sala, M., Dotson, C., & Steinle, N. (2015). Taste perception, associated hormonal modulation, and nutrient intake. Nutr Rev, 73(2), 83-91. Lustig, R. H., Schmidt, L. A., & Brindis, C. D. (2012). Public health: The toxic truth about sugar. Nature, 482(7383), 27-29. 16
448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495
Maillet, E. L., Cui, M., Jiang, P., Mezei, M., Hecht, E., Quijada, J., Margolskee, R. F., Osman, R., & Max, M. (2015). Characterization of the Binding Site of Aspartame in the Human Sweet Taste Receptor. Chem Senses, 40(8), 577-586. Matsunami, H., Montmayeur, J. P., & Buck, L. B. (2000). A family of candidate taste receptors in human and mouse. Nature, 404(6778), 601-604. Moller, T. C., Moreno-Delgado, D., Pin, J. P., & Kniazeff, J. (2017). Class C G protein-coupled receptors: reviving old couples with new partners. Biophys Rep, 3(4), 57-63. Montmayeur, J. P., Liberles, S. D., Matsunami, H., & Buck, L. B. (2001). A candidate taste receptor gene near a sweet taste locus. Nat Neurosci, 4(5), 492-498. Mysinger, M. M., Carchia, M., Irwin, J. J., & Shoichet, B. K. (2012). Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J Med Chem, 55(14), 6582-6594. Nissim, I., Dagan-Wiener, A., & Niv, M. Y. (2017). The taste of toxicity: A quantitative analysis of bitter and toxic molecules. IUBMB Life, 69(12), 938-946. Nuemket, N., Yasui, N., Kusakabe, Y., Nomura, Y., Atsumi, N., Akiyama, S., Nango, E., Kato, Y., Kaneko, M. K., Takagi, J., Hosotani, M., & Yamashita, A. (2017). Structural basis for perception of diverse chemical substances by T1r taste receptors. Nature Communications, 8. Pase, M. P., Himali, J. J., Beiser, A. S., Aparicio, H. J., Satizabal, C. L., Vasan, R. S., Seshadri, S., & Jacques, P. F. (2017). Sugar- and Artificially Sweetened Beverages and the Risks of Incident Stroke and Dementia A Prospective Cohort Study. Stroke, 48(5), 1139-+. Ripphausen, P., Nisius, B., & Bajorath, J. (2011). State-of-the-art in ligand-based virtual screening. Drug Discov Today, 16(9-10), 372-376. Rojas, C., Todeschini, R., Ballabio, D., Mauri, A., Consonni, V., Tripaldi, P., & Grisoni, F. (2017). A QSTR-Based Expert System to Predict Sweetness of Molecules. Front Chem, 5, 53. Sastry, M., Lowrie, J. F., Dixon, S. L., & Sherman, W. (2010). Large-scale systematic analysis of 2D fingerprint methods and parameters to improve virtual screening enrichments. J Chem Inf Model, 50(5), 771-784. Spaggiari, G., Di Pizio, A., & Cozzini, P. (2020). Sweet, umami and bitter taste receptors: State of the art of in silico molecular modeling approaches. Trends in Food Science & Technology, 96, 21-29. Sterling, T., & Irwin, J. J. (2015). ZINC 15--Ligand Discovery for Everyone. J Chem Inf Model, 55(11), 2324-2337. Suez, J., Korem, T., Zeevi, D., Zilberman-Schapira, G., Thaiss, C. A., Maza, O., Israeli, D., Zmora, N., Gilad, S., Weinberger, A., Kuperman, Y., Harmelin, A., Kolodkin-Gal, I., Shapiro, H., Halpern, Z., Segal, E., & Elinav, E. (2014). Artificial sweeteners induce glucose intolerance by altering the gut microbiota. Nature, 514(7521), 181-186. Temussi, P. A. (2011). Determinants of sweetness in proteins: a topological approach. J Mol Recognit, 24(6), 1033-1042. Truchon, J.-F., & Bayly, C. I. (2007). Evaluating Virtual Screening Methods: Good and Bad Metrics for the “Early Recognition” Problem. Journal of Chemical Information and Modeling, 47(2), 488-508. Winnig, M., Bufe, B., Kratochwil, N. A., Slack, J. P., & Meyerhof, W. (2007). The binding site for neohesperidin dihydrochalcone at the human sweet taste receptor. Bmc Structural Biology, 7. Yang, J., Zhang, W., He, B., Walker, S. E., Zhang, H., Govindarajoo, B., Virtanen, J., Xue, Z., Shen, H. B., & Zhang, Y. (2016). Template-based protein structure prediction in CASP11 and retrospect of I-TASSER in the last decade. Proteins, 84 Suppl 1, 233-246. 17
496 497 498 499 500 501 502 503 504 505 506
Yee, K. K., Sukumaran, S. K., Kotha, R., Gilbertson, T. A., & Margolskee, R. F. (2011). Glucose transporters and ATP-gated K+ (KATP) metabolic sensors are present in type 1 taste receptor 3 (T1r3)-expressing taste cells. Proc Natl Acad Sci U S A, 108(13), 5431-5436. Zhang, F., Klebansky, B., Fine, R. M., Liu, H., Xu, H., Servant, G., Zoller, M., Tachdjian, C., & Li, X. (2010). Molecular mechanism of the sweet taste enhancers. Proc Natl Acad Sci U S A, 107(10), 4752-4757. Zhao, G. Q., Zhang, Y., Hoon, M. A., Chandrashekar, J., Erlenbach, I., Ryba, N. J., & Zuker, C. S. (2003). The receptors for mammalian sweet and umami taste. Cell, 115(3), 255-266. Zheng, S. Q., Chang, W. P., Xu, W. X., Xu, Y., & Lin, F. (2019). e-Sweet: A Machine-Learning Based Platform for the Prediction of Sweetener and Its Relative Sweetness. Frontiers in Chemistry, 7.
507 508 509
Figure 1 – Example of varying MW compounds from the true positives data set that were used for evaluation of enrichment and preparation of decoys. 18
510 511 512 513 514 515
Figure 2.A. ROC curves for hT1R2 models, for compounds with MW up to 460 g/mol, using fishbased model (green curve) and mGluR class-C GPCR template (blue curve) models. Red dotted line indicates random enrichment performance. B. ROC curve for hT1R2 open-form model, for compounds with MW of 460-1100 g/mol, the model is colored in magenta. Red dotted line indicates random enrichment performance
19
516 517 518
Figure 3- Ribbon representation of superimposed hT1R2 models, fish-based (5XDMB) model residues are colored in green, default model in blue, glucose ligand in purple.
519
20
520
521 522 523
Figure 4 - A. Ribbon representation of hT1R2, fish-based model in green. B. Fish-based model 2D binding site with docked glucose and residues within 5A from the docked glucose.
524 525 526
Figure 5 – Virtual screening of FooDB dataset against the fish-based model.
21
527
Structure-based screening for discovery of sweet compounds
528
Yaron Ben Shoshan-Galeczki and Masha Y Niv*
529
The Institute of Biochemistry, Food and Nutrition, The Robert H Smith Faculty of
530
Agriculture, Food and Environment, The Hebrew University, 76100 Rehovot and The
531
Fritz Haber Center for Molecular Dynamics, The Hebrew University, Jerusalem, 91904,
532
Israel.
533
*correspondence to
[email protected]
534
Yaron Ben Shoshan - Galeczki: Methodology, Data Curation, Formal Analysis, Resources, Writing, Visualization, Editing
535 536 537
Masha Y Niv: Conceptualization, Writing, Review and Editing, Supervision, Funding acquisition
538
22
539
23
540
24
541
542
25
543 544
Highlights
545
Docking to homology models of VFT domain of human T1R2 was evaluated
546
Medaka fish-based model performed well for compounds below 460 g/mol
547
Model based on open form experimental structures was useful for larger compounds
548 549 550
Screening of FooDB retrieved recently patented sweeteners and provides novel candidates
551 552
26