Molecular interaction fingerprint approaches for GPCR drug discovery

Molecular interaction fingerprint approaches for GPCR drug discovery

Available online at www.sciencedirect.com ScienceDirect Molecular interaction fingerprint approaches for GPCR drug discovery Ma´rton Vass1, Albert J ...

2MB Sizes 38 Downloads 217 Views

Available online at www.sciencedirect.com

ScienceDirect Molecular interaction fingerprint approaches for GPCR drug discovery Ma´rton Vass1, Albert J Kooistra1,2, Tina Ritschel1,2, Rob Leurs1, Iwan JP de Esch1 and Chris de Graaf1 Protein–ligand interaction fingerprints (IFPs) are binary 1D representations of the 3D structure of protein–ligand complexes encoding the presence or absence of specific interactions between the binding pocket amino acids and the ligand. Various implementations of IFPs have been developed and successfully applied for post-processing molecular docking results for G Protein-Coupled Receptor (GPCR) ligand binding mode prediction and virtual ligand screening. Novel interaction fingerprint methods enable structural chemogenomics and polypharmacology predictions by complementing the increasing amount of GPCR structural data. Machine learning methods are increasingly used to derive relationships between bioactivity data and fingerprint descriptors of chemical and structural information of binding sites, ligands, and protein– ligand interactions. Factors that influence the application of IFPs include structure preparation, binding site definition, fingerprint similarity assessment, and data processing and these factors pose challenges as well possibilities to optimize interaction fingerprint methods for GPCR drug discovery. Addresses 1 Division of Medicinal Chemistry, Faculty of Sciences, Amsterdam Institute for Molecules, Medicines and Systems, Vrije Universiteit Amsterdam, De Boelelaan 1108, 1081 HZ Amsterdam, The Netherlands 2 Centre for Molecular and Biomolecular Informatics (CMBI), Radboud University Medical Center, Geert Grooteplein-Zuid 26-28, 6525 GA Nijmegen, The Netherlands Corresponding author: de Graaf, Chris ([email protected])

Current Opinion in Pharmacology 2016, 30:59–68 This review comes from a themed issue on New Technologies Edited by David E Gloriam and Christofer S Tautermann

http://dx.doi.org/10.1016/j.coph.2016.07.007 1471-4892/# 2016 Elsevier Ltd. All rights reserved.

Introduction The recent surge of publicly available G protein-coupled receptor (GPCR) structures has opened up the door to structure-based ligand discovery for GPCRs [1], one of the most important drug target families. Structural data can be combined with the wealth of available molecular pharmacology, protein–drug association, mutagenesis, www.sciencedirect.com

gene expression, and other biochemical data in integrated chemogenomics approaches. This combination provides new possibilities to obtain insights into the details of GPCR function and for the discovery of new protein targets and biologically active molecules [2,3,4,5]. Beyond case studies tailored to specific targets, we have recently seen more generally applicable structure-based methodologies for improved binding mode prediction and virtual screening of small molecule ligands, SAR elucidation, and functional activity prediction [6,7,8,9,10,11,12]. Protein–ligand interaction fingerprints (IFPs) are binary 1D representations of the 3D structure of protein–ligand complexes encoding the presence or absence of specific interactions between the binding pocket amino acids and the ligand. For example, if a ligand forms an H-bond with a specific amino acid of the binding pocket, the respective bit in the fingerprint will be 1, if the interaction is missing, it will be 0. Fingerprints derived from ligands, proteins, or protein–ligand complexes are computer-digestible representations of (bio)chemical structures and are particularly well suited for working with large amounts of data allowing for rapid processing and comparisons [13,14,15,16, 17,18,19]. IFPs may be straightforwardly used for machine learning applications in cheminformatics [20]. Comparison of binary fingerprints is most often performed using the Tanimoto coefficient (Tc) which is the number of common bits in the two fingerprints divided by the number of bits present in at least one of the fingerprints. Tc ranges from 0 for dissimilar binding interactions to 1 for identical interactions [13]. 0.6 is generally considered as a minimum cutoff for binding mode similarity [15] and 0.7–0.75 is generally used for structure-based virtual screening [6,7,8,9]. A general workflow of fingerprint-based chemogenomics applications is presented in Figure 1. This workflow covers data collection steps from structural and bioactivity databases, various different fingerprint definitions based on ligand chemical structure, protein–ligand interactions or pocket pharmacophore, and data analysis steps employing simple similarity calculation or fingerprintbased machine learning methods. In the present article we will focus on recent advances and applications of protein–ligand interaction fingerprint-based methods to GPCR research and machine learning methods trained on these fingerprints. We will furthermore discuss the possibilities and pitfalls of fingerprint-based chemogenomics methods. Current Opinion in Pharmacology 2016, 30:59–68

60 New Technologies

Figure 1

X-ray structure

ChEMBL

binding pocket

ligand

fragments

pharmacophore triplets/quadruplets

0101011101 . . . 01010001100 chemical fingerprint

ionic

nd y

H-bo

. y an rom a

an

hastin

g

interactions

1010000 . . . 1000101 1000000 interaction fingerprint

t1,t2,t3,t4,d1,d2,d3,d4,d5,d6,V

0100000101 . . . 01000001100 binding pocket fingerprint

reference fingerprint

active/inactive ligands

related or unrelated

bioactivities

protein binding sites

A ∩B Tc = A+B – A ∩B rescoring by similarity interaction clustering etc.

machine learning

proteochemometrics binding site comparison etc.

virtual screening, polypharmacology prediction, de novo drug design Current Opinion in Pharmacology

General workflow of protein–ligand interaction fingerprint based chemogenomic methods. Publicly available databases can be mined for protein– ligand complexes and related ligands or proteins (e.g. PDB [21] for structural information and ChEMBL [22] for bioactivity data). This structural information can be encoded in digital fingerprint representations based on ligand chemical structure [13], protein–ligand interactions (e.g. [14,15,16,17]) or binding pocket pharmacophores (e.g. [18,19]). Fingerprints can be compared and clustered using similarity assessment metrics such as the Tanimoto coefficient [13], data from multiple sources can be integrated [23,24], and machine learning methods may be applied for bioactivity prediction [20], virtual screening, polypharmacology prediction, or de novo design. Examples of applications of the different methods are provided in Table 1 and Figures 2 and 3.

Current Opinion in Pharmacology 2016, 30:59–68

www.sciencedirect.com

Interaction fingerprints for GPCR drug discovery Vass et al. 61

To illustrate the use and difference between chemical and protein–ligand interaction fingerprints two examples are shown in Figure 2 in which receptors belonging to different subfamilies share relatively high IFP similarity while the chemical similarities of the co-crystallized ligands are low. In the case of histamine H1 receptor and muscarinic acetylcholine M2 receptor the high IFP similarity (0.63) stems from the local similarity of the orthosteric pockets. On the other hand, for the aminergic serotonin 5-HT1B receptor and peptidergic nociceptin (NOP) receptor pair the high IFP similarity (0.63) stems from the overall similar shape and contacts in the binding pocket since the larger ligands also extend to the extracellular vestibule of these receptors and have similar contacts with extracellular loop 2 (ECL2). These examples show how structurally dissimilar ligands can share similar interaction patterns and high IFP similarity.

Development of interaction fingerprints The first structural interaction fingerprint (SIFt) algorithm [14] was developed by Deng et al. in 2004 for the clustering of kinase-inhibitor complexes. This fingerprint contained seven bits for each interacting amino acid for predefined interaction types (any, backbone,

sidechain, polar, hydrophobic, H-bond donor/acceptor). A more recent implementation of SIFt was described by Mordalski et al. extending the interaction fingerprint implementation by 2 bits to encode aromatic and charged interactions and implementing technical improvements [25]. SIFt allowed pointing out crucial amino acids involved in interactions with antagonists docked into serotonin 5-HT7 receptor homology models. LIFt, a similar method to SIFt was described by Cao and Wang with 10 bits per amino acid, who used it to predict kinase targets for ligands [26]. Another commonly used variant of the IFP was developed by Marcou and Rognan [15] in 2006 using a 7-bit fingerprint encoding hydrophobic, aromatic face-to-face and edge-to-face, H-bond donor/acceptor and cationic/ anionic interaction types. The geometric definitions are user settable and less common interaction types can also be added to the fingerprint (i.e. weak H-bonds, cation-p, and metal complexation). The same group also developed triplet interaction fingerprints (TIFPs) in which triangles of interaction points are mapped to a fixed-length fingerprint of 210 bits [16]. This method was designed especially for binding site comparison, but also performed

Figure 2

(a) 1.0

(b)

Class A GPCRs

(c)

High

4%

4%

0.8

Density

IFP

0.6

0.4 0.2

91%

1%

0.0 0.0

0.2

0.4

0.6

0.8

1.0

Low

ECFP–4

Aminergic GPCRs

High

1.0

10%

10%

(d)

0.8

D3.32

Density

IFP

0.6

0.4

H1– Doxepin M2– QNB

D3.32 5-HT1B – DHE

0.2

77%

NOP – C-24

3%

0.0 0.0

0.2

0.4

0.6

0.8

1.0

Low

W6.48

F/N6.52

F/V6.55

10001011000000 1010000 1010000 10001011010000 1001100 1000000 W6.48

F/Q6.52

S/V6.55

1000101 1000000 1010000 1000000 1000101 1000000 1000000 1000000

W4.56

F5.47

1000000 1000000 1100000 1000000 Y/D2.63

T7.38

1000000 1000000 1000000 1000000

1 - Apolar 2 – Ar face-to-face 3 – Ar edge-to-face 4 - HBD (protein) 5 - HBD (ligand) 6 - Neg. charge (ligand) 7 - Neg. charge (protein)

ECFP–4 Current Opinion in Pharmacology

GPCR family-wise binding interaction comparison. (a) Class A GPCRs (excluding rhodopsin) and aminergic GPCRs pairwise comparison of binding pockets in terms of ligand chemical similarity (ECFP4 fingerprint) vs. interaction fingerprint (IFP) similarity using structure-based alignment of sequences and colored by scatter point density. (b) Example H1 (yellow, PDB: 3RZE) and M2 (cyan, PDB: 3UON), ligand similarity 0.11, IFP similarity 0.63. (c) Example 5-HT1B (green, PDB: 4IAQ) and NOP (magenta, PDB: 4EA3), ligand similarity 0.24, IFP similarity 0.63. Structures were aligned on the ligand contacting residues only. (d) Interaction fingerprints are shown for selected pocket residues and interactions are colored according to the interaction types. www.sciencedirect.com

Current Opinion in Pharmacology 2016, 30:59–68

62 New Technologies

similarly well to IFP in post-processing of docking results. An online tool, sc-PDB-frag was created using TIFPs for the identification of bioisosteric replacements based on conserved interaction patterns [27]. Hirokawa and Sato used a simple contact atom count based implementation of protein–ligand interaction fingerprints (PLIFs) to improve homology modeling of GPCRs [28]. In their study multi-template homology modeling was followed by retrospective virtual screening for consensus PLIF of the active ligands, and was tested for the modeling of the serotonin 5-HT2A receptor. A substantially different approach to define structural protein–ligand interaction fingerprints (SPLIFs) was given by Kireev and Da [17]. This method takes into account all protein–ligand atom contacts, then extends the neighborhoods of these atoms by two bonds in an ECFP2-like fashion and maps the resulting substructures onto the fingerprint. The advantage is that all possible protein–ligand interactions are accounted for and no predefined geometric criteria are needed to define interactions. The method resulted in similar or higher early enrichment in virtual screening to traditional interaction fingerprints. Currently, various implementations of interaction fingerprints are available as post-processing tools or integrated methods in commercial software packages and docking software or as standalone software or scripts from the original authors facilitating their use. IFP-based methods and their applications are summarized in Table 1. It is also important to note that PDB-wide protein–ligand interaction databases are also available, such as sc-PDB [29], GPCRdb [30], KLIFS [31] and PDEStrIAn [32] containing interactions in the IFP format, CREDO [33] containing interactions in the SIFt format, and Pocketome [34] containing aligned binding sites with consensus contact residues and annotated interaction types. These databases also allow searching for binding site and interaction similarity to selected reference binding pockets.

Applications to GPCR virtual screening IFPs have been used in various studies on GPCRs increasing binding mode prediction accuracy and improving the ranking of docked compounds based on similarity to validated interaction patterns (Table 1) [6,7,8,9,10, 11,12]. In the first crystal structure based GPCR fragment screening against the histamine H1 receptor a customized hierarchical virtual screening workflow using combined PLANTS docking and IFP scoring (similarity to the doxepin reference and required interaction with D3.32, an important anchor for positively charged aminergic ligands) gave high retrospective enrichment of actives. The prospective application resulted in an unprecedented biologically confirmed hit rate of 73% (19 actives out of 26 tested) providing novel ligands with binding affinities Current Opinion in Pharmacology 2016, 30:59–68

ranging from 6 nM to 10 mM [6]. A follow-up study showed that the individual scoring methods also provided favorable experimentally validated hit rates (61% for IFP and 45% for PLANTS scoring alone) [7]. 24 from the 48 compounds tested in this study were proven active with binding affinities between 90 nM and 20 mM. In a virtual fragment screening against the histamine H4 receptor 2D and 3D ligand-based methods and docking into homology models built with b2 and H1 crystal structures as templates were found to provide complementary hit sets [8]. Prospective structure-based virtual screening using two homology models with two reference IFPs resulted in 37 virtual hits of which 9 were experimentally validated as actives with binding affinities ranging from 144 nM to 7 mM. In another fragment screening campaign against the histamine H3 receptor PLANTS and GOLD docking combined with IFP post-processing gave satisfactory results, but ligand-based and structure-based FLAP (Fingerprints of Ligands and Proteins [18]) models were found to give even higher enrichments [10]. The FLAP method uses linear discriminant analysis (LDA) to build classification models on pharmacophore fingerprints derived from protein–ligand interactions. Prospective virtual screening of positively charged fragment-like molecules from ZINC [36] interacting with D3.32 using consensus FLAP models resulted in 18 experimentally confirmed hits out of 29 purchased compounds with affinities between 527 nM and 10 mM. A general virtual screening workflow as well as examples of histamine receptor hits are shown in Figure 3. The recent advances in crystallization of specific functional states of GPCRs have also facilitated structurebased prediction of GPCR ligand function. In a recent study [11], 31 adrenergic b1 and b2 receptor crystal structures were used for docking. IFPs of these complexes were separated by functional activity in clustering, also predicting functional activities of fragment ligands with unreported activities. Analysis of enrichment factors indicated that reference IFPs of ligands of a specific functional activity type have a higher prominence to select ligands of the same activity type. The predicted IFP of the full agonist norepinephrine gave the highest retrieval rate of agonists over antagonists. In a prospective study virtual screening was also performed to identify full or partial agonists of the b2 receptor using an active state X-ray structure of b2 (PDB: 3P0G) and a similar combined scoring approach as in ref. [6,7]. This provided 53% experimentally validated hit rate (18 true agonists from 34 selected virtual hits). Additionally 8 agonists were identified using the IFP or PLANTS scoring approach alone with 44% and 39% hit rate, respectively. EC50 of hits ranged from 40 nM to 30 mM including also novel, non-ethanolamine scaffolds (see Figure 3). www.sciencedirect.com

www.sciencedirect.com

Table 1 Protein–ligand interaction fingerprint-based chemogenomics methods with retrospective validation and prospective applications to GPCR ligand discovery and design. The interaction fingerprint method, the descriptors, the evaluation method, and the GPCR target(s) are reported for all cases. Furthermore, details about the prospective virtual screening protocol, the hit rate, and the hits are reported for prospective ligand discovery applications. Method

IFP

Descriptors a

7 bits/aa

Comparison/ML model a

Tc

Protein target b

H1

Prospective screening database

Hit selection

a

Active/Tested (Hit rate)

ZINC (fragment-like, basic)

D3.32 interaction IFP Tc  0.75 PLANTS  90 novelty, visual

19/26 (73%)

Example prospective hit

HN

IFP

7 bits/aa

Tc

b2, H1

ZINC (fragment-like, basic)

D3.32 interaction IFP Tc  0.75 PLANTS  90 novelty, visual

18/34 (53%)

IFP

7 bits/aa

Tc

H4

ZINC (fragment, basic, non-reactive)

D3.32 interaction IFP Tc  0.733 or 0.810 (2 ref. IFPs) novelty, visual

9/37 (24%)

O

H N

PubChem CID of hit, reference ligand, most similar known ligand

pKi range of hits c

Ref.

Hit: 23722970 Ref: 667477 Sim: 15723

5.6-8.2

[6]

Hit: 1988029 Ref: 45483813 Sim: 9927453

4.5-7.4 (pEC50, ex. 5.4)

[7]

Hit: 3840250 Ref: 25147772 Sim: 12686479

5.2-6.8

[8]

Hit: Ref: 10054055 Sim: -

5.4-5.5 (pKB)

[9]

Hit: 8834915 Ref: 11313837 Sim: 10177656

5.0-6.3

[10]

Hit: 11581936 Ref: Sim: -

4.9-6.8 (pIC50)

[35]

NH N

N N

IFP

Current Opinion in Pharmacology 2016, 30:59–68

FLAP

PROFILER

7 bits/aa

Pharmacophore quadruplet bits

ECFP4 FP, 7 bits/aa IFP

Tc

LDA

Tc, SVM, SVR

GCGR

H3

CLT1 and 4370 other targets

ZINC (similar physchem prop. actives)

K2.53/2.60b/ E6.48/6.53b interaction IFP Tc > 0.7 GoldScore > 30 clustering/visual

2/23 (9%)

ZINC (fragment, non-reactive)

LB2 FLAP-LDA score > 0.5 SB1 FLAP-LDA score > 0.5 novelty, visual All positive classification results, availability of cpds and assays

18/29 (62%)

Integrity db clinical candidates

O

O

O

OH

N N

N N

N

S

5/10 (50%)

N

N N N

O

Interaction fingerprints for GPCR drug discovery Vass et al. 63

N

O

– –







– –







– –







– –





– 5-HT2A

5-HT6/7

5-HT7

Correlation filtered 9 bits/aa

1 bit/aa (contact atom count) PLIF

SIFt

Short SIFt

Ensemble average SIFt Weighted average of NB, SMO, kNN, J48, RF Generalized Tc

b1/2 A1/2A/2B/3 Tc LDA

7 bits/aa Pharmacophore quadruplet bits 9 bits/aa IFP FLAP

Current Opinion in Pharmacology 2016, 30:59–68

a aa, amino acid; FP, fingerprint; ML, machine learning; Tc, Tanimoto coefficient; LDA, Linear Discriminant Analysis; SVM, Support Vector Machine; SVR, Support Vector Regression; NB, Naı¨ve Bayes; SMO, Sequential Minimal Optimization; kNN, k-Nearest Neighbor; J48, Decision Tree J48; RF, Random Forest. b The target for which experimental results are presented in the table shown in bold. c Bioactivity of example hit indicated in brackets when not highest in the interval, non-pKi activity type indicated in brackets.

– –

[28]

– –

[24]

– –

[25]

– – – –

[11] [12]

pKi range of hits c Descriptors a Method

Table 1 (Continued )

Comparison/ML model a

Protein target b

Prospective screening database

Hit selection

a

Active/Tested (Hit rate)

Example prospective hit

PubChem CID of hit, reference ligand, most similar known ligand

Ref.

64 New Technologies

Ligand, structure and pharmacophore-based FLAP models were used in retrospective virtual screening evaluation studies against the four adenosine receptor subtypes [12]. The general predictivity was in the order A2A > A2B > A3 > A1 for the different subtypes. The ligandbased FLAP method provided superior global enrichment than chemical similarity or shape-based similarity, although early enrichments were similar. In structurebased screening FLAP outperformed PLANTS and performed equally well or better to GOLD. SIFts were used in analyzing virtual screening results of natural compounds against the recently solved protease-activated receptor 1 (PAR1) receptor crystal structure [37]. IFPs were used in virtual screening against the glucagon receptor (GCGR) [9] before any class B crystal structures were available. A hierarchical screening cascade was used including physicochemical property filters, ROCS ligandbased similarity filter, GOLD docking and IFP similarity to predicted reference compound binding modes. Out of the 23 compounds tested, two proved to be negative allosteric modulators (NAMs) of GCGR with IC50, of 24 and 65 mM and interestingly one virtual hit was found to be also a positive allosteric modulator (PAM) of the related glucagon-like peptide GLP-1R receptor with an EC50 of 1.3 mM.

Machine learning based on interaction fingerprints Simple regression models have been around since the late 19th century but with the increasing amount of publicly available data and computing power a new family of algorithms collectively called machine learning has emerged in the last decade suitable for analyzing large amounts of data and building predictive models also for bioactivity prediction. Machine learning will also be a useful tool for the integrated analysis of chemical, interaction and binding pocket fingerprints. Marcou and Rognan showed the first application of this approach by training a simple linear Naı¨ve Bayesian (NB) classifier on multiple reference and decoy pose IFPs clearly outperforming docking scoring functions and single-reference IFPs in fragment virtual screening against CDK2 kinase [15]. The LDA method implemented in FLAP is also a simple linear classifier, but does not assume variables to be independent, and is therefore more flexible in chemical applications [10]. The Rognan group recently also used a supervised 3-layer neural network (NN) with a resilient propagation algorithm trained on ISIDA fragment descriptors to predict the IFPs of ligands [38]. The NN was tested on IFPs of CDK2 and p38a kinase and heat shock protein HSP 90a structures, but as more GPCR X-ray structures become available, this approach can also be extended to this domain. Models were able to correctly predict about two-thirds of all interaction bits and predicted IFPs of www.sciencedirect.com

Interaction fingerprints for GPCR drug discovery Vass et al. 65

Figure 3

(a)

(b)

commercial compounds ZINC

H1R

H 4R

filtering physico-chemical properties

H1/4 β2

H3

docking PLANTS

FLAP models ligand-based and structure-based

IFP calculation

filtering D3.32 interaction consensus scoring PLANTS score, IFP similarity

β2R

H3R

consensus scoring ligand-based and structure-based filtering novelty

visual inspection Current Opinion in Pharmacology

Application of interaction fingerprints in prospective virtual screening. (a) General workflow for prospective virtual screening studies using docking and protein–ligand interaction fingerprint or the FLAP method and (b) examples of resulting hits against the histamine H1, H3, H4 and b2 receptors [6,7,8,10]. For the H3 receptor the interaction field overlap from the FLAP models is shown (yellow: hydrophobic region, blue: protein acceptor region). Ligand structures are shown in Table 1 (H1R: CID 23722970, H4R: CID 3840250, b2R: CID 1988029, H3R: CID 8834915).

ligands afforded significant enrichment of actives against decoys when scored by Tanimoto similarity to X-ray IFPs. The authors note that no improvement was achieved for predicting biological activities, but the predicted IFPs could be used to select correct docking poses. Smusz et al. combined a short version of the 9-bit SIFts (significant bits selected by correlation-based feature selection) with Spectrophores descriptors (a set of values calculated from atomic partial charges, lipophilicity, shape deviations and electrophilicity), then trained Naı¨ve Bayes (NB), Sequential Minimal Optimization (SMO), k-Nearest Neighbor (IBk), Decision Tree (J48) and Random Forest (RF) algorithms on these descriptors to predict active and inactive ligands [24]. 10 homology models of the serotonin 5-HT6 and 5-HT7 receptors based on various templates were used for docking known active and inactive ligands from ChEMBL and decoys from ZINC. All docking and machine learning results were combined using the weighted average method to give the active/inactive class assignment with high classification accuracies. www.sciencedirect.com

Meslamani et al. reported PROFILER, an automated pipeline for polypharmacology prediction on 4371 targets using a range of modeling tools with a predefined decision tree, which chooses ligand-based (SVM classification, SVR affinity prediction, nearest neighbors interpolation, shape similarity) or structure-based models (docking, interaction fingerprint similarity, pharmacophore) based on the quality and quantity of available data [35]. New target associations were discovered, for example PF2545920, a claimed selective PDE10A inhibitor was found to bind the cysteinyl leukotriene type 1 receptor (CLT1, IC50 = 142 nM).

Challenges and possibilities of using interaction fingerprints for optimizing GPCR drug discovery While the successful applications presented above aptly underline the utility of chemogenomics approaches and protein–ligand interaction fingerprints in virtual screening against GPCRs, a few potential sources of errors have to be addressed. These errors can originate from GPCR Current Opinion in Pharmacology 2016, 30:59–68

66 New Technologies

specific reasons such as correct sequence alignment and binding pocket definition or technical issues such as correct structure preparation, implementation differences, and fingerprint similarity assessment. A potential source of error when comparing fingerprints derived from multiple (homologous) protein structures is the definition and alignment of the binding pocket to ensure the comparability of the fingerprint. Each bit in the fingerprints must correspond to structurally equivalent positions in the binding sites. As GPCR structures from multiple subfamilies were elucidated kinks, bulges, and constrictions within the transmembrane helices were identified and the dogma of a gapless sequence alignment was contested [39]. The structure-based GPCRdb sequence numbering was proposed to overcome this issue also benefitting interaction fingerprint approaches. For cross-family comparisons a generic definition of the GPCR transmembrane binding pocket is also important e.g. for targeting allosteric pockets in class C receptors [40]. It has also become evident that not only the TM pocket may be targeted. For example, the muscarinic M2 receptor was co-crystallized with the allosteric modulator LY2119620 in the extracellular vestibule (PDB: 4MQT [41]), while the purinergic P2Y1 receptor (PDB: 4XNV, complex with BPTU [42]) and the glucagon receptor (PDB: 5EE7, complex with MK-0893 [43]) feature allosteric sites on the outside of the heptahelical bundle for which customized fingerprint definitions will be required. A prerequisite for all structure-based methods is the correct preparation of protein and ligand structures. As protein–ligand interaction fingerprint definitions are dependent on correct ligand topology and atom typing, special care should be taken to ensure its correctness [31]. Furthermore, one has to be consistent in comparing fingerprints generated using the same feature definitions. Comparison of fingerprints derived from molecules with large differences in size should also be done with care. This could, for example, be an issue when docking smaller ligands in the pocket of the serotonin 5-HT1B/2B receptors, for which the currently available crystal structures contain large ergotamine-like reference ligands. Defining a smaller subpocket, fragmenting the reference ligand, or using a different similarity coefficient (e.g. Tversky, McConnaughey) can alleviate this issue [13,23]. The customizable design of the IFP docking post-processing protocol furthermore offers possibilities to apply it to virtual fragment screening or the identification of ligands selectively targeting protein specific binding sites [32] or stabilizing specific conformational states of a protein [11]. One way to consider multiple possible binding modes for a docked ligand is to use multiple reference IFPs e.g. from multiple crystal structures or by using validated modeled binding modes. Then different consensus scoring schemes Current Opinion in Pharmacology 2016, 30:59–68

may be used to rank docking poses (e.g. as demonstrated for ligand-based fingerprint screening against the histamine H1 and H4 receptors, and the ligand-gated ion channel 5-HT3AR [23]). Such methods are also used for data fusion in multiconformer docking from molecular dynamics trajectories. Finally, visual inspection of the top ranking virtual hits is often carried out to eliminate unlikely poses not filtered out by other methods.

Conclusions Various protein–ligand interaction fingerprint-based approaches have been reported and the advances of G protein-coupled receptor crystallography have enabled the application of interaction fingerprints also for GPCR drug discovery. Fingerprint methods have proven useful for improved binding mode prediction, and multiple successful prospective virtual screening campaigns against GPCRs underline their impact on GPCR ligand discovery. Furthermore, IFPs have been used in functional activity prediction of GPCR ligands and can guide the modeling of GPCR structures that have not yet been solved experimentally. The highly customizable nature of IFPs makes them suitable tools for ligand discovery for novel binding sites or unexplored drug targets, but also requires careful setup for optimal application. Machine learning methods coupled to interaction fingerprints allow the combination of chemical, interaction and binding pocket fingerprints for ligand activity prediction. We can probably expect a rise of data-driven methods to predict GPCR-ligand interactions in the future, integrating ligand-based and structure-based models and data sets from diverse data sources.

Acknowledgements This research was financially supported by the Netherlands eScience Center (NLeSC)/NWO (Enabling Technologies project: 3D-e-Chem, grant 027.014.201) to C.d.G, and The Netherlands Organization for Scientific Research (NWO CW TOP-PUNT grant 718.014.002, 7 ways to 7TMR modulation), to R.L. M.V. A.J.K., I.J.P.d.E., R.L., and C.d.G. participate in the European Cooperation in Science and Technology Action CM1207 [GPCR-Ligand Interactions, Structures, and Transmembrane Signaling: A European Research Network (GLISTEN)] and the GPCR Consortium (gpcrconsortium.org).

References and recommended reading Papers of particular interest, published within the period of review, have been highlighted as:  of special interest  of outstanding interest 1.

Kooistra AJ, Roumen L, Leurs R, de Esch IJP, de Graaf C: From heptahelical bundle to hits from the haystack: structure-based virtual screening for GPCR ligands. Methods Enzymol 2013, 522:279-336.

2.

Kooistra AJ, Kuhne S, de Esch IJP, Leurs R, de Graaf C: A structural chemogenomics analysis of aminergic GPCRs: lessons for histamine receptor ligand design. Br J Pharmacol 2013, 170:101-126.

3.

Piscitelli CL, Kean J, de Graaf C, Deupi X: A molecular pharmacologist’s guide to G protein-coupled receptor crystallography. Mol Pharmacol 2015, 88:536-551. www.sciencedirect.com

Interaction fingerprints for GPCR drug discovery Vass et al. 67

4.

Tautermann CS: GPCR structures in drug design, emerging opportunities with new structures. Bioorg Med Chem Lett 2014, 24:4073-4079.

5.

de Graaf C, Vischer HF, de Kloe GE, Kooistra AJ, Nijmeijer S, Kuijer M, Verheij MHP, England PJ, van Muijlwijk-Koezen JE, Leurs R et al.: Small and colorful stones make beautiful mosaics: fragment-based chemogenomics. Drug Discov Today 2013, 18:323-330.

6.

de Graaf C, Kooistra AJ, Vischer HF, Katritch V, Kuijer M, Shiroishi M, Iwata S, Shimamura T, Stevens RC, de Esch IJP et al.: Crystal structure-based virtual screening for fragment-like ligands of the human histamine H1 receptor. J Med Chem 2011, 54:8195-8206.

Kooistra AJ, Vischer HF, McNaught-Flores D, Leurs R, de Esch IJP, de Graaf C: Function-specific virtual screening for GPCR ligands using a combined scoring method. Sci Rep 2016, 6:28288. Combining an energy-based docking scoring function with a molecular interaction fingerprint method gives higher hit rates in prospective virtual screening studies for histamine H1 receptor antagonists/inverse agonists and b2 adrenoreceptor agonists than individual scoring approaches.

7. 

8.

Istyastono EP, Kooistra AJ, Vischer HF, Kuijer M, Roumen L, Nijmeijer S, Smits RA, de Esch IJP, Leurs R, de Graaf C: Structurebased virtual screening for fragment-like ligands of the G protein-coupled histamine H4 receptor. Med Chem Commun 2015, 6:1003-1017.

9.

de Graaf C, Rein C, Piwnica D, Giordanetto F, Rognan D: Structure-based discovery of allosteric modulators of two related class B G-protein-coupled receptors. ChemMedChem 2011, 6:2159-2169.

10. Sirci F, Istyastono EP, Vischer HF, Kooistra AJ, Nijmeijer S, Kuijer M, Wijtmans M, Mannhold R, Leurs R, de Esch IJP et al.: Virtual fragment screening: discovery of histamine H3 receptor ligands using ligand-based and protein-based molecular fingerprints. J Chem Inf Model 2012, 52:3308-3324. 11. Kooistra AJ, Leurs R, de Esch IJP, de Graaf C: Structure-based  prediction of G-protein-coupled receptor ligand function: a b-adrenoceptor case study. J Chem Inf Model 2015, 55:1045-1061. Detailed analysis of b-adrenoceptor crystal structures with ligands of different functional activities. It was shown that specific crystal structure/ reference interaction fingerprint combinations perform particularly well in the discrimination of antagonists/inverse agonists and full/partial agonists from each other and from decoys. 12. Sirci F, Goracci L, Rodrı´guez D, van Muijlwijk-Koezen J, Gutie´rrezde-Tera´n H, Mannhold R: Ligand-, structure- and pharmacophore-based molecular fingerprints: a case study on adenosine A1, A2A, A2B, and A3 receptor antagonists. J Comput Aided Mol Des 2012, 26:1247-1266. 13. Cereto-Massague´ A, Ojeda MJ, Valls C, Mulero M, GarciaVallve´ S, Pujadas G: Molecular fingerprint similarity search in virtual screening. Methods 2015, 71:58-63. 14. Deng Z, Chuaqui C, Singh J: Structural interaction fingerprint (SIFt): a novel method for analyzing three-dimensional protein-ligand binding interactions. J Med Chem 2004, 47:337-344. 15. Marcou G, Rognan D: Optimizing fragment and scaffold docking by use of molecular interaction fingerprints. J Chem Inf Model 2007, 47:195-207. 16. Desaphy J, Raimbaud E, Ducrot P, Rognan D: Encoding proteinligand interaction patterns in fingerprints and graphs. J Chem Inf Model 2013, 53:623-637. 17. Da C, Kireev D: Structural protein-ligand interaction  fingerprints (SPLIF) for structure-based virtual screening: method and benchmark study. J Chem Inf Model 2014, 54:2555-2561. Description of a novel ECFP2-like interaction fingerprint benchmarked on DUD datasets including adrenergic b1 receptor. SPLIF performed better or equally well to the classic PLIF interaction fingerprint implemented in MOE, especially in early enrichment. www.sciencedirect.com

18. Baroni M, Cruciani G, Sciabola S, Perruccio F, Mason JS: A common reference framework for analyzing/comparing proteins and ligands. Fingerprints for Ligands and Proteins (FLAP): theory and application. J Chem Inf Model 2007, 47:279-294. 19. Wood DJ, de Vlieg J, Wagener M, Ritschel T: Pharmacophore fingerprint-based approach to binding site subpocket similarity and its application to bioisostere replacement. J Chem Inf Model 2012, 52:2031-2043. 20. Lavecchia A: Machine-learning approaches in drug discovery: methods and applications. Drug Discov Today 2015, 20:318-331. 21. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28:235-242. 22. Bento AP, Gaulton A, Hersey A, Bellis LJ, Chambers J, Davies M, Kru¨ger FA, Light Y, Mak L, McGlinchey S et al.: The ChEMBL bioactivity database: an update. Nucleic Acids Res 2014, 42:1083-1090. 23. Schultes S, Kooistra AJ, Vischer HF, Nijmeijer S, Haaksma EEJ, Leurs R, de Esch IJP, de Graaf C: Combinatorial consensus scoring for ligand-based virtual fragment screening: a comparative case study for serotonin 5-HT3A, Histamine H1, and Histamine H4 receptors. J Chem Inf Model 2015, 55:1030-1044. 24. Smusz S, Mordalski S, Witek J, Rataj K, Kafel R, Bojarski AJ: Multi step protocol for automatic evaluation of docking results based on machine learning methods — a case study of serotonin receptors 5-HT6 and 5-HT7. J Chem Inf Model 2015, 55:823-832. A multi-step docking result analysis workflow is presented in this paper as an alternative to tedious visual inspection of docking poses utilizing interaction fingerprints, spectrophores descriptors and machine learning models demonstrated on 5-HT receptor data sets. 25. Mordalski S, Kosciolek T, Kristiansen K, Sylte I, Bojarski AJ: Protein binding site analysis by means of structural interaction fingerprint patterns. Bioorganic Med Chem Lett 2011, 21:6816-6819. 26. Cao R, Wang Y: Predicting molecular targets for smallmolecule drugs with a ligand-based interaction fingerprint approach. ChemMedChem 2015 http://dx.doi.org/10.1002/ cmdc.201500228. 27. Desaphy J, Rognan D: sc-PDB-Frag: a database of protein ligand interaction patterns for bioisosteric replacements. J Chem Inf Model 2014, 54:1908-1918. Report of the creation of a bioisosteric replacement database utilizing the TIFP fingerprint binding site comparison method affording structurally supported bioisosteric replacements. The online tool is freely available and easy to use. 28. Sato M, Hirokawa T: Extended template-based modeling and evaluation method using consensus of binding mode of GPCRs for virtual screening. J Chem Inf Model 2014, 54:3153-3161. 29. Desaphy J, Bret G, Rognan D, Kellenberger E: sc-PDB: A 3Ddatabase of ligandable binding sites–10 years on. Nucleic Acids Res 2015, 43:D399-D404. 30. Isberg V, Mordalski S, Munk C, Rataj K, Harpsoe K, Hauser AS, Vroling B, Bojarski AJ, Vriend G, Gloriam DE: GPCRdb: an information system for G protein-coupled receptors. Nucleic Acids Res 2015, 44:356-364. 31. Kooistra AJ, Kanev GK, van Linden OPJ, Leurs R, de Esch IJP, de Graaf C: KLIFS. A structural kinase-ligand interaction database. Nucleic Acids Res 2015, 44:D365-D371. 32. Jansen C, Kooistra AJ, Kanev GK, Leurs R, de Esch IJP, de Graaf C: PDEStrIAn: A phosphodiesterase structure and ligand interaction annotated database as a tool for structure-based drug design. J Med Chem 2016 http://dx.doi.org/10.1021/ acs.jmedchem.5b01813. 33. Schreyer AM, Blundell TL: CREDO. A structural interactomics database for drug discovery. Database 2013 http://dx.doi.org/ 10.1093/database/bat049. Current Opinion in Pharmacology 2016, 30:59–68

68 New Technologies

34. Kufareva I, Ilatovskiy AV, Abagyan R: Pocketome An encyclopedia of small-molecule binding sites in 4D. Nucleic Acids Res 2012, 40:D535-D540. 35. Meslamani J, Bhajun R, Martz F, Rognan D: Computational profiling of bioactive compounds using a target-dependent composite workflow. J Chem Inf Model 2013, 53:2322-2333. 36. Sterling T, Irwin JJ: ZINC 15 - ligand discovery for everyone. J Chem Inf Model 2015, 55:2324-2337. 37. Kakarala KK, Jamil K: Screening of phytochemicals against protease activated receptor 1 (PAR1), a promising target for cancer. J Recept Signal Transduct Res 2015, 35:26-45. 38. Chupakhin V, Marcou G, Baskin I, Varnek A, Rognan D: Predicting ligand binding modes from neural networks trained on protein-ligand interaction fingerprints. J Chem Inf Model 2013, 53:763-772. 39. Isberg V, de Graaf C, Bortolato A, Cherezov V, Katritch V,  Marshall FH, Mordalski S, Pin J-P, Stevens RC, Vriend G et al.: Generic GPCR residue numbers — aligning topology maps while minding the gaps. Trends Pharmacol Sci 2015, 36:22-31.

Current Opinion in Pharmacology 2016, 30:59–68

Important conceptual paper for GPCR modelling. Experimental structurebased alignments and gapped transmembrane helix alignments are proposed for all GPCRs facilitating the correct usage of interaction fingerprints also in cross-class chemogenomics methods. 40. Gloriam DE: Chemogenomics of allosteric binding sites in GPCRs. Drug Discov Today Technol 2013, 10:e307-e313. 41. Kruse AC, Ring AM, Manglik A, Hu J, Hu K, Eitel K, Hu¨bner H, Pardon E, Valant C, Sexton PM et al.: Activation and allosteric modulation of a muscarinic acetylcholine receptor. Nature 2013, 504:101-106. 42. Zhang D, Gao Z-G, Zhang K, Kiselev E, Crane S, Wang J, Paoletta S, Yi C, Ma L, Zhang W et al.: Two disparate ligand-binding sites in the human P2Y(1) receptor. Nature 2015, 520 317+. 43. Jazayeri A, Dore´ AS, Lamb D, Krishnamurthy H, Southall SM, Baig AH, Bortolato A, Koglin M, Robertson NJ, Errey JC et al.: Extra-helical binding site of a glucagon receptor antagonist. Nature 2016, 533:274-277.

www.sciencedirect.com