Prediction of cytochrome P450 mediated metabolism

Prediction of cytochrome P450 mediated metabolism

    Prediction of cytochrome P450 mediated metabolism Lars Olsen, Chris Oostenbrink, Flemming Steen J¡rgensen PII: DOI: Reference: S016...

588KB Sizes 12 Downloads 141 Views

    Prediction of cytochrome P450 mediated metabolism Lars Olsen, Chris Oostenbrink, Flemming Steen J¡rgensen PII: DOI: Reference:

S0169-409X(15)00084-8 doi: 10.1016/j.addr.2015.04.020 ADR 12780

To appear in:

Advanced Drug Delivery Reviews

Please cite this article as: Lars Olsen, Chris Oostenbrink, Flemming Steen J¡ rgensen, Prediction of cytochrome P450 mediated metabolism, Advanced Drug Delivery Reviews (2015), doi: 10.1016/j.addr.2015.04.020

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT Prediction of cytochrome P450 mediated metabolism Lars Olsena,*, Chris Oostenbrinkb, Flemming Steen Jørgensena

Department of Drug Design and Pharmacology, University of Copenhagen, Denmark

b

Institute of Molecular Modeling and Simulation, University of Natural Resources and Life

RI PT

a

Sciences, Austria

SC

* Corresponding author. Tel: +45 35336305, Fax: +45 35336040, e-mail address: [email protected]

NU

Abstract

Cytochrome P450 enzymes (CYPs) form one of the most important enzyme families involved in the

MA

metabolism of xenobiotics. CYPs comprise many isoforms, which catalyze a wide variety of reactions, and potentially, a large number of different metabolites can be formed. However, it is often hard to rationalize what metabolites these enzymes generate. In recent years, many different

PT ED

in silico approaches have been developed to predict binding or regioselective product formation for the different CYP isoforms. These comprise ligand-based methods that are trained on experimental CYP data and structure-based methods that consider how the substrate is oriented in the active site or/and how reactive the part of the substrate that is accessible to the heme group is. We will review

CE

key aspects for various approaches that are available to predict binding and site of metabolism (SOM), what outcome can be expected from the predictions, and how they could potentially be

AC

improved. 1. Introduction

Cytochrome P450 enzymes (CYPs) form a ubiquitous enzyme family involved in many different vital biotransformation functions. One important function is the metabolism of xenobiotics, including drug compounds, and for that reason CYPs have been studied extensively. There are 57 genes encoding for different CYP isoforms in humans of which about one fourth contributes to drug metabolism. Five of the isoforms are by far the most important ones and metabolize ~90 % of the marketed drug compounds [1]. All CYPs contain a heme group that carries out the reactions that are often oxidation reactions like aliphatic and aromatic oxidations, heteroatom oxidations and N- and O-dealkylations. These reactions will in most cases generate more soluble compounds that are more easily excreted. Often, more than one specific CYP isoform is involved in the metabolism of a drug compound, leading to different metabolites. Since there are many potential sites in a drug compound that can

1

ACCEPTED MANUSCRIPT react with the CYPs and moreover, different CYP isoforms that can do the reactions, it is not straightforward to predict what metabolites the CYPs generate. However, it is of interest to understand the metabolism in order to avoid toxic metabolites and to optimize drug dosing and administration.

RI PT

The catalytic cycle of the CYP-mediated transformation of a compound to its metabolite is depicted in Figure 1. Initially, the water molecule coordinating the iron (1) is displaced by the substrate molecule (2). Subsequently, the iron (III) is reduced to iron (II) (3) by transfer of an electron from NADPH-P450 reductase. Binding of molecular oxygen gives the iron (II)-O2

SC

intermediate (4), which by further electron transfer (5), addition of two protons (6) and production and release of one water molecule, is transferred to the activated oxygen intermediate, compound I

NU

(7). Compound I is responsible for the insertion of the iron-bound oxygen atom into the substrate (8), and after release of the product, the enzyme is regenerated in its resting state (1). Relevant for

MA

the models described in this review are the formation of the enzyme-ligand complex (2) and the oxidation by compound I (7) yielding the compound I-product complex (8) [2-4]. A prerequisite for generating a metabolite is that the compound has affinity for the CYP

PT ED

isoform. Thus, it should get access to the active site through some tunnels, which then leads to step (2). Next, it is important that the part of the compound that reaches the iron-bound oxygen of compound I can undergo a reaction. Whether that reaction occurs, depends on the orientation of the compound in the active site and how easily it reacts with the activated heme group (steps (7) – (8)).

CE

Simulations of access to the active site have previously been performed, but these are still not generally applicable and quantitative for these purposes due to the limited timescale in the

AC

simulations [5]. Therefore, most in silico methods to predict drug metabolism are either predicting the affinity for the CYP isoform, how the substrate is oriented in the active site of the CYP enzyme, how easily the reaction occurs or combinations of them [4, 6-14]. Although CYPs are present in nearly all tissues, they are most abundant in the liver, where the metabolism of xenobiotics primarily takes place. Ten different CYP isoforms are responsible for virtually all CYP-mediated metabolism in the liver, but to complicate the situation further, most xenobiotics are metabolized by more than one isoform. Generally, the five isoforms 1A2, 2C9, 2C19, 2D6 and 3A4 are considered the most important [1]. The selectivity of the different isoforms is a direct consequence of the nature of their active sites. CYP1A2 has a relatively small, hydrophobic and planar active site and metabolizes planar compounds. CYP2C9 has a preference for small, hydrophobic anions, whereas CYP2D6 has a preference for protonated nitrogencontaining cations. The crystal structure of CYP2C19 reveals that its active site differs significantly from those of CYP2C8 and CYP2C9, despite the high sequence similarity, explaining the differences in selectivity [15]. CYP3A4 is rather promiscuous and metabolizes a variety of 2

ACCEPTED MANUSCRIPT structurally different compounds [10, 16]. The situation is further complicated because CYPs display clinically important polymorphisms [10]. There is a multitude of methods of varying complexity to predict CYP-mediated drug metabolism. Some of the methods use the properties of the ligands to predict the binding affinity or

RI PT

the part of the ligand where the reaction is most likely to occur by correlating molecular descriptors to experimental data using machine learning methods or by performing quantum-mechanical calculations of the activation energy. Other methods use the structure of the enzyme-compound complex to do the same, by docking, molecular dynamics (MD) simulations or through QM/MM

SC

calculations. In this paper we will review the assumptions behind the many different approaches for conducting in silico studies of drug metabolism.

NU

The paper is organized as follows. In Section 2 we focus on ligand- and structure-based models that predict whether a compound binds to a CYP isoform. In Section 3, we consider ligand-

MA

based models to predict the site of metabolism (SOM), i.e. the position where a compound is metabolized. In Section 4 we describe methods that explicitly include properties of the binding pocket and the reactivity in order to predict the SOM. Finally, more advanced models that may

PT ED

improve the predictions are discussed in Section 5.

2. Prediction of binding to CYP enzymes

In ligand-based approaches the activity of each ligand is typically described by a number of

CE

descriptors reflecting different physicochemical properties of the ligand. The activity could refer to the KM of a substrate, the IC50 of an inhibitor, or to the site of metabolism. In this section, we will

AC

focus on models predicting whether a compound is likely to bind to a CYP isoform or not. These models can be used to identify compounds, which may interact with specific CYP isoforms, but often the models may not be able to distinguish between inhibitors and substrates, because the data forming the basis for the models have been derived from assays where only the IC50 values are measured and no details about the mode of action is determined (substrate, inhibitor, competitive ligand etc.). If the data consist of Ki values for inhibitors or Km and Vmax values for substrates the structure-activity relationships may give a much more detailed insight into the enzyme mechanism. Unfortunately, only few datasets contain this kind of experimental data for a reasonable number of compounds. Further, the quality of the data may be problematic. A priori, it is reasonable to assume that datasets obtained by high-throughput screening methods determining primarily binding affinity are less accurate and contain more false-positives and false-negatives than smaller datasets where proper equilibrium constants have been determined. Some datasets are also heavily biased (redundant) with an overweight of inactive compared to active compounds. Chen and Wild [17],

3

ACCEPTED MANUSCRIPT Xie [18] and Xie and Chen [19] have discussed limitations using large high-throughput screening datasets. The choice of atomic and molecular descriptors provides another challenge, because these should describe properties that are relevant for the activity in order to facilitate interpretation and

RI PT

subsequent application of the models. A clear - and with a few exceptions ignored - limitation of most ligand-based approaches is that most descriptors do not include information about the chirality of the compounds and consequently, the models do not distinguish between enantiomers. Attempts to include information, although indirectly, about the 3D nature of the target have led to the use of

SC

3D descriptors providing information about the shape and size of the ligands and thereby indirectly about the shape and size of the target site. The present trend is to include target-derived descriptors

NU

to add information about e.g. the shape or electrostatic characteristics of the target-site to the ligandbased approaches. This will be further discussed in Section 5.

MA

The methods used to derive the structure-activity relationships comprise linear methods like regression analysis, partial least squares (PLS) and non-linear methods like artificial neural networks (ANN) and support vector machines (SVM). It is beyond the scope of this review to

PT ED

describe the different methods. A limitation with several of the reported predictive methods is, as pointed out by Lapins et al. [20], that they are not generally available, i.e. the descriptors are calculated by commercial software packages and that the models are not implemented in publicly

CE

available programs or services.

2.1 Ligand-based models for prediction of binding.

AC

A substantial number of models to predict binding of molecules to a specific CYP isoform have been published. A typical example of this type of work is our recently developed model to predict whether a compound is an inhibitor of the CYP1A2 isoform. The model was developed using a dataset of 7,469 compounds tested for binding to CYP1A2 and classified as inhibitors or non-inhibitors (the PubChem dataset) [21]. The dataset was split in training and test sets of 411 and 7058 compounds, respectively, by D-optimal design. Descriptors were generated using the MOE and VolSurf software yielding 214 and 110 descriptors, respectively. Several machine-learning models were developed with the best model being able to predict 73-76 % of the external test set correctly. The use of descriptors derived from molecular interaction fields enabled us subsequently to identify and illustrate the molecular characteristics between inhibitors and non-inhibitors for CYP1A2. For example, inhibitors of CYP1A2 are generally hydrophobic with only small hydrophilic regions, in agreement with the properties of the CYP1A2 cavity described in the Introduction.

4

ACCEPTED MANUSCRIPT In recent years, models that not only predict whether a compound interacts with a specific CYP isoform, but rather predict the isoform selectivity have been developed. Four groups have developed models for prediction of isoform specificity based on the PubChem quantitative highthroughput screening dataset (PubChem AID: 1851) containing 17,143 compounds with binding

RI PT

data against the five major isoforms 1A2, 2C9, 2C19, 2D6 and 3A4 [22]. Compounds classified as active in the PubChem dataset comprised both inhibitors and substrates.

Cheng et al. [23] developed a model to identify inhibitors for the five isoforms using MACCS keys and FP4 fingerprints representing 166 and 307 substructure patterns, respectively, as

SC

descriptors. Initially they applied four different machine-learning (ML) methods and subsequently they combined these via a back-propagation artificial neural network (BP-ANN). The BP-ANN

NU

combinations yielded generally a better result than each of the ML methods individually or by just averaging the four methods. An additional, although small, improvement could be obtained by

MA

determining the applicability domain by a structural similarity analysis of the training set, and subsequently only predict the activity of compounds within this applicability domain. The best of the BP-ANN methods gave area under the receiver operating characteristic curve (AUC) for the

PT ED

external validation sets of 0.81 (1A2), 0.86 (2C9), 0.84 (2C19), 0.88 (2D6) and 0.79 (3A4) for the five isoforms.

Sun et al. [24] used a Support Vector Machines (SVM) approach to develop a classification model for the same five isoforms. For each isoform the PubChem dataset was randomly split into a

CE

training and a test set containing roughly an equal number of active compounds. They used primarily atom type descriptors (218 different atom types) supplemented with 26 molecular

AC

descriptors. The AUC values for the test sets were 0.93, 0.89, 0.89, 0.85, and 0.87 for 1A2, 2C9, 2C19, 2D6, and 3A4, respectively [24], which is significantly better than the model developed by Cheng et al. In a recent paper Sun et al. [25] applied their method to a library comprising 2870 environmental compounds provided by the National Toxicology Program and the U.S. Environmental Protection Agency. Not surprisingly, a large fraction of these compounds was falling outside the applicability domain of the model, as measured by k-nearest neighbour similarity, as they included industrial chemicals, food-additives, fragrances etc. Nevertheless, the performance was surprisingly good for 1A2, 2C9 and 3A4 with AUC values between 0.82 and 0.84, and acceptable for 2C19 and 2D6 with AUC values of 0.76. Rostkowski, Spjuth and Rydberg [26] also used a SVM approach to develop a classification model, WhichCyp, based on the PubChem dataset. They used molecular signatures, which are describing the topology of each atom, as atomic descriptors. Their best model was slightly better than the best model of Sun et al. with AUC values for the external test sets of 0.95, 0.90, 0.91, 0.88 and 0.92 for the five isoforms 1A2, 2C9, 2C19, 2D6, and 3A4, respectively. Lapins et al. [20] 5

ACCEPTED MANUSCRIPT extended the above approach one step further by describing the five enzymes by an alignmentindependent description of the composition and transition of amino acid properties in the primary protein sequence [27]. Using a proteochemometric approach, i.e. developing a model that is simultaneously based on the interactions between several targets and several molecules comprising

RI PT

a total of 63,391 interactions, they obtained one unified model for prediction of inhibition of each of the five CYP isoforms with AUC = 0.94 for the same external dataset as Cheng et al. [23] used. One distinct advantage with the latter two methods is that they are based on descriptors, which can be calculated with open-source programs/web-servers, and that the predictive models are

SC

available free-of-charge as a web-server (WhichCyp at http://drug.ku.dk/whichcyp) or implemented in an open-source program (the proteochemometric model at http://www.cyp450model.org). The

NU

above methods could be directly compared because they were derived from the same compounds, the PubChem dataset. In the following we briefly describe an additional number of methods

MA

available to external users either via web-servers or implemented in software packages. The Didziapetris et al. [28] model for prediction of CYP 3A4 metabolism is based on data extracted from literature and the PubChem dataset and claim an overall accuracy of 89 % for those

PT ED

compounds in the test set, which fell within the applicability domain. The authors also show that the accuracy can be further improved at the expense of the generality of the model. The model has been implemented in the ACD/Lab software package (http://www.acdlabs.com). The MetaPred server (http://crdd.osdd.net/raghava/metapred/) is an implementation of a model for prediction of CYP

CE

isoform specificity developed by Raghava and coworkers [29] using a SVM approach and 26 CDK descriptors on a dataset on 216 drug molecules only metabolized by one isoform. The model had an

AC

average accuracy of 71 % on an independent test set comprising 146 drug molecules metabolized by one or more isoforms. The Gasteiger group has developed isoCYP (https://www.molecularnetworks.com/products/isocyp), a method for predicting the isoform specificity of CYP 2C9, 2D6 and 3A4 substrates [30]. The authors develop several models, apply different algorithms, and use a variety of descriptors. The best model uses only nine descriptors and gives 83 % correct predictions on an external dataset of 233 compounds. This model has subsequently been extended from being based on 146 to 580 compounds and to predict the specificity of seven CYP isoforms [31]. The authors develop and assess several models, ranging from 78 % correct predictions for 3A4 to 97 % for 2E1. The ADMET Predictor available from Simulations Plus (http://www.simulationsplus.com/) predicts the CYP specificity for nine CYP isoforms with accuracy comparable to the above-mentioned methods. Unfortunately, the method used for the classification has to our knowledge not been described. It is not straightforward to directly compare the above-mentioned methods because they are based on different, and in some cases rather small, datasets and most likely cover different 6

ACCEPTED MANUSCRIPT applicability domains. Furthermore, the statistics reported in the papers are often not detailed enough to allow a proper comparison, e.g. not all report AUC values. A recent review discusses these problems in details [32]. For a comprehensive list of links to models and web-services, see

RI PT

http://www.click2drug.org/.

2.2 Structure-based models for prediction of binding.

In structure-based investigations, the three dimensional structure of the enzyme is used to determine how the compound binds in the enzyme cavity. Various three-dimensional structures

SC

have been determined over the past years and shed light on the properties of the binding cavities of the CYP enzymes. The Protein DataBank (http://www.rcsb.org) now contains more than 30

NU

structures of human CYP-ligand complexes covering examples of all the subfamilies involved in drug metabolism (CYP1A, 1B, 2A, 2B, 2C, 2E and 3A) and comprising a variety of ligands,

reviewed by Wu and coworkers [26].

MA

including several drug compounds. Structural characteristics of human CYPs have recently been

Structural information is important for determining the structural requirements for ligand

PT ED

recognition and binding to the various CYPs, a prerequisite for development of efficient structurebased models by e.g. docking, molecular dynamics (MD) and QM/MM methods. Typically, docking methods take into account interactions like e.g. hydrogen-bonding, electrostatic interactions and van der Waals interactions. In a recent review Chen provides a comprehensive list

CE

of available docking methods and provides evidence for the necessity of using more advanced methods to predict the optimal binding conformation and binding affinity, to supplement docking

AC

studies [33].

Vasanthanathan et al. used the experimentally determined CYP1A2 structure as target structure and 7,469 compounds from PubChem tested for binding to CYP1A2 as dataset to classify compounds as binders or non-binders, based on their predicted binding affinity. Using the docking program GOLD and the Chemscore/Goldscore scoring functions [34], it was observed that if a certain docking score was used as a cut-off value, a separation between the binders and non-binders was achieved (with 62 % true positives and 69 % true negatives). The model was subsequently used in a virtual screening study and showed acceptable enrichment factors [35]. Moreover, it was used to identify inhibitors of CYP1A2 with IC50 values around 20 nM [36]. In a subsequent study, Vasanthanathan et al. used the linear interaction energy (LIE) method to calculate the binding affinity for a series of CYP1A2 ligands. The ligands were docked into the active site of the enzyme and subsequently subjected to MD simulation, thus, allowing both ligand and enzyme to move. The interactions energies for the LIE calculation were derived from the MD trajectory. The root mean square error for the training set was 3.7 kJ/mol and 2.1 kJ/mol for an 7

ACCEPTED MANUSCRIPT external test set, which is remarkably good considering the simplicity of the model [37]. In a similar study on CYP2C9, free-energy estimates were used to predict binding modes for CYP2C9 [38]. A series of compounds was studied that contained both an imidazole moiety and a thiourea moiety, both of which are known to interact favorably with the heme iron. Twelve

RI PT

compounds were docked into the active site of CYP2C9 and up to four different poses, coordinating the heme with the imidazole or the thiourea moiety, were identified. MD simulations were performed for all docking poses and an iterative scheme using the LIE methodology [39] was developed. From this scheme, the individual simulations were weighted based on the free energy of

SC

the ligand in the active site. Apart from the overall binding affinity, the calculations estimate a probability that a given pose occurs [38]. Nine of the 12 compounds in this series prefer to bind in

NU

very similar orientations, while three bind in alternative poses. Moreover, for two of the compounds, multiple binding orientations contribute to the total free energy of binding, suggesting

MA

that they are equally likely to occur. Apparently, the active site of CYP2C9 is quite versatile in the binding modes that it accepts and a dynamic equilibrium between multiple binding modes needs to be taken into account to explain the experimental data [38]. Similar observations were more

PT ED

recently also made for CYP2D6 [40].The LIE method was also applied to describe the homotropic cooperativity of ketoconazole and of aflotoxin B1 binding to CYP3A4, quantifying the cooperative effect in binding to 5 kJ/mol [32, 41].

The examples mentioned above show that it is indeed possible to use structure-based models to

CE

predict binding affinities, but also that the simple models (docking and scoring) may yield less accurate data compared to models based on more advanced computational methods (MD

AC

simulations combined with free energy calculations).

3. Ligand-based models for prediction of the site of metabolism All the CYP enzymes contain a heme co-factor that catalyses the reactions. Various ligandbased methods include a reactivity measure for the prediction of the SOM. In this section, we first address the issues of reactivity which currently usually is treated with density functional theory (DFT) calculations [12, 13] (see Section 3.1). Since it is not only the reactivity that is important for the SOM prediction, we subsequently discuss ligand-based SOM models that include other descriptors as well (see Section 3.2).

3.1 Computational methods for analysing the reactivity Ideally, a DFT calculation of the activation barrier using the entire enzyme-substrate complex should be done. However, these types of calculations are computationally too demanding to be feasible. Therefore, the first calculations addressing this issue were done with the so-called 8

ACCEPTED MANUSCRIPT semi-empirical methods that are a lot faster. In these calculations, an oxygen-containing radical mimicking compound I, e.g. a methoxy radical, was used to determine the activation energy. Ignoring the protein may sound a bit crude, but to some extent these types of calculations capture the ease of forming the radical intermediate, and thus, how likely the reaction is to occur [42, 43].

RI PT

As the computing power increased, it became possible to use more realistic models of compound I. In particular Shaik and coworkers pioneered these types of calculations using DFT on a porphine compound as model for compound I [3, 44, 45] (cf. Figure 2 [46]). With calculated activation energy data, it became possible to benchmark the performance of the more simplified

SC

systems. For example, activation energies obtained for aromatic hydroxylation reactions studied with the simplified systems correlated well with DFT activation energies using the porphine model

NU

(with correlation coefficients in the range of 0.8 [47, 48]). A number of more simple systems have also been correlated with the transition state energies for the aliphatic hydroxylation reactions, such

MA

as the reaction with the methoxy radical, bond dissociation energies, etc. [49]. Moreover, QMderived properties and energies have been used on series of compounds and compared with experimentally determined SOMs [50, 51]. The fact that these more simple QM-based calculations

PT ED

can be used to estimate how prone a chemical group in a compound is to react with the CYP enzymes is quite promising, since these calculations are much faster because the molecular system is smaller and minima are optimized rather than first order saddle points. It is also interesting that the previously used energies derived from using the semi-empirical

CE

methods correlate with the more accurate DFT-derived energies [49]. In addition, Mayeno et al. [52] showed that with the newer semi-empirical methods, even the porphyrine models could be

AC

used to study the reactions. This opens the way for full electronic structure calculations on the entire enzyme-substrate system. However, this has not yet been done, probably because it would still be too time-consuming and it could be problematic to identify the transition states with semi-empirical methods for these complicated electronic configurations, i.e. open shell Fe-containing systems. Instead, efforts focus on the so-called QM/MM calculations (see Section 5) and also on pure DFT calculations on models of the heme interacting with the substrate, ignoring the protein.

3.2 Site of metabolism predictions There are several effects that are neglected in a DFT calculation of a model of the heme interacting with a substrate. For example, polarization from the surrounding amino acids is missing, possible strain of the substrate upon binding is not included, etc. Thus, there are several reasons why DFT alone could turn out not to be able to predict SOMs. Rydberg et al. used DFT to make general rules for the activation energies of drug-like fragments which is the basic idea behind SMARTCyp [53], assuming that substitutions a few single bonds away do not affect the activation 9

ACCEPTED MANUSCRIPT barrier significantly [49]. From 139 DFT calculations on drug-like fragments, 41 activation energy rules were derived. Using these activation energy rules on a set of 394 CYP3A4 substrates it was observed that the sites with low activation energies are often observed as experimental SOMs for the CYP3A4 isoform. With the approach, 76 % of the experimental SOMs were within the two

RI PT

most likely predicted sites of metabolism (referred to as the top-two rank). The reason for using the top-two rank is that there is often no significant difference in the score for the prediction of the first two most likely SOMs.

The reason why such a simple approach seems to capture the metabolites generated by

SC

CYP3A4 might be found in the flexible nature of this isoform allowing substrates to bind in various orientations [54]. Thus, the most reactive part of the substrate seems to be most likely to undergo

NU

metabolism. While this seems a plausible explanation of the CYP3A4 mediated metabolism, it is a bit more surprising that the reactivity is also the major determinant for most CYP isoforms like

MA

CYP1A2, 2A6, 2B6, and 2E1 [55]. These enzymes have previously been thought to have more well-defined binding cavities. The CYP2D6 and 2C9 enzymes on the other hand do not necessarily perform the reaction at the most reactive site of the substrate. Previously described pharmacophore

PT ED

models have already taken this into account [56]. In addition to this, the crystal structures of these enzymes show that the Glu216 and Asp301 residues in CYP2D6 can be involved in the binding of positively charged groups in substrates, e.g. amines [57, 58], and the Arg108 residue in CYP2C9 interacts favourably with negatively charged groups in substrates [59] (see Figure 3). Therefore it is

CE

not sufficient to only consider the reactivity for the prediction of the site of metabolism for these two CYP isoforms. A pragmatic approach to include this knowledge in the SMARTCyp reactivity

AC

model is to have a penalty for sites that are close to positively and negatively charged chemical groups, because these sites will be too distant from the heme. This simple adjustment of the activation energies improves the agreement with the experimental sites of metabolism, to the same level as for the other human CYP isoforms [53, 60, 61]. The SMARTCyp reactivities are determined from DFT calculations of the activation energies, and, accordingly, this approach is not dependent of fitting to experimental SOM data. The modified reactivities in the CYP2C9 and 2D6 models show that it is necessary to include other descriptors than reactivity to implicitly model the enzyme. There are various methods that include a reactivity measure and/or a number of other atomic and molecular descriptors. Typically, advanced machine-learning methods are used to relate the descriptors to the experimental data and thus the models need to be validated carefully afterwards. Moreover, it is not always straightforward to interpret the models afterwards, if many descriptors contribute to the model. Zaretzki and coworkers have developed models to predict SOMs (RS-Predictor and XenoSite) that are constructed on the basis of reactivities (from SMARTCyp) and hundreds of 10

ACCEPTED MANUSCRIPT molecular descriptors, e.g. topological and quantum chemical descriptors [62-64]. The models differ in how the models are constructed. The latest model, XenoSite, is trained on more descriptors than RS-Predictor, is based on machine learning-based neural network methods and is calibrated through leave-one-out cross validation. An improvement of a few percentages is reported (largest

RI PT

for CYP 3A4) compared to using the RS-predictor, SMARTCyp, StarDrop [65], and the method from Schrödinger [66]. Whether this is a significant improvement remains to be investigated using an external test set.

There are other methods that include a reactivity measure such as CypScore that uses AM1-

SC

derived atomic reactivity descriptors [67]. The CypScore models were validated on several external data sets, showing top-2 accuracies up to 85 %. StarDrop uses activation energies derived from

NU

AM1 calculations that are corrected to include steric effects. Accuracies of 74 – 77 % for the top-2 rank have been reported for CYP2C9, 2D6, and 3A4.

MA

Other models do not directly include knowledge about the calculated reactivities. In these ligand-based models, experimental sites of metabolism are related to descriptors (typically topological descriptors). MetaPrint2D (http://www-metaprint2d.ch.cam.ac.uk/metaprint2d/) uses

PT ED

fingerprints to predict the SOM of phase I drug metabolism (thus, not only CYP-mediated drug metabolism) [68], the GALAS-model for prediction of the SOM by Dapkunas (accessible from ACD/Labs http://www.acdlabs.com/) is based on fragment descriptors [69], as probably is the ADMET Predictor (http://www.simulations-plus.com/). Examples of recent ligand-based models

CE

for prediction of SOM have been developed by Tyzack et al. [70] using topological fingerprints (with top-2 accuracies of 85 – 91 % for CYP3A4, 2C9, and 2D6) and by Rudik using the PASS

AC

algorithm (with top-2 accuracies of 61 – 77 % for different models when excluding data from the training sets) [71]. The Meteor program [72] uses a different approach. It is a knowledge-based system of fragment rules to relate structure and drug metabolism [73-75].

4. Structure-based models to predict the site of metabolism The methods covered in Section 3 consider the reactivity of the compounds and only implicitly include the surrounding enzyme active site using molecular descriptors. Thus, the ligandbased methods obviously lack the explicit interaction between the substrate and the enzyme. With the (slowly) growing number of crystal structures of human CYP enzymes, it is possible to include these interactions in the models. In this section, we review work in which the actual binding mode of the compounds is used for the prediction of the site of metabolism (see Section 4.1). The assumption behind this is that the part of the substrate pointing towards and being close enough to the heme group can undergo an oxidation reaction because compound I is very reactive. In addition,

11

ACCEPTED MANUSCRIPT several studies combine the binding mode studies with a reactivity measure for the atoms or chemical groups that are in the vicinity of the heme group (see section 4.2).

4.1 Docking and molecular dynamics studies to determine the binding modes of substrates

RI PT

Apart from the catalytic function, the CYP enzymes influence the product formation by orienting the substrates in the active site, leading to regio- and stereospecific metabolism. The enzyme reaction that takes place should be both energetically feasible and atom(s) of the substrate should also be in the vicinity of the reactive centre [8, 9, 50, 76]. Automated docking approaches

SC

place putative substrate molecules in the active site of proteins and generally predict binding poses with a reasonable accuracy [77, 78]. The distance between the site of metabolism in the substrate

NU

and the heme Fe ion can be used as a very simple criterion to determine if a binding orientation agrees with the experimentally observed product formation. Typically, a pose is considered to agree

MA

with the observed metabolism if this distance is shorter than 6 Å, while larger distances indicate that the substrate orientation is incompatible with experiment [79, 80]. More elaborate criteria considering not only the distance but also the directionality have been proposed as well [81].

PT ED

In most docking algorithms, the protein structure is kept rigid and water molecules in the active site are ignored. Considering the plasticity or malleability of the CYP active sites [82, 83], including protein flexibility seems particularly important for these proteins, as they metabolize structurally very diverse compounds. In the following paragraphs, we will present three examples

CE

on docking studies illustrating aspects of flexibility and active site hydration. These aspects were shown to be relevant when trying to reproduce experimentally determined sites of metabolism by

AC

docking of substrates into the active sites of CYP1A2, CYP2D6 and CYP3A4.

4.1.1 CYP1A2

CYP1A2 metabolizes mostly flat aromatic compounds because the active site is relatively narrow and contains an aromatic cluster of three phenylalanine residues [84]. In addition to the nonpolar interactions between this aromatic cluster and the inhibitor -naphthoflavone, the CYP1A2 structure also reveals the presence of a structured water molecule, which is contributing to binding of the inhibitor by bridging a hydrogen bond to the active site [84] (see Figure 3). Using the docking program GOLD, 20 substrates were docked into the active site in the presence of the active site water [34]. For three substrates, no docking solutions were found in which the SOM was placed within 6 Å of the heme iron. For 13 substrates the top-ranked pose corresponded to a pose that was in agreement with an experimentally observed product (with the experimental SOM within 6 Å of the heme iron), while for the remaining four substrates the second ranked pose needed to be included to understand the metabolism [35]. 12

ACCEPTED MANUSCRIPT Subsequently, the presence of the water molecule was determined based on the scoring function, in the course of the docking experiment [85]. Overall, the accuracy of the predictions slightly worsened by including water molecules as exemplified for the substrate imipramine. Without any waters in the active site, the first ranked pose was in agreement with the major

RI PT

metabolite of this molecule. In the presence of the water molecule, alternative orientations were found, corresponding to the formation of the minor metabolite. When the presence or absence of the water molecule was determined by the docking program, a mixture of both binding orientations was found, but the water was always predicted to be absent, also for the poses that were in agreement

SC

with the minor metabolite. Obviously, active site water molecules have an effect on the resulting

NU

docking poses, but the predictions may not always be consistent [35].

4.1.2 CYP2D6

MA

For CYP2D6, a consistent set of experiments has been performed on 65 known substrates [79], using different protein models and docking strategies. Before the availability of any crystal structure, various homology models of CYP2D6 were described. In one of the most recent ones, the

PT ED

active site was modelled in the presence of the substrate codeine [86]. Docking into this homology model led to poses that agree with the experimentally observed metabolite formation for about 60 % of the substrates. Adding static water molecules to the active site further increased this value [79]. The X-ray structure of an apo form of CYP2D6 was subsequently solved [57], but only 20 % of the

CE

substrates could be docked into its active site in a conformation that would result in the experimentally observed metabolite. By using MD simulations to increase the volume of the active

AC

site and dock into an ensemble of conformations, the average quality of the SOM predictions was improved to 52 %. Subsequently the ensemble was reduced to only three structures, for which an overall accuracy of 90 % could be obtained [87]. After publication of the three protein structures that were used for the prediction of product formation, three additional X-ray structures of CYP2D6 with PDB codes: 3QM4 [58], 3TDA and 3TBG (unpublished) were released. An overlay of the four available X-ray structures (color) and the three MD-generated models (in grey) is shown in Figure 4. The spread in the position of the side chains of Glu216 and Phe483 from the MD simulations and docking experiments is also observed in the crystal structures. The availability of different conformations allows CYP2D6 to accommodate a wide range of substrates, offering a rationalization of its substrate diversity [88]. In a subsequent experiment on CYP2D6, water molecules were carefully selected based on hydration hotspots observed in MD simulations of eight diverse protein structures. As before, the presence or absence of these water molecules was determined by the docking program. Averaged

13

ACCEPTED MANUSCRIPT over the eight protein structures, the overall percentage of binding poses with an orientation in agreement with the experimental SOM only marginally increased from 52 to 56 % [89].

4.1.3 CYP3A4

RI PT

For CYP3A4, 16 substrates were docked into 125 MD-generated protein structures. On average 26 % of the docking experiments yielded a pose in agreement with experiments, and a single protein structure could be identified in which 62 % of the compounds were placed in the protein in a way that agreed with experimental observations [90]. CYP3A4 forms a particularly

SC

challenging target for molecular docking experiments, because of the flexibility and promiscuity of the target leading to a diversity of (experimentally observed) protein structures, and in some cases

NU

the binding of multiple inhibitors in the active site [54]. Hayes et al. recently addressed this challenge by using induced fit docking, where the protein is allowed to adapt to the ligand binding,

MA

and reached success rates as high as 89 % [91]. This study also came to the conclusion that the inclusion of crystallographic water molecules does not necessarily improve the performance of docking to predict SOMs.

PT ED

In summary, the above studies, as well as some not discussed in detail here on bacterial CYP102A1 (CYP BM3) [92] and the human isoforms 2C9 [93] and 2D6 [94], show that docking experiments on CYPs are highly sensitive to the exact protein structure that is used (example CYP2D6), and that water molecules do influence the outcome of the docking experiment (example

CE

CYP1A2). So far no general recommendations as to the inclusion of these can be given. It seems

AC

that water molecules need to be considered on a case-by-case basis.

4.2 Inclusion of reactivity in structure-based models Recently, various studies have attempted to include the reactivity of the atoms or chemical groups in the docking. Most of these studies consider the binding of a substrate to an enzyme by docking and correcting the score with a measure for the reactivity of the functional group that is in the proximity of either the oxygen or Fe ion in compound I [93-97]. Many of these studies have been done on CYP1A2, probably because this isoform has a small and confined active site, thus eliminating the sampling problem. These studies differ with respect to the docking method and activation energy estimate used. For example, Jung et al. [95] used AutoDock in combination with AM1-derived activation energies to rationalize the sites of metabolism by CYP1A2 on a small set of CYP1A2 substrates. This set of compounds was extended by Rydberg et al. [98] combining Chemscore with pre-computed activation energies (neighbouring atom model, NAT, which later was further developed to the SMARTCyp model). For this set of CYP1A2 compounds the combination of docking and activation energies improved the prediction rates compared to just 14

ACCEPTED MANUSCRIPT using the activation energies. Again, this emphasizes that the protein plays an important role orienting the substrate for this isoform. There is also a number of similar studies on CYP2C9, probably because this was one of the first isoforms for which a crystal structure became available. Danielson et al. combined pre-

RI PT

calculated DFT activation energies using the NAT model with docking scores from the AutoDock Vina program [93]. Like for CYP1A2, the combination of docking and pre-computed activation energies gives the best overall result, although it seems more difficult for CYP2C9 than CYP1A2. In addition, docking was done on an ensemble of 10 structures obtained from MD simulations,

SC

which further improves the prediction rates. This indicates that the flexible nature of the enzyme cavity is important to consider. This point was also central in Section 3.2 for CYP2D6. In line with

NU

this, Moors et al. [94] found that the most efficient prediction of SOMs for CYP2D6 came from docking into large ensembles, combining the docking scores from Glide and the activation energies

MA

from SMARTCyp.

As seen above, a number of different docking programs have been applied in combination with different reactivity measures. Thus, different prediction rates of the structure-based SOM

PT ED

models could stem from the docking approach or from the reactivity measure. Tyzack et al. used different scoring functions implemented in Gold (Chemscore, Goldscore, ASP, and PLP) in combination with a local ionization energy as reactivity measure, determined with DFT [96]. The performance of the method was somewhat dependent on the docking score used. This is an

CE

important issue to bear in mind. In addition to testing different scoring functions, Tyzack et al. also introduced a covalent docking approach, which improves the prediction rates. Overall, the best

AC

approach in this study predicts the experimental SOM within the two best ranked sites for 59 %, 70 %, and 70 % for CYP3A4, 2D6, and 2C9, respectively. The reason why the covalent docking approach seems to be the best one might be due to the fact that the binding mode of the substrate resembles the transition state structure. There are other methods that take into account that the potential SOM in the substrate should be close enough to the heme group to facilitate a reaction, while the orientation of the atoms that are directly involved in the reaction should also be positioned appropriately. For example, the hydrogen atom in a hydrogen atom abstraction reaction should point towards the oxygen atom in compound I. This is the idea behind using specialized transition state force fields, which capture both the distance and orientation of the ligand relative to the reaction centre [99-101]. With the IMPACTS program, Campagna-Slater combines the scoring from docking, transition state modeling and the reactivity from simplified (and faster) methods. This is particularly appealing, because it uses a model of the transition state in the presence of the surrounding protein. The performance of this approach is as good as that of the best ligand-based methods [99, 102]. 15

ACCEPTED MANUSCRIPT Finally, further methods that use information about the protein for SOM predictions, but do not directly combine docking with a reactivity measure, exist. MetaSite was one of the first approaches for predicting SOMs [76, 103]. In MetaSite, the accessibility of the potential SOM is determined by matching the molecular interaction fields of the ligand and the enzyme. This

RI PT

accessibility is corrected with a reactivity measure for the reaction. A recent approach for the same purpose is DR-Predictor from the Breneman group [104]. With this method, the ligand is docked into the CYP isoform. Then, a model is trained with a number of descriptors related to the binding and the reactivity. This approach was used on CYP1A2 and CYP2A6, some of the more rigid

SC

enzymes, and is of the same quality as the ligand-based RS-Predictor method.

When comparing the accuracy of the structure-based methods directly with the performance

NU

of the best ligand-based models, the latter seem to be as good or even better although the latter does not explicitly describe the direct interaction between the enzyme and the substrate. The reason for

MA

this could be that the prediction of the correct binding pose and determination of the free energy of binding remains very complicated for these flexible enzymes. In addition, many other factors affect the drug metabolism like binding of oxygen, potential side reactions [105], interaction with the

PT ED

redox partners [106], accommodation and dissociation of the product [107], etc. This is implicitly taken care of in the ligand-based methods where the models are fitted to the experimental data. Thus, if that data is accurate, it is possible to develop accurate models. On the other hand, for compounds that are not similar to those used to develop the model (i.e. outside the applicability

CE

domain), the structure-based models may be more appropriate because most of them do not use experimental data of SOMs for input. Furthermore, the structure-based models offer a 3D

AC

understanding that can be useful.

5. Future perspectives

Most of the SOM prediction methods described in this review are relatively fast which makes it realistic to use them in interdisciplinary academic or industrial projects. Moreover, the speed of the methods make benchmarks of large experimental SOM data sets possible. In these benchmarks, it is often the top-two rank that is given, probably because it is often difficult to distinguish the most likely predicted sites by the SOM model. The top-two rank is typically ~80 % for the best models [4, 13, 14, 55, 62, 63, 102, 104, 108]. There could be several reasons why these methods do not show a better performance. First of all, the experimental data may be problematic because they come from different laboratories using different protocols etc. Secondly, computational chemistry models that are fast most likely include approximations. Thus, theoretically they could be improved by using more precise methods

16

ACCEPTED MANUSCRIPT taking into account, for example, more advanced electronic structure descriptions, more sampling, or the protein environment. The protein environment can be included in the activation energy calculation using hybrid QM/MM methods. This way, effects like the polarization and the possible strain in the transition

RI PT

state caused by the surrounding amino acid residues are included in the calculation. There are now several studies on this [3, 109, 110] for the human CYP isoforms like 2A6 [111, 112], 2C9 [113], and 2D6 [114, 115]. However, there is still a need for selecting an appropriate conformation (or several), which is typically taken from ensembles generated by MD simulation, which may,

SC

however, not be trivial.

The QM/MM and free energy simulations described above are examples of methods that

NU

from a theoretical point of view are very accurate and give insight into many details. However, applying methods like these increases the CPU time significantly, treating only one or a few

MA

substrates in days/weeks instead of hundreds of substrates in seconds/minutes by the fastest SOM prediction methods. Thus, on the short term it is probably more realistic to improve the ligandbased models, because they implicitly include all the complicated steps in the conversion of the

PT ED

substrate. These models might be improved further by considering other molecular descriptors and methods for developing the model. However, since models are fitted to experimental data it is necessary that high quality data exist. For the SOM models, it is important to adequately cover the chemical space of the compounds to avoid extrapolations. For the inhibition data, the mode of

CE

binding would be important to know as well, since it is probably hard to include different types of inhibition in one model.

AC

A generally ignored problem in predicting metabolism based on 2D information is enantioselectivity. Hartman et al. [116] have provided the first attempt to predict the enantioselectivity as well as the catalytic parameters kcat, Km and kcat/Km for chiral drugs by an artificial neural network. The complexity of the task is illustrated by the size of the training set only containing 12 compounds and the test set only a single compound. More elaborate free-energy calculations have been applied to quantify subtle differences in stereoselective binding, e.g. the binding of R- and S-propranolol to CYP2D6 and its F483A mutant [117-119], or the stereoselective metabolism of drug-metabolizing mutants of the bacterial CYP102 [120]. However, these approaches are still far from generally applicable. The SOM models determine which sites in a compound would be likely to react with the CYP enzymes. However, it would also be relevant to consider how much of a metabolite that is generated by the enzyme, e.g. if it is a major or minor metabolite or rather how fast the metabolites are generated. That is a challenge for such a complex system to determine, because so many factors play a role for the interaction with a particular CYP isoform. For example, it is currently not easy to 17

ACCEPTED MANUSCRIPT determine the on-rates of the substrates, the off-rate of the products, the activation energies, the conformational changes of the protein and the desolvation of the enzyme cavity upon binding of the substrates with sufficient accuracy. The interplay with the other CYP isoforms and other drug metabolizing enzymes just complicates this further. The ligand-based models for prediction of

RI PT

general metabolism (phase I and II) could become very relevant for these purposes [121]. In our opinion, the ultimate goal of prediction of metabolism by computational methods is to be able to predict where substrates are metabolized (site of metabolism), which metabolites are formed, and the rate of metabolism for substrate to metabolite conversion (the reaction profile) in a setting that

SC

is representative for realistic in vivo situations. In spite of tremendous advances in the field

AC

CE

PT ED

MA

NU

described in this review, there are still many steps to take to reach this vision.

18

ACCEPTED MANUSCRIPT References

[1] F.P. Guengerich, Cytochrome p450 and chemical toxicology, Chem Res Toxicol, 21 (2008) 70-

RI PT

83. [2] I. Schlichting, J. Berendzen, K. Chu, A.M. Stock, S.A. Maves, D.E. Benson, R.M. Sweet, D. Ringe, G.A. Petsko, S.G. Sligar, The Catalytic Pathway of Cytochrome P450cam at Atomic

SC

Resolution, Science, 287 (2000) 1615-1622.

[3] S. Shaik, S. Cohen, Y. Wang, H. Chen, D. Kumar, W. Thiel, P450 enzymes: their structure,

NU

reactivity, and selectivity-modeled by QM/MM calculations, Chem Rev, 110 (2010) 949-1017. [4] A. Tarcsay, G.M. Keseru, In silico site of metabolism prediction of cytochrome P450-mediated

MA

biotransformations, Expert Opin Drug Metab Toxicol, 7 (2011) 299-312. [5] X. Yu, V. Cojocaru, R.C. Wade, Conformational diversity and ligand tunnels of mammalian

PT ED

cytochrome P450s, Biotechnol Appl Biochem, 60 (2013) 134-145. [6] N.P. Vermeulen, Prediction of drug metabolism: the case of cytochrome P450 2D6, Curr Top Med Chem, 3 (2003) 1227-1239.

[7] M.J. de Groot, S.B. Kirton, M.J. Sutcliffe, In silico methods for predicting ligand binding

CE

determinants of cytochromes P450, Curr Top Med Chem, 4 (2004) 1803-1824.

AC

[8] C. de Graaf, N.P. Vermeulen, K.A. Feenstra, Cytochrome p450 in silico: an integrative modeling approach, J Med Chem, 48 (2005) 2725-2755. [9] E. Stjernschantz, N.P. Vermeulen, C. Oostenbrink, Computational prediction of drug binding and rationalisation of selectivity towards cytochromes P450, Expert Opin Drug Metab Toxicol, 4 (2008) 513-527. [10] O. Pelkonen, M. Turpeinen, J. Hakkola, P. Honkakoski, J. Hukkanen, H. Raunio, Inhibition and induction of human cytochrome P450 enzymes: current status, Arch Toxicol, 82 (2008) 667715. [11] N.K. Mishra, Computational modeling of P450s for toxicity prediction, Expert Opin Drug Metab Toxicol, 7 (2011) 1211-1231. [12] P. Rydberg, L. Olsen, U. Ryde, Quantum-mechanical studies of reactions performed by cytochrome P450 enzymes, Curr Inorg Chem, 2 (2012) 292-315. 19

ACCEPTED MANUSCRIPT [13] P. Rydberg, F.S. Jorgensen, L. Olsen, Use of density functional theory in drug metabolism studies, Expert Opinion on Drug Metabolism & Toxicology, 10 (2014) 215-227. [14] J. Kirchmair, M.J. Williamson, J.D. Tyzack, L. Tan, P.J. Bond, A. Bender, R.C. Glen,

RI PT

Computational prediction of metabolism: sites, products, SAR, P450 enzyme dynamics, and mechanisms, J Chem Inf Model, 52 (2012) 617-648.

[15] R.L. Reynald, S. Sansen, C.D. Stout, E.F. Johnson, Structural characterization of human

SC

cytochrome P450 2C19: active site differences between P450s 2C8, 2C9, and 2C19, J Biol Chem, 287 (2012) 44581-44591.

NU

[16] D. Dong, B. Wu, D. Chow, M. Hu, Substrate selectivity of drug-metabolizing cytochrome P450s predicted from crystal structures and in silico modeling, Drug Metab Rev, 44 (2012) 192-

MA

208.

[17] B. Chen, D.J. Wild, PubChem BioAssays as a data source for predictive models, J Mol Graph

PT ED

Model, 28 (2010) 420-426.

[18] X.Q. Xie, Exploiting PubChem for Virtual Screening, Expert Opin Drug Discov, 5 (2010) 1205-1220.

[19] X.Q. Xie, J.Z. Chen, Data mining a small molecule drug screening representative subset from

CE

NIH PubChem, J Chem Inf Model, 48 (2008) 465-475.

AC

[20] M. Lapins, A. Worachartcheewan, O. Spjuth, V. Georgiev, V. Prachayasittikul, C. Nantasenamat, J.E. Wikberg, A unified proteochemometric model for prediction of inhibition of cytochrome p450 isoforms, PLoS One, 8 (2013) e66566. [21] P. Vasanthanathan, O. Taboureau, C. Oostenbrink, N.P. Vermeulen, L. Olsen, F.S. Jorgensen, Classification of cytochrome P450 1A2 inhibitors and noninhibitors by machine learning techniques, Drug Metab Dispos, 37 (2009) 658-664. [22] H. Veith, N. Southall, R. Huang, T. James, D. Fayne, N. Artemenko, M. Shen, J. Inglese, C.P. Austin, D.G. Lloyd, D.S. Auld, Comprehensive characterization of cytochrome P450 isozyme selectivity across chemical libraries, Nat Biotechnol, 27 (2009) 1050-1055. [23] F. Cheng, Y. Yu, J. Shen, L. Yang, W. Li, G. Liu, P.W. Lee, Y. Tang, Classification of cytochrome P450 inhibitors and noninhibitors using combined classifiers, J Chem Inf Model, 51 (2011) 996-1011. 20

ACCEPTED MANUSCRIPT [24] H. Sun, H. Veith, M. Xia, C.P. Austin, R. Huang, Predictive models for cytochrome p450 isozymes based on quantitative high throughput screening data, J Chem Inf Model, 51 (2011) 24742481.

RI PT

[25] H. Sun, H. Veith, M. Xia, C.P. Austin, R.R. Tice, R. Huang, Prediction of Cytochrome P450 Profiles of Environmental Chemicals with QSAR Models Built from Drug-like Molecules, Mol Inform, 31 (2012) 783-792.

SC

[26] M. Rostkowski, O. Spjuth, P. Rydberg, WhichCyp: prediction of cytochromes P450 inhibition, Bioinformatics, 29 (2013) 2051-2052.

NU

[27] Z.R. Li, H.H. Lin, L.Y. Han, L. Jiang, X. Chen, Y.Z. Chen, PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid

MA

sequence, Nucleic Acids Res, 34 (2006) W32-37.

[28] R. Didziapetris, J. Dapkunas, A. Sazonovas, P. Japertas, Trainable structure-activity relationship model for virtual screening of CYP3A4 inhibition, Journal of Computer-Aided

PT ED

Molecular Design, 24 (2010) 891-906.

[29] N. Mishra, S. Agarwal, G. Raghava, Prediction of cytochrome P450 isoform responsible for metabolizing a drug molecule, BMC Pharmacology, 10 (2010) 8.

CE

[30] L. Terfloth, B. Bienfait, J. Gasteiger, Ligand-based models for the isoform specificity of cytochrome P450 3A4, 2D6, and 2C9 substrates, Journal of Chemical Information and Modeling,

AC

47 (2007) 1688-1701.

[31] L. Michielan, L. Terfloth, J. Gasteiger, S. Moro, Comparison of Multilabel and Single-Label Classification Applied to the Prediction of the Isoform Specificity of Cytochrome P450 Substrates, Journal of Chemical Information and Modeling, 49 (2009) 2588-2605. [32] A. Cherkasov, E.N. Muratov, D. Fourches, A. Varnek, Baskin, II, M. Cronin, J. Dearden, P. Gramatica, Y.C. Martin, R. Todeschini, V. Consonni, V.E. Kuz'min, R. Cramer, R. Benigni, C. Yang, J. Rathman, L. Terfloth, J. Gasteiger, A. Richard, A. Tropsha, QSAR modeling: where have you been? Where are you going to?, J Med Chem, 57 (2014) 4977-5010. [33] Y.C. Chen, Beware of docking!, Trends Pharmacol Sci, (2014). [34] G. Jones, P. Willett, R.C. Glen, A.R. Leach, R. Taylor, Development and validation of a genetic algorithm for flexible docking, J Mol Biol, 267 (1997) 727-748. 21

ACCEPTED MANUSCRIPT [35] P. Vasanthanathan, J. Hritz, O. Taboureau, L. Olsen, F.S. Jorgensen, N.P. Vermeulen, C. Oostenbrink, Virtual screening and prediction of site of metabolism for cytochrome P450 1A2 ligands, J Chem Inf Model, 49 (2009) 43-52.

RI PT

[36] P. Vasanthanathan, J. Lastdrager, C. Oostenbrink, J.N.M. Commandeur, N.P.E. Vermeulen, F.S. Jorgensen, L. Olsen, Identification of CYP1A2 ligands by structure-based and ligand-based virtual screening, MedChemComm, 2 (2011) 853-859.

SC

[37] P. Vasanthanathan, L. Olsen, F.S. Jorgensen, N.P. Vermeulen, C. Oostenbrink, Computational prediction of binding affinity for CYP1A2-ligand complexes using empirical free energy

NU

calculations, Drug Metab Dispos, 38 (2010) 1347-1354.

[38] E. Stjernschantz, C. Oostenbrink, Improved ligand-protein binding affinity predictions using

MA

multiple binding modes, Biophys J, 98 (2010) 2682-2691.

[39] J. Åqvist, C. Medina, J.E. Samuelsson, A new method for predicting binding affinity in

PT ED

computer-aided drug design, Protein Eng, 7 (1994) 385-391. [40] L. Peric-Hassler, E. Stjernschantz, C. Oostenbrink, D.P. Geerke, CYP 2D6 Binding Affinity Predictions Using Multiple Ligand and Protein Conformations, Int J Mol Sci, 14 (2013) 2451424530.

CE

[41] U. Bren, C. Oostenbrink, Cytochrome P450 3A4 inhibition by ketoconazole: tackling the problem of ligand cooperativity using molecular dynamics simulations and free-energy calculations,

AC

J Chem Inf Model, 52 (2012) 1573-1582. [42] H. Yin, M.W. Anders, K.R. Korzekwa, L. Higgins, K.E. Thummel, E.D. Kharasch, J.P. Jones, Designing safer chemicals: predicting the rates of metabolism of halogenated alkanes, Proc Natl Acad Sci U S A, 92 (1995) 11076-11080. [43] L. Higgins, K.R. Korzekwa, S. Rao, M. Shou, J.P. Jones, An assessment of the reaction energetics for cytochrome P450-mediated reactions, Arch Biochem Biophys, 385 (2001) 220-230. [44] B. Meunier, S.P. de Visser, S. Shaik, Mechanism of oxidation reactions catalyzed by cytochrome p450 enzymes, Chem Rev, 104 (2004) 3947-3980. [45] S. Shaik, D. Kumar, S.P. de Visser, A. Altun, W. Thiel, Theoretical perspective on the structure and mechanism of cytochrome P450 enzymes, Chem Rev, 105 (2005) 2279-2328.

22

ACCEPTED MANUSCRIPT [46] P. Rydberg, R. Lonsdale, J.N. Harvey, A.J. Mulholland, L. Olsen, Trends in predicted chemoselectivity of cytochrome P450 oxidation: B3LYP barrier heights for epoxidation and hydroxylation reactions, J Mol Graph Model, 52 (2014) 30-35.

RI PT

[47] C.M. Bathelt, L. Ridder, A.J. Mulholland, J.N. Harvey, Mechanism and structure-reactivity relationships for aromatic hydroxylation by cytochrome P450, Org Biomol Chem, 2 (2004) 29983005.

SC

[48] P. Rydberg, U. Ryde, L. Olsen, Prediction of activation energies for aromatic oxidation by cytochrome P450, J Phys Chem A, 112 (2008) 13058-13065.

NU

[49] L. Olsen, P. Rydberg, T.H. Rod, U. Ryde, Prediction of activation energies for hydrogen abstraction by cytochrome p450, J Med Chem, 49 (2006) 6489-6499.

MA

[50] L. Afzelius, C.H. Arnby, A. Broo, L. Carlsson, C. Isaksson, U. Jurva, B. Kjellander, K. Kolmodin, K. Nilsson, F. Raubacher, L. Weidolf, State-of-the-art tools for computational site of metabolism predictions: Comparative analysis, mechanistical insights, and future applications, Drug

PT ED

Metab Rev, 39 (2007) 61-86.

[51] Y.W. Hsiao, C. Petersson, M.A. Svensson, U. Norinder, A pragmatic approach using firstprinciple methods to address site of metabolism with implications for reactive metabolite formation,

CE

J Chem Inf Model, 52 (2012) 686-695.

[52] A.N. Mayeno, J.L. Robinson, B. Reisfeld, Rapid estimation of activation enthalpies for

AC

cytochrome-P450-mediated hydroxylations, J Comput Chem, 32 (2011) 639-657. [53] P. Rydberg, D.E. Gloriam, J. Zaretzki, C. Breneman, L. Olsen, SMARTCyp: A 2D Method for Prediction of Cytochrome P450-Mediated Drug Metabolism, ACS Med Chem Lett, 1 (2010) 96100. [54] M. Ekroos, T. Sjogren, Structural basis for ligand promiscuity in cytochrome P450 3A4, Proc Natl Acad Sci U S A, 103 (2006) 13682-13687. [55] P. Rydberg, M. Rostkowski, D.E. Gloriam, L. Olsen, The contribution of atom accessibility to site of metabolism models for cytochromes P450, Mol Pharm, 10 (2013) 1216-1223. [56] M.J. de Groot, S. Ekins, Pharmacophore modeling of cytochromes P450, Adv Drug Deliv Rev, 54 (2002) 367-383.

23

ACCEPTED MANUSCRIPT [57] P. Rowland, F.E. Blaney, M.G. Smyth, J.J. Jones, V.R. Leydon, A.K. Oxbrow, C.J. Lewis, M.G. Tennant, S. Modi, D.S. Eggleston, R.J. Chenery, A.M. Bridges, Crystal structure of human cytochrome P450 2D6, J Biol Chem, 281 (2006) 7614-7622.

RI PT

[58] A. Wang, U. Savas, M.H. Hsu, C.D. Stout, E.F. Johnson, Crystal structure of human cytochrome P450 2D6 with prinomastat bound, J Biol Chem, 287 (2012) 10834-10843. [59] M.R. Wester, J.K. Yano, G.A. Schoch, C. Yang, K.J. Griffin, C.D. Stout, E.F. Johnson, The

SC

structure of human cytochrome P450 2C9 complexed with flurbiprofen at 2.0-A resolution, J Biol Chem, 279 (2004) 35630-35637.

NU

[60] P. Rydberg, L. Olsen, Predicting drug metabolism by cytochrome P450 2C9: comparison with the 2D6 and 3A4 isoforms, ChemMedChem, 7 (2012) 1202-1209.

MA

[61] R. Liu, J. Liu, G. Tawa, A. Wallqvist, 2D SMARTCyp reactivity-based site of metabolism prediction for major drug-metabolizing cytochrome P450 enzymes, J Chem Inf Model, 52 (2012)

PT ED

1698-1712.

[62] J. Zaretzki, C. Bergeron, P. Rydberg, T.W. Huang, K.P. Bennett, C.M. Breneman, RSpredictor: a new tool for predicting sites of cytochrome P450-mediated metabolism applied to CYP 3A4, J Chem Inf Model, 51 (2011) 1667-1689.

CE

[63] J. Zaretzki, P. Rydberg, C. Bergeron, K.P. Bennett, L. Olsen, C.M. Breneman, RS-Predictor models augmented with SMARTCyp reactivities: robust metabolic regioselectivity predictions for

AC

nine CYP isozymes, J Chem Inf Model, 52 (2012) 1637-1659. [64] J. Zaretzki, M. Matlock, S.J. Swamidass, XenoSite: accurately predicting CYP-mediated sites of metabolism with neural networks, J Chem Inf Model, 53 (2013) 3373-3383. [65] Stardrop, Optibrium Ltd., Cambridge, United Kingdom, 2011. See http://www.optibrium.com/stardrop/ [66] Schrödinger, P450 Site of Metabolism module, Schrödinger, LLC, New York, 2012. See https://http://www.schrodinger.com [67] M. Hennemann, A. Friedl, M. Lobell, J. Keldenich, A. Hillisch, T. Clark, A.H. Goeller, CypScore: Quantitative Prediction of Reactivity toward Cytochrornes P450 Based on Semiempirical Molecular Orbital Theory, ChemMedChem, 4 (2009) 657-669.

24

ACCEPTED MANUSCRIPT [68] S. Boyer, C.H. Arnby, L. Carlsson, J. Smith, V. Stein, R.C. Glen, Reaction site mapping of xenobiotic biotransformations, Journal of Chemical Information and Modeling, 47 (2007) 583-590. [69] J. Dapkunas, A. Sazonovasay, P. Japertas, Probabilistic Prediction of the Human CYP3A4 and

RI PT

CYP2D6 Metabolism Sites, Chemistry & Biodiversity, 6 (2009) 2101-2106. [70] J.D. Tyzack, H.Y. Mussa, M.J. Williamson, J. Kirchmair, R.C. Glen, Cytochrome P450 site of metabolism prediction from 2D topological fingerprints using GPU accelerated probabilistic

SC

classifiers, J Cheminform, 6 (2014) 29.

[71] A.V. Rudik, A.V. Dmitriev, A.A. Lagunin, D.A. Filimonov, V.V. Poroikov, Metabolism site

NU

prediction based on xenobiotic structural formulas and PASS prediction algorithm, J Chem Inf Model, 54 (2014) 498-507.

MA

[72] Meteor, Lhasa Ltd., Leeds, United Kingdom. See http://www.lhasalimited.org/ [73] B. Testa, A.L. Balmat, A. Long, Predicting drug metabolism: Concepts and challenges, Pure

PT ED

and Applied Chemistry, 76 (2004) 907-914.

[74] B. Testa, A.L. Balmat, A. Long, P. Judson, Predicting drug metabolism - An evaluation of the expert system METEOR, Chemistry & Biodiversity, 2 (2005) 872-885.

CE

[75] C.A. Marchant, K.A. Briggs, A. Long, In Silico Tools for Sharing Data and Knowledge on Toxicity and Metabolism: Derek for Windows, Meteor, and Vitic, Toxicology Mechanisms and

AC

Methods, 18 (2008) 177-187.

[76] G. Cruciani, E. Carosati, B. De Boeck, K. Ethirajulu, C. Mackie, T. Howe, R. Vianello, MetaSite: understanding metabolism in human cytochromes from the perspective of the chemist, J Med Chem, 48 (2005) 6970-6979. [77] P. Ferrara, H. Gohlke, D.J. Price, G. Klebe, C.L. Brooks, 3rd, Assessing scoring functions for protein-ligand interactions, J Med Chem, 47 (2004) 3032-3047. [78] G.L. Warren, C.W. Andrews, A.M. Capelli, B. Clarke, J. LaLonde, M.H. Lambert, M. Lindvall, N. Nevins, S.F. Semus, S. Senger, G. Tedesco, I.D. Wall, J.M. Woolven, C.E. Peishoff, M.S. Head, A critical assessment of docking programs and scoring functions, J Med Chem, 49 (2006) 5912-5931.

25

ACCEPTED MANUSCRIPT [79] C. de Graaf, C. Oostenbrink, P.H. Keizers, T. van der Wijst, A. Jongejan, N.P. Vermeulen, Catalytic site prediction and virtual screening of cytochrome P450 2D6 substrates by consideration of water and rescoring in automated docking, J Med Chem, 49 (2006) 2417-2430.

RI PT

[80] C. de Graaf, P. Pospisil, W. Pos, G. Folkers, N.P. Vermeulen, Binding mode prediction of cytochrome p450 and thymidine kinase protein-ligand complexes by consideration of water and rescoring in automated docking, J Med Chem, 48 (2005) 2308-2318.

SC

[81] J. Li, S.T. Schneebeli, J. Bylund, R. Farid, R.A. Friesner, IDSite: An accurate approach to predict P450-mediated drug metabolism, J Chem Theory Comput, 7 (2011) 3829-3845.

NU

[82] F.P. Guengerich, A malleable catalyst dominates the metabolism of drugs, Proc Natl Acad Sci U S A, 103 (2006) 13565-13566.

MA

[83] C. Oostenbrink, A. de Ruiter, J. Hritz, N. Vermeulen, Malleability and versatility of cytochrome P450 active sites studied by molecular simulations, Curr Drug Metab, 13 (2012) 190-

PT ED

196.

[84] S. Sansen, J.K. Yano, R.L. Reynald, G.A. Schoch, K.J. Griffin, C.D. Stout, E.F. Johnson, Adaptations for the oxidation of polycyclic aromatic hydrocarbons exhibited by the structure of human P450 1A2, J Biol Chem, 282 (2007) 14348-14355.

CE

[85] M.L. Verdonk, G. Chessari, J.C. Cole, M.J. Hartshorn, C.W. Murray, J.W. Nissink, R.D. Taylor, R. Taylor, Modeling water molecules in protein-ligand docking using GOLD, J Med Chem,

AC

48 (2005) 6504-6515.

[86] C. de Graaf, C. Oostenbrink, P.H. Keizers, B.M. van Vugt-Lussenburg, R.A. van Waterschoot, R.A. Tschirret-Guth, J.N. Commandeur, N.P. Vermeulen, Molecular modeling-guided site-directed mutagenesis of cytochrome P450 2D6, Curr Drug Metab, 8 (2007) 59-77. [87] J. Hritz, A. de Ruiter, C. Oostenbrink, Impact of plasticity and flexibility on docking results for cytochrome P450 2D6: a combined approach of molecular dynamics and ligand docking, J Med Chem, 51 (2008) 7469-7477. [88] C. Oostenbrink, Structure-based methods for predicting the sites and products of metabolism, in: J. Kirchmair (Ed.) Drug Metabolism Prediction, Wiley, Weinheim, 2014, pp. 243 - 263. [89] R. Santos, J. Hritz, C. Oostenbrink, Role of water in molecular docking simulations of cytochrome P450 2D6, J Chem Inf Model, 50 (2010) 146-154. 26

ACCEPTED MANUSCRIPT [90] V.H. Teixeira, V. Ribeiro, P.J. Martel, Analysis of binding modes of ligands to multiple conformations of CYP3A4, Biochim Biophys Acta, 1804 (2010) 2036-2045. [91] C. Hayes, D. Ansbro, M. Kontoyianni, Elucidating substrate promiscuity in the human

RI PT

cytochrome 3A4, J Chem Inf Model, 54 (2014) 857-869. [92] S.B. de Beer, L.A. van Bergen, K. Keijzer, V. Rea, H. Venkataraman, C.F. Guerra, F.M. Bickelhaupt, N.P. Vermeulen, J.N. Commandeur, D.P. Geerke, The role of protein plasticity in

SC

computational rationalization studies on regioselectivity in testosterone hydroxylation by cytochrome P450 BM3 mutants, Curr Drug Metab, 13 (2012) 155-166.

NU

[93] M.L. Danielson, P.V. Desai, M.A. Mohutsky, S.A. Wrighton, M.A. Lill, Potentially increasing the metabolic stability of drug candidates via computational site of metabolism prediction by

Chem, 46 (2011) 3953-3963.

MA

CYP2C9: The utility of incorporating protein flexibility via an ensemble of structures, Eur J Med

[94] S.L. Moors, A.M. Vos, M.D. Cummings, H. Van Vlijmen, A. Ceulemans, Structure-based site

PT ED

of metabolism prediction for cytochrome P450 2D6, J Med Chem, 54 (2011) 6098-6105. [95] J. Jung, N.D. Kim, S.Y. Kim, I. Choi, K.H. Cho, W.S. Oh, D.N. Kim, K.T. No, Regioselectivity prediction of CYP1A2-mediated phase I metabolism, J Chem Inf Model, 48 (2008)

CE

1074-1080.

[96] J.D. Tyzack, M.J. Williamson, R. Torella, R.C. Glen, Prediction of cytochrome P450

AC

xenobiotic metabolism: tethered docking and reactivity derived from ligand molecular orbital analysis, J Chem Inf Model, 53 (2013) 1294-1305. [97] W.S. Oh, D.N. Kim, J. Jung, K.H. Cho, K.T. No, New combined model for the prediction of regioselectivity in cytochrome P450/3A4 mediated metabolism, J Chem Inf Model, 48 (2008) 591601. [98] P. Rydberg, P. Vasanthanathan, C. Oostenbrink, L. Olsen, Fast prediction of cytochrome P450 mediated drug metabolism, ChemMedChem, 4 (2009) 2070-2079. [99] V. Campagna-Slater, J. Pottel, E. Therrien, L.D. Cantin, N. Moitessier, Development of a computational tool to rival experts in the prediction of sites of metabolism of xenobiotics by p450s, J Chem Inf Model, 52 (2012) 2471-2483.

27

ACCEPTED MANUSCRIPT [100] P. Rydberg, S.M. Hansen, J. Kongsted, P.-O. Norrby, L. Olsen, U. Ryde, Transition-state docking of flunitrazepam and progesterone in cytochrome P450, J Chem Theory Comput, 4 (2008) 673-681.

RI PT

[101] P. Rydberg, L. Olsen, P.-O. Norrby, U. Ryde, General transition-state force field for cytochrome p450 Hydroxylation, J Chem Theory Comput, 3 (2007) 1765-1773. [102] L.M. Nielsen, K. Linnet, L. Olsen, P. Rydberg, Prediction of cytochrome p450 mediated

SC

metabolism of designer drugs, Curr Top Med Chem, 14 (2014) 1365-1373.

[103] I. Zamora, L. Afzelius, G. Cruciani, Predicting drug metabolism: a site of metabolism

NU

prediction tool applied to the cytochrome P450 2C9, J Med Chem, 46 (2003) 2313-2324. [104] T.W. Huang, J. Zaretzki, C. Bergeron, K.P. Bennett, C.M. Breneman, DR-predictor:

MA

incorporating flexible docking with specialized electronic reactivity and machine learning techniques to predict CYP-mediated sites of metabolism, J Chem Inf Model, 53 (2013) 3352-3366.

PT ED

[105] H.L. Cooper, J.T. Groves, Molecular probes of the mechanism of cytochrome P450. Oxygen traps a substrate radical intermediate, Arch Biochem Biophys, 507 (2011) 111-118. [106] A. Sündermann, C. Oostenbrink, Molecular dynamics simulations give insight into the conformational change, complex formation, and electron transfer pathway for cytochrome P450

CE

reductase, Protein Sci, 22 (2013) 1183-1195.

AC

[107] A. Tarcsay, R. Kiss, G.M. Keseru, Site of metabolism prediction on cytochrome P450 2C9: a knowledge-based docking approach, J Comput Aided Mol Des, 24 (2010) 399-408. [108] J. Zaretzki, C. Bergeron, T.W. Huang, P. Rydberg, S.J. Swamidass, C.M. Breneman, RSWebPredictor: a server for predicting CYP-mediated sites of metabolism on drug-like molecules, Bioinformatics, 29 (2013) 497-498. [109] M.W. van der Kamp, A.J. Mulholland, Combined quantum mechanics/molecular mechanics (QM/MM) methods in computational enzymology, Biochemistry, 52 (2013) 2708-2728. [110] R. Lonsdale, A.J. Mulholland, QM/MM modelling of drug-metabolizing enzymes, Curr Top Med Chem, 14 (2014) 1339-1347. [111] R.A. Kwiecien, J.Y. Le Questel, J. Lebreton, M. Delaforge, F. Andre, E. Pihan, A. Roussel, A. Fournial, P. Paneth, R.J. Robins, Cytochrome P450-catalyzed degradation of nicotine:

28

ACCEPTED MANUSCRIPT fundamental parameters determining hydroxylation by cytochrome P450 2A6 at the 5'-carbon or the n-methyl carbon, J Phys Chem B, 116 (2012) 7827-7840. [112] D. Li, X. Huang, J. Lin, C.G. Zhan, Catalytic mechanism of cytochrome P450 for N-

RI PT

methylhydroxylation of nicotine: reaction pathways and regioselectivity of the enzymatic nicotine oxidation, Dalton Trans, 42 (2013) 3812-3820.

[113] R. Lonsdale, K.T. Houghton, J. Zurek, C.M. Bathelt, N. Foloppe, M.J. de Groot, J.N. Harvey,

SC

A.J. Mulholland, Quantum mechanics/molecular mechanics modeling of regioselectivity of drug metabolism in cytochrome P450 2C9, J Am Chem Soc, 135 (2013) 8001-8015.

NU

[114] J. Olah, A.J. Mulholland, J.N. Harvey, Understanding the determinants of selectivity in drug metabolism through modeling of dextromethorphan oxidation by cytochrome P450, Proc Natl Acad

MA

Sci U S A, 108 (2011) 6050-6055.

[115] P. Schyman, W. Lai, H. Chen, Y. Wang, S. Shaik, The directive of the protein: how does cytochrome P450 select the mechanism of dopamine formation?, J Am Chem Soc, 133 (2011)

PT ED

7977-7984.

[116] J.H. Hartman, S.D. Cothren, S.H. Park, C.H. Yun, J.A. Darsey, G.P. Miller, Predicting CYP2C19 catalytic parameters for enantioselective oxidations using artificial neural networks and a

CE

chirality code, Bioorg Med Chem, 21 (2013) 3749-3759. [117] C. de Graaf, C. Oostenbrink, P.H. Keizers, B.M. van Vugt-Lussenburg, J.N. Commandeur,

AC

N.P. Vermeulen, Free energies of binding of R- and S-propranolol to wild-type and F483A mutant cytochrome P450 2D6 from molecular dynamics simulations, Eur Biophys J, 36 (2007) 589-599. [118] G. Nagy, C. Oostenbrink, Rationalization of stereospecific binding of propranolol to cytochrome P450 2D6 by free energy calculations, Eur Biophys J, 41 (2012) 1065-1076. [119] B. Lai, G. Nagy, J.A. Garate, C. Oostenbrink, Entropic and enthalpic contributions to stereospecific ligand binding from enhanced sampling methods, J Chem Inf Model, 54 (2014) 151158. [120] S.B. de Beer, H. Venkataraman, D.P. Geerke, C. Oostenbrink, N.P. Vermeulen, Free energy calculations give insight into the stereoselective hydroxylation of alpha-ionones by engineered cytochrome P450 BM3 mutants, J Chem Inf Model, 52 (2012) 2139-2148.

29

ACCEPTED MANUSCRIPT [121] J. Kirchmair, M.J. Williamson, A.M. Afzal, J.D. Tyzack, A.P. Choy, A. Howlett, P. Rydberg, R.C. Glen, FAst MEtabolizer (FAME): A rapid and accurate predictor of sites of metabolism in

AC

CE

PT ED

MA

NU

SC

RI PT

multiple species by endogenous enzymes, J Chem Inf Model, 53 (2013) 2896-2907.

30

ACCEPTED MANUSCRIPT

AC

CE

PT ED

MA

NU

SC

RI PT

Figures

Figure. 1. Catalytic cycle of CYP. The structures relevant for prediction of SOM, the iron (III) complex (2), compound I (7), and the hydroxylated substrate product (8), are framed.

31

CE

PT ED

MA

NU

SC

RI PT

ACCEPTED MANUSCRIPT

AC

Figure 2. A typical model system for calculation of the activation energy at the DFT level. The substrate, caffeine, is interacting with a porphine ring that is a model for compound I.

32

PT ED

MA

NU

SC

RI PT

ACCEPTED MANUSCRIPT

Figure 3. CYP1A2 (left) and CYP2C9 (right). For clarity, only the substrates (-naphthoflavone in CYP1A2 and flurbiprofen in CYP2C9), the heme group and a water molecule (in CYP1A2) and the

AC

CE

polar residues (Arg108 and Asn204 in CYP2C9) are shown.

33

PT ED

MA

NU

SC

RI PT

ACCEPTED MANUSCRIPT

CE

Figure 4. Overlay of the available CYP2D6 crystal structures (2F9Q in green; 3QM4 in cyan; 3TBG in magenta and 3TDA in yellow) and MD derived models (in grey). For clarity only backbone and the heme group of 2G9Q is shown, together with stick models of Glu216 and Phe483

AC

of the other structures.

34

ACCEPTED MANUSCRIPT

AC

CE

PT ED

MA

NU

SC

RI PT

Graphical abstract

35