Exploring In Silico Prediction of the Unbound Brain-to-Plasma Drug Concentration Ratio: Model Validation, Renewal, and Interpretation

Exploring In Silico Prediction of the Unbound Brain-to-Plasma Drug Concentration Ratio: Model Validation, Renewal, and Interpretation

RESEARCH ARTICLE – Pharmacokinetics, Pharmacodynamics and Drug Transport and Metabolism Exploring In Silico Prediction of the Unbound Brain-to-Plasma...

525KB Sizes 4 Downloads 43 Views

RESEARCH ARTICLE – Pharmacokinetics, Pharmacodynamics and Drug Transport and Metabolism

Exploring In Silico Prediction of the Unbound Brain-to-Plasma Drug Concentration Ratio: Model Validation, Renewal, and Interpretation SRINIDHI VARADHARAJAN,1,2 SUSANNE WINIWARTER,3 LARS CARLSSON,3 OLA ENGKVIST,2 AJAY ANANTHA,2,4 3 ˚ THIERRY KOGEJ,2 MARKUS FRIDE´ N,5 JONNA STALRING, HONGMING CHEN2 1

Department of Biology, Lund University, Lund SE-22100, Sweden ¨ ¨ Chemistry Innovation Centre, Discovery Sciences, AstraZeneca R&D Molndal, Molndal SE-43183, Sweden 3 ¨ ¨ Computational ADME and Safety, Drug Safety and Metabolism, AstraZeneca R&D Molndal, Molndal SE-43183, Sweden 4 ¨ Department of Chemistry and Molecular Biology, University of Gothenburg, Goteborg SE-40530, Sweden 5 ¨ ¨ RIA Innovative Medicines, AstraZeneca R&D Molndal, Molndal SE-43183, Sweden 2

Received 7 October 2014; revised 14 November 2014; accepted 18 November 2014 Published online in Wiley Online Library (wileyonlinelibrary.com). DOI 10.1002/jps.24301 ABSTRACT: Recently, we built an in silico model to predict the unbound brain-to-plasma concentration ratio (Kp,uu,brain ), a measure of the distribution of a compound between the blood plasma and the brain. Here, we validate the previous model with new additional data points expanding the chemical space and use that data also to renew the model. The model building process was similar to our previous approach; however, a new set of descriptors, molecular signatures, was included to facilitate the model interpretation from a structure perspective. The best consensus model shows better predictive power than the previous model (R2 = 0.6 vs. R2 = 0.53, when the same 99 compounds were used as test set). The two-class classification accuracy increased from 76% using the previous model to 81%. Furthermore, the atom-summarized gradient based on molecular signature descriptors was proposed as an interesting new approach to C 2014 Wiley interpret the Kp,uu,brain machine learning model and scrutinize structure Kp,uu,brain relationships for investigated compounds.  Periodicals, Inc. and the American Pharmacists Association J Pharm Sci Keywords: ADME; blood–brain barrier; computational ADME; drug transport; unbound brain-to-plasma concentration ratio; machine learning; in silico modeling

INTRODUCTION The blood–brain barrier (BBB) is a natural defense mechanism evolved to protect the delicate and sensitive brain. Consequently, it poses a major hurdle for drugs targeting the central nervous system (CNS).1 The BBB is characterized by the presence of tight junctions that impede paracellular permeation. Additionally, the transcellular transport of more lipophilic compounds is hindered by highly expressed efflux transporters, so that the CNS is well protected from potentially harmful xenobiotics. P-glycoprotein, breast cancer-resistance protein, and multidrug-resistance protein transporters are the most vital efflux transporters at the BBB relevant for drug disposition.2 On the contrary, influx transporters are present to ensure that compounds that are essential for the brain, for example, nutrients, can pass through the membrane. Therefore, it is the interplay of influx/efflux transporters at the BBB interface that regulates the transcellular movement of molecules across the membrane.3 A centrally acting drug has to cross the BBB in sufficient amount to elicit the required pharmacological effect in the CNS. For peripherally acting drugs, on the contrary, it may be advantageous to be kept out of the brain to avoid undesired side effects. Therefore, understanding the likely brain exposure for a compound in early discovery phase is crucial. Correspondence to: Hongming Chen (Telephone: +46-31-7065285; Fax: +4631-7763700; E-mail: [email protected]) This article contains supplementary material available from the authors upon request or via the Internet at http://onlinelibrary.wiley.com/. Journal of Pharmaceutical Sciences

 C 2014 Wiley Periodicals, Inc. and the American Pharmacists Association

Earlier, structure–brain exposure studies have largely focused on the total brain-to-plasma concentration ratio, denoted by Kp,brain or in its logarithmic form, log BB.4 Kp,brain 5 is described by the following equation (Eq. 1).

K p,brain =

Abrain Cp

(1)

Abrain is the total amount of drug in the brain per unit tissue weight and Cp is the total concentration of the drug molecule in the blood plasma. One of the first attempts toward QSAR modeling of log BB was published by Young et al.6 who correlated log BB with log P in a series of 20 antihistamine molecules. This was followed by several modeling attempts trying to use properties such as lipophilicity, polar surface area (PSA), hydrogen binding, and so on to predict log BB.7–11 However, the total amount of drug in the brain does not necessarily reflect the relevant drug concentration that is responsible for the efficacy.5 Following the free drug hypothesis, it has been proposed that it is the unbound or free drug concentration in the brain interstitial fluid (ISF) rather than the total drug concentration in the brain, which is driving the pharmacodynamic response.12,13 The unbound brain-to-plasma concentration ratio, Kp,uu,brain ,13–15 is a way to measure the free drug concentration in the brain in relation to the free concentration in the blood plasma. It presents information on the passive diffusion and active influx/efflux occurring at the BBB interface and is thus a relevant measure of brain uptake of drugs.12 Varadharajan et al., JOURNAL OF PHARMACEUTICAL SCIENCES

1

2

RESEARCH ARTICLE – Pharmacokinetics, Pharmacodynamics and Drug Transport and Metabolism

Kp,uu,brain is defined by Eq. 2.13 K p,uu,brain =

Cu,brainISF Cu,p

(2)

Cu,brainISF denotes the unbound concentration of drug in brain ISF and Cu,p denotes the unbound drug concentration in the blood plasma. Cu,brainISF can be determined directly in one in vivo experiment using microdialysis. However, this is a somewhat laborious method with technical limitations, for example, the method is not well suited to measure highly lipophilic compounds. Therefore, surrogate methods have been proposed to estimate the values of Kp,uu,brain indirectly. One alternative way of determining Kp,uu,brain involves combining the total brainto-plasma concentration ratio Kp,brain measured in vivo with the unbound volume of distribution (Vu,brain ) and the unbound fraction of drug in the plasma (fu,p ) measured in vitro using brain slices16 and equilibrium dialysis methods,17 respectively (Eq. (3)). K p,uu,brain =

K p,brain Vu,brain f u,p

(3)

Note that although Kp,uu,brain is assessed from the three directly measurable components Kp,brain , fu,p , and Vu,brain , it is mechanistically not dependent on any of them.15 The first QSAR modeling study using Kp,uu,brain data was described by Frid´en et al.,15 based on Kp,uu,brain measurements of 41 marketed drugs. In 2011, we built a predictive model18 by extending the Frid´en dataset with a set of in-house compounds. The models were validated and shown to have decent continuous value predictions along with a good classification performance on an external test set. Since 2011, 99 additional compounds have been measured in-house, which expands the chemical space covered in the earlier model. Because it was shown that validating and updating QSAR models regularly can improve the accuracy of a QSAR model,19 we found it was time to revisit the model. Additionally, a new set of descriptors, the signature descriptors, was found to be a powerful tool for building QSAR models, in combination with the support vector machine (SVM) algorithm.20 In this paper, we describe the validation of our earlier model, how the addition of the new data points improves the model and, more importantly, how the atom-based gradients of signature descriptors for SVM model can be used to identify functional groups that possibly influence Kp,uu,brain in individual compounds.

METHODS Experimental Data The exact procedure to determine Kp,uu,brain values was described earlier.15 In short, the data are derived from three different experiments, which are combined using Eq. 3: an in vivo determination of the brain–blood ratio (Kp,brain ) in rat and the in vitro assessment of binding properties in both brain (Vu,brain ) and plasma (fu,p ). The in vivo experiment comprises a 4-h constant rate infusion in Sprague–Dawley rats with up to three drugs administered. Terminal sampling of blood and brain tisVaradharajan et al., JOURNAL OF PHARMACEUTICAL SCIENCES

sue was performed under isoflurane anesthesia. The total concentration in the brain, Abrain , was reduced by 0.8% of the drug plasma concentration to approximately correct for drug in the residual blood.21 Binding to brain tissue was determined as Vu,brain in vitro in brain slices16 and plasma protein binding measured by equilibrium dialysis.17 Dataset All experimental values were obtained from the corporate database. The datasets used in the present study are described as follows. Dataset 1 Dataset 1 comprises the data used for the Kp,uu,brain predictive model built in 2011. Dataset 1a consists of 247 compounds with Kp,uu,brain data (values for Kp,brain , Vu,brain , and fu,p are all available) used for the direct model. Datasets 1b, 1c, and 1d comprise 506, 473, and 3235 compounds with measured values of Kp,brain , Vu,brain , and fu,p , respectively. These datasets were employed for the indirect models. 73 compounds (30%) were randomly picked from dataset 1a as a test set (test set 1).18 Dataset 2 Since 2011, additional in-house data were accumulated for all the parameters. We collected 99 compounds for which the values of Kp,brain , Vu,brain , and fu,p were available. These 99 compounds comprise dataset 2, a temporal validation set for the old Kp,uu,brain model. Further, additional data were also available for each of Kp,brain , Vu,brain , and fu,p , with 215, 736, and 2520 new data points, respectively. Dataset 3 Dataset 3 contains the combined datasets with all available data. Dataset 3a consists of 346 compounds with known Kp,uu,brain data. Dataset 3b, 3c, and 3d consist of 721, 1209, and 5755 compounds with Kp,brain , Vu,brain , and fu,p values, respectively. Dataset 3a was randomly divided into training and test set in a 7:3 ratio and the procedure repeated 10 times. In each run, the test set consisted of 104 compounds and these 104 compounds were removed from the Kp,brain , Vu,brain , and fu,p sets (dataset 3b, 3c, and 3d) to obtain the respective training sets. Descriptor Sets AZ Descriptors This is a set of 196 physicochemical descriptors comprising important properties such as lipophilicity, hydrogen bonding, molecular weight, polar surface area, and so on, calculated using an in-house program.22–24 Signature Descriptors Signature descriptors refer to atom-based descriptors that describe the extended valence of the atoms in the molecules. In this case, a molecule is defined in terms of a set of canonical subgraphs that represent all the atoms that are at a predefined distance (often called height) from the central atom in consideration.25 Thus, each investigated molecule is associated with a vector whose components are the occurrence of the particular signature in the structure of the molecule. The current study generated signatures of height between 0 and 3, DOI 10.1002/jps.24301

RESEARCH ARTICLE – Pharmacokinetics, Pharmacodynamics and Drug Transport and Metabolism

leading to several hundred descriptors for each molecule, using the CDK toolkits.26,27 A principal component analysis (PCA) was carried out in SIMCA28 on compounds’ AZ descriptors. Modeling Workflow As in the previous study,18 two model building approaches were employed. Direct models are models built using Kp,uu,brain data directly, whereas an indirect model is based on models for each of the three measurements required for obtaining Kp,uu,brain (i.e., Kp,brain , Vu,brain , and fu,p ). The individual predictions from each of these models are combined to predict Kp,uu,brain according to Eq. 3. Considering the fact that we have more compounds with known Kp,brain , Vu,brain , or fu,p data as compared with Kp,uu,brain , indirect models enhance the utilization of the available data to the greatest extent. Both regression and classification models were built during the study. The resulting models were subsequently aggregated in several combinations to obtain consensus models. The average result of the considered base models was used as the consensus prediction result. Machine Learning Algorithms Two nonlinear machine learning algorithms were used in this study, namely, SVM and random forest (RF). The models using AZ descriptors were built with AZOrange,29 an in-house extension of the open source package Orange,30 whereas the LIBSVM31 software was employed to build SVM models with signature descriptors. Random Forests The RF method32,33 refers to an ensemble learning method where predictions from numerous decision tree learners are aggregated. During the decision tree building, a randomly chosen subset of input variables is used for the splitting. The number of variables in this subset is optimized specifically for each dataset. The number of trees was set to 100 in the current study. The final prediction is the average from all the trees. The RF algorithm is known to be highly predictive with a capability of handling large numbers of input variables. Support Vector Machines Support vector machine is a supervised learning algorithm based on statistical learning theory.34,35 The basic principle of SVM is to find an optimum separating hyperplane that can efficiently separate the data points into classes. This optimal hyperplane is constructed with maximal margin based on a subset of training samples called support vectors, which lie closest to the separating hyperplane. Here, the radial basis function kernel was employed during model building. Optimal Gamma and C parameters obtained from a grid search for the highest cross-validation accuracy were utilized in the final SVM models. Descriptor-Based Model Interpretation Variable Importance Measures for RF Model An attractive feature of the RF algorithm is the variable importance assessment. Variable importance measures suggest the contribution that a particular variable makes to the model performance. The current study uses the OpenCV36 DOI 10.1002/jps.24301

3

implementation through the AZOrange interface to perform the variable importance calculations. The random trees variable importance assessment method in OpenCV was used for global ranking of the descriptors. The OpenCV implementation randomly permutes the values of one variable within the outof-bag (OOB) set of examples of each tree. The OOB error of all trees, with and without permuted values, is used to quantify the importance of each variable.33 Atom-Summarized Gradients for Signature Descriptor-Based SVM Model QSAR models use a mathematical function to approximate the relationship between biological activity and compound characteristics (descriptors). Each gradient component37 of a QSAR model indicates how sensitive the prediction would be to a change in the corresponding descriptor. This can be interpreted as how significant the descriptor component is on the bioactivity prediction. Hence, it is a measure of the importance of each descriptor in the QSAR model. An SVM decision function gradient based on signature descriptors has been shown to be capable of providing interpretability of SVM models and the details of gradient calculation in an SVM model can be found in previous publications.20,38 In this study, we utilize the so-called atom-summarized gradient contributions, which is calculated with the following equation (Eq. 4). Ci =

k 

gj

(4)

j =1

Ci refers to the gradient contribution of atom i present in k different signatures and gj is the gradient value for signature j. Thus, all atoms in a given compound are assigned a distinct contribution value by summing up the gradient contributions from all signatures in which the atom appears. The atom-based gradients can subsequently be projected onto the structures to facilitate interpretation of the model by highlighting specific substructures in individual compounds, contributing either positively or negatively to the modeled endpoint value. Note that Ci does not represent the actual atomic contribution to the predicted value, but rather indicates where a change in the molecule would affect the prediction. Model Validation Internal and external validations of a QSAR model are vital to assess the model performance. For the internal validation, 10fold cross-validation correlation coefficient Q2 , and root-meansquare error (RMSE) were used as performance measures, whereas the correlation coefficient R2 and RMSE were used for estimating the model performance in the external validation. Two types of classification schemes (two class and three class) were investigated throughout the study. For the twoclass classification, a cut-off of −1 (log Kp,uu,brain ) was employed as in our earlier study.18 A compound with log Kp,uu,brain value greater than or equal to −1 is classified as BBB positive (Kp,uu,brain ≥ 0.1, that is, >10% of a compounds free plasma concentration will be found as free concentration in the brain), whereas a value less than −1 implies BBB negative (Kp,uu,brain < 0.1). We also defined a three-class classification with a somewhat larger region around the above limit as moderate class, using two limits (1.3 and −0.52): log Kp,uu,brain = −0.52 (Kp,uu,brain value Varadharajan et al., JOURNAL OF PHARMACEUTICAL SCIENCES

4

RESEARCH ARTICLE – Pharmacokinetics, Pharmacodynamics and Drug Transport and Metabolism

Figure 1. Score plot from the PCA of the Kp,uu,brain data in dataset 1 and 2. X and Y axes correspond to the two most important principal components from PCA analysis. The compounds in dataset 1 are labeled with blue circle and the green ones refer to dataset 2.

of 0.3, i.e., >30% of a compound’s free plasma concentration is freely available in the brain ISF) is defined as high class, log Kp,uu,brain < −1.3 (Kp,uu,brain value of 0.05) as low class and all compounds with a log Kp,uu,brain between −1.3 and −0.52 as moderate class. Thus, the possibility to predict low- and high-class compounds correctly should be enhanced, but for a compound in the moderate class, the potential to reach the CNS is not well defined. The classification performance was determined using some common measures such as accuracy, sensitivity, specificity, precision, and so on calculated based on the confusion matrix (see Supporting Information Tables S2 and S3 for a detailed description). The 2011 Kp,uu,brain Predictive Model The predictive model built in 2011 was based on a consensus model including direct and indirect models using the AZ descriptor set.18 The study reported R2 and RMSE of 0.58 and 0.46, respectively, on a test set consisting of 73 compounds (test set 1). The two-class classification accuracy of the model was determined to be 85%.

Figure 2. Performance of the of 2011 Kp,uu,brain model on the temporal test set.

RESULTS Validation of the 2011 Model The temporal test set (dataset 2) consists of 99 compounds measured since 2011. A PCA analysis was carried out for datasets 1 and 2 based on the AZdescriptors, to understand how the chemical space was expanded by dataset 2. Figure 1 shows that the new compounds actually fill gaps within the original dataset space. The temporal test set was tested against the 2011 model (see Fig. 2). An overall correlation coefficient R2 of 0.53 and an RMSE of 0.58 is obtained, which is somewhat lower than the reported values from 2011 (R2 and RMSE are 0.58 and 0.46, respectively). The classfication performance of the model on the temporal test set demonstrates a two-class accuracy of 76%, which is again lower than the reported value of 85%.18 The temporal test set result shows that the 2011 model is still valid, but inclusion of more data should lead to improvement. Varadharajan et al., JOURNAL OF PHARMACEUTICAL SCIENCES

The two-class system, with an absolute single point cut-off, may be somewhat arbitrary in distinguishing between compounds, especially in case of compounds with a Kp,uu,brain value slightly above or below the chosen limit. Therefore, we here also present a three class system, with the intermediate class (moderate class) essentially as “buffer zone” between the high and low classes. The three-class classification resulted in precision values of 92%, 40%, and 83% for the high, moderate, and low classes, respectively. As expected, the precision values for the high and low class are higher than the accuracy of the two-class classification. However, it is noted that the model predict more than 55% of the compounds to be in the moderate class, whereas experimentally only less than 30% of the compounds belong to this class. As the potential to reach the CNS is not well defined for a compound in the moderate class, this indicates DOI 10.1002/jps.24301

RESEARCH ARTICLE – Pharmacokinetics, Pharmacodynamics and Drug Transport and Metabolism

Table 1. Models

Internal Validation of Kp,uu,brain , Kp,brain , Vu,brain , and fu,p Model Type

Kp,uu,brain

Kp,brain

Vu,brain

fu,p

Q2

RF(A) SVM(A) SVM(S)

0.54 0.61 0.56

0.52 0.55 0.55

0.73 0.80 0.81

0.78 0.81 0.83

RMSEb

RF(A) SVM(A) SVM(S)

0.50 0.46 0.49

0.53 0.53 0.53

0.36 0.32 0.31

0.4 0.38 0.36

A, AZ descriptor set; S, signature descriptor set. RF(A) annotation refers to a RF model based on AZ descriptor set. The Q2 and RMSE values represented here are the average values of 10 runs.

that the model could not distinguish between high or low for a substantial number of compounds. It is thus important to ensure that a new model also yields a clear reduction of false positives in the moderate class. New Model Development Base Models New models were built using dataset 3, that is, the extension of the 2011 dataset with all data measured since then. Dataset 3a was randomly split into training and test set with a ratio of 7:3 and the procedure repeated 10 times. Cross-validated Q2 and RMSE values for internal validation are obtained for each of the 10 runs and the average value for each model is given in Table 1. Internal validation for the Kp,uu,brain direct model using AZ descriptors gives an average cross-validation Q2 of 0.54 with RF and 0.61 with SVM, whereas the SVM model with signature descriptors give a Q2 of 0.56. The SVM model using AZ descriptors thus seems to perform best on average. Regarding the Kp,brain , Vu,brain , and fu,p models, the SVM method seems to perform slightly better, irrespective of the used descriptor set, but the model quality is actually very similar for all three model types (Table 1; for individual values see Supplementary Material Fig. S1). The test sets derived from dataset 3a (104 compounds each) were used for the external validation of the models. R2 and RMSE values were calculated for each of the 10 runs separately and the average value derived. By varying descriptor sets (AZ descriptors and signature descriptors), machine learning algorithms (SVM and RF), and model workflows (direct and indirect), six different models were constructed. It is worthwhile to mention that the models with the signature descriptors always used the SVM algorithm. The averaged R2 values for the direct models are approximately in the same range as the Q2 values obtained from the internal validation (∼0.55–0.6). Overall, it seems that the RF and SVM indirect models (using AZ descriptors) performed best with an R2 and RMSE of 0.59 and 0.49, respectively, for both models (Table 2). Consensus Models In our previous study, we saw that consensus models helped to improve the model quality for Kp,uu,brain modeling. A similar consensus strategy was employed in the current study. The models to be used in the consensus models were selected and the average values were calculated as the consensus model prediction value. Various combinations of base models were assessed (Table 3). The R2 value is found to be above 0.6 for most DOI 10.1002/jps.24301

Table 2.

5

External Validation of the Six Kp,uu,brain Base Models

Model Type

R2

RMSE

SVM(A, d) RF(A, d) SVM(S, d) SVM(A, i) RF(A, i) SVM(S, i)

0.57 0.54 0.51 0.59 0.59 0.50

0.49 0.52 0.53 0.49 0.49 0.57

A, AZdescriptor set; S, signature descriptor set; d, direct model; i, indirect model. RF(A, d) annotation refers to a RF direct model based on AZdescriptor set. The R2 and RMSE values represented here are the average values of 10 runs.

combinations, that is, higher than that of any single models. Also, the RMSE is in general better (lower) than the RMSEs found in the single models. Models no. 15 (combining five base models) and no. 16 (combining six base models) showed the best average R2 and RMSE (0.65 and 0.45, respectively). The classification performance (average among the 10 runs) of the four best consensus models was investigated. Both two class and three class schemes were considered (Tables 4 and 5; definitions for the measures used can be found in the Supplementary Material Tables S2 and S3). In the two-class case, the classification accuracy is around 84% for the four best models. For the three-class models, the classification precision for the high and low classes are above 80% as well, but the moderate class shows an average classification precision of around 46%. About 50% of the compounds are classified as moderate.

DISCUSSIONS Validation of our earlier model using a temporal test set containing 99 molecules showed that the model has good predictivity despite somewhat reduced performance when compared with the original model test set. Nevertheless, as the dataset was increased substantially, it was regarded worthwhile to renew the model to ensure all available information to be included in future predictions. Comparison with the 2011 Model To clearly demonstrate the improvement of the new Kp,uu,brain model as compared with the 2011 model, we decided to compare results based on the exact same test set. We used the combination strategy for the best consensus model identified (model 15, based on five individual base models; see Tables 3–5), to rebuild a model using all the compounds available except the temporal test set for the 2011 model consisting of 99 compounds (dataset 2). The model performance measures derived from the temporal test set differed slightly from the averaged values for model 15, for example, showing a lower R2 (0.6 vs. 0.65) and lower accuracy for the two-class classification scheme (0.81 vs. 0.84). This is expected, as randomly selected test set compounds, though not included in the model itself, still have a good chance of belonging to the chemical space that is covered by the training set. The R2 and RMSE for the rebuilt consensus model are 0.6 and 0.57, respectively, an enhancement when compared with the values observed for the old model (R2 of 0.53 and RMSE of 0.58; see Table 6). An improvement is also seen for the Varadharajan et al., JOURNAL OF PHARMACEUTICAL SCIENCES

6

RESEARCH ARTICLE – Pharmacokinetics, Pharmacodynamics and Drug Transport and Metabolism

Table 3.

Consensus Model Performance

Model Number

Consensus Model Type AZ descriptor SVM(A,d) RF(A,d) SVM(A,i) RF (A,i) SVM(A,d) SVM(A,i) RF(A,d) RF (A,i) SVM(A,d) RF (A,i) SVM(A,i) RF(A,d) SVM(A,d) SVM(A,i) RF(A,d) SVM(A,d) SVM(A,i) RF(A,d) RF(A,i) SVM(A,d) RF(A,d) RF(A,i) Signature descriptor SVM(S,d) SVM(S,i) AZ descriptors and signature descriptor SVM(A,d) RF(A,d) SVM(S,d) SVM(A,d) RF(A,d) SVM(S,i) SVM(A,d) RF(A,d) SVM(S,d) SVM(S,i) SVM(A,d) RF(A,i) RF(A,d) SVM(S,d) SVM(A,d) SVM(A,i) RF(A,i) RF(A,d) SVM(S,d) SVM(A,d) SVM(A,i) RF(A,i) RF(A,d) SVM(S,d) SVM(S,i)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

R2

RMSE

0.60 0.63 0.62 0.60 0.63 0.62 0.63 0.63 0.64

0.48 0.46 0.46 0.48 0.46 0.47 0.46 0.47 0.46

0.56

0.51

0.61 0.62 0.62 0.63 0.65 0.65

0.48 0.46 0.47 0.46 0.45 0.45

Consensus models are composed by several base models. The base model annotations are similar to that represented in Table 2. The R2 and RMSE values represented here are the average values of 10 runs.

Table 4.

Two-Class Classification Performance of the New Best Four Consensus Models

Model Numbera

14 15 16

F-Score

Kappa

Matthews Correlation Coefficient

Accuracy

Sensitivity

Specificity

RF(A,i)

0.818

0.801

0.826

0.761

0.857

0.778

0.626

0.623

SVM(A,d) RF (A,d) RF(A,i) SVM(A,d) RF(A,i) RF (A,d) SVM(S,d) SVM(A,d) SVM(A,i) RF (A,i) RF(A,d) SVM(S,d) SVM(A,d) SVM(A,i) RF (A,i) RF (A,d) SVM(S,d) SVM(S,i)

0.836

0.802

0.858

0.799

0.861

0.8

0.66

0.659

0.836

0.814

0.848

0.791

0.867

0.8

0.662

0.66

0.843

0.823

0.852

0.8

0.874

0.809

0.677

0.675

0.839

0.817

0.849

0.796

0.869

0.804

0.668

0.666

Models

Base model 9

Negative Precision

Positive Precision

a

Model numbers are based on Table 3. The performance measures listed in the table are the average values among 10 runs.

Table 5.

Three-Class Classification Performance of the New Best Four Consensus Models

Model Numbera Base model 9 14 15 16

RF(A,i) SVM(A,d) SVM(A,d) SVM(A,d) SVM(A,d)

RF(A,d) RF(A,i) RF(A,i) RF(A,d) SVM(S,d) SVM(A,i) RF(A,i) RF(A,d) SVM(S,d) SVM(A,i) RF(A,i) RF(A,d) SVM(S,d) SVM(S,i)

Precision High

Precision Low

Precision Moderate

0.752 0.816 0.808 0.862 0.81

0.837 0.806 0.808 0.819 0.828

0.476 0.456 0.459 0.479 0.491

a

Model numbers are based on Table 3. The performance measures listed in the table are the average values among 10 runs.

classification performance (Table 7). Kappa increases from 0.51 to 0.61 and the two-class classification accuracy from 76% to 81%. Using the three-class classification scheme, we found that the accuracy for the high and low classes is similar to that obtained with the previous model, but the moderate class precision is improved from 40% to 48% (see Fig. 3 and Table 8). Varadharajan et al., JOURNAL OF PHARMACEUTICAL SCIENCES

Table 6.

Performance of the Models on the Temporal Test Set

2011 model New best consensus model

R2

RMSE

0.53 0.6

0.58 0.57

DOI 10.1002/jps.24301

RESEARCH ARTICLE – Pharmacokinetics, Pharmacodynamics and Drug Transport and Metabolism

Table 7.

7

Two-Class Classification Performance

Model 2011 model New best consensus model

Accuracy

Sensitivity

Specificity

Positive Precision

Negative Precision

F-Score

Kappa

Matthews Correlation Coefficient

0.76 0.81

0.74 0.72

0.77 0.89

0.74 0.85

0.77 0.78

0.74 0.78

0.51 0.61

0.51 0.62

Additionally, the number of compounds predicted as moderate is reduced from more than 55% to 46%. All in all, the new model shows some improvement of predictivity. However, the model complexity in terms of the number of individual models involved increase from three (2011 model) to five (present study).

Table 8.

Three-Class Classification Performance

2011 model New best consensus model

Precision High

Precision Moderate

Precision Low

0.92 0.92

0.40 0.48

0.83 0.82

Model Interpretation Understanding which variables are of importance for a model result is key to elucidate structure property relationships. In the present case, variable importance assessment within the RF method was used to determine the most important molecular properties affecting Kp,uu,brain . Table 9 shows the top 15 AZ descriptors for the RF Kp,uu,brain model [RF (A,d)]. Mainly, descriptors relating to hydrogen bonding, polar surface area, topology, and volume are present in the top 15 most influential descriptors, which is in agreement with earlier observations. Similar to our previous studies,15,18 lipophilicity descriptors do not appear in the list of the most important descriptors for

Kp,uu,brain in the present study. Lipophilicity is often considered a vital property for BBB permeation,39 possibly related to the fact that Kp,brain (log BB) is highly correlated with unspecific tissue binding where lipophilicity is a main driver. Kp,uu,brain , on the contrary, largely accounts for effects because of transporter interactions at the BBB,15 which are more dependent on hydrogen bonding properties than on lipophilicity. During drug discovery, it is also of great interest to identify characteristic functional groups in a compound that influence the BBB penetration. In contrast to bulk descriptors

Figure 3. Classification results (two class and three class) of 2011 model (a and b) and the new model (c and d) for the same test set of 99 compounds. The number of compounds within each class is indicated in the figure. DOI 10.1002/jps.24301

Varadharajan et al., JOURNAL OF PHARMACEUTICAL SCIENCES

8

RESEARCH ARTICLE – Pharmacokinetics, Pharmacodynamics and Drug Transport and Metabolism

Table 9. Random Forest Variable Importance Values for the Kp,uu,brain Model Descriptor

VIP

Meaning

MM SAS EP P SUM

0.023

HBAsum

0.021

Kappa2 VDW AREA

0.021 0.016

MM SAS EP P MEAN

0.016

MM VDW EP P SUM

0.015

Kappa1 CMR

0.014 0.014

HBAmax

0.012

OVAL NEW

0.012

HBD

0.012

MM VDW EP P AREA

0.012

AREA

0.012

Chi3p

0.011

VOL

0.011

Sum of positive electrostatic potentials on solvent accessible surface. Sum of acceptor free energies according to Raevsky (HYBOT). Topological index. van der Waals molecular surface area. Mean of positive electrostatic potentials on solvent-accessible surface. Sum of positive electrostatic potentials on van der Waals surface. Topological index. Calculated molar refractivity. Largely a volume descriptor highly correlated with molecular weight. Highest free energy factor for H-bond acceptors according to Raevsky (HYBOT). TSA/the area of a sphere with the volume given by MolVol2D. Lipinski number of HB donors = number of OH+NH. Area of van der Waals surface with positive electrostatic potential. van der Waals radius surface, summed over all atoms, with a 1–3 overlap correction. Sum of reciprocal square roots of valences over all 4-count linear atom paths. Gaussian volume. A measure of molecular volume.

that describe overall molecular properties, the fragment-based signature descriptors encode the structural information of a compound in detail and may serve well for this purpose. Previously, the model gradient based on signature descriptors was successfully employed to interpret SVM models.20,37 Here, this novel methodology was used to explore which molecular features within a query molecule actually influence the Kp,uu,brain SVM model result. As described in the Methods section, the gradient information from the signatures was mapped onto a compound’s atoms for a detailed SAR analysis. We examined the atom-summarized gradient values for eight drugs (see Fig. 4). The prediction and gradient values in the figure are derived from a signature SVM model trained on all available in-house Kp,uu,brain data except for these eight compounds. The molecules investigated are three beta-blockers, Metoprolol (1), Atenolol (2), and Nadolol (3), one analgetic, Morphine (4), one antitussive, Codeine (5), two antibiotics,

Varadharajan et al., JOURNAL OF PHARMACEUTICAL SCIENCES

Norfloxacin (6) and Levofloxacin (7), and one antiepileptic drug, Lamotrigine (8). Five of the compounds are BBB positive (1, 4, 5, 7, and 8), with compound 7 being very close to the limit (log Kp,uu,brain = −0.92). The $-blockers 1 (Metoprolol), 2 (Atenolol), and 3 (Nadolol) are a set of closely related molecules, with only Metoprolol being BBB positive. The model predicts Nadolol and Metoprolol’s log Kp,uu,brain as −0.72 and −0.67, respectively, whereas the prediction for Atenolol is below −1. For all three molecules, high-negative gradient values on the secondary amine nitrogen and the hydroxyl oxygen in the right hand side chain can be seen. Both Nadolol and Atenolol have additional substantially negative gradients on atoms in the other part of the molecule (two hydroxyl groups in Nadolol and the amide group in Atenolol), but Nadolol also shows some positive gradient values in the phenyl ring. The BBB-positive Metoprolol has only very small absolute gradient values (between −0.02 and 0.02) on the atoms around the ether side chain on the left hand side. This difference between the gradient values on the left hand side chains may explain the correctly predicted difference in BBB disposition for Metoprolol and Atenolol. The lower negative effect of the ether group in Metoprolol, when compared with the amide group in Atenolol, is consistent with our understanding that ether groups, in general, might favor permeation more than the hydroxyl or amide groups because of the differences in their hydrogen bonding potential. Although Nadolol was not predicted correctly, the gradient values show that the two hydroxyl groups in the bicyclic system are important for the predictions and changes in these substructures will influence Kp,uu,brain . Compound 4 and compound 5 are closely related compounds differing only in one methyl group. Both are BBB positive and predicted as such, but compound 5 has a much higher Kp,uu,brain value. In compound 4, two hydroxyl groups with a rather large negative contribution can be identified. Methylating one of them gives the expected effect: a higher Kp,uu,brain for compound 5, both experimental and predicted, and an overall less negative gradient value on the methoxy group. Note that both the oxygen and the carbon in this group show a negative gradient. Compound 6 and compound 7 are two fluoroquinolone antibiotics with distinct BBB permeability (−1.55 vs. −0.92 on log Kp,uu,brain ). Both compounds show many negative gradient values, especially in the carboxy and keto substructures important for the activity. Norfloxacin (6) shows additionally a high-negative gradient (−0.247) on the secondary amine in the piperazine ring suggesting that modification in this region will be able to change Kp,uu,brain . In Levofloxacin (7), the compound with higher BBB accessibility, one of the structural changes is methylation of the amine in the piperazine ring, which actually shows a largely decreased gradient value on the nitrogen atom and a positive gradient value on the carbon atom. For the BBB-positive compound 8, Lamotrigine, many atoms with positive gradients can be noticed. However, one of the Cl atoms actually has a negative gradient, indicating that removal of this Cl might lead to additional increase in Kp,uu,brain . Among these examples, atomic gradient values provide an interesting structural interpretation of the SVM model. We believe this approach to be very useful in guiding molecular design with regard to BBB penetration within a compound series explored in a drug discovery project.

DOI 10.1002/jps.24301

RESEARCH ARTICLE – Pharmacokinetics, Pharmacodynamics and Drug Transport and Metabolism

9

Figure 4. Structures of eight drug compounds with selected gradient values shown (at least five main gradients; green: positive; yellow: negative).

CONCLUSIONS Recently, improved awareness in drug industries that centrally mediated pharmacological effects are driven by the unbound drug concentration in the brain ISF has led to the development and increased application of experimental methodologies to determine the unbound drug concentration ratio between brain and plasma, Kp,uu,brain . The availability of such data has subsequently also enabled the development of in silico models for this property. In 2011, an in silico Kp,uu,brain model was built based on a set of AstraZeneca in-house data. Since then, new Kp,uu,brain data were continuously generated within the company covering a broader chemical space. In the current study, we have collected these additional data points as temporal test set to validate the previous model and to expand the training set for an update of the in-house Kp,uu,brain model. The validation results considering both the continuous result and a two-class

DOI 10.1002/jps.24301

classification demonstrate the relatively stable performance of the 2011 model. To improve the 2011 model further, various descriptors and machine learning algorithms were utilized for model building. The updated Kp,uu,brain model shows a better performance when compared with our previous model. A three-class classification system is proposed in order to diminish the risk of misclassification of compounds very close to the chosen limit, by adding an intermediate moderate class. Using the limits of −1.3 and −0.52 for log Kp,uu,brain , we found that the precision of high- and low-class predictions is 0.92 and 0.83, respectively, which is somewhat better than the precisions obtained using the two-class model (positive and negative precision are 0.85 and 0.78, respectively). However, approximately 50% of the compounds are classified as moderate, so their potential to reach the CNS is not really defined. Future models need to be able to handle compounds in the moderate class more efficiently.

Varadharajan et al., JOURNAL OF PHARMACEUTICAL SCIENCES

10

RESEARCH ARTICLE – Pharmacokinetics, Pharmacodynamics and Drug Transport and Metabolism

Examining the important descriptors defined by the RF VIP values resulted in observations similar to those reported in the 2011 study, further indicating the validity of the model. Additionally, the combination of molecular signature descriptors and SVM was explored to build an interpretable Kp,uu,brain model. Using atom-summarized gradient, we could visualize and identify some of the functional groups that influence Kp,uu,brain in the shown examples. We believe that this approach will be very useful for compound design.

REFERENCES 1. Reichel A. 2006. The role of blood–brain barrier studies in the pharmaceutical industry. Curr Drug Metab 7:183–203. 2. L¨oscher W, Potschka H. 2005. Blood–brain barrier active efflux transporters: ATP-binding cassette gene family. NeuroRx 2:86– 98. 3. Sanchez-Covarrubias L, Slosky LM, Thompson BJ, Davis TP, Ronaldson PT. 2014. Transporters at CNS barrier sites: Obstacles or opportunities for drug delivery? Curr Pharm Des 20:1422–1449. 4. Norinder U, Haeberlein M. 2002. Computational approaches to the prediction of the blood–brain distribution. Adv Drug Deliv Rev 54:291– 313. ¨ 5. Hammarlund-Udenaes M, Frid´en M, Syvanen S, Gupta A. 2008. On the rate and extent of drug delivery to the brain Pharm Res 25:1737– 1750. 6. Young RC, Mitchell RC, Brown TH, Ganellin CR, Griffiths R, Jones M, Rana KK, Saunders D, Smith IR, Sore NE. 1988. Development of a new physiochemical model for brain penetration and its application to the design of centrally acting H2 receptor histamine antagonists. J Med Chem 31:656–671. 7. Van de Waterbeemd H, Kansy M. 1992. Hydrogen-bonding capacity and brain penetration. Chimia 46:299–303. 8. Clark DE.1999. Rapid calculation of polar molecular surface area and its application to the prediction of transport phenomena 2. Prediction of blood–brain barrier penetration. J Pharm Sci 88:815–821. 9. Ooms F, Weber P, Carrupt PA, Testa B. 2002. A simple model to predict blood–brain barrier permeation from 3D molecular fields. Biochim Biophys Acta 1587:118–125. 10. Fan Y, Unwalla R, Denny RA, Di L, Kerns EH, Diller DJ, Humblet C. 2010. Insights for predicting blood–brain barrier penetration of CNS targeted molecules using QSPR approaches. J Chem Inf Model 50:1123–1133. 11. Engkvist O, Wrede P, Rester U. 2003. Prediction of CNS activity of compound libraries using substructure analysis. J Chem Inf Comput Sci 43:155–160. 12. Di L, Rong H, Feng B. 2013. Demystifying brain penetration in central nervous system drug discovery. J Med Chem 56:2–12. 13. Frid´en M, Gupta A, Antonsson M, Bredberg U, HammarlundUdenaes M. 2007. In vitro methods for estimating unbound drug concentration in the brain interstitial and intracellular fluids. Drug Metab Dispos 35:1711–1719. 14. Hammarlund-Udenaes M, Bredberg U, Frid´en M. 2009. Methodologies to assess brain drug delivery in lead optimization. Curr Topic Med Chem 9:148–162. 15. Frid´en M, Winiwarter S, Jerndal G, Bengtsson O, Wan H, Bredberg U, Hammarlund-Udenaes M, Antonsson M. 2009. Structure–brain exposure relationship in rat and human using a novel data set of unbound drug concentrations in brain interstitial and cerebrospinal fluids. J Med Chem 52:6233–6243. 16. Friden M, Ducrozet F, Middleton B, Antonsson M, Bredberg U, Hammarlund-Udenaes M. 2009. Development of a high-throughput brain slice method for studying drug distribution in the central nervous system. Drug Metab Dispos 37:1226–1233.

Varadharajan et al., JOURNAL OF PHARMACEUTICAL SCIENCES

17. Wan H, Bergstrom F. 2007. High throughput screening of drug– protein binding in drug discovery. J Liq Chromatogr Related Technol 30:681–700. 18. Chen H, Winiwarter S, Frid´en M, Antonsson M, Engkvist O. 2011. In silico prediction of unbound brain-to-plasma concentration ratio using machine learning algorithms. J Mol Graphics Modell 29:985–995. 19. Rodgers SL, Davis AM, van de Waterbeemd H. 2007. Time-series QSAR analysis of human plasma protein binding data. QSAR Comb Sci 26:511–521. 20. Chen H, Carlsson L, Eriksson M, Varkonyi P, Norinder U, Nilsson I. 2013. Beyond the scope of free-Wilson analysis: Building interpretable QSAR models with machine learning algorithms. J Chem Inf Model 53:1324–1336. 21. Frid´en M, Ljungqvist H, Middleton B, Bredberg U, HammarlundUdenaes M. 2010. Improved measurement of drug exposure in brain using drug-specific correction for residual blood J Cereb Blood Flow Metab 30:150–161. 22. Paine SW, Barton P, Bird J, Denton R, Menochet K, Smith A, Tomkinson NP, Chohan KK. 2010. A rapid computational filter for predicting the rate of human renal clearance. J Mol Graph Model 29:529– 537. 23. Bruneau P. 2001. Search for predictive generic model of aqueous solubility using Bayesian neural nets. J Chem Inf Comput Sci 41:1605– 1616. 24. Katritzky A, Wang Y, Sild S, Tamm T, Kalrelson M. 1998. QSPR studies on vapor pressure, aqueous solubility, and the prediction of water–air partition coefficient. J Chem Inf Comp Sci 38:720–725. 25. Faulon JL. 2003. The signature molecular descriptor. 1. Using extended valence sequences in QSAR and QSPR studies. J Chem Inf Comput Sci 43:707–720. 26. Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E. 2003. The chemistry development kit (CDK): An open-source java library for chemo- and bioinformatics. J Chem Inf Comput Sci 43:493–500. 27. Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen E. 2006. Recent developments of the chemistry development kit (CDK)—An open-source java library for chemo- and bioinformatics. Curr Pharm Des 12:2111–2120. ˚ Sweden. 28. SIMCA P+ . Version 13.0. Umetrics: Umea, ˚ 29. Stalring J, Almeida P, Carlsson L, Boyer S. 2011. AZOrange—High performance open source machine learning for QSAR modeling in a graphical programming environment. J Chem Inf 3:28. 30. Orange official web site. http://www.ailab.si/orange/. (accessed Nov 29, 2014) 31. Chang CC, Lin CJ. 2011. LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol 2:1–27. 32. Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP. 2003. Random forest: A classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci 43:1947– 1958. 33. Breiman L. 2001. Random forests. Machine Learning 45:5–32. 34. Cortes C, Vapnik V. 1995. Support-vector networks. Machine Learning 20:273–297. 35. Vapnik VN. 1995. The nature of statistical learning theory. New York: Springer. 36. OpenCV official web site, http://opencv.org/. (accessed Nov 29, 2014) 37. Carlsson L, Helgee EA, Boyer S. 2009. Interpretation of nonlinear QSAR models applied to Ames mutagenicity data. J Chem Inf Model 49:2551–2558. 38. Eriksson M, Chen H, Carlsson L, Nissink JWM, Cumming JG, Nilsson I. 2014. Beyond the scope of free-Wilson analysis. 2: Can distance encoded R-group fingerprints provide interpretable nonlinear models? J Chem Inf Model 54:1117–1128. 39. Pajouhesh H, Lenz GR. 2005. Medicinal chemical properties of successful central nervous system drugs. NeuroRx 2:541–553.

DOI 10.1002/jps.24301