G Model
ARTICLE IN PRESS
JOCS-276; No. of Pages 10
Journal of Computational Science xxx (2014) xxx–xxx
Contents lists available at ScienceDirect
Journal of Computational Science journal homepage: www.elsevier.com/locate/jocs
An insight into pyruvate dehydrogenase kinase (PDHK) inhibition through pharmacophore modeling and QSAR studies Rita Kakkar ∗ , Richa Arora, Pragya Gahlot, Deepti Gupta Computational Chemistry Group, Department of Chemistry, University of Delhi, Delhi 110007, India
a r t i c l e
i n f o
Article history: Received 2 December 2013 Received in revised form 14 March 2014 Accepted 8 April 2014 Available online xxx Keywords: Pyruvate dehydrogenase kinase Principal Component Analysis Clustering Genetic Function Algorithm Pharmacophore 3D-QSAR
a b s t r a c t Quantitative-Structure-Activity-Relationship (QSAR) studies have been performed on PDHK inhibitors based on anilides of (R)-3,3,3-trifluoro-2-hydroxy-2-methylpropionic acid. A pharmacophore model has also been developed and a predictive atom based 3D-QSAR model for the studied data set has been derived. The obtained 3D-QSAR model scores high on all statistical parameters. The model suggests that a hydrophobic zone plays a crucial role in the activity of the ligands. This zone is occupied by a chlorine atom at the ortho position of the benzene ring in the active ligands. This is followed by statistical analysis of the data to elucidate the most important descriptors governing the inhibitory activity of the dataset. The descriptor set has been selected so as to capture important topological, geometric, electronic, structural and spatial features of the analogs. By using the Genetic Function Approximation (GFA), robust models have been generated. Principal Component Analysis (PCA) has been used to reduce the descriptors to a manageable set. The importance of hydrogen bonding, molecular flexibility with large number of rotatable bonds, hydrophobicity, and electron-withdrawing substituents at the para position of the phenyl ring contribute to the activity. Hierarchical Cluster Analysis (HCA) has been used to divide the dataset into three clusters on the basis of similarity. The same properties as deciphered from PCA are found to contribute to the activity of the compounds in the cluster containing the most active molecules. © 2014 Elsevier B.V. All rights reserved.
1. Introduction The pyruvate dehydrogenase complex (PDC) is one of the largest multi-enzyme complexes found in living cells. It catalyzes a key regulatory step in oxidative glycolysis, the irreversible decarboxylation of pyruvate, leading to the conversion of pyruvate to acetyl-coenzyme A (acetyl-CoA). In mammals, the activity of PDC is regulated by the enzyme pyruvate dehydrogenase kinase (PDHK), an integral part of PDC bound to the E2 component, which catalyzes the phosphorylation of the E1 component of PDC, causing inactivation of the entire complex [1]. Four PDHK isozymes (PDHK1, PDHK2, PDHK3, and PDHK4) are responsible for the short- and long-term regulation of mammalian PDC [2]. Since activation of PDC is important for medical conditions such as heart ischemia and insulin resistant diabetes, PDHKs have been targeted for development of specific non-toxic PDHK inhibitors, such as R-dihalogenated carbonyl compounds [3], anilides [4] and
∗ Corresponding author. Tel.: +91 1127666313. E-mail addresses:
[email protected],
[email protected] (R. Kakkar).
amides [5,6] of (R)-3,3,3-trifluoro-2-hydroxy-2-methylpropionic acid (Scheme 1). Structure–activity relationships of the secondary of (R)-3,3,3-trifluoro-2-hydroxy-2-methylpropionic amides acid derivatives showed that the (R)-3,3,3-trifluoro-2-hydroxy-2methyl moiety is an optimum group for PDHK inhibition [6]. Several compounds in this category, namely AZ12 (N-{4-[(ethylanilino) sulfonyl]-2-methylphenyl}-3,3,3-trifluoro-2-hydroxy-2-methylpropanide), Nov3r (4-{(2,5)-dimethyl-4-[3,3,3-trifluoro-2-hydroxy-2-methylpropanoyl]piperozinyl}carbonyl)benzonitrile), and AZD7545 ((2R)-N-{4-[4-(dimethylcarbamoyl)phenylsulfonyl]-2chlorophenyl}-3,3,3-trifluoro-2-hydroxy-2-methylpropanamide), have been evaluated for their PDHK inhibition activity, and their crystal structures in complexation with PDHK isoforms have also been determined [7–10]. It is found that the activities are dependent on the PDHK isoform. For example, AZD7545, on which the present work is based, is 5–15 times more active against PDHK2 than against PDHK1 and may not inhibit PDHK4 at all – rather in larger concentrations it stimulates it [10]. Docking studies on some analogs have also been performed [11–13]. Data on the inhibitory activities of various derivatives belonging to this class are available [4] in the literature, enabling us to attempt an investigation on the quantitative structure activity relationships
http://dx.doi.org/10.1016/j.jocs.2014.04.006 1877-7503/© 2014 Elsevier B.V. All rights reserved.
Please cite this article in press as: R. Kakkar, et al., An insight into pyruvate dehydrogenase kinase (PDHK) inhibition through pharmacophore modeling and QSAR studies, J. Comput. Sci. (2014), http://dx.doi.org/10.1016/j.jocs.2014.04.006
G Model JOCS-276; No. of Pages 10
ARTICLE IN PRESS R. Kakkar et al. / Journal of Computational Science xxx (2014) xxx–xxx
2
(QSAR) that govern their biological activities. QSAR studies performed till date on PDHK inhibitors have been very limited but include comparative molecular field analysis (CoMFA) [14]. In the present work, we have first carried out pharmacaphore modeling of the active site of PDHK and 3D-QSAR of the inhibitors. These two are robust tools for finding new potential drugs. They are widely used in lead optimization and provide valuable insights about ligand–protein interactions [15,16]. Once data about the shape and features of the potent inhibitor is known by such a study, 3D databases can be searched for existing molecules as new leads, or entirely new molecules can be designed. This reduces the input cost and time for finding new lead molecules which have not been studied yet for the given target. Since sufficient data on the inhibitory activities of 100 anilides is already known, we have also carried out classical QSAR studies on these analogs, generating satisfactory models using the GFA, PCA and HCA approaches.
2. Computational details A total of hundred anilides of (R)-3,3,3-trifluoro-2-hydroxy2-methylpropionic acid with known IC50 inhibitory activities [4] formed our structural data set. Before using the activity data, univariate analysis of the data was performed. The original activity data in the form of IC50 values was found unsuitable due to lack of a normal distribution. It was therefore transformed to log(1/IC50 ) (i.e. pIC50 ). Pharmacophore modeling and 3D-QSAR were performed using the PHASE (Pharmacophore Alignment and Scoring Engine) module developed by Schrödinger, Inc. Pharmacophore models prove useful since they provide for alignment of molecules [17], which is a critical requirement for developing a good QSAR model. The pIC50 values of the ligands range from 3.66 to 8.15 (Tables S1a–d, Supplementary Information (SI)). In order to divide the ligands into actives and inactives, the following criteria were used. Ligands with pIC50 values above 7.01 were considered active, those with pIC50 values below 4.77 were considered inactive, and the rest were taken as moderately active. On applying these thresholds, we obtained 27 active molecules and 19 inactive molecules. The ligands were prepared using the LigPrep module of Schrodinger. The structures were refined and ionized at neutral pH. Because of the presence of several rotatable bonds in the molecules, the conformers for each molecule were generated by running a thorough mixed MCMM/LMOD search (Mixed Monte Carlo Multiple Minimum/Low Mode search) applying the OPLS-2005 force-field with distance-dependent dielectric solvent treatment. Conformers with maximum energy difference of 10 kcal mol−1 relative to the global energy minimum conformer were retained. Pharmacophores from all conformations of the ligands in the active set were examined. During this exercise, the original chirality (R) was retained, as it has been shown in amides of primary amines, secondary amines, and anilines that the activity of (R)3,3,3-trifluoro-2-hydroxy-2-methylpropionic derivatives is several times greater than that of the (S)-enantiomers [6]. Common pharmacophoric features (CPHs) were then identified from a set of variants – a set of feature types that define a possible pharmacophore. The final size of the pharmacophore box was kept at 1 A˚ to optimize the number of final CPHs. The three-featured hypothesis was rejected because it seems insufficient to completely describe the large binding space of the ligands under consideration. Only the four-variant list yielded reasonable CPHs. These CPHs were examined using a scoring function to yield the best alignment of the active ligands [18]. The scoring procedure provided a ranking of the different hypotheses, allowing rational choices about hypotheses that are most appropriate for further investigation. Those hypotheses that survived the scoring procedure were subsequently scored with respect to inactives using a weight of
1.0. The inactive molecules were scored in order to observe their alignment with respect to the pharmacophore hypothesis, to aid decision on the selection of the hypothesis, since larger the difference between the scores of actives and inactives, better is the hypothesis at distinguishing them from each other [18]. PHASE QSAR models are based on Partial Least Square (PLS) regression. The hydrogen bond acceptor, hydrogen bond donor and hydrophobic features were used to build a QSAR model derived from a regular grid of cubic volume elements that span the space occupied by the training set of ligands. The pharmacophore features that were present in the 3D cubic grid were scored for all compounds. To avoid over-fitting, a maximum of seven PLS factors were taken [18,19]. We performed atom-based QSAR analysis because it is more useful and takes into account the entire molecular structure [18,20,21]. The training (70%) and test set (30%) molecules were randomly selected. In both the training and test sets, active, inactive as well as moderately active ligands were incorporated. The evidence for a QSAR model’s appropriateness lies in its ability to predict for the test set. There should be a relationship between the internal (training set) and external (test set) predictivity [22]. Inconsistency between the internal and external predictivity has been observed in past QSAR studies. Good models should have internal and external predictivity in the favorable range of R2 = 0.6–1.0 [23–25]. The predictive ability of hypotheses that scored well with respect to actives and could also distinguish well from the inactives was evaluated by correlating the observed and estimated activities of the training and test set molecules. The hypotheses were also evaluated for their predictive ability on an external test set containing 188 ligands. In order to reduce the number of descriptors, the GFA algorithm was used to efficiently search the wide solution space to identify the most suitable descriptors for building robust QSAR models [26,27]. Principal Component Analysis (PCA) was used to extract principal components from the data matrix into a reducible few PCs. PCA is a technique of extracting some condensed principal components from the variables in the data [28]. Hierarchical cluster analysis (HCA) was used to classify the molecules into groups based on a similarity based index. 3. Results and discussion 3.1. Pharmacophore modeling Compound 1 (Scheme 1) contains the CF3 , OH, C O, and NH groups, and a phenyl ring attached to the nitrogen. Its analogs were built [4] by keeping the core up to the phenyl ring intact in order to retain the activity, and the structures were modified by introducing a variety of R groups into the phenyl ring (see SI: Fig. S1), resulting in the modification of charges on the carbons of the phenyl ring. These hundred anilides of (R)-3,3,3-trifluoro-2hydroxy-2-methylpropionic acid [4] were taken for the generation of common pharmacophore hypotheses (CPHs). This resulted in F
F
F
H N
C1
C2
HO O
C3
C5 C4
Scheme 1.
Please cite this article in press as: R. Kakkar, et al., An insight into pyruvate dehydrogenase kinase (PDHK) inhibition through pharmacophore modeling and QSAR studies, J. Comput. Sci. (2014), http://dx.doi.org/10.1016/j.jocs.2014.04.006
G Model
ARTICLE IN PRESS
JOCS-276; No. of Pages 10
R. Kakkar et al. / Journal of Computational Science xxx (2014) xxx–xxx
3
Table 1 Atom-based QSAR statistics for four-featured hypotheses. Hypothesis* AADH.96 AADH.54 ADHR.309 DDHR.115 AADH.55 *
SD 0.1577 0.1637 0.2539 0.2611 0.1541
R2 0.9858 0.9847 0.9810 0.9799 0.9865
F 546.9 507.1 457.4 432.1 573.0
p −48
1.9 × 10 1.5 × 10−47 7.7 × 10−51 4.3 × 10−50 5.4 × 10−49
RMSE
Q2
Pearson-R
0.6221 0.6685 1.1868 1.2975 0.7455
0.7501 0.7114 0.4887 0.3888 0.6411
0.8758 0.8598 0.7523 0.6725 0.8492
A: Acceptor, D: Donor, H: Hydrophobic, R: Ring aromatic.
Fig. 1. Correlation plots for the predicted versus experimental pIC50 values for the AADH.96 model applied to the (a) training and (b) test set ligands.
782 three featured, 1906 four featured and 771 five featured probable CPHs. Among these, only the four featured hypotheses could survive the scoring process. The survival and “survival-inactive” scores of the best scoring hypotheses are given in Table S2. A good hypothesis is one which has high active features and low inactive features. For building the QSAR model, we chose five four-featured CPHs, on the basis of their high survival as well as “survival-inactive” scores (Table S2). The atom-based QSAR statistics are given in Table 1. A good QSAR model is one which has large R2 value (obtained for the training set molecules) and cross-validated Q2 value (obtained for the test set molecules). Also, a large value of F and small value of p indicates high degree of confidence [29]. ADHR.309 and DDHR.115 were rejected on the basis that, apart from the low Q2 values, both these hypotheses have low values of the Pearson-R coefficient. Amongst the remaining three CPHs, AADH.96 has good statistics with high R2 and Q2 values. Apart from this, it has the highest Pearson-R value. Its Root Mean Square Error (RMSE) is also the lowest. Apart from AADH.96, AADH.54 and AADH.55 also have good statistics. In order to confirm whether our proposed model (AADH.96) is the best QSAR model, all these hypotheses were tested on an external library of 188 known PDHK inhibitors. These were imported from the Binding DB database [30,31], along with their IC50 values, and their conformations were generated using the OPLS-2005 force-field. It may be stated here that we could not obtain exclusive data for inhibitors of a particular PDHK isoform. One set of data contained inhibitors of all isoforms, and the other contained data of PDHK1 and PDHK2 isoforms. Experimental evidence [10] suggests a large deviation in the activity of a particular inhibitor toward different PDHK isoforms. For example, analogs of AZD7545, on which the present work is based, are 5–15 more potent toward PDHK2 than toward PDHK1, and do not inhibit PDHK4 at all. Rather, at high concentrations, they even stimulate PDHK4. Moreover, the source of the experimental data [4] does not mention the PDHK isoform (presumably PDHK2) on which the data is based. Nevertheless, for the purpose of comparison of models, we proceeded to compare the experimental activities with the predicted ones using each of the models, and 163 ligands were obtained as hits for AADH.96, 141 for AADH.54 and 140
for AADH.55. The plot of the predicted activity versus the experimental activity of all these molecules for the four-featured QSAR model, AADH.96, is linear, with a calculated value of the square of the regression coefficient, R2 = 0.515 (Fig. S2). The internal data set (Table 1) and external data set predictions, therefore, unanimously point toward AADH.96 as the best QSAR model for PDHK inhibitors. The correlation plots of the predicted versus experimental activity of the training and test set molecules of the best QSAR model (AADH.96) have good statistical parameters (Fig. 1). The features represented in the best pharmacophore model AADH.96 are: two hydrogen bond acceptors, one hydrogen bond donor and one hydrophobic group. The hydrogen bond acceptors are the carbonyl and hydroxyl oxygens (Scheme 1). The hydrogen bond donor is the NH group. A pictorial representation of the pharmacophore model is presented in Fig. 2 and the angles between the various features are given in Table S3. On superimposing the actives among the studied molecules on our proposed model, a good overlap was obtained (Fig. 3).
Fig. 2. Model AADH.96 representing features and distances and angles between them. Acceptors “A3” and “A4” are represented as light red spheres with their lone pair vectors, donor “D6” is represented by a light blue sphere centered on the H-atom, with an arrow pointing in the direction of a potential H-bond, and the hydrophobic feature “H8” is represented as a green sphere.
Please cite this article in press as: R. Kakkar, et al., An insight into pyruvate dehydrogenase kinase (PDHK) inhibition through pharmacophore modeling and QSAR studies, J. Comput. Sci. (2014), http://dx.doi.org/10.1016/j.jocs.2014.04.006
G Model JOCS-276; No. of Pages 10 4
ARTICLE IN PRESS R. Kakkar et al. / Journal of Computational Science xxx (2014) xxx–xxx
Fig. 3. Alignment of active ligands on AADH.96.
Fig. 4. Top: Alignment of the ligand with the best fitness score, 10b (left) and least fitness score, 4d (right) on the pharmacophore model AADH.96. Bottom: Respective 3D-QSAR models. Blue color indicates favorable regions, whereas red color indicates unfavorable regions. (For interpretation of the references to color in this text, the reader is referred to the web version of the article.)
The fitness scores and the experimental as well as predicted activities of various ligands in the test and training sets are given in Table S4. Ligand 10b has the highest possible fitness score of 3.00 and ligands 4d and 4h have the least fitness scores of 0.94. The most potent ligand (10e), with experimental pIC50 value of 8.16, has the next highest fitness score of 2.96. The alignment of the best scoring ligand (10b) and the least scoring ligand (4d) on the AADH.96 model is represented in Fig. 4 (top). Another picture with a white background and mesh-like features is provided in the Supplementary Information (Fig. S3). The combined effect of the hydrogen-bond donor, hydrophobic and electron-withdrawing groups on the activity of the best fitted ligand 10b (left) and the worst fit ligand 4d (right) is depicted in
the lower images. The presence of these groups in the blue colored regions enhances the activity of the ligand, whereas their presence in the red colored regions reduces their activity. We can see that most of the features of the best ligand (10b) lie in the blue colored region and the worst fit ligand (4d) has more features in the unfavorable red region. The hydrogen bond donor regions for the most active and least active ligand are depicted in Fig. 5 (top). The pale red regions are those where the presence of a hydrogen bond donor will increase the activity of the ligand and yellow regions are those where the presence of a hydrogen bond donor group is unfavorable for the activity of ligand against PDHK2. The NH group of the inhibitor molecule lies in the pale red region, indicating the importance of
Please cite this article in press as: R. Kakkar, et al., An insight into pyruvate dehydrogenase kinase (PDHK) inhibition through pharmacophore modeling and QSAR studies, J. Comput. Sci. (2014), http://dx.doi.org/10.1016/j.jocs.2014.04.006
G Model JOCS-276; No. of Pages 10
ARTICLE IN PRESS R. Kakkar et al. / Journal of Computational Science xxx (2014) xxx–xxx
5
Fig. 5. Top: 3D-QSAR model based on 10b (left) and 4b (right) depicting the hydrogen bond donor feature. The pale red regions are those where the presence of hydrogen bond donors will increase the activity of the ligand and yellow regions are those where presence of hydrogen bond donor is unfavorable for the activity of ligand against PDHK. Middle: 3D-QSAR models, depicting the hydrophobic feature. The green regions are those where the presence of hydrophobic groups will increase the activity of the ligand and purple regions are those where their presence is unfavorable for the activity of the ligand against PDHK. Bottom: Respective 3D-QSAR models, depicting the electron withdrawing feature. The orange regions are those where the presence of electron withdrawing groups will increase the activity of the ligand and cyan regions are those where the presence of electron withdrawing groups is unfavorable for the activity of ligand against PDHK. (For interpretation of the references to color in this text, the reader is referred to the web version of the article.)
this group for the activity against PDHK2. The hydroxyl hydrogen also lies in this region. Similarly, Fig. 5 (middle) shows the favorable (green) and unfavorable (purple) regions for the hydrophobic group. The CF3 group of the ligand lies in the negative hydrophobic region. The carbonyl and hydroxyl oxygens, groups common to all the inhibitors studied, lie in the positive electron withdrawing (orange) region (Fig. 5 (bottom)). The presence of these groups is thus favorable for the activity of the ligands against PDHK2. An electron withdrawing group attached to the benzene ring at the para position to the amide group also enhances the activity. Our AADH.96 model thus confirms that, apart from the presence of the carbonyl and hydroxyl oxygens as hydrogen bond acceptors and the NH group as a hydrogen bond donor, common to all the ligands, an essential criterion for the exhibition of PDHK inhibitory activity by this class of compounds is a hydrophobic group at the C1 position. The CF3 group is, however, not essential. In fact, it is present in the hydrophobically unfavorable region. Moreover, Compounds 11a–d, in which this group is replaced by methyl, are moderately active compounds. Electron-withdrawing groups at the para position to the amide group also enhance the activity.
Table S4 indicates that most of the compounds exhibit high fitness to the pharmacophore model, but Compounds 1–4, except some of the compounds in the 2 series, show smaller values of the fitness parameter. These are also among the least active compounds. The residuals of the predicted pIC50 values using PLS with respect to the experimental values are small in all cases, barring three exceptions, 10g, 10q and 11d, all of which belong to the test set, but otherwise exhibit high fitness (≥2.5) to the pharmacophore model. This prompted us to examine the data more minutely in order to bring out further relationships accounting for the variation. 3.2. QSAR model development and validation For the 2D QSAR, more than 100 descriptors, chosen to capture important topological, geometric, electronic and structural features, were calculated for each of the compounds in order to clarify the effect of electronic, hydrophobic, and steric properties on their inhibitory activity. These include thermodynamic, topological and quantum mechanical parameters, such as the partial charges on the ring carbon atoms. As stated previously, the basic skeleton of
Please cite this article in press as: R. Kakkar, et al., An insight into pyruvate dehydrogenase kinase (PDHK) inhibition through pharmacophore modeling and QSAR studies, J. Comput. Sci. (2014), http://dx.doi.org/10.1016/j.jocs.2014.04.006
G Model JOCS-276; No. of Pages 10
ARTICLE IN PRESS R. Kakkar et al. / Journal of Computational Science xxx (2014) xxx–xxx
6
Fig. 6. Loadings plot for PC1 and PC2, showing the descriptor pattern.
all the compounds is identical and only the substituents on the ring are varied in the various structures, which implies that the partial charges on the carbon atoms are likely to be important indicators of activity. For this reason, we calculated the electrostatic potential (ESP) fitted charges [32] using the AM1 Hamiltonian [33] instead of the usual Hückel charges in QSAR studies. The Mulliken charges were also avoided because of their inability to assign unambiguous charges. After calculating the descriptors for the compounds, GFA was performed. A distinctive feature of GFA is that, instead of generating a single model, as do most other statistical methods, it produces a population of models. The range of variation in this population gives added information on the quality of fit and importance of descriptors. For example, the frequency of use of a particular descriptor in the population of equations may indicate how relevant the descriptor is to the prediction of activity. Prior to performing GFA, the Pearson correlation matrix was examined. Though many of the variables are inter-correlated, the observed pIC50 is found to be strongly correlated with only the number of hydrogen bond acceptors. In fact, simple regression yields an equation pIC50 = 1.706 + 0.566X1 ,
(1)
where X1 stands for the number of hydrogen bond acceptors. The coefficient is highly significant (t-value = 10.93, p-value < 0.001). The R2 and Adj.R2 values are 0.5785 and 0.5737, respectively, showing that the activity is strongly dependent on the number of hydrogen bond acceptor groups. Other statistical parameters are also satisfactory: Friedman LOF [34] = 0.7265, Q2 = 0.5592, and the Fischer F value is also a high 119.4. Application of GFA yields the second equation pIC50 = 1.878 + 0.450X1 − 2.28X2 ,
(2)
and the coefficient for X1 has a t-value of 8.03 and that for X2 has a t-value of −3.98, and both the p values are highly significant (<0.001). The R2 and Adj.R2 values improve significantly to 0.6440 and 0.6357, respectively, and the cross-validated Q2 value is also high, 0.6204, along with the F value (77.8). In this equation, X2 stands for the ESP charge on C1 (q1 ), which in turn is highly correlated with the ESP charges on C2 (q2 ) (R = −0.8835) and C5 (q5 ) (R = 0.8037). This implies that compounds with higher negative charge on C1 and C5 (ortho positions), and higher positive charge on C2 (meta position) are better inhibitors. Among the various descriptors examined, a significant one is WPSA, standing for the weakly polar component of solvent accessible surface area. It is correlated with the pIC50 values with a correlation coefficient of 0.6069 and with the ESP charge on C1 with a correlation coefficient of −0.7593. This implies that an increase in the negative charge on C1 has the effect of increasing WPSA, thus increasing pIC50 . Since C1 is more highly correlated with pIC50 (−0.6720) than WPSA, we decided to retain the former in the regression analysis. No further improvement in either the R2 (0.7191) or adj.R2 (0.6654) or Q2 (0.6370) is achieved by adding a third factor (Structural Information Content (SIC)) in the equation. Moreover, the coefficient for SIC is less significant (p = 0.022). Therefore, we may conclude that the number of hydrogen bond acceptors and substituents that increase the negative charge at the ortho positions and the positive charge at the meta position, thereby increasing the weakly polar solvent accessible area, improve the inhibitory activity. Although we obtained a fairly good QSAR model, there is scope for improvement. A few large residual values (e.g. −1.35 for 10b and −1.34 for 10e) question the prediction capability of the model (Table S5). Moreover, the residual values are larger for the highscoring ligands, showing a variation with pIC50 , and that the model underestimates the PDHK2 inhibition of the best inhibitors, but overestimates that of the low-scoring ones. The residuals remain
Please cite this article in press as: R. Kakkar, et al., An insight into pyruvate dehydrogenase kinase (PDHK) inhibition through pharmacophore modeling and QSAR studies, J. Comput. Sci. (2014), http://dx.doi.org/10.1016/j.jocs.2014.04.006
G Model JOCS-276; No. of Pages 10
ARTICLE IN PRESS R. Kakkar et al. / Journal of Computational Science xxx (2014) xxx–xxx
7
Fig. 7. (a) PCA score plot for the derivatives. (b) Biplot, showing the overlap of the ligands and variables.
high even when subsequent GFA equations containing more variables are taken into account. Clearly, some parameters are missing from the model. We performed Principal Component Analysis (PCA) in order to understand the missing parameters in the GFA equations, after removing highly correlated variables (|R| ≥ 0.7). Table S7 lists the computed molecular descriptors for the ligands. The linear combinations of variables in each PC, called Loadings, describe the relationship between the PCs and the variables [35]. The contributions of molecular descriptors to the first two PCs are shown in the loadings plot (Fig. 6). The first principal component, which accounts for 34% variation, contains high loadings from hydrogen bond acceptors and donors, and number of rotatable bonds. An opposite contribution is found from the ESP charge on C1 (as also found in the GFA equations), and C3 . Therefore, the contribution of the structural and quantum mechanical parameters is divided by PC1. The second component brings out the importance of the hydrophobic component, as it almost wholly depends on the partition coefficient, since SIC is also negatively correlated with it (−0.54). This component accounts for 16% of the variation, and between the two, the first two components account for almost half the variation. The PC score (the perpendicular projection of each sample to the PC) is considered a better criterion than the loadings [36]. The plot of the score vectors yields a score plot, an important feature of PCA, which permits an easy visualization of the PCA results, and helps in data analysis, as it is used to identify groups of similar objects or find patterns to explain chemical behavior in the samples. Fig. 7 shows the PC1 versus PC2 score plot, where the compounds are seen to be clustered in the region of negative values of PC1. These findings corroborate those of earlier studies, where the compounds are combined in a similar fashion according to their potency. Though almost all the ligands contribute to PC1, the contribution is in the negative region for 1–8, after which it becomes positive. Since these are also among the least active ligands, one may conclude that their activity is mostly governed by the quantum mechanical charges on the ring carbon atoms, and the remaining most active compounds owe their activities to the larger number of hydrogen bond acceptors and donors, as well as rotatable bonds. The classification is of particular interest since PCA does not explicitly consider the activity of the compounds, but is still able to divide the molecules in a correct ranking order of biological activity around the series of compounds.
Thus, the GFA equations failed to take into account the contributions of hydrogen bond donors and rotatable bonds. Another feature that was neglected in the GFA equations, but contributes most to the second component, is the octanol–water partition coefficient, bringing hydrophobicity to the fore. Overall, the conclusion from the PC analysis is that substituents containing a large number of electronegative atoms and hydrogen bond donor–acceptor groups, particularly the latter, with large alkyl chains, are conducive to inhibition. The alkyl groups contribute toward hydrophobicity and rotatable bonds, and, being weakly electron donating, also increase the negative charge at the ortho and para carbons. This difference in activities and chemical behavior of the first two groups of compounds, as classified by Bebernitz et al. [4], seems to be consistent with the hypothesis that different subgroups may have different deciding factors for their activity or inactivity. All the compounds are structurally different and the R groups had been attached in a random fashion. Therefore, some kind of grouping is required to explain the behavior of the compounds in order to streamline the information gathered from the QSAR models. As the data were originally divided into four subgroups [4], we decided to perform QSAR on these subgroups before applying any other criterion for classification. However, this did not lead to any improvement in the results, save the additional conclusion that activities of Group II compounds are dependent on the SIC factor, those of Group III on the number of hydrogen bond donors, charge on C4 and SIC, and for the last group, the activities depend upon AlogP and the charges on C3 and C4 . For compounds of Group I, no significant regression could be obtained. 3.3. Model construction and evaluation for compounds clustered into subsets As stated above, subgrouping could not improve the results. Rather, the statistical parameters became worse than before. The most important point is the basis of division. These groups had been prepared by Bebernitz et al. [4] on the basis of screening. The authors first prepared some diverse compounds, tested them for activity and the best one among these was chosen for further random side chain alteration. Subsequently, the next level was as random as the initial one. This methodology provided both good and bad results in terms of activity for the prepared compounds. The authors simply grouped the compounds on the basis of the number and type of substituents introduced into the phenyl ring.
Please cite this article in press as: R. Kakkar, et al., An insight into pyruvate dehydrogenase kinase (PDHK) inhibition through pharmacophore modeling and QSAR studies, J. Comput. Sci. (2014), http://dx.doi.org/10.1016/j.jocs.2014.04.006
G Model
ARTICLE IN PRESS
JOCS-276; No. of Pages 10
R. Kakkar et al. / Journal of Computational Science xxx (2014) xxx–xxx
8
Fig. 8. Dendrograms showing the hierarchical clustering of the hundred anilides.
Therefore, we felt a need to classify the compounds on some other concrete ground. This prompted us to perform Hierarchical Cluster Analysis (HCA) on our dataset in order to use a similarity index for their classification. HCA is a statistical method for finding relatively homogeneous sample clusters on the basis of measured characteristics (variables) [37]. Before performing HCA, the dataset was scaled using the mean/SD standardization technique. The dendrogram shown in Fig. 8 is a plot of the dissimilarity index (y-axis) against the compounds (x-axis), based on the square of the Euclidean distance. It can be seen that the dissimilarity index at 86 divides the compounds into three clusters of significant size. The ligands in each cluster are listed in Table S6, and Table 2 gives the centroids for the variables in each cluster. Cluster 1 is characterized by compounds with smaller number of rotatable bonds, hydrogen bond donors and acceptors, less hydrophobicity and SIC, and high value for the Balaban JX parameter. The charges on the ortho and para carbons are high and that on the meta carbons is negative, all the parameters that reduce activity. It is no wonder then that this cluster, which comprises 42 ligands from nearly all groups, except 10 and 11, has the value of −0.62 for the average standardized pIC50 value. This group of ligands was also found to be clustered on the negative PC1 axis (Fig. 7) and is the most homogeneous cluster, though it has ligands ranging from the least active to active, with pIC50 values in the range 3.66–7.15. The next cluster is small, comprising only 10 ligands. They have small number of hydrogen bond acceptors (Table 2), chiral centers, SIC values, and high dipole moments and JX values. They have, however, large values of negative charges at the ortho and para positions, a requirement for good inhibitors. In spite of their good quantum mechanical parameters, they lose out (average pIC50 = −0.39) because of their poor structural parameters. These ligands are mostly found to contribute to the third principal component, which is dependent on the charge at the meta position C4 .
These ligands show large variance, but no ligand from the high scoring 9 and 10 series finds its way in this cluster. The pIC50 values range from 4.52 to 7.01. The last is another large group of 37 ligands. Except 6e and 7e, only the high scoring 9 and 10 series of ligands are present in this cluster. As seen from Table 2, these ligands score high on number of rotatable bonds, hydrogen bond donors and acceptors, AlogP and SIC. They also have the required negative charges at the ortho and para positions, and positive charge at the meta position. All this leads to an above average pIC50 value of 0.81, and the pIC50 values of individual ligands range from 5.44 to 8.15. Their dipole moment and JX values are also below average, and these are ligands found to score in the second principal component, which involves hydrophobicity. To sum up the conclusions of this section, the critical requirements for the bioactive ligands are high number of rotatable bonds, hydrogen bond donors and acceptors, chiral centers, hydrophobicity, SIC and charge at the meta carbon atom, and low JX, charges at the ortho and para positions and dipole moments. Although the CF3 group does not appear explicitly in any factor, the appearance of the 11 series, which lacks this group, in the second cluster indicates that it does not have any role to play. The substitution of this electronegative grouping by the mildly electron donating methyl group increases the negative charge on the ortho and para positions. Though this is one of the requirements of a good inhibitor, the present results indicate that too high negative charges at the ortho and para positions are also not conducive to good inhibition. Moreover, the absence of this grouping also increases the dipole moment, as the effect of the electron withdrawing group at the para position to the amide group is not balanced by any electron withdrawing substituent and, as we have seen, low dipole moments and consequent high hydrophobicity is one of the requirements of good inhibitors. Moreover, the chiral center is also lost on substitution of CF3 by CH3 .
Table 2 Centroid values for each cluster. Cluster
Rotatable bonds
HB donor
HB acceptor
Chiral centers
Alog P
JX
SIC
q1
q3
q4
1 2 3
−0.666 −0.232 0.819
−0.630 −0.133 0.751
−0.664 −0.582 0.911
−0.047 −1.110 0.353
−0.060 −0.046 0.080
0.405 0.380 −0.562
−0.333 −0.239 0.442
0.687 −1.078 −0.488
0.703 −1.495 −0.394
−0.552 2.185 0.036
0.113 0.396 −0.235
Please cite this article in press as: R. Kakkar, et al., An insight into pyruvate dehydrogenase kinase (PDHK) inhibition through pharmacophore modeling and QSAR studies, J. Comput. Sci. (2014), http://dx.doi.org/10.1016/j.jocs.2014.04.006
G Model JOCS-276; No. of Pages 10
ARTICLE IN PRESS R. Kakkar et al. / Journal of Computational Science xxx (2014) xxx–xxx
4. Conclusions 3-D QSAR on analogs of the AZD7545 class of PDHK inhibitors has revealed that the most prominent common pharmacophore features two hydrogen bond acceptors (C O, and C O), one hydrogen bond donor ( NH) and a hydrophobic area at the ortho position of the phenyl ring. Since the first three atom groups are common to all the ligands studied, we emphasize in this study that the hydrophobic area is crucial for the activity of the given compound at a specific orientation from the H bond donors and acceptors. Electron withdrawing groups para to the amide group are highly favorable. These reduce the dipole moment, being opposite to the electron withdrawing CF3 group, contributing to the hydrophobicity of the ligand, which is one of the factors responsible for high activity. 2D QSAR further revealed that a specific group or type of descriptor is not sufficient to capture the true factors responsible for the activity in the set of inhibitor compounds. Using PCA, we were able to separate the contributions into hydrogen bonding and quantum mechanical ones on the PC1 axis, and hydrophobic and structural ones along the PC2 axis. This study also revealed that HCA, along with PCA, forms a powerful tool to improve a QSAR model. This study used HCA to investigate whether a similarity based set generation method would lead to better understanding of the QSAR models. Though the activity was not a parameter for hierarchical clustering, this methodology accurately divided the ligands into inactive, mildly inactive and active ligands. Based on our study, the following suggestions are made for designing new improved inhibitors on the basis of QSAR models. Electron withdrawing groups containing hydrogen bond acceptors should be attached to the phenyl ring at the para position in AZD7545 analogs. Aliphatic chains containing large number of rotatable bonds, preferably attached to the amine group, as in 9–11, are also desirable. The CF3 group should also be retained for good inhibition. This study should be thus helpful in designing new inhibitors for the target. Acknowledgements Three of the authors (P.G., R.A. and D.G.) thank the Council of Scientific and Industrial Research (CSIR), New Delhi, for Research Fellowships. A research grant from the University of Delhi’s “Scheme to Strengthen R&D Doctoral Research Program by Providing Funds to University Faculty” is gratefully acknowledged. Appendix A. Supplementary data Supplementary material related to this article can be found, in the online version, at http://dx.doi.org/10.1016/j.jocs.2014.04.006. References [1] M.S. Patel, L.G. Korotchkina, Regulation of mammalian pyruvate dehydrogenase complex by phosphorylation: complexity of multiple phosphorylation sites and kinases, Exp. Mol. Med. 33 (2001) 191–197. [2] P.J. Randle, Metabolic fuel selection: general integration at the whole-body level, Proc. Nutr. Soc. 54 (1995) 317–327. [3] J. Espinal, T. Leesnitzer, A. Hassman, M. Beggs, J. Cobb, Inhibition of pyruvate dehydrogenase kinase by halogenated acetophones, Drug Dev. Res. 35 (1995) 130–136. [4] G.R. Bebernitz, T.D. Aicher, J.L. Stanton, J. Gao, S.S. Shetty, D.C. Knorr, R.J. Strohschein, J. Tan, L.J. Brand, C. Liu, W.H. Wang, C.C. Vinluan, E.L. Kaplan, C.J. Dragland, D. DelGrande, A. Islam, R.J. Lozito, X. Liu, W.M. Maniara, W.R. Mann, Anilides of (R)-trifluoro-2-hydroxy-2-methylpropionic acid as inhibitors of pyruvate dehydrogenase kinase, J. Med. Chem. 43 (2000) 2248–2257.
9
[5] T.D. Aicher, R.C. Anderson, G.R. Bebernitz, G.M. Coppola, C.F. Jewell, D.C. Knorr, C. Liu, D.M. Sperbeck, L.J. Brand, R.J. Strohschein, J. Gao, C.C. Vinluan, S.S. Shetty, C. Dragland, E.L. Kaplan, D. DelGrande, A. Islam, X. Liu, R.J. Lozito, W.M. Maniara, R.E. Walter, W.R. Mann, (R)-3,3,3-trifluoro-2-hydroxy2-methylpropionamides are orally active inhibitors of pyruvate dehydrogenase kinase, J. Med. Chem. 42 (1999) 2741–2746. [6] T.D. Aicher, R.C. Anderson, J. Gao, S.S. Shetty, G.M. Coppola, J.L. Stanton, D.C. Knorr, D.M. Sperbeck, L.J. Brand, C.C. Vinluan, E.L. Kaplan, C.J. Dragland, H.C. Tomaselli, A. Islam, R.J. Lozito, X. Liu, W.M. Maniara, W.S. Fillers, D. DelGrande, R.E. Walter, W.R. Mann, Secondary amides of (R)-3,3,3-trifluoro-2-hydroxy-2methylpropionic acid as inhibitors of pyruvate dehydrogenase kinase, J. Med. Chem. 43 (2000) 236–249. [7] M. Kato, J.L. Chuang, S.C. Tso, R.M. Wynn, D.T. Chuang, Crystal structure of pyruvate dehydrogenase kinase 3 bound to lipoyl domain 2 of human pyruvate dehydrogenase complex, EMBO J. 24 (2005) 1763–1774; M. Kato, J. Li, J.L. Chuang, D.T. Chuang, Distinct structural mechanisms for inhibition of pyruvate dehydrogenase kinase isoforms by AZD7545, dichloroacetate, and radicicol, Structure 15 (2007) 992–1004. [8] T.R. Knoechel, A.D. Tucker, C.M. Robinson, C. Phillips, W. Taylor, P.J. Bungay, S.A. Kasten, T.E. Roche, D.G. Brown, Regulatory roles of the N-terminal domain based on crystal structures of human pyruvate dehydrogenase kinase 2 containing physiological and synthetic ligands, Biochemistry 45 (2006) 402–415. [9] C.N. Steussy, K.M. Popov, M.M. Bowker-Kinley, R.B. Sloan Jr., R.A. Harris, J.A. Hamilton, Structure of pyruvate dehydrogenase kinase. Novel folding pattern for a serine protein kinase, J. Biol. Chem. 276 (2001) 37443–37450. [10] R.M. Mayers, B. Leighton, E. Kilgour, PDH kinase inhibitors: a novel therapy for Type II diabetes? Biochem. Soc. Trans. 33 (2005) 367–370. [11] P. Gahlot, R. Kakkar, Docking modes of Pfz3 and its analogues into the lipoamide binding site on PDHK2, Int. Res. J. Pharm. 1 (2011) 1–8. [12] R. Kakkar, Structure-based design of PDHK2 inhibitors from docking studies, Int. Res. J. Pharm. 1 (2011) 51–59. [13] R. Kakkar, N. Azad, P. Gahlot, AZ12 based design of PDK2 inhibitors, Int. Rev. Biophys. Chem. (IREBIC) 3 (2012) 163–167. [14] T.L. Aboye, M.E. Sobhia, P.V. Bharatam, 3D-QSAR studies of pyruvate dehydrogenase kinase inhibitors based on a divide and conquer strategy, Bioorg. Med. Chem. 12 (2004) 2709–2715. [15] M.N. Drwal, R. Griffith, Combination of ligand- and structure-based methods in virtual screening, Drug Discov. Today: Technol. (2013), http://dx.doi.org/10.1016/j.ddtec.2013.02.002. [16] U.A. Shah, H.S. Deokar, S.S. Kadam, V.M. Kulkarni, Pharmacophore generation and atom-based 3D-QSAR of novel 2-(4-methylsulfonylphenyl)pyrimidines as COX-2 inhibitors, Mol. Divers. 14 (2010) 559–568. [17] W. Tong, R. Perkins, R. Strelitz, E.R. Collantes, S. Keenan, W.J. Welsh, W.S. Branham, D.M. Sheehan, Quantitative structure-activity relationships (QSARs) for estrogen binding to estrogen receptor: predictions across species, Environ. Health Pers. 105 (1997) 1116–1124. [18] S.L. Dixon, A.M. Smondyrev, E.H. Knoll, S.N. Rao, D.E. Shaw, R.A. Freisner, PHASE: a new engine for pharmacophore perception, 3D QSAR model development, and 3D database screening. 1. Methodology and preliminary results, J. Comput. Aided Mol. Des. 20 (2006) 647–671. [19] S.L. Dixon, A.M. Smondyrev, E.H. Knoll, S.N. Rao, PHASE: a novel approach to pharmacophore modeling and 3D database searching, Chem. Biol. Drug Des. 67 (2006) 370–372. [20] V. Lather, R. Kristam, J.S. Saini, R. Kristam, N.A. Karthikeyan, V.N. Balaji, QSAR models for prediction of glycogen synthase kinase-3 inhibitory activity derivatives, QSAR Comb. Sci. 27 (2008) 718–728. [21] U.A. Shah, H.S. Deokar, S.S. Kadam, V.M. Kulkarni, 3D-QSAR and pharmacophore identification of novel imidazolyl benzimidazoles and imidazo[4,5-b]pyridines as potent p38␣ mitogen activated protein kinase inhibitors, Int. J. ChemTech Res. 2 (2010) 194–204. [22] H. Kubinyi, Validation and predictivity of QSAR models, QSAR and molecular modeling in rational design of bioactive molecules, in: E. Aki Sener, I. Yalcin (Eds.), Proceedings of the 15th European Symposium on QSAR & Molecular Modeling, CADDD Society, Istanbul, Ankara, Turkey, 2006, pp. 30–33. [23] H. Kubinyi, A general view on similarity and QSAR studies, in: H. van de Waterbeemd, B. Testa, G. Folkers (Eds.), Computer-Assisted Lead Finding and Optimization, Proceedings of the 11th European Symposium on QSAR, Lausanne, VHChA and VCH, Basel, Weinheim, 1996, pp. 9–28. [24] U. Norinder, Single and domain variable selection in 3D QSAR applications, J. Chemometrics 10 (1996) 95–105. [25] E. Novellino, C. Fattorusso, G. Greco, Use of comparative molecular field analysis and cluster analysis in series design, Pharm. Acta Helvetiae 70 (1995) 149–154. [26] D.E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, New York, 1989. [27] J.H. Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor, 1975. [28] S. Wold, K. Esbensen, P. Geladi, Principal Component Analysis, Chemom. Intell. Lab. Syst. 2 (1987) 37–52. [29] M.K. Teli, K. RG, Pharmacophore generation and atom-based 3D-QSAR of N-isopropyl pyrrole-based derivatives as HMG-CoA reductase inhibitors, Org. Med. Chem. Lett. 2 (2012) 1–10. [30] X. Chen, Y. Lin, M. Liu, M.K. Gilson, The binding database: data management and interface design, Bioinformatics 18 (2002) 130–139.
Please cite this article in press as: R. Kakkar, et al., An insight into pyruvate dehydrogenase kinase (PDHK) inhibition through pharmacophore modeling and QSAR studies, J. Comput. Sci. (2014), http://dx.doi.org/10.1016/j.jocs.2014.04.006
G Model JOCS-276; No. of Pages 10 10
ARTICLE IN PRESS R. Kakkar et al. / Journal of Computational Science xxx (2014) xxx–xxx
[31] T. Liu, Y. Lin, X. Wen, R.N. Jorrisen, M.K. Gilson, Binding DB: a web-accessible database of experimentally determined protein–ligand binding affinities, Nucl. Acids Res. 35 (2007) D198–D201. [32] U.C. Singh, P.A. Kollman, An approach to computing electrostatic charges for molecules, J. Comp. Chem. 5 (1984) 129–145. [33] M.J.S. Dewar, E.G. Zoebisch, E.F. Healy, J.J.P. Stewart, Development and use of quantum mechanical molecular models. 76. AM1: a new general purpose quantum mechanical molecular model, J. Am. Chem. Soc. 107 (1985) 3902–3909, http://dx.doi.org/10.1021/ja00299a024. [34] J.H. Friedman, Multivariate adaptive regression splines, Ann. Stat. 19 (1991) 1–67. [35] K. Esbensen, S. Schonkopf, T. Midtgaard, Multivariate Analysis in Practice, CAMO AS, Trondheim, Norway, 1994. [36] P. Geladi, B.R. Kowalksi, Partial least-squares regression: a tutorial, Anal. Chim. Acta 185 (1986) 1–17. [37] S. Sharma, Applied Multivariate Techniques, John Wiley & Sons Inc., New York, 1996.
Rita Kakkar holds a PhD degree in chemistry from the University of Delhi, which she earned in 1982 for her research on the theoretical study of some reaction paths. She has been teaching Physical Chemistry in the University of Delhi for over three decades. Her major field of research is Computational Chemistry. She has over 95 research publications in reputed international journals in this field. Her research interests include investigation of mechanisms of chemical reactions, including tautomerism, enzyme reactions, and reactions on nanosurfaces. Prof. Rita Kakkar has successfully supervised the work of 35 PhD and 7 MPhil students. She has delivered several invited talks at scientific conferences. She acted as an International Advisory Member, First-Eighth Workshops on Computational Chemistry and its Applications, part of the International Conference on Computational Science, ICCS: 2006–2013. She regularly reviews manuscripts for many international journals, including those published by the American Chemical Society, Royal Society of Chemistry, Wiley and Elsevier.
Richa Arora (born 10 July, 1988) is presently pursuing PhD in the Department of Chemistry, University of Delhi, after clearing the CSIR National Eligibility Test. Her research interests include computational modeling, and she has presented her work related to the computational study of benzohydroxamic acid at various conferences.
Pragya Gahlot (born at New Delhi, India, on January 31, 1979) holds a PhD degree in chemistry from the University of Delhi, which she earned in 2009 for her research on protein ligand interactions study. She has been teaching Physical Chemistry at Sri Venkateswara College (University of Delhi) for the past five years. Her major field of research is Computational Chemistry. She has 7 research publications in reputed international journals in this field. Her research interests include docking, drug designing and QSAR.
Dr. Deepti Gupta (born in Faridabad, India, on 8th March, 1983) finished her Ph.D. (2010) under the supervision of Prof. Rita Kakkar, Dept. of Chemistry, University of Delhi. Her thesis is based on the DFT study of structural properties of maleic and fumaric acids, their isomerization and their adsorption and degradation on TiO2 . She is currently working as an Assistant Professor in Deshbandhu College, University of Delhi. She is also involved in developing e-content for undergraduate courses for Institute of Lifelong Learning (ILLL), Delhi University. Her research interests include QSAR and docking studies, and study of reaction mechanisms. In academic achievements, she has qualified the National Eligibility Test (CSIR/NET/JRF, 2007) and was selected for INDO-US RISE Internship (2009).
Please cite this article in press as: R. Kakkar, et al., An insight into pyruvate dehydrogenase kinase (PDHK) inhibition through pharmacophore modeling and QSAR studies, J. Comput. Sci. (2014), http://dx.doi.org/10.1016/j.jocs.2014.04.006