European Journal of Pharmaceutical Sciences 20 (2003) 63–71 www.elsevier.com / locate / ejps
Quantitative structure–activity relationships (QSAR): studies of inhibitors of tyrosine kinase Qi Shen a , Qing-Zhang Lu¨ a,b , Jian-Hui Jiang a , Guo-Li Shen a , Ru-Qin Yu a , * a
State Key Laboratory of Chemo /Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha 410082, PR China b College of Chemistry and Environmental Science, Henan Normal University, Xinxiang 453002, PR China Received 31 January 2003; received in revised form 10 June 2003; accepted 16 June 2003
Abstract A quantitative structure–activity relationship (QSAR) study of the 1-phenylbenzimidazoles as inhibitors of the platelet-derived growth factor receptor (PDGFR) was performed. Some new electronic parameters Q o , Q m and Q p are suggested for characterizing the effect of substituents. Many other descriptors are also used which are selected by evolution algorithm (EA) using modified Cp as objective function proposed by the present authors. The descriptor Q m is shown to be an important variable to express effect of substituents. The variable selection shows that spatial descriptors are most important variables revealing important properties of the inhibitors. Electron-releasing substitutes at 5-position and the absence of bulky groups at 4,7-positions of the parent structure can enhance inhibitor activity. Principal component analysis is performed to classify this series of compounds. 2003 Elsevier B.V. All rights reserved. Keywords: Platelet-derived growth factor receptor; Electronic parameters; Modified Cp statistic; QSAR; 1-Phenylbenzimidazoles
1. Introduction Traditionally, anticancer drugs have been targeted at inhibiting DNA synthesis and function during mitosis. However, these drugs appear to be limited both in the degree of efficacy of cell killing that they can induce and in the selectivity with respect to tumor and normal cells, especially in organs that require rapid cellular proliferation for full potency. Abnormal activity of tyrosine kinases has been implicated in many cancers and a large number of inflammatory responses (Kurup et al., 2001). Inhibitors of tyrosine kinase as a new kind of effective anticancer drug are important mediators of cellular signal transduction that affects growth factors and oncogenes on cell proliferation. The development of tyrosine kinase inhibitors has therefore become an active area of pharmaceutical science. Platelet-derived growth factor receptor (PDGFR) which plays a vital role as a regulator of cell growth is one of the intensely studied tyrosine kinase targets of inhibitors. Inhibitors of the PDGFR not only can prevent restenosis *Corresponding author. Tel.: 186-731-882-1577; fax: 186-731-8822577. E-mail address:
[email protected] (R.Q. Yu). 0928-0987 / 03 / $ – see front matter 2003 Elsevier B.V. All rights reserved. doi:10.1016 / S0928-0987(03)00170-2
following vascular interventions but also are involved in the development of tumor angiogenesis. Inhibitors of PDGFR are of interest as potential anticancer drugs. Many tumors, particularly gliomas and sarcomas, undergo autocrine PDGFR activation that can be inhibited by PDGF antisera (Palmer et al., 1998, 1999). A large number of different classes of compounds (Maguire et al., 1994; Dolle et al., 1994) has been reported as selective inhibitors to the activity of PDGFR. 1-Phenylbenzimidazoles (Palmer et al., 1998, 1999) are shown to be a new class of adenosine triphosphate (ATP) site inhibitors of PDGFR. A number of structure–activity studies involving 1-phenylbenzimidazoles as inhibitors of PDGFR have been published (Oblak et al., 2000; Zhu et al., 2001; Pierre et al., 2000; Naumann and Matter, 2002). There is, however, a lack of well-defined quantitative structure–activity relationships (QSARs) for this system. A QSAR study on a limited set of 22 1-phenylbenzimidazoles (Kurup et al., 2001) has been performed, but quite a few compounds could not be included in the regression because of the lack of the electronic parameter s1 . In this study, we propose electronic charge parameters to replace the electric parameters s1 for characterizing the effect of substituents. Many other descriptors are also used
Q. Shen et al. / European Journal of Pharmaceutical Sciences 20 (2003) 63–71
64
in the QSAR study of PDGFR inhibitors, and variables are selected by evolution algorithm (EA) using modified Cp (Shen et al., 2003) as objective function as proposed by the present authors. In order to rationalize the structure–activity relationship in depth, a classification result based on principal component analysis (PCA) was obtained for comparison, and the main factors affecting to the inhibitory activity of 1-phenylbenzimidazoles considered have been identified.
2. Material and methods
2.1. Data sets A group of 75 1-phenylbenzimidazole derivatives (Palmer et al., 1998, 1999) which have substituents only on benzimidazole ring is used in this study. The molecular structure and numbering of substituents in the series of 1-phenylbenzimidazole derivatives are shown in Fig. 1. A list of the compounds studied along with their inhibitory data is summarized in Table 1. Inactive 1-phenylbenzimidazoles are automatically assigned the value of lg (1 / IC 50 ) of 4.3 (IC 50 550 mM). This data set of 75 1-phenylbenzimidazoles is randomly divided into two groups with 55 compounds used as training set for developing regression models and the remaining 20 compounds used only as the validation set in the prediction of biological activities.
2.2. Descriptors The charge densities of particular atoms in the molecule concerned, for example, the charge of the nitrogen atom in position 3, Q N-3 , are calculated using AM1 method. The charge densities of the atom in the ortho- (qo ), meta- (qm ) and para- (qp ) positions of a benzene ring when a substituting group is connected with the ring are obtained from corresponding charges with and without substitution; qo and qm are the average values of the charge densities of two ortho- and meta-positions, respectively. The descriptors Q i 2 s (i 5 o, m, p for ortho-, meta- and parapositions, respectively) are defined as the sum of qi 2 s
(i 5 o, m, p for ortho-, meta- and para-positions, respectively) of all substituting groups connected with the benzene ring moiety of benzimidazole. An indicator variable for the 5-substituted derivatives, I 5 is assigned the value of 1 when a substituting group is present at 5position, and a value of 0 is assigned otherwise. A series of other molecular descriptors are calculated for 1-phenylbenzimidazoles derivatives including spatial, structural, electronic, quantum mechanical, thermodynamic descriptors, and E-State indices. The spatial descriptors (Stanton and Jurs, 1990; Rohrbaugh and Jurs, 1987) used involve radius of gyration (RadOfGyration), density, principal moment of inertia (PMI), molecular volume, Verloop’s sterimol parameter (B5) and shadow indices. Structural descriptors include the molecular weight (Mw ), the number of rotatable bonds (Rotbonds) and the number of hydrogen bond (Hbond acceptor, Hbond donor). The electronic descriptors (Molecular Simulations, 1997) taken are concerning surperdelocalizability (Sr), atomic polarizabilities (Apol), and the dipole moment (Dipole). Quantum mechanical descriptors include the energy of the highest occupied molecular orbital (HOMO), the energy of the lowest unoccupied molecular orbital (LUMO), charge distribution-related descriptors Q N-3 , Q o , Q m and Q p as described above. The thermodynamic descriptors (Viswanadhan et al., 1989) are taken describing the hydrophobic character (lg P, logarithm of the partition coefficient in octanol–water), refractivity (MolRef, molar refractivity) and the dissolution free energy for water and octanol (Fh2o, desolvation free energy for H 2 O; Foct, desolvation free energy for octanol). Electrotopologicalstate indices (E-State indices) (Hall and Kier, 1991, 1995) used involve S-aaaC, S-aaN, S-aaCH etc. For example, in the symbol S-aaaC, S represents electronic topological state of atom; a stands for the bond in an aromatic ring; and C represents the carbon. After elimination of zero variance descriptors and descriptors that are difficult to interpret, there are 47 variables for describing compounds. Table 2 summarizes all the molecular descriptors used in this QSAR study. Calculation of quantum mechanical descriptors is performed with the AM1 semiempirical quantum chemistry method in HYPERCHEM 6.0 software package on PC. Other molecular descriptors are generated using the Cerius 2 QSAR1, 1997 23.5 soft system on silicon graphics R3000 workstation. The evolutionary algorithm was written in MATLAB 5.3 and run on a personal computer (Intel pentium processor 4 / 1.5 GHz 256 MB RAM).
2.3. Methods
Fig. 1. Structure of 1-phenylbenzimidazole.
Evolution algorithm (EA) (Hasegawa et al., 1997; Luke, 1994; Kubingi, 1996) is employed as the searching procedure in variable selection. A chromosome is formulated by a binary bit string, and each bit represents a descriptor. When a descriptor is selected, a value of 1 is given, and
Q. Shen et al. / European Journal of Pharmaceutical Sciences 20 (2003) 63–71
65
Table 1 Summary of experimental and calculated biological activities of 1-phenylbenzimidazoles derivatives along with their structures used in QSAR study No.
Substituent
Observed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57
H 4-OMe 4-OH 5-Me 5-OMe 5-OH 5-Cl 5-COOH 5-COOMe 5-CONH 2 5-NO 2 5-COMe 5-CHO 5-OC 3 H 7 5-OEt 5-OCH(Me) 2 5-OC 4 H 9 5-OCH 2 CH=CH 2 5-O(CH 2 ) 4 OH 5-OCH 2 (oxiranyl) 5-OCH 2 CH(OH)CH 2 OH 5-O(CH 2 ) 2 NH 2 5-O(CH 2 ) 2 N(Me) 2 5-O(CH 2 ) 3 N(Me) 2 5-O(CH 2 ) 4 N(Me) 2 5-O(CH 2 ) 2 Nmorph 5-O(CH 2 ) 3 Nmorph 5-O(CH 2 ) 4 Nmorph 5-SH 5-SMe 5-OCSN(Me) 2 6-Me 6-OMe 6-OH 6-Cl 6-COOH 6-COOMe 6-CONH 2 6-NO 2 6-NH 2 7-OMe 4,5-diOH 4-OH,5-OMe 4-CH 2 CH(Me)O-5 5,6-diOH 5,6-diMe 5,6-OCH 2 O 5-OMe,6Me 5-OH,6-Me 5-OMe,6-COOH 5-OH,6-COOH 5-OMe,6-COOMe 5-OMe,6-CH 2 OH 5-OMe,6-CHO 5-S(CH 2 ) 3 Nmorph 4-Me 4-Cl
Series c
Log (1 / IC 50 )
5.03 4.301 4.8539 5.3565 6.3665 6.3565 5.3979 5.03 6.081 4.7959 4.7959 6.0655 6.3665 6.6021 6.6198 5.5086 5.8861 6.2147 6.3468 6.4969 6.5086 6.1871 5.8239 6.8239 6.7959 6.1367 6.7696 6.5686 5.4815 6.1308 5.3372 4.3979 5.1938 5.6778 5.2676 4.301 4.8861 4.6021 4.301 4.6383 4.4318 4.6021 5.1487 4.5376 5.6383 5.9208 5.6576 6 5.6021 4.6778 5.3665 6.0605 6.4318 6 4.3 4.3 4.3
a
Calculated
b
Eq. (1)
Eq. (2)
Eq. (3)
Eq. (4)
5.1359 4.3768 4.739 5.3561 5.6684 5.2653 5.3138 5.4108 5.615 5.5311 4.9008 5.4794 5.3478 5.9993 5.7107 5.9601 6.3216 6.196 6.4967 5.9346 6.2273 5.9207 6.2111 6.6366 5.9882 5.9428 5.9131 6.7481 5.3269 5.7202 5.5928 5.1188 5.2054 5.1311 4.9202 4.914 4.9164 4.9772 4.5418 5.1693 4.1068 4.8572 5.0766 5.7176 5.3239 5.9208 5.2181 5.6503 5.3032 5.3042 5.0032 5.3765 5.3541 5.3761 5.6053 4.7429 4.7703
4.7599 4.4525 4.5956 5.554 6.052 5.7457 5.5814 5.4938 5.7474 5.4773 5.4828 5.4554 5.4554 6.0853 6.0759 6.0891 6.0911 6.0559 6.0661 6.2982 6.0603 6.05 6.0655 6.0745 6.0769 6.4571 6.4602 6.4623 5.6362 5.6307 6.0237 4.7763 5.2776 4.986 4.8037 4.7161 4.972 4.6996 4.7051 4.9626 4.458 4.8694 5.1681 5.1856 5.2418 5.848 5.4837 5.3646 5.0501 5.2905 4.9898 5.5592 5.3273 5.2571 5.9695 4.3599 4.4834
4.9992 4.3627 4.5467 5.438 5.6253 5.3404 5.3213 5.396 5.6778 5.2112 4.9566 5.4902 5.4582 6.0759 5.8525 5.8629 6.2968 6.0734 6.4462 6.0852 6.1434 5.7096 6.3 6.5148 6.7395 5.9365 6.1577 6.3843 5.23 5.5915 5.7367 4.9832 5.2211 4.9308 4.8421 4.8852 5.1718 4.6747 4.5019 4.6819 4.4028 4.8763 5.1854 5.3796 5.2784 5.8486 5.3865 5.6146 5.3233 5.5487 5.2586 5.8769 5.7366 5.5939 5.9873 4.544 4.462
4.9329 4.4223 4.6155 5.5506 5.7893 5.5533 5.471 5.4773 5.6999 5.3022 5.135 5.5481 5.5225 6.1616 5.9783 5.9867 6.3428 6.1487 6.4415 6.1119 6.1803 5.8214 6.3318 6.5109 6.6952 6.0292 6.2107 6.3965 5.4106 5.7161 5.8605 4.9283 5.2074 4.967 4.829 4.8102 5.0367 4.6145 4.5127 4.7461 4.4544 4.8731 5.1286 5.1396 5.239 5.7007 5.137 5.4355 5.1943 5.343 5.1028 5.6028 5.5069 5.3746 6.0146 4.5276 4.487
1 1 1 1 2 1 2 1 1 1 2 2 1 1 2 1 1 1 2 1 1 2 2 1 1 1 1 1 1 1 1 1 2 2 1 1 2 1 1 2 1 1 1 1 1 1 2 2 1 1 1 2 1 1 1 1 2
Q. Shen et al. / European Journal of Pharmaceutical Sciences 20 (2003) 63–71
66 Table 1. Continued No.
Substituent
Observed a
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
4-COOH 4-COOMe 4-CONH 2 4-NO 2 4-NH 2 7-Me 7-OH 7-Cl 7-COOH 7-COOMe 7-CONH 2 7-NO 2 7-NH 2 4-OMe,5-OH 4,5-diOMe 4-Br,5-OH 4-Br,5-OCH 2 CH=CH 2 4-CH 2 CH=CH 2 ,5-OH
Series c
Log (1 / IC 50 )
4.3 4.3 4.3 4.3 4.3 4.3 4.3 4.3 4.3 4.3 4.3 4.3 4.3 4.3 4.3 4.3 4.3 4.3
Calculated b Eq. (1)
Eq. (2)
Eq. (3)
Eq. (4)
4.7617 4.6087 4.6761 4.0957 4.9183 4.6183 4.6864 4.7499 4.1479 3.4105 4.2433 3.9242 4.7144 4.2732 4.5588 4.0942 4.7409 4.2661
4.3302 4.1841 4.0935 4.1285 4.5742 4.3599 4.5792 4.4834 4.3302 4.1875 4.0935 4.1285 4.5742 4.6913 5.0228 4.6368 4.9797 3.915
4.4387 4.2508 4.0042 3.8367 4.2947 4.5951 4.5693 4.6885 4.1582 3.8521 4.097 4.0682 4.3162 4.6899 4.9787 4.0001 4.7361 4.5921
4.4051 4.2103 4.0041 3.9121 4.3903 4.5683 4.6254 4.6681 4.1808 3.8915 4.0783 4.0973 4.4075 4.6644 4.9063 4.0746 4.6861 4.4754
a Logarithm of the inverse value of the concentration of inhibitor to reduce the level of glutamate–tyrosine copolymer substrate as reported by Palmer et al. (1998, 1999). b Calculated using Eqs. (1)–(4) in Table 4. c Randomly selected as the member of training (1) and validating (2) sets.
the value of 0 is taken otherwise. At first, a population of 100 models is collected by randomly choosing subsets of independent variables, i.e. taken 1 or 0 for different variables. Then the objective function (modified Cp, see below) value for each model is calculated. The evolving process includes mutation and selection operations. Each model is allowed to create a new model through mutation operation, then the Cp (Nishii, 1984) of the new model is recalculated, and all new models are added to form a 200-model population. According to the Cp values, 100 models with lowest Cp are selected from the set of 200 models. Mutation and selection operations are continuously repeated until the convergence criterion is satisfied. The modified Cp statistic as objective function is applied to variable selection in this QSAR study of 1-phenylbenzimidazoles. The modified Cp in MLR is expressed as follows. 2 Cp( p) 5 RSS p / sˆ PLS 2 (n 2 2p)
(1)
where n is the number of dependent variables, p is the number of independent variables. RSS p is the residual sum of the squares of p-variable model. sˆ 2PLS is a modified sˆ 2 by taking advantage of the capability of PLS in dealing with the multicollinearity problem and providing a correct estimation of model error. When the original data set is subjected to PLS analysis, sˆ 2PLS is defined as the value of RSS corresponding to the minimum number of principal components when further increase of the number of principal components does not cause a significant reduc-
32
1 1 1 1 2 1 2 1 2 1 1 1 1 1 2 1 1 1
P (from added [ 32 P]-ATP) incorporated into the
tion in RSS. The details of modified Cp have been described elsewhere (Shen et al., 2003).
3. Results and discussion
3.1. Definition of some descriptors QSAR of some 1-phenylbenzimidazoles was performed by Kurup et al. (2001). Due to the lack of the electronic parameter s1 , only 22 compounds were included in the regression. It seems that the electronic parameters1 is an important variable in the QSAR study concerned. The parameter s1 is a summation of the s1 values of all substituents at different positions of each molecule. It is the measure of field / inductive effect. The electric charge of each atom in the parent compound is changed when it is connected with substituents at different positions. The electronic properties are initially developed from a consideration of the effects in aromatic compounds such as benzoic acids. The benzene ring is the most common structural element in all kinds of pharmaceutical compounds. To express the field / inductive effect of substituents, we proposed parameters qo , qm and qp as described above which are the changes of charge densities of carbon atoms of a benzene ring when a substituting group is connected to the ring. Electron-releasing substituents have large values of qm and small values of qo and qp . Otherwise, electron-attracting substituents have small values of qm and large values of qo and qp .
Q. Shen et al. / European Journal of Pharmaceutical Sciences 20 (2003) 63–71
67
Table 2 List of molecular descriptors for 1-phenylbenzimidazoles studied as candidate variables Functional families of descriptors
Descriptors
Spatial descriptors
RadOfGyration (Radius of gyration), Shadow indices (surface area projections) (Shadow-XY, Shadow-XZ, Shadow-YZ, Shadow-XYfrac, Shadow-XZfrac, Shadow-YZfrac, Shadow-nu, Shadow-Xlength, Shadow-Ylength, Shadow-Zlength) Vm (molecular volume) Density Area (molecular surface area) PMI (principal moment of inertia) (PMI-mag-X, PMI-mag-Y, PMI-mag-Z) B5 4,7 (Veloop’s sterimol parameter),
Structural descriptors
Mw (molecular weight), Hbond acceptor (number of hydrogen bond acceptors), Hbond donor (number of hydrogen bond donors), Rotbonds (number of rotatable bonds),
Electronic descriptors
Apol (sum of atomic polarizabilities), Dipole (Dipole-mag, Dipole-X, Dipole-Y, Dipole-Z) Sr (superdelocalizability),
Quantum mechanical descriptors
HOMO (highest occupied molecular orbital energy) LUMO (lowest unoccupied molecular orbital energy) Q N3 (electronic charge of N-3 in the 1-phenylbenzimidazoles) Q o , Q m , Q p (electronic effect of substituents)
Thermodynamic descriptors
A log P, log P (the octanol–water partition coefficient) Fh2o (desolvation free energy for water) Foct (desolvation free energy for octanol) MR CM**23 , MolRef (molar refractivity)
E-State index
S-aaCH, S-aasC, S-aaaC, S-aaN, S-aas N, S-ssO
Indicator variable
I5
Q N-3 , the electric quantity of N-3 in the benzimidazole is also a descriptor defined in this study, as the nitrogen atom is suggested to form hydrogen bonds (Palmer et al., 1999).
3.2. Modified Cp statistic The Cp statistic is modified for variable selection. The conventional Cp (denoted by Cp9 here) is expressed as Cp9 5 RSS p / sˆ 2 2 (n 2 2p)
large compared to the sample size, which deteriorates the performance of QSAR modeling. Using sˆ 2 in Cp9 in such ill-conditioned systems would result in overfitting and underestimation of model error. Because PLS has the capacity to deal with the multicollinearity problem and to provide a correct estimation of model error, sˆ 2 in Cp9 is replaced by sˆ 2PLS as defined in Section 2.3. The experimental results show that the penalty to the number of independent variables in modified Cp is moderate.
(2)
where sˆ 2 is the estimation of RSS in the model involving all variables. Usually, the Cp9 statistic can perform satisfactorily in well-conditioned situations where the sample size is large compared to the number of variables and the collinearity among variables is negligible. However, the collinearity among variables is a common case rather than an exception in QSAR studies under normal conditions, and the number of variables representing the objects is large compared to the sample size. When Cp9 is used as the objective function, even when an apparently optimum Cp9 is reached, the number of descriptors might still be too
3.3. QSAR modeling for 1 -phenylbenzimidazoles In the first step of regression analysis, we calculate the correlation coefficients using one-parameter regression model for each descriptor with respect to all 75 compounds. Descriptors highly correlated with inhibitory activity are listed in Table 3. Indicator variable I 5 , sterimol parameter B5 4,7 , radius of gyration (RadOfGyration) and shadow-Xlength show high correlation with the inhibition activity. On the basis of the results obtained, we have assumed that most of the information concerning the structure–activity relationship of 1-phenylbenzimidazoles
Q. Shen et al. / European Journal of Pharmaceutical Sciences 20 (2003) 63–71
68
Table 3 Descriptors and their correlation coefficients (R) in one-parameter regression model involving all 75 compounds Descriptor
R
B5 4,7
I5
RodOfGyration
Shadow-Xlength
Shadow-XY
Area
PMI-Z
PMI-Y
PMI-mag
Vm
Rotbonds
Apol
20.6387
0.6325
0.6132
0.5909
0.58
0.573
0.577
0.583
0.578
0.567
0.558
0.527
is contained in the group of spatial descriptors. Among the 12 variables listed in Table 3, nine are spatial descriptors. This shows that steric effect plays an important role in inhibitory activity of 1-phenylbenzimidazoles. Shadow indices merit attention that all surface area projections relevant x coordinate axe such as shadow-Xlength, shadow-XZ and shadow-XY have higher correlation coefficients (R . 0.577) than other area projections only relevant y or z axes. For example, R values for shadowZlength, shadow-Ylength and shadow-YZ are 20.15, 0.28 and 0.247, respectively. For the parameter Q m , the correlation coefficient calculated with respect to all active and inactive compounds is 0.183, while an R value of 0.61 is obtained when only active compounds are involved in the regression model. Subsequent studies as discussed below show that Q m is an important descriptor, while Q o and Q p are not such in prediction the inhibitory activity. MLR is used with the modified Cp as the statistic in EAs. The best model with minimum Cp value among the final 100 combinations contains four variables during the EA search. Best models involving 3, 4, 5 and 6 variables are shown in Table 4. In Eqs. (2) and (4), the large positive coefficient of variable Q m implies an increase in the value of Q m is conductive to the activity of molecule. Electron-releasing groups, which have higher Q m value than electron-attracting group, enhance the activity of inhibitors while electron-attracting groups reduce the inhibitory activity. Negative coefficient of B5 4,7 in these equations shows bulky groups would reduce the activity of molecules when they are attached to position 4 and 7. Substituents at 5-position enhance the inhibitory activity by a positive effect shown by positive I 5 . However, substituents parameter Q m I 5 and B5 4,7 are not sufficient to describe the activity of inhibitor, and parameters describing the integral molecule appear necessary. The parent compound 1-phenylbenzimidazole is fairly rigid, with only one rotatable bond between the phenyl and benzimidazole
rings. The more rotatable bonds exist in the molecule, the larger the degree of conformational flexibility the molecule possessing. A flexible ligand can easily transform to favorable steric configuration for binding with a relatively narrow ATP site. This is in accordance with the positive coefficient of the descriptor Rotbonds as shown in Table 4. The positive coefficient of descriptor LUMO implied molecules with high-energy LUMOs would promote the inhibitory activity. Negative coefficient of the descriptor Hbond donor suggests that an increase in the number of hydrogen bond donors in a molecule would reduce the activity of molecules. That is to say a low electrophilicity of the molecule is favorable for promoting the activity. The correlation between the experimentally observed lg 1 / IC 50 and those calculated by the best 5-variable model is shown in Fig. 2A. The correlation coefficient for the training set was 0.8529 and that for the validation set was 0.8708, respectively. In Fig. 2A, there is an obvious outlier with rather high deviation of calculated activity from the experimentally measured value of compound 55. This value for this compound is an outlier also in all equations shown in Table 4, no matter whether it was placed in the training or predicted sets. When the EA search terminates, one may count the number of times for a particular molecular descriptor to appear in 100 individual combinations. When one lists the descriptors by order of decreasing numbers of times of appearance, the top descriptors or the most frequently appeared feathers are shown in Table 5. Once again, spatial descriptors occupy an important position and one third of top descriptors are of spatial type. From lg1 / IC 50 of this series of compounds, one notices that compounds with substitutes at 5-position are more active against PDGFR than those with substitutes at other positions. So spatial descriptors related to substitute position, such as B5 4,7 and Shadow-Xlength, are important descriptors. Importance of descriptor RadOfGyration (Radius of gyra-
Table 4 Results of variable selection by EA using Cp and MLR modeling involving 75 compounds Ra
Sa
Fa
Rp
R max b
(1) (2)
0.8105 0.8382
0.5304 0.4987
32.5538 29.5386
0.8435 0.8742
0.2455 0.4702
(3)
0.8529
0.4822
26.1647
0.8708
0.4882
(4)
0.8603
0.4758
22.7837
0.8724
0.4882
Equation lg 1 / IC 50 5 2 0.5244*B5, 4,7 1 0.2592*LUMO 1 0.2388*Shadow-Xlength 1 1.9262 lg 1 / IC 50 5 2 0.4004*B5, 4,7 1 5.4778*Q m 1 0.777*I5 1 0.0623*S-ssO 1 5.8374 lg 1 / IC 50 5 20.4125*B5, 4,7 10.3027*LUMO 10.2216*Rotbonds –0.2928*Hbond donor10.4402*I 5 14.2138 lg 1 / IC 50 5 20.5244*B5, 4,7 12.7188*Q m ,10.2457*LUMO 10.1818*Rotbonds 20.2533*Hbond donor10.6170*I 5 14.7015 a
R, correlation coefficient; S, standard deviation; F, F statistics. b R p , correlation coefficient of prediction set; R max , the maximum correlation coefficient among the variables.
Q. Shen et al. / European Journal of Pharmaceutical Sciences 20 (2003) 63–71
69
Fig. 2. (A) Calculated versus observed Ig IC 50 of a five-descriptor model of 75 compounds; (B) Calculated versus observed Ig IC 50 of a three-descriptor model of 54 compounds.
Table 5 Most frequently appeared descriptors during the EA search Compounds
Prefered variables
Inactive and active compounds
I 5 , Q m , Rotbonds, LUMO, MolRef, Area, Hbond donor, RadOfGyration, Density, B5 4,7 , Shadow-Xlength, S-ssO,
Active compounds
I 5 , Q m , Rotbonds, LUMO, MolRef, Area, Hbond donor, Vm , S-aaaC, S-aaN, Dipole-Y
tion) indicates the significance of steric hindrance caused by the size of functional groups. With the same reason, indicator descriptor I 5 is an important variable. Molar refractivity (MolRef) is a combined measure of molecular size and polarizability, and it turned out to be one of the important variables. Nitrogen at position 3 (N-3) of benzimidazole moiety was believed to form hydrogen bond (Palmer et al., 1999). However, descriptor Q N-3 turned to be not important during EA search. As each molecule in this series contains atom N-3, electric charge in atom N-3 seems to be not so essential with respect to inhibitory activity. Hydrophobicity, a factor usually much considered in the development of QSAR in biochemistry, seems to have a
small effect on the activity as Alog P is excluded in these equations and the top descriptors. To explore further into effects influencing inhibitory activity, we also carried QSAR study solely on 54 active compounds. The best equations involving three, four and five variables are listed in Table 6. The correlation coefficient for the prediction set as given by Eqs. (2) and (3) are rather low, though the correlation coefficients for the training set obtained by these two equations are acceptable. This is a symptom of overfitting which seems to be related with the relatively high correlation among variables involved in these equations. The correlation of calculated and observed lg1 / IC 50 by Eq. (1) is shown in Fig. 2B. The most frequently appeared descriptors during
Table 6 Results of variable selection by EA using Cp and MLR modeling involving 54 compounds No.
Equation
Ra
Sa
Fa
Rp
R max a
1
lg1 / IC 50 50.2120*Rotbonds10.7796*I 5 1 0.9252*S-aaaS12.5225 lg1 / IC 50 50.3081*Rotbonds10.8587*I 5 1 2.5908*S-aaaS1 24.9477*S-aaN 120.6713 lg1 / IC 50 50.3682*Rotbonds10.8674*I 5 1 2.6171*S-aaaS1 25.9895*S-aaN–0.2108 * Hbond donor 125.1076
0.8245
0.4763
24.7632
0.7288
0.4390
0.8552
0.4420
23.3800
0.6673
0.8702
0.8695
0.4281
20.4531
0.6708
0.8702
2 3
a
See footnote a of Table 4.
70
Q. Shen et al. / European Journal of Pharmaceutical Sciences 20 (2003) 63–71
the EA search when solely activity compounds are involved are also shown in Table 5. As the data set is reduced, the relative importance of different descriptors is changed. For instance, compounds with substituting groups at 4,7-positions are commonly inactive ones. As these compounds are excluded from the data set, the B5 4,7 becomes unimportant, and it was not selected during the EA search. Because 5-position appears an activate position, MLR was performed on 28 compounds with 5-substitutes and the following model was obtained: Log IC 50 5 17.8478*Q m 1 0.1610*Hbond donor 2 0.4591*Rotbonds 2 0.5685*Alg P 1 9.2649 n 5 28 R 5 0.7969 S 5 0.4423 F 5 10.8747
(3)
Here n is the number of observations, R is correlation coefficient, S is standard deviation, F is F statistics. From the large coefficient of Q m in this equation one can see that Q m plays an essential role in terms of the activity. Hydrophobicity (A lg P) exerts some negative influence to PDGFR inhibition.
3.4. Principal component analysis for classification of PDGFR inhibitors There are 21 compounds inactive to PDGFR in the whole set of 75 compounds, it is interesting to use PCA as a unsupervised classification method to classify this set of compounds. PCA is a multivariate statistical analysis method, which can extract information contained in a data matrix and reduce the original number of variables to a few factors called principal component (PC). All 47 descriptors listed in Table 2 are used, and PCA is used to project the high dimensional patterns into the two dimensional space of the first two PC-s (Fig. 3). The tendency of
Fig. 3. Score plot of the first two Pcs of the whole set of 75 compounds.
the distribution of compounds active to PDGFR is different from that of the inactive compounds, though a clear classification was not obtained.
4. Conclusions Some new electronic parameters Q o , Q m and Q p for substituents are suggested together with a series of established descriptors are used to predict and classify inhibitor activities of 75 1-phenylbenzimidazoles as inhibitors of PDGFR. Descriptor Q m is shown to be important variable to express effect of substituents. Variable selection by EA using modified Cp based on MLR modeling show that spatial descriptors are most important variables revealed important properties of the inhibitors. Electron-releasing substitutes at 5-position and the absence of bulky groups at 4, 7-positions of the parent structure can enhance inhibitor activity.
Acknowledgements The work was financially supported by the National natural Science Foundation of China (Grant No.29735150, 20075006, 20105007)
References Dolle, R.E., Dunn, J.A., Bobko, M., Singh, B., Kuster, J.E., Baizman, E., Harris, A.L., Sawutz, S.G., Miller, D., Wang, S., Faltynek, C.R., Xie, W., Sarup, J., Bode, C.E., Pagani, E.D., Silver, P.J., 1994. 5,7Dimethoxy-3-(4-pyridinyl) quinoline is a potent and selective inhibitor of human vascular b-type platelet-derived growth factor receptor tyrosine kinase. J. Med. Chem. 37, 2627–2629. Hall, L.H., Kier, L.B., 1991. The electrotopological state: structure information at the atomic level for molecular graphs. J. Chem. Inf. Comput. Sci. 31, 76–78. Hall, L.H., Kier, L.B., 1995. Electrotopological state indices for atom types: a novel combination of electronic, topological, and valence state information. J. Chem. Inf. Comput. Sci. 35, 1039–1045. Hasegawa, K., Miyashita, Y., Funatsu, K., 1997. GA strategy for variable selection in QSAR studies: GA Based PLS Analysis of calcium channel antagonists. J. Chem. Inf. Comput. Sci. 37, 306–310. Kubingi, H., 1996. Evolutionary variable selection in regression and PLS analysis. J.Chemometrics 10, 119–133. Kurup, A., Garg, R., Hansch, C., 2001. Comparative QSAR study of tyrosine kinase inhibitors. Chen. Rev. 101, 2573–2600. Luke, B.T., 1994. Evolutionary Programming Applied to the development of quantitative structure–activity relationships and quantitative structure–property relationships. J. Chem. Inf. Comput. Sci. 34, 1279– 1287. Maguire, M.P., Sheets, K.R., McVety, K., Spada, A.P., Ziberstein, A.A., 1994. New series of PDGF receptor tyrosine kinase inhibitors: 3substituted quinoline derivatives. J. Med. Chem. 21, 29–2137. Cerius 2 QSAR. Molecular Simulations, San Diego, CA. Naumann, T., Matter, T., 2002. Structural classification of protein kinases using 3D molecular interaction field analysis of their ligand binding sites: target family landscapes. J. Med. Chem. 45, 2366–2378.
Q. Shen et al. / European Journal of Pharmaceutical Sciences 20 (2003) 63–71 Nishii, R., 1984. Asymptotic properties of criteria for selection of variables in multiple regression. Ann. Stat. 12, 758–765. Oblak, M., Randic, M., Solmajer, T., 2000. Quantitative structure– activity relationship of flavonoid analogues. 3. Inhibition of P56 lck protein tyrosine kinase. J. Chem. Inf. Comput. Sci. 40, 994–1001. Palmer, B.D., Kraker, A.J., Hartl, B.G., Panopoulos, A.D., Panek, R.L., Batley, B.L., Lu, G.H., Susanne, T.K., Showalter, H.D.H., Denny, W.A., 1999. Structure–activity relationships for 5-substituted phenylbenzimidazoles as Selective ATP site inhibitors of the plateletderived growth factor receptor. J. Med. Chem. 42, 2373–2382. Palmer, B.D., Smaill, J.B., Boyd, M., Boschelli, D.H., Doherty, A.M., Hamby, J.M., Khatana, S.S., Kramer, J.B., Kraker, A.J., Panek, R.L., Lu, G.H., Dahring, T.K., Winters, R.T., Showalter, H.D.H., Denny, W.A., 1998. Structure–activity relationships for 1-phenylbenzimidazoles as Selective ATP site inhibitors of the platelet-derived growth factor receptor. J. Med. Chem. 41, 5457–5465. Pierre, D., Michel, L., David, S.G., 2000. 3D-QSAR CoMFA on cyclindependent kinase inhibitors. J. Med. Chem. 43, 4098–4108. Rohrbaugh, R.H., Jurs, P.C., 1987. Descriptions of molecular shape
71
applied in studies of structure / activity and structure / property relationships. Anal. Chim. Acta 199, 99–109. Shen, Q., Jiang, J.H., Shen, G.L., Yu, R.Q., 2003. Variable selection by an evolution algorithm using modified Cp based on MLR and PLS modeling: QSAR studies of carcinogenicity of aromatic amines. Anal. Bioanal. Chem. 375, 248–254. Stanton, D.T., Jurs, P.C., 1990. Development and use of charged partial surface area structural descriptors in computer-assisted quantitative structure–property relationship studies. Anal. Chem. 62, 2323–2329. Zhu, L.L., Hou, T.J., Chen, L.R., Xu, X.J., 2001. 3D QSAR analyses of novel tyrosine kinase inhibitors based on pharmacophore alignment. J. Chem. Inf. Comput. Sci. 41, 1032–1040. Viswanadhan, V.N., Ghose, A.K., Revankar, G.R., Robins, R.K., 1989. Atomic physicochemical parameters for three dimensional structure directed quantitative structure–activity relationships. 4. Additional parameters for hydrophobic and dispersive interactions and their application for an automated superposition of certain naturally occurring nucleoside antibiotics. J. Chem. Inf. Comput. Sci. 29, 163–172.