Chemometrics and Intelligent Laboratory Systems 114 (2012) 1–9
Contents lists available at SciVerse ScienceDirect
Chemometrics and Intelligent Laboratory Systems journal homepage: www.elsevier.com/locate/chemolab
Study of indole derivative inhibitors of Cytosolic phospholipase A2α based on Quantitative Structure Activity Relationship Xiaoquan Lu ⁎, Dongqin Ji, Jing Chen, Xibin Zhou, Haicai Shi Key Laboratory of Bioelectrochemistry & Environmental Analysis of Gansu Province, College of Chemistry & Chemical Engineering, Northwest Normal University, Lanzhou, 730070, PR China
a r t i c l e
i n f o
Article history: Received 7 August 2011 Received in revised form 11 October 2011 Accepted 17 November 2011 Available online 14 March 2012 Keywords: Quantitative Structure Activity Relationships Cytosolic phospholipase A2α inhibitor Partial Least Squares Artificial Neural Networks Support Vector Machine
a b s t r a c t Cytosolic phospholipase A2α, one of the three subtypes of Cytosolic phospholipase A2 (α, β and γ), is deemed to play an important role in the arachidonate pathway. Due to the rate-limiting provider for proinflammatory mediators, it is a particularly attractive target for drug development. Studies have revealed that indol derivate compounds can inhibit the activities of Cytosolic phospholipase A2α. However, few papers on the relationship between the molecular structure and the activity of inhibitor were reported. In this study, the Quantitative Structure Activity Relationship (QSAR) of indole derivates has been performed based on the dataset of 49 compounds. By using stepwise multiple linear regression, 5 descriptors were selected from 1777 molecular descriptors, including GGI5 (Topological charge index G5), TIE(dssC) (sum of E-State of atom type dssC: ¦2S(dssC)), RDF115a (the atomic Sanderson ALOGP), RDF100c (the atomic charge), and RDF065p (the atomic polarizability). Subsequently, Partial Least Squares (PLS), Artificial Neural Networks (ANN) and Support Vector Machine (SVM) were adopted to build the QSAR model, respectively. The independent test indicated that the SVM can give the best statistical results. And indole derivative inhibitors activity might be related to global charge transfers, carbon atoms type linked benzyl sulfonamide and molecule geometrical the distance distribution. Crown Copyright © 2011 Published by Elsevier B.V. All rights reserved.
1. Introduction The cPLA2α (Cytosolic phospholipase A2α, Group IVA phospholipase A2), one of the three subtypes of cPLA2 (α, β and γ), is a central mediator of arachidonate from cellular phospholipids for the biosynthesis of eicosanoids and is an esterase that selectively cleaves the sn-2 position of arachidonoyl glycerophospholipids of biomembranes to bring about free arachidonic acid and lysophospholipids. The cPLA2α activation results in the production of numerous lipid mediators including leukotrienes, prostanoids and platelet activating factor [1,2]. These mediators have diverse functions in promoting inflammation and contributing to its resolution. The cPLA2α contributes to the biosynthesis of eicosanoids, and several studies have revealed that cPLA2α played an important part in various inflammatory diseases, according to the experiments using cPLA2α-deficient mice [3–5]. In these reports, cPLA2α-deficient mice are shown that they resist collagen-induced arthritis model of rheumatoid arthritis [5], adult respiratory distress syndrome (ARDS) [6], a model of colon cancer in the APC mouse [7], and a model of multiple sclerosis [8]. They also play a critical role in the pathogenesis of autoimmune diabetes [9]. From the experiments using cPLA2α deficient mice, we learn the contribution of cPLA2α for eicosanoid synthesis,
⁎ Corresponding author. Tel./fax: + 86-931-7971276. E-mail address:
[email protected] (X. Lu).
leading to progression of inflammation, reperfusion injury and acute lung injury [3]. Thus, cPLA2α is believed to be a potential candidate for the treatment of such an inflammatory diseases and reperfusion injury. So cPLA2α inhibitors aroused a great interest for researchers as it has represented a potentially useful therapeutic target to control these diseases. In recent years, in order to pursue better potency of inhibitors for cPLA2α, many different inhibitors, such as pyrroxyphene inhibitor [10], pyrrolidine inhibitors [11] and inhibitors based on a 1,3-Disubstituted Propan-2-one Skeleto [12] have been synthesized. Takao Shimizu [13] reviewed the biochemical properties and physiological roles about cytosolic phospholipase A2α. Nowadays, Quantitative Structure Activity Relationship (QSAR) has been widely utilized as a useful strategy in both research and regulatory applications, such as prediction of aquatic toxicity [14], capillary electrophoresis [15], chromatography [16] and prediction of biological activity [17]. Varnavas D [18] used 3D-QSAR CoMFA to study on indole inhibitors of GIIA Secreted Phospholipase A2. Gialih Lin [19] used QSAR to study 1-acyloxy-3-N-n-octylcarbamylbenzenes inhibitors of phospholipase A2. QSAR has been successfully established to predict different important biopharmaceutical properties. All kinds of statistical learning methods are used to build the QSAR model, such as Multiple Linear Regression (MLR), Partial Least Squares (PLS), Artificial Neural Networks (ANN) and Support Vector Machine (SVM). In most case, MLR and PLS are linear statistical models and ANN algorithms can be used to perform nonlinear statistical modeling. The advantages of ANN are
0169-7439/$ – see front matter. Crown Copyright © 2011 Published by Elsevier B.V. All rights reserved. doi:10.1016/j.chemolab.2011.11.011
2
X. Lu et al. / Chemometrics and Intelligent Laboratory Systems 114 (2012) 1–9
Table 1 Indole derivative inhibitors of cPLA2α structures, inhibitor activity data and predicted values.
R1
R2
1
R3
pIC50
Pred. (PLS)
Pred. (ANN)
Pred. (SVM)
5\Cl
- 0.8451
- 0.8411
- 0.6666
- 0.7183
2
\Me
5\Cl
- 0.3010
- 0.1770
- 0.3868
- 0.3803
3
\Me
5\Cl
0.8239
1.0171
0.6586
0.7380
4
5\Cl
0.2219
0.3538
0.5867
0.3199
5
5\Cl
0.8239
0.7530
0.9859
0.8110
6
5\Cl
0.2596
0.4678
0.4048
0.4192
7
5\Cl
0.4949
0.2709
0.2993
0.5177
8
5\Cl
0.7959
0.3206
0.4742
0.6861
9
5\Cl
1.2219
0.9204
0.9709
1.0792
10
5\Cl
0.4437
0.7718
0.5541
0.5021
11
5\Cl
0.6778
0.9612
0.9677
0.7648
12
5\Cl
1.0223
0.9340
0.9000
0.8630
13
5\Cl
- 0.3424
- 0.7044
- 0.6323
- 0.4741
5\Cl
1.2219
0.9358
0.9108
1.0636
14
\CH2OH
X. Lu et al. / Chemometrics and Intelligent Laboratory Systems 114 (2012) 1–9
3
Table 1 (continued) R1
R2
R3
pIC50
Pred. (PLS)
Pred. (ANN)
Pred. (SVM)
15
5\Cl
0.6576
0.4365
0.7432
0.7250
16
5\Cl
0.6576
0.5754
0.8554
0.7465
17
5\Cl
0.6990
0.9374
0.8975
0.8591
18
5\Cl
0.9208
0.8363
0.8838
0.9775
19
5\Cl
1.0223
0.5481
0.7642
0.9534
20
5\Cl
0.6990
0.9951
0.9205
0.8188
21
5\Cl
0.6990
0.7086
0.9059
0.7548
22
5\Cl
1.0000
1.1018
1.0246
0.9122
23
5\Cl
1.1249
1.3787
1.0243
0.9663
24
5\Cl
0.5686
0.3484
0.3838
0.4094
25
5\Cl
- 0.8062
- 0.2627
- 0.5221
- 0.6467
26
5\Cl
- 0.3424
- 0.4076
- 0.4347
- 0.3030
27
5\Cl
- 0.9031
- 0.6619
- 0.6257
- 0.7432
5\Cl
- 0.0414
0.0964
0.1614
- 0.1029
29
5\Cl
- 0.2553
- 0.1898
- 0.3286
- 0.2858
30
5\Cl
- 0.3979
- 0.3777
- 0.5335
- 0.4534
28
\CH2CH2OH
(continued on next page)
4
X. Lu et al. / Chemometrics and Intelligent Laboratory Systems 114 (2012) 1–9
Table 1 (continued) R1
R2
R3
pIC50
Pred. (PLS)
Pred. (ANN)
Pred. (SVM)
31
5\Cl
0.0458
- 0.0758
0.0436
- 0.0622
32
5\Cl
0
- 0.1070
- 0.3290
- 0.0502
33a
5\Cl
- 1.0000
- 0.3553
- 0.3831
- 0.8410
34
5\Cl
0.9208
0.5111
0.5700
0.7609
35
5\Cl
0.2007
- 0.2795
- 0.3554
0.0407
36
5\Cl
- 0.6021
0.2444
- 0.2354
- 0.4428
0.6198
0.5304
0.5554
0.6805
37
38
5\Cl
0.5850
0.7616
0.5249
0.4491
39
5\Cl
- 0.3979
- 0.0422
- 0.1053
- 0.2397
40b
5\Cl
- 0.6990
- 0.3293
- 0.3288
- 0.4698
41b
5\Cl
- 1.0000
- 1.0930
- 0.7799
- 0.4298
42b
5\NO2
1.0458
0.7303
0.6571
0.7984
43b
6\Cl
0.7959
0.9630
0.8593
0.7360
44b
5\Cl
1.2757
0.8450
0.7827
1.0018
45b
5\Cl
0.7959
0.4614
0.6373
0.7729
46b
5\Cl
0.8861
0.9506
0.9180
0.9567
47b
5\Cl
1.0000
0.7017
0.7451
0.9711
X. Lu et al. / Chemometrics and Intelligent Laboratory Systems 114 (2012) 1–9
5
Table 1 (continued) R1
R2
R3
pIC50
Pred. (PLS)
Pred. (ANN)
Pred. (SVM)
48b
5\Cl
0.8861
1.0073
0.9002
0.9622
49b
5\Cl
0.9586
0.6606
0.7367
0.9419
a b
7-H replace 7-Ph in the template molecule. Test set.
requirement of less formal statistical training, the ability to implicitly detect complex nonlinear relationships between dependent and independent variables, the ability to detect all possible interactions between predictor variables, and the availability of multiple training algorithms. The disadvantages are greater computational burden, proneness to overfitting, and the empirical nature of model development [20]. The SVM is a relatively recent approach introduced by Vapnik [21]. The SVM makes use of the structural risk minimization inductive principle, which has been shown to be superior to the traditional empirical risk minimization inductive principle that has been used in conventional neural networks. In this work, three QSAR predictive models for predicting the inhibitory activity of indole derivatives were built by PLS, ANN and SVM, respectively. A reliable and stable model based on SVM was achieved with strict criteria of evaluation. By studying the relationship between the molecule structure of indole derivates and inhibitory activity, it might provide important information to design effective indole derivative inhibitors of cPLA2α. 2. Materials and methods 2.1. Experimental data During the development of QSAR models, it is usually recommended where possible to (i) use experimental data from the same laboratory to avoid interlaboratory variation [22] and (ii) use data sets where the ratio of number of test compounds to descriptors used for modeling are at least five [23]. A dataset of 49 indole derivative inhibitors was founded, and the IC50 values were taken from the rat whole blood assay made by Lee and coworkers [24,25] (Table 1). The dataset was randomly divided into training (39 compounds, about 80% of the dataset) and test sets (10 compounds, the remaining 20%). The training set was used to develop the QSAR models and the test set was used to validate the developed model. 2.2. Molecular descriptors calculation and choice In the process of QSAR model construction, various rationally designed molecular descriptors are needed to examine molecular structures. Furthermore, the molecular structures can be drawn or searched for within databases, such as PubChem, Drugbank. Some free on-line software can be employed to generate the molecular descriptors (for example, PreADME/T [26], CDK (Chemistry development kit) [27] and E-DRAGON [28]). Herein, the structures of all molecules were drawn by the Hyperchem Software [29], and the final geometries were optimized until the root mean square gradient reached 0.001 kcal mol 1 − 1 by the semi-empirical MP3 method. Then 1777 molecular descriptors
were obtained through Molecular Descriptor Lab (MODEL) [30] which is accessible at http://jing.cz3.nus.edu.sg/cgi-bin/model/ model.cgi. After eliminating the descriptors with constant or almost constant values, a final set of 1430 descriptors were obtained. To make a stable and interpretable model, relevant descriptors should be selected in QSAR analysis. In the present work, stepwise multiple linear regression method was used to select the most relevant descriptors from the pool of 1430 descriptors. The descriptors were selected among the all descriptors based on permutation and correlation matrices in order to avoid collinearity problems. And five descriptors have been selected.
2.3. Methods In QSAR studies, there are some linear and nonlinear techniques, which can be applied for the construction of the model, such as the Multiple Linear Regression (MLR), Partial Least Squares (PLS), Artificial Neural Networks (ANN) and Support Vector Machine (SVM). PLS is a multivariate statistical technique and developed recently generalization of Multiple Linear Regression (MLR). PLS method's theory was reviewed in these papers [31–33]. Five molecule descriptions were input variables and pIC50 were output variable. Each data set had six components (x1, x2, x3, x4, x5; y), five of which were the input variables while the sixth one was the output variable. The concept of ANN was first proposed in 1943 by McCulloch and Pitts [34]. Recently, there has been a growing interest in the use of ANN for QSAR due to their inherent ability in modeling a nonlinear problem. The ANN is especially useful when a rigid theoretical basis or mathematical relationship to describe a phenomenon to be modeled is not available. There are numerous detailed introductions about the theory of ANN in these papers [35–39], thus only a brief outline of ANN is presented here. ANN is a technique capable of modeling complex functions. The neuron (node) is the basic processing unit in ANN. An ANN is composed of a number of neurons organized in layers. The ANN used in this study was a three layer architecture model: an input layer (molecular descriptor values), one hidden layer, and an output layer (50% inhibiting concentration). Both the input and output variables were the same as PLS model. SVM is gaining popularity owing to many attractive features and promising empirical performance. The SVM is a relatively recent approach introduced by Vapnik [21]. The SVM makes use of the structural risk minimization inductive principle, which has been shown to be superior to the traditional empirical risk minimization inductive principle that has been used in conventional neural networks. And this method has proven to be very effective for addressing general purpose classification and regression problems. A detailed description of SVM theory can be found in several papers [40–43]. Input and output values were the same as PLS and ANN model.
6
X. Lu et al. / Chemometrics and Intelligent Laboratory Systems 114 (2012) 1–9
3. Results and discussions 3.1. The results of molecule descriptions selection An important step in QSAR model is to select robust and informative descriptors from a variety of descriptors. We used the stepwise multiple linear regression to select representative variables from the pool of descriptors. The selection of the optimum number of descriptors was shown in Fig. 1. Fig. 1 shows that the changes of R and q 2 values were stable after five descriptors. We obtained the smallest RMSE value when it has five descriptors. So the five descriptors were selected to build models. As it can be seen from the correlation matrix (Table 2), there was no significant correlation among the selected descriptors. The contribution rates of five descriptions were shown in Fig. 2. The descriptors which obtained from stepwise multiple linear regression were GGI5, TIE(dssC), RDF115a, RDF100c, RDF065p, respectively. They were interpreted as follows. GGI5 is a Topological charge index G5. Topological charge indices evaluate the charge transfers between pairs of atoms, and therefore the global charge transfers in the molecule [44–47]. We may define the topological charge indexes, Gk, as i¼N−1
GK ¼ ∑i¼1
CT ij δ k; Dij
j¼N−1
∑j¼1
ð1Þ
where N is the number of vertices in the graph and D values are the entries of the topological distance matrix. We may define k as the index order of G. Therefore, Gk would represent the sum of all the CTi, terms, with Dij = k. These new descriptors being “graph invariants” would evaluate the total charge transfer between atoms placed at topological distance “k”. Therefore, for a linear molecule there would exist (N - 1) Gk values (from G1 to GN − 1). Gk is a strictly topological quantity plausibly correlating with the charge distribution inside the molecule. TIE(dssC) is mean sum of E-State of atom type dssC: ¦ 2S(dssC) [48]. It is the sum of the electrotopological state indices of atoms with the same atom type. The E-state index is computed as a graph
Fig. 2. The contribution rate of five descriptions.
invariant for each atom in the molecular graph. The index combines the electronic state of the bonded atom within the molecule with its topological nature in the context of the whole molecular skeleton. RDF115a, RDF100c, and RDF065p belong to the radial distribution function (RDF) descriptors [44,49,50]. These descriptors are based on the distance distribution in the 3D geometry of the molecule represented by Eq. (2): 2 N−1 N gw ðRÞ ¼ ∑i¼1 ∑j¼iþ1 wi ⋅ wj ⋅ exp −β ⋅ R r ij
ð2Þ
where N is the number of atoms and wi is the weighting factors, any atomic properties, and rij is the interatomic distance between atoms i and j. R is scanned from 0.0 Å to 14.0 Å in the step of 0.50 Å. In these descriptors (RDF115a, RDF100c and RDF065p) weighting schemes are the atomic Sanderson ALOGP, the atomic charge and atomic polarizability respectively. 3.2. Model predict 3.2.1. Partial Least Squares model The PLS technique was applied to build the linear modeling. The models obtained were validated using leave-one-out (LOO) crossvalidation process. LOO cross-validation procedure was widely used to evaluate the internal validation of QSAR models. As the name suggests, LOO cross-validation procedure involves using a single observation from the original sample as the validation data, and the remaining observations as the training data. This is repeated such that each observation in the sample is used once as the validation data [51]. The equation of the PLS model is the following: pIC50 ¼ 4:8564 þ 0:6771 GGI5 1:5382 TIEðdsscÞ 0:1528 ð3Þ RDF115a þ 0:6901 RDF100c þ 0:0123 RDF065p
GGI5
TIE(dssC)
RDF115a
RDF100c
RDF065p
The built model was used to predict the test set. The predictive results are given in Table 1. The plots of the predicted vs. residual and the experimental vs. predicted biological activities (pIC50) for the developed models are shown in Fig. 3a, b. The square correlation coefficient R 2 was obtained to be 0.7784 for the training set and 0.8835 for the test set and Root Mean Square Error (RMSE) were 0.1677 and 0.3312, respectively. The cross-validated correlation coefficient q 2 were 0.7767 and 0.8588 for training set and test set. This indicates that the obtained regression model has a good internal and external predictive power. The formulae used to calculate the q 2 and RMSE were given below:
1
0.1175 1
0.0207 - 0.1485 1
0.2244 0.1475 0.2136 1
0.5520 0.1408 0.0183 0.1965 1
2 ∑ni¼1 y exp ypre q ¼1 2 ∑ni¼1 y exp ymean
Fig. 1. Influences of the number of descriptors on: R, q2 and RMSE.
Table 2 The correlation coefficient matrix of the selected five descriptors.
GGI5 TIE(dssC) RDF115a RDF100c RDF065p
2
ð4Þ
X. Lu et al. / Chemometrics and Intelligent Laboratory Systems 114 (2012) 1–9
a
7
a
b b
Fig. 3. a. Distribution of the standardized residuals for training set and test set using PLS method. b. Experimental values and predicted values for training set and test set using PLS method.
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 u u y exp ypre t RMSE ¼ ∑ni¼1 n
Fig. 4. a. Distribution of the standardized residuals for the training set and test set using the ANN method. b. Experimental values and predicted values for the training set and test set using ANN method.
ð5Þ
In these equations, yexp is the desired to output, ypre is the predicted value by model, n is the number of the molecules in data set and ymean is the average of experiments value. 3.2.2. Artificial Neural Networks model The values of ANN inputs and outputs were normalized to a numerical value between 0.1 and 0.9. The training of this ANN has established a target error of 0.0001%; the maximum number of training cycles was established as 10,000, the learning rate was set at 0.01. In this study, the gradient-descent algorithm was used. The gradient-descent algorithm is usually used for minimizing the sum-of-squares error. The optimization was done with leave-one-out cross-validation process. The residual analysis was shown in Fig. 4a, and from Fig. 4b, we could see that the data predicted by the ANN were compared with the previously experimental data. The R 2 were 0.8485 for the training set and 0.9252 for the test set. The q 2 were obtained to be 0.8482 for the training set and 0.8666 for the test set. And the RMSE were 0.0383 and 0.2584, respectively. 3.2.3. Support Vector Machine model The SVM for regression model is affected by several parameters: kernel type K, which determines the sample distribution in the
mapping space and its corresponding parameter γ, capacity parameter C, and ε-insensitive loss function. The kernel function parameter γ greatly affects the number of support vectors, which has a close relation with the performance of the SVM and training time. In addition, the kernel function parameter γ controls the amplitude of the kernel function and therefore, controls the generalization ability of SVM. We have to optimize the kernel function parameter γ and find the optimal one. C is the penalty parameter, which is a regularized constant to determine the trade-off between maximizing the margin and minimizing the training error. If C is too small then insufficient stress will be placed on fitting the training data. On the contrary, if C is too large then the SVM model would over fit the training data set. The optimal value of the ε-insensitive loss function relies on the type of noise present in the data. Even if enough knowledge of the noise is available to select an optimal value for ε, there would always exist in some practical consideration of the number of resulting support vectors. Parameter ε-insensitive prevents the entire training set meeting boundary conditions and so allows for the possibility of sparsity in the dual formulation's solution. So, it is critical to choose the appropriate value for ε. In this study, the RBF kernel was used as kernel function. We performed leave-one-out cross-validation to select the optimum values of capacity parameter C, ε-insensitive loss function and the corresponding parameters γ of RBF kernel. The best choices for C, ε and
8
X. Lu et al. / Chemometrics and Intelligent Laboratory Systems 114 (2012) 1–9
γ, were fixed to 100, 0.25 and 0.07, respectively. The built model was used to predict the test set. The RMSE were 0.0190 for the training set, 0.0937 for the test set and the R 2 were 0.9706 and 0.9637, respectively. For the optimal model, the q 2 were 0.9672 and 0.9024. Distribution of the standardized residuals for the training set and test set with the SVM prediction model could be seen in Fig. 5a. Experimental and predicted values for the training set and test set with the SVM prediction model was plotted in Fig. 5b. 3.3. Evaluation of models
Table 3 Statistics Parameters for three regression models. PLS model Training set 2
R R02 R0′2 k k′ ðR2 R20 Þ 2 2R ′ 2
q2 > 0:5 R > 0:6 2 R2 R20 R 2 R′ 0 b 0:1 or b 0:1 R2 R2 ′ 0:85 ≤ k ≤ 1:15 or 0:85 ≤ k ≤ 1:15
Training set
SVM model Test set
Training set
Test set
0.7784 0.9994 0.9674 0.9789 0.8379
0.8835 0.9745 0.9476 1.1270 0.8218
0.8485 0.9999 0.9794 1.0076 0.8710
0.9254 0.9024 0.9165 1.2285 0.7750
0.9706 0.9967 0.9936 1.0512 0.9283
0.9631 0.9693 0.9642 1.1151 0.8528
-0.2839
- 0.1030
- 0.1784
0.0249
- 0.0269
- 0.0064
0
-0.2428
- 0.0726
- 0.1543
0.0100
- 0.0237
- 0.0011
RMSE q2
0.1677 0.7767
0.3312 0.8588
0.0383 0.8482
0.2584 0.8666
0.0190 0.9672
0.0937 0.9024
R R
The predictive power of the QSAR model can be estimated conventionally by an external q 2. A high value of q 2 is necessary but not sufficient for a model with high predictive power. In our paper, the QSAR model was validated by these statistical characteristics which were recommended by Tropsha [52].
ANN model Test set
R2
For the best QSAR models, the value for (R 2 − R02) / R 2 should be 0, value for R 2, R02 (R0′ 2) and k(k′) should be close to 1. In addition, terms of R 2, R02 (R0′ 2) must have similar values, (R 2 − R02) / R 2 and (R 2 − R0′ 2) / R 2 should be 0. From Table 3, it can be seen that the SVM model showed good statistical performance for all these criteria on both training and test set. It suggests that reliability of the SVM model is high compared with other predictive models. 4. Conclusions
a
In this study, Comparing PLS, ANN and SVM methods we can learn that the SVM model was proved to be the best QSAR model for the prediction of the 49 indole derivatives. Furthermore, the structure of descriptions which we selected were GGI5, TIE(dssC), RDF115a, RDF100c, and RDF065p, indicating that inhibitor activities might be related with global charge transfers, carbon atoms type linked benzyl sulfonamide and molecule geometrical the distance distribution. The proposed models might provide an insight into some instructions for further synthesis of indole derivatives. Acknowledgments This work was supported by the National Natural Science Foundation of China (no. 20927004, no. 21005063, no. 21165016, no. 21175108), the Natural Science Foundation of Gansu (no. 096RJZA121). References
b
Fig. 5. a. Distribution of the standardized residuals for the training set and test set using the SVM method. b. Experimental values and predicted values for the training set and test set using the SVM method.
[1] M.A. Gijón, D.M. Spencer, A.R. Siddiqi, J.V. Bonventre, C.C. Leslie, Cytosolic phospholipase A2 is required for macrophage arachidonic acid release by agonists that do and do not mobilize calcium, The Journal of Biological Chemistry 275 (2000) 20146–20156. [2] B.B. Rubin, G.P. Downey, A. Koh, N. Degousee, F. Ghomashchi, L. Nallan, E. Stefanski, D.W. Harkin, C. Sun, B.P. Smart, T.F. Lindsay, V. Cherepanov, E. Vachon, D. Kelvin, M. Sadilek, G.E. Brown, M.B. Yaffe, J. Plumb, S. Grinstein, M. Glogauer, M.H. Gelb, Cytosolic phospholipase A2-α is necessary for platelet-activating factor biosynthesis, efficient neutrophil-mediated bacterial killing, and the innate immune response to pulmonary infection, The Journal of Biological Chemistry 280 (2005) 7519–7529. [3] A. Sapirstein, J.V. Bonventre, Specific physiological roles of cytosolic phospholipase A2 as defined by gene knockouts, Biochimica et Biophysica Acta (BBA) — Molecular and Cell Biology of Lipids 1488 (2000) 139–148. [4] C. Miyaura, M. Inada, C. Matsumoto, T. Ohshiba, N. Uozumi, T. Shimizu, A. Ito, An essential role of cytosolic phospholipase A2α in prostaglandin E2-mediated bone resorption associated with inflammation, The Journal of Experimental Medicine 197 (2003) 1303–1310. [5] M. Hegen, L. Sun, N. Uozumi, K. Kume, M.E. Goad, C.L. Nickerson-Nutter, T. Shimizu, J.D. Clark, Cytosolic phospholipase A2α-deficient mice are resistant to collagen-induced arthritis, The Journal of Experimental Medicine 197 (2003) 1297–1302. [6] T. Nagase, N. Uozumi, S. Ishii, K. Kume, T. Izumi, Y. Ouchi, T. Shimizu, Acute lung injury by sepsis and acid aspiration: a key role for cytosolic phospholipase A2, Nature Immunology 1 (2000) 42–46. [7] K. Takaku, M. Sonoshita, N. Sasaki, N. Uozumi, Y. Doi, T. Shimizu, M.M. Taketo, Suppression of intestinal polyposis in Apc Δ 716 knockout mice by an additional mutation in the cytosolic phospholipase A2 gene, The Journal of Biological Chemistry 275 (2000) 34013–34016. [8] S. Marusic, M.W. Leach, J.W. Pelker, M.L. Azoitei, N. Uozumi, J. Cui, M.W.H. Shen, C.M. DeClercq, J.S. Miyashiro, B.A. Carito, P. Thakker, D.L. Simmons, J.P. Leonard, T. Shimizu,
X. Lu et al. / Chemometrics and Intelligent Laboratory Systems 114 (2012) 1–9
[9]
[10]
[11]
[12]
[13] [14]
[15]
[16]
[17]
[18]
[19]
[20]
[21] [22] [23]
[24]
[25]
[26] [27]
[28]
J.D. Clark, Cytosolic phospholipase A2α-deficient mice are resistant to experimental autoimmune encephalomyelitis, The Journal of Experimental Medicine 202 (2005) 841–851. Y. Oikawa, E. Yamato, F. Tashiro, M. Yamamoto, N. Uozumi, A. Shimada, T. Shimizu, J. Miyazaki, Protective role for cytosolic phospholipase A2[alpha] in autoimmune diabetes of mice, FEBS Letters 579 (2005) 3975–3978. N. Tai, K. Kuwabara, M. Kobayashi, K. Yamada, T. Ono, K. Seno, Y. Gahara, J. Ishizaki, Y. Hori, Cytosolic phospholipase A2 alpha inhibitor, pyrroxyphene, displays antiarthritic and anti-bone destructive action in a murine arthritis model, Inflammation Research 59 (2010) 53–62. K. Seno, T. Okuno, K. Nishi, Y. Murakami, F. Watanabe, T. Matsuura, M. Wada, Y. Fujii, M. Yamada, T. Ogawa, T. Okada, H. Hashizume, M. Kii, S.-i. Hara, S. Hagishita, S. Nakamoto, K. Yamada, Y. Chikazawa, M. Ueno, I. Teshirogi, T. Ono, M. Ohtani, Pyrrolidine inhibitors of human cytosolic phospholipase A2, Journal of Medicinal Chemistry 43 (2000) 1041–1044. S. Connolly, C. Bennion, S. Botterell, P.J. Croshaw, C. Hallam, K. Hardy, P. Hartopp, C.G. Jackson, S.J. King, L. Lawrence, A. Mete, D. Murray, D.H. Robinson, G.M. Smith, L. Stein, I. Walters, E. Wells, W.J. Withnall, Design and synthesis of a novel and potent series of inhibitors of cytosolic phospholipase A2 based on a 1,3-disubstituted propan-2-one skeleton, Journal of Medicinal Chemistry 45 (2002) 1348–1362. T. Shimizu, T. Ohto, Y. Kita, Cytosolic phospholipase A2: biochemical properties and physiological roles, IUBMB Life 58 (2006) 328–333. P.G. Polishchuk, E.N. Muratov, A.G. Artemenko, O.G. Kolumbin, N.N. Muratov, V.E. Kuz'min, Application of random forest approach to QSAR prediction of aquatic toxicity, Journal of Chemical Information and Modeling 49 (2009) 2481–2488. C.X. Xue, R.S. Zhang, M.C. Liu, Z.D. Hu, B.T. Fan, Study of the quantitative structuremobility relationship of carboxylic acids in capillary electrophoresis based on support vector machines, Journal of Chemical Information and Computer Sciences 44 (2004) 950–957. B.B. Lei, S. Li, L. Xi, J. Li, H. Liu, J.X. Yao, Novel approaches for retention time prediction of oligonucleotides in ion-pair reversed-phase high-performance liquid chromatography, Journal of Chromatography. A 1216 (2009) 4434–4439. R. Guha, P.C. Jurs, Development of linear, ensemble, and nonlinear models for the prediction and interpretation of the biological activity of a set of PDGFR inhibitors, Journal of Chemical Information and Computer Sciences 44 (2004) 2179–2189. V.D. Mouchlis, T.M. Mavromoustakos, G. Kokotos, Molecular docking and 3DQSAR CoMFA studies on indole inhibitors of GIIA secreted phospholipase A2, Journal of Chemical Information and Modeling 50 (2010) 1589–1601. G. Lin, G.-Y. Yu, QSAR for phospholipase A2 inhibitions by 1-acyloxy-3-N-noctylcarbamyl-benzenes, Bioorganic & Medicinal Chemistry Letters 15 (2005) 2405–2408. J.V. Tu, Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes, Journal of Clinical Epidemiology 49 (1996) 1225–1231. C. Cortes, V. Vapnik, Support-vector networks, Machine Learning 20 (1995) 273–297. M.T.D. Cronin, T.W. Schultz, Pitfalls in QSAR, Journal of Molecular Structure (THEOCHEM) 622 (2003) 39–51. A. Tropsha, P. Gramatica, V.K. Gombar, The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models, QSAR and Combinatorial Science 22 (2003) 69–77. K.L. Lee, M.L. Behnke, M.A. Foley, L. Chen, W. Wang, R. Vargas, J. Nunez, S. Tam, N. Mollova, X. Xu, M.W.H. Shen, M.K. Ramarao, D.G. Goodwin, C.L. Nickerson-Nutter, W.M. Abraham, C. Williams, J.D. Clark, J.C. McKew, Benzenesulfonamide indole inhibitors of cytosolic phospholipase A2[alpha]: optimization of in vitro potency and rat pharmacokinetics for oral efficacy, Bioorganic & Medicinal Chemistry 16 (2008) 1345–1358. L.K.L. Lee, M.A. Foley, L. Chen, M.L. Behnke, F.E. Lovering, S.J. Kirincich, W. Wang, J. Shim, S. Tam, M.W.H. Shen, S. Khor, X. Xu, D.G. Goodwin, M.K. Ramarao, C. Nickerson-Nutter, F. Donahue, M.S. Ku, J.D. Clark, J.C. McKew, Discovery of ecopladib, an indole inhibitor of cytosolic phospholipase Α2α, Journal of Medicinal Chemistry 50 (2007) 1380–1400. PreADME., http://preadmet.bmdrc.org. C. Steinbeck, C. Hoppe, S. Kuhn, M. Floris, R. Guha, E.L. Willighagen, Recent developments of the chemistry development kit (CDK) — an open-source java library for chemo- and bioinformatics, Current Pharmaceutical Design 12 (2006) 2111–2120. I. Tetko, J. Gasteiger, R. Todeschini, A. Mauri, D. Livingstone, P. Ertl, V. Palyulin, E. Radchenko, N. Zefirov, A. Makarenko, V. Tanchuk, V. Prokopenko, Virtual
[29] [30]
[31] [32] [33]
[34] [35]
[36] [37]
[38]
[39]
[40]
[41] [42] [43] [44] [45]
[46]
[47]
[48]
[49]
[50]
[51]
[52]
9
computational chemistry laboratory — design and description, Aided Molecular Design 19 (2005) 453–463. Hyper Chem Release7, Hyper Cube, Inc., [Online] available. http://www.hyper. com. Z.R. Li, L.Y. Han, Y. Xue, C.W. Yap, H. Li, L. Jiang, Y.Z. Chen, Model — molecular descriptor lab: a web-based server for computing structural and physicochemical features of compounds, Biotechnology and Bioengineering 97 (2007) 389–396. P. Bastien, V.E. Vinzi, M. Tenenhaus, PLS generalised linear regression, Computational Statistics and Data Analysis 48 (2005) 17–46. S. Wold, M. Sjöström, L. Eriksson, PLS-regression: a basic tool of chemometrics, Chemometrics and Intelligent Laboratory Systems 58 (2001) 109–130. N. Hernández, R. Kiralj, M.M.C. Ferreira, I. Talavera, Critical comparative analysis, validation and interpretation of SVM and PLS regression models in a QSAR study on HIV-1 protease inhibitors, Chemometrics and Intelligent Laboratory Systems 98 (2009) 65–77. W. McCulloch, W. Pitts, A logical calculus of the ideas immanent in nervous activity, Bulletin of Mathematical Biology 5 (1943) 115–133. Y. Zhang, H. Li, A. Hou, J. Havel, Artificial neural networks based on principal component analysis input selection for quantification in overlapped capillary electrophoresis peaks, Chemometrics and Intelligent Laboratory Systems 82 (2006) 165–175. L.S. Anker, P.C. Jurs, Prediction of carbon-13 nuclear magnetic resonance chemical shifts by artificial neural networks, Analytical Chemistry 64 (1992) 1157–1164. G. Astray, A. Cid, J.A. Ferreiro-Lage, J.F. Gálvez, J.C. Mejuto, O. Nieto-Faza, Prediction of prop-2-enoate polymer and styrene polymer glass transition using artificial neural networks, Journal of Chemical and Engineering Data 55 (2010) 5340–5346. N.A. Darwish, Application and performance of neural networks in the correlation of thermophysical properties of long-chain n-alkanes, Industrial and Engineering Chemistry Research 46 (2007) 4717–4725. M. Jalali-Heravi, Z. Garkani-Nejad, Prediction of electrophoretic mobilities of alkyl- and alkenylpyridines in capillary electrophoresis using artificial neural networks, Journal of Chromatography. A 971 (2002) 207–215. J. Cheng, D. Yu, Y. Yang, Application of support vector regression machines to the processing of end effects of hilbert-huang transform, Mechanical Systems and Signal Processing 21 (2007) 1197–1211. H. Li, Y. Liang, Q. Xu, Support vector machines and its applications in chemistry, Chemometrics and Intelligent Laboratory Systems 95 (2009) 188–198. C.J.C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery 2 (1998) 121–167. F.E.H. Tay, L.J. Cao, Modified support vector machines in financial time series forecasting, Neurocomputing 48 (2002) 847–861. R. Todeschini, V. Consonni, Handbook of Molecular Descriptors, Wiley-VCH Verlag GmbH, 2008, pp. 227–365. J. Galvez, R. Garcia, M.T. Salabert, R. Soler, Charge indexes. New topological descriptors, Journal of Chemical Information and Computer Sciences 34 (1994) 520–525. J. Galvez, R. Garcia-Domenech, J.V. de Julian-Ortiz, R. Soler, Topological approach to drug design, Journal of Chemical Information and Computer Sciences 35 (1995) 272–284. I. Ríos-Santamarina, R. García-Domenech, J. Gálvez, J. Cortijo, P. Santamaria, E. Morcillo, New bronchodilators selected by molecular topology, Bioorganic & Medicinal Chemistry Letters 8 (1998) 477–482. L.H. Hall, L.B. Kier, Electrotopological state indices for atom types: a novel combination of electronic, topological, and valence state information, Journal of Chemical Information and Computer Sciences 35 (1995) 1039–1045. M.C. Hemmer, V. Steinhauer, J. Gasteiger, Deriving the 3D structure of organic molecules from their infrared spectra, Vibrational Spectroscopy 19 (1995) 151–164. E. Papa, S. Kovarich, P. Gramatica, QSAR modeling and prediction of the endocrinedisrupting potencies of brominated flame retardants, Chemical Research in Toxicology 23 (2010) 946–954. R. Darnag, E.L. Mostapha Mazouz, A. Schmitzer, D. Villemin, A. Jarid, D. Cherqaoui, Support vector machines: development of QSAR models for predicting anti-HIV-1 activity of TIBO derivatives, European Journal of Medicinal Chemistry 45 (2010) 1590–1597. A. Golbraikh, A. Tropsha, Beware of q2! Journal of Molecular Graphics & Modelling 20 (2002) 269–276.