QSAR modeling of iNOS inhibitors based on a novel regression method: Multi-stage adaptive regression Wei Long, Jian Xiang, Hongying Wu, Weicheng Hu, Xiaodong Zhang, Jin Jin, Xin He, Xiu Shen, Zewei Zhou, Saijun Fan PII: DOI: Reference:
S0169-7439(13)00143-3 doi: 10.1016/j.chemolab.2013.07.011 CHEMOM 2689
To appear in:
Chemometrics and Intelligent Laboratory Systems
Received date: Revised date: Accepted date:
10 September 2012 10 July 2013 20 July 2013
Please cite this article as: Wei Long, Jian Xiang, Hongying Wu, Weicheng Hu, Xiaodong Zhang, Jin Jin, Xin He, Xiu Shen, Zewei Zhou, Saijun Fan, QSAR modeling of iNOS inhibitors based on a novel regression method: Multi-stage adaptive regression, Chemometrics and Intelligent Laboratory Systems (2013), doi: 10.1016/j.chemolab.2013.07.011
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT QSAR modeling of iNOS inhibitors based on a novel regression method: Multi-stage adaptive regression
IP
T
Wei Long1, Jian Xiang1, Hongying Wu1, Weicheng Hu2, 3, Xiaodong Zhang1, Jin Jin1, Xin He1, Xiu Shen1, Zewei Zhou1, Saijun Fan1* 1
NU
SC R
Institute of radiation medicine, Peking union medical college, Chinese academic of medical sciences, Tianjin 300192, PR China 2 Jiangsu Key Laboratory for Eco-Agricultural Biotechnology around Hongze Lake, School of Life Sciences, Huaiyin Normal University, Huaian 223300, PR China 3 Jiangsu Key Laboratory for Biomass-based Energy and Enzyme Technology, Huaiyin Normal University, Huaian 223300, PR China
D
MA
*Address for correspondence: Institute of radiation medicine, Peking union medical college, Chinese academic of medical science, No.238, Baidi Road, Nankai District, Tianjin 300192, China, E-mail:
[email protected] (S. Fan)
TE
ABSTRACT A novel regression method, multi-stage adaptive regression (MAR), was employed to build a quantitative structure-activity relationship (QSAR) model for
CE P
predicting iNOS inhibitory compounds. This model is based on descriptors which are calculated from the molecular structure. Six descriptors are selected from the pool of descriptors by best multiple linear regression (BMLR) method. The MAR method
AC
produced a good model with the square of correlation coefficient (R2) 0.92 and 0.86 for the training and test set, respectively. Meanwhile, a competing model was built by using BMLR. The results show that the MAR model has better predictive ability and more reliable than the BMLR model. This indicates that MAR could be a promising method in QSAR studies. KEYWORDS QSAR, regression method, iNOS, multi-stage adaptive regression
1. Introduction Nitric oxide (NO), acting as a biological signaling molecule, is produced from L-arginine by the Nitric Oxide Synthase (NOS) [1]. So far three isoforms of this enzyme have been identified in mammals, each associated with different
1
ACCEPTED MANUSCRIPT physiological functions: the neuronal NOS (nNOS) plays a role in neurotransmission and long-term potentiation, the endothelial NOS (eNOS) relaxes smooth muscle relaxation and vascular tone, and the inducible NOS (iNOS) is expressed and
IP
T
activated during immune response [2-4]. However, overproduction of NO by iNOS has been implicated in the pathogenesis of numerous diseases including septic shock,
SC R
asthma, inflammatory bowel disease, osteoarthritis, and rheumatoid arthritis [5]. Selective inhibition of iNOS would therefore be a useful therapy for such diseases, leading to reduction of inflammation, protection of the joint toward erosion, and
NU
possibly alleviation of the associated pain [6] and protection from radiation. So the development of potent, selective and safe iNOS inhibitors is highly desirable.
MA
Referring to the connection between the structure of molecules and their activity, QSAR has been popular in molecular design research. The advantage of this approach
D
lies in the fact that the descriptors used can be calculated from the structure alone and
TE
are not dependent on any experimental properties. Once the structure of a compound is known, any descriptor can be calculated no matter whether it is synthesized or not.
CE P
So once a reliable model is established, we can use the model to predict the property of compounds and to see which structural factors play an important role to the property. The main steps involved in QSAR include the following: data collection,
AC
molecular descriptor obtaining and selection, correlation model development, and finally model evaluation. Among them, descriptors selection and correlation modeling are the most important. Therefore, it is crucial for a successful QSAR study to select suitable methods for descriptors selection and correlation modeling [7]. In this study, the CODESSA program was used for the calculation of the descriptors. And then, the best multiple linear regression (BMLR) method was utilized to select the best set of descriptors. After that, Multi-stage Adaptive Regression (MAR), a novel regression method, was employed to establish quantitative relationships between the EC50 values and the selected descriptors. To validate the superiority of MAR method, we made a comparative model by using BMLR at the same time. Finally, the results of the two models were compared to draw a conclusion. Besides the new methodology application, this paper presented a significant model for 2
ACCEPTED MANUSCRIPT predicting the activities of iNOS inhibitors, and molecular structural modification information for improving the bioactivity of iNOS inhibitory compounds. It may be very useful to illuminate the iNOS inhibitory effect mechanism of compounds and it
IP
T
also could help us to facilitate the design of new drugs toward iNOS related diseases.
SC R
2. Methods 2.1 Dataset
In this study, all the 52 compounds and their inhibition activity towards iNOS were
NU
taken from the literature [8, 9]. A complete list of the compounds structures and their
MA
corresponding experimental EC50 is given in Table S1.
2.2 Descriptors generation
D
In order to obtain a QSAR model, compounds were always represented by the
TE
molecular descriptors. The calculation process of the molecular descriptors is described as following: all molecules were drawn and pre-optimized using molecular
CE P
mechanics force fields MM+ encoded in the HyperChem 7.0 program [10] to generate their 3D conformations. A more precise optimization was done by semi-empirical AM1 method in MOPAC 6.0, [11] with the calculation of the properties of
AC
compounds, simultaneously. The resulting geometries formed the inputs for the CODESSA software [12] to calculate constitutional, topological, geometrical, electrostatic and quantum chemical descriptors. Constitutional descriptors are related to the number of atoms and bonds in each molecule. Topological descriptors include valence and non-valence molecular connectivity indices calculated from the hydrogen-suppressed formula of the molecule, encoding information about the size, composition, and the degree of branching of a molecule. The topological descriptors describe the atomic connectivity in the molecule. The geometrical descriptors describe the size of the molecule and require 3D-coordinates of the atoms in the given molecule. The electrostatic descriptors reflect characteristics of the charge distribution of the molecule. The quantum chemical descriptors offer information about binding and formation energies, partial atom charge, dipole moment, and molecular orbital 3
ACCEPTED MANUSCRIPT energy levels [13-15].
2.3 Descriptors selection
IP
T
Successful QSAR depends on rational selection of descriptors. If molecular structures are represented by improper descriptors, they will not lead to reasonable predictions.
SC R
The BMLR method in CODESSA is a very useful tool in searching the best set of descriptors for multi-linear correlations. It has several advantages, such as its ease of implementation, the interpretability of the resulting equations, and offers a more
NU
systematic and thorough search of preferred descriptors. In BMLR, the number of orthogonal descriptors in the model was incrementally added up to the optimum as
MA
determined by the Fisher criterion at a given probability level and the cross-validated correlation coefficient. The model obtained with this procedure was expected to yield
D
maximum predictive ability. A stepwise addition of further descriptor scales was
TE
performed to find the best multi-parameter regression models with the optimum values of statistical criteria (highest values of square of correlation coefficient (R2),
CE P
the cross-validated Rcv2, and the F value). The influence of the dimension of the model on its prediction capability was tested by the leave-one-out cross-validation procedure. The BMLR procedure correlations are usually much faster than other
AC
methods including the heuristic method with comparable quality. The strategy of multi-parameter regression with the maximum predicting ability searches of BMLR was described in detail in ref [16]. In this study, a total of 587 descriptors were obtained, then BMLR method was used to find the best descriptors set for modeling with the optimum values of statistical criteria (highest values of R2, means error, and the F-value).
2.4 Multi-stage adaptive regression (MAR) Multi-stage adaptive regression is a novel regression method based on multiple regression, polynomial regression and adaptive algorism. The structure and design of this method was shown in Figure 1 (Figure 1 is here). The original descriptors (Xn) will be adaptively treated to generate munificent new descriptors, such as Xn2, Xn3, 4
ACCEPTED MANUSCRIPT sinXn, logXn, et al, so as to structure a pool of input factors. These newly generated factors will be selected by global searching to form the first-stage model with multiple regression applied. Multiple regression [17], a classical regression method, attempts
IP
T
to model the relationship between two or more explanatory variables and a response variable by fitting an equation to observed data. In multiple regression, more than one
SC R
variable is used to predict the criterion. Every value of the independent variable x is associated with a value of the dependent variable y. The population regression line for p explanatory variables x1, x2, ... , xp is defined to be μy = β0 + β1x1 + β2x2 + ...
NU
+ βpxp. This line describes how the mean response μy changes with the explanatory variables. After that, with the first-stage model treated as a single independent
MA
variable, the second-stage model will be constructed by polynomial regression [18][19] in an adaptive optimization mode. Polynomial regression fits data to this
TE
D
equation:
You can include any number of terms. If you stop at the second (B) term, it is called a
CE P
first-order polynomial equation, which is identical to the equation for a straight line. If you stop after the third (C) term, it is called a second-order, or quadratic, equation. If you stop after the fourth term, it is called a third-order, or cubic, equation.If you
AC
choose a second, or higher, order equation, the graph of Y vs. X will be curved (depending on your choice of A, B, C…). Nonetheless, the polynomial equation is not strictly a nonlinear equation. Holding X and the other parameters constant, a graph of any parameter (A, B, C…) vs. Y would be linear. From a mathematical point of view, the polynomial equation is linear. This means little to scientists, but it means a lot to a mathematician, because it is quite easy to write a program to fit data to linear equations. Because polynomial regression is related to linear regression, you don't have to enter any initial values. But there is a fundamental problem with polynomial regression: Few biological or chemical models are described by polynomial equations. This means that the best-fit results can rarely be interpreted in terms of biology or
5
ACCEPTED MANUSCRIPT chemistry. Polynomial regression can be useful to create a standard curve for interpolation, or to create a smooth curve for graphing. In MAR, adaptive algorithm [20] is often used. Adaptive algorithm
is
IP
T
an algorithm that changes its behavior based on the resources available. For example, stable partition, using no additional memory is O(n lg n) but given O(n)
SC R
memory, it can be O(n) in time. As implemented by the C++ Standard Library, stable partition is adaptive and so it acquires as much memory as it can get (up to what it would need at most) and applies the algorithm using that available memory. In order
NU
to implement the design of MAR in a convenient toolbox, we created a piece of software by using Visual C++, named Multi-stage Adaptive Regression, containing all
MA
the necessary steps and functions of MAR. The GUI of this software is shown in
D
Figure S1.
TE
2.5 Model validation
We carried out external validation by randomly splitting the dataset into training and
CE P
validation set in a proportion of 70: 30. Finally, 15 compounds were selected out to constitute the test set, and these compounds were not involved by any means in the training procedure. To assess the robustness, reliability and predictive activity of the
AC
model, the following statistical criteria were employed in this paper. The most common objective criteria used for assess the success of a QSAR model are the squared correlation coefficient R2 and the Root Mean Squared Error (RMSE) statistic which are defined as followed [21]:
Then, the efficiency and stability of QSAR models was estimated using the Leave-One-Out (LOO) cross-validation correlation for the training set. In light of Tropsha’s criteria [22], the predictive ability of a QSAR model should be tested on an external set of data that has not been taken into account during the process of 6
ACCEPTED MANUSCRIPT developing the model. Aside from the R2pred, the following statistical indices have
SC R
IP
T
been proposed to assess the predictive power of a QSAR model:
NU
,
where
MA
In these equations ntest is the number of compounds that constitute the validation data set, ȳ tr is the averaged value for the dependent variable for the training set, yi, ỹ,
D
i=1, …, ntest are the measured values and the QSAR model predictions of the is the average over all ỹ,
TE
dependent variable over the available validation set and
i=1, …, ntest. It will be considered a QSAR model predictive, if the following
AC
CE P
conditions are satisfied:
R2ext>0.5; R2pred>0.6; (R2pred- R20)/ R2pred<0.1; 0.85≤k≤1.15.
2.6 Y-randomization Y-randomization test, ensuring the robustness and the statistical significance of a QSAR study, was also used to validate the MAR model. It consists of repeating the calculation procedure several times after shuffling the Y vector randomly. The derived models after several repetitions are expected to have less significant correlation coefficient values than the ones of the original model. If all models obtained by the Y-randomization test have relatively high values for R2 statistics, this is due to a chance correlation and implies that the current modeling method cannot lead to an acceptable model using the available data set [23]. 7
ACCEPTED MANUSCRIPT
2.7 Domain of application QSAR models used for prediction purposes require an estimation of applicability
T
domain (AD). The Organisation for Economic Cooperation and Development (OECD)
IP
QSAR Validation Principles [24] states that: “The applicability domain (AD) of a
SC R
(Q)SAR is the physico-chemical, structural, or biological space, knowledge or information on which the training set of the model has been developed, and for which it is applicable to make predictions for new compounds. The AD of a (Q)SAR should
NU
be described in terms of the most relevant parameters i.e. usually those that are descriptors of the model.” The goal of AD approach is to estimate the prediction
MA
accuracy of a new unknown compound, regardless of our naive interpretation of its similarity to the molecules from the training and test sets used to construct and
D
validate the model [25].
TE
In this work, similarity measurements were used to define the domain of applicability of the models based on the Euclidean distances among all training
CE P
compounds and the test compounds. The distance of a test compound to its nearest neighbor in the training set was compared to the predefined applicability domain (APD) threshold. The prediction was considered unreliable when the distance was
AC
higher than APD [26]. APD was calculated as follows: APD=
+Zσ
Calculation of and σ was perfomed as follows: First, the average of Euclidean distances between all pairs of training compounds was calculated. Next, the set of distances that were lower than the average was formulated. and σ were finally calculated as the average and standard deviation of all distances included in this set. Z was an empirical cutoff value and for this work, it was chosen equal to 0.5 [21].
3. Results 3.1 Results of BMLR Five hundred and eighty seven descriptors for all the compound structures were calculated by using CODESSA program. In order to select the descriptors responsible 8
ACCEPTED MANUSCRIPT for the iNOS inhibitory activity, the stepwise regression was performed by BMLR. After the initial reduction, the pool of descriptors was reduced to 320. Then, six-descriptor model was selected as the best parameter set. This optimum number of
IP
T
descriptor was determined by using a simple “breaking point” rule. A variety of subset sizes were investigated to determine the optimum number of the descriptors in the
SC R
model. The optimum number, the so called “breaking point”, is determined when the addition of a descriptor does not improve the statistics of a model as significantly as before. In the plot of number of descriptors involved versus R2 (Figure 2, Figure 2 is
NU
here), we can find two breaking points. Although the first one seems to be more significant, we choose the second one as the best breaking point finally, in light of the
MA
criterion that an increase in the value of R2 closing to 0.02 should be chosen as the best breaking point. Using the six descriptors set as input, a linear QSAR model was
D
built by BMLR. The predicted pEC50 values of the compounds were list in Table S2.
TE
The selected descriptors with their chemical mining, and the details of the six-parameter model, were presented in Table 1 and Table 2.
Descriptor
Chemical Meaning
Coefficient
t-test
Intercept Number of F atoms Max total interaction for a C-O bond FNSA-2 fractional PNSA Min e-n attraction for a N atom Number of occupied electronic levels Gravitation index
95.9 1.15 -1.55 -4.77 -0.13 -8.45 -0.0027
6.34 6.05 -2.14 -7.02 -5.07 -3.71 -2.98
AC
Constant NF Etot(CO) FNSA2 Ene,min(N) Nocc G
CE P
Table 1 Involved molecular descriptors with their corresponding chemical meaning and the coefficient of each descriptor with their t-test values in the BMLR model.
Table 2 Comparison of R2, RLOO2and RMSE for BMLR and MAR models. Training set 2
Method
R
BMLR MAR
0.83 0.92
tr
Test set RLOO 0.72 0.84
2
RMSE
R2pr
RMSE
0.33 0.23
0.79 0.86
0.40 0.34
3.2 Results of MAR The six selected descriptors were put into the software “Multi-stage Adaptive 9
ACCEPTED MANUSCRIPT Regression”, so as to get a better model for predicting the iNOS inhibitory of the compounds. As a necessary step, a first-stage model was built. In this process, five power indexes were generated adaptively by the program to make a best model. The
1.59 -0.185 -
X3
0.0186 -3.77×10-6 -
X4
4
6.89×10 -6.31×104 2.56×104 -3.90×104 -
4
-4.93×10 -5.65×104 -3.22×104 -9.20×104 -1.04×103
X5
X6
484 -3.17 8.52×10-3 -8.19×10-6 -
-2.39 -
MA
X X2 X3 X4 X5
X2
NU
X1 1
SC R
Table 3 The condition of parameters in the first-stage model
IP
T
details of these parameters were shown in Table 3.
For the next step, the first model obtained was input into the module of adaptive optimization, as a single whole factor x, to build the second-stage model, which is
D
shown as followed:
TE
ln(y)=-78+131x-89.3x2+31.8x3-6.25x4
CE P
Where, y means the pEC50 values. The predicted pEC50 values of the compounds were list in Table S2 and the statistics results of the last model are given in Table 2.
AC
4. Discussion
4.1 Comparison between the results of MAR and BMLR In order to check the relative effectiveness of the two models, MAR and BMLR, we compared the predicted accuracy of the two methods for the training and test sets. Moreover, leave-one-out cross-validation was employed to estimate the reliability of the two models. As it can be seen from Table 2, compared with the linear BMLR model, the predicted accuracy of the MAR model was improved, with the R2 value rising from 0.83 to 0.92, 0.79 to 0.86 for the training and test set, respectively. Furthermore, from the LOO cross-validation results, 0.84 compared with 0.72, it is significant that the reliability of MAR method surpass BMLR a lot. We made a spot chart with the predicted data of the two models in Figure 3 (Figure 3 is here), in order to compare their goodness-of-fit and prediction accuracy. In the graph, the spots of 10
ACCEPTED MANUSCRIPT MAR are obviously closer to the diagonal line than the BMLR ones, which means MAR made a better prediction. Based on the above, the new regression method MAR shows a better predictive capability and reliability in this study, and the corresponding
IP
T
predicted results indicate an appropriate fit of the model.
SC R
4.2 Discussion of descriptors
The developed QSAR models should not only offer a reliable prediction capability but also gain some insight into the factors that are likely to influence the iNOS inhibitory
(1) Number of occupied electronic levels
NU
effect by interpreting the meaning of the selected descriptors.
MA
Number of occupied electronic levels [27], a quantum chemical descriptor, reflects the distribution of electron density within a molecule. It is generally deemed that the
D
higher the number of the electronic shell, the more polarizable the molecule. Based on
TE
the BMLR model, this descriptor has the most significant negative regression coefficient resulting in a decrease of EC50 values. It indicates that the higher the
CE P
number of occupied electronic levels of the compound, the lower the iNOS inhibitory acitivity. In another word, the polarization of the molecule is not favorable for the improvement of the iNOS inhibitory effect.
AC
(2) FNSA-2 fractional PNSA FNSA-2 fractional PNSA is a charged partial surface area descriptor [28, 29], which can be expressed as followed:
FNSA 2
PNSA 2 TMSA
Where FNSA-2 represents fractional total charge weighted partial negative surface area, PNSA-2 represents total charge weighted partial negatively charged surface area, and TMSA means total molecular surface area. This is another significant negative coefficient in BMLR model, which indicates negative surface on the molecule will not benefit the inhibitory effect of a compound towards iNOS. (3) Max total interaction for a C-O bond Maximum total interaction energy between C and O atoms in the molecule, defined as
11
ACCEPTED MANUSCRIPT follows: Etot (CO ) Ec (CO ) Eexc (CO )
T
Where Ec(CO) is the electrostatic interaction energy, and Eexc(CO) is electronic
IP
exchange energy between C and O atoms in C-O bonds. This descriptor leads to a
SC R
decrease of iNOS inhibitory effect when it has a higher value. To calculate this descriptor will also be helpful for designing better inhibitors of iNOS. (4) Number of F atoms
NU
This is a simple constitutional descriptor, meaning the number of F atoms in the compound. Moreover, it is the only positive coefficient in the BMLR model.
MA
Therefore, more F atoms in the molecule may result in higher activity for an iNOS inhibitory compound.
(5) Min e-n attraction for a N atom
D
The descriptor represents the minimum energy of the nuclear-electron attraction
TE
energy for a N atomic in the molecule, calculated as follows [30]:
Ene,min ( N )
P
Z B / RiB
CE P
B , N
The first summation is performed over all atomic nuclei in the molecule and Z B / RiB
denotes the nuclear-electron attraction integrals on the given atomic
AC
basis. This energy describes the nuclear-electron attraction driven processes in the molecule and may again be related to the conformational (rotational, inversional) changes or atomic reactivity in the molecule. This descriptor indicated that N atom is another important factor that influences the iNOS inhibitory effect of a compound. (6) Gravitation index It is a geometrical descriptor and defined as follows [31]:
G
N
mi m j
(i j )
r 2i j
where mi and mj are the atomic weights of atoms i and j, rij is the interatomic distance, N is the number of atoms or bonds in the molecule. Although the coefficient of this descriptor is very trifling in the BMLR model, it was selected out as a necessary factor, after all, to hint that the geometrical property, gravitation, is also important for 12
ACCEPTED MANUSCRIPT an iNOS inhibitor.
4.3 Validation of MAR methodology
IP
T
To demonstrate the accuracy, significance, robustness and the absence of chance correlations of the produced MAR model, validation of the model was performed
SC R
using the strategies mentioned in the Methods section. The statistics are presented in Table 2. As can be seen, MAR methodology results in a robust and accurate model that could be reliably used to predict the bioactivities of iNOS inhibitors. Figure 3
NU
presents a plot of experimental versus predicted values of iNOS inhibition for compounds in the training and test set. The MAR model also passed Tropsha's
MA
recommended tests [32] for predictive ability:
R2pred=0.86>0.6;
D
R2ext=>0.5;
TE
(R2pred- R20)/ R2pred=0.0023<0.1; (R2pred- R’20)/ R2pred=0.00035<0.1;
CE P
| R20- R’20|=0.0017<0.1 K=0.996≈1 K’=1.004≈1
AC
Except for the concern of the generalizability, the high internal validation performance of the MAR model might be a result of chance correlation. To address this problem, the model was validated by applying the Y-randomization of the experimental activity values. This method was performed to eliminate the possibility of chance correlation. In the results, 10 random shuffles of the Y vector correlation coefficient values were in the ranges of 0.02 to 0.28. The low values of the correlation coefficient indicated that the results from the MAR model were not due to chance correlation. The applicability domain was defined for all compounds that constituted the training set as described in the Methods section. The applicability domain APD value for training compounds was equal to 4.72. In the case of similarity measurements, all compounds in the test set had values in the range of 0.02-3.7, which means all the 13
ACCEPTED MANUSCRIPT predictions for the compounds in the test set can be considered reliable.
5. Conclusion
IP
T
This study focused on the development of QSAR models for predicting the iNOS inhibitory effect of a class of compounds by using a novel method, MAR. At the same
SC R
time, a comparative model was built by using BMLR. The results show that the MAR model has better predictive ability and more reliable than the BMLR model. This indicates that MAR is a promising method in QSAR studies. Besides, a significant
NU
model was presented for predicting iNOS inhibitory effect, and molecular structural information was discussed for improving the activities of iNOS inhibitors. The work
MA
was expected to give help on the development of new drugs against disease related to iNOS. Additionally, MAR is a newborn method, and more evidential studies will be
Acknowledgement
TE
D
carried out to prove its potential worth in the field of QSAR/QSPR research.
CE P
This work was under the financial support provided by National Natural Science Foundation of China (No. 81202153), PUMC Youth Fund and the Fundamental Research Funds for the Central Universities (No. 3332013104), Doctoral Program of
AC
Higher Education of China (No. 20121106120042) and Development Fund of Institute of Radiation Medicine, Chinese Academy of Medical Sciences (No. SF1227, NO.SZ1337).
References and notes [1] B. Brune, Nitric oxide: a short lived molecule stays alive, Pharmacol Res, 61 (2010) 265-268. [2] K. Bian, F. Murad, Nitric oxide (NO)--biogeneration, regulation, and relevance to human diseases, Front Biosci, 8 (2003) d264-278. [3] T. Thippeswamy, J.S. McKay, J.P. Quinn, R. Morris, Nitric oxide, a biological double-faced janus--is this good or bad?, Histol Histopathol, 21 (2006) 445-458. [4] S. Mariotto, M. Menegazzi, H. Suzuki, Biochemical aspects of nitric oxide, Curr Pharm Des, 10 (2004) 1627-1645. [5] A.J. Duncan, S.J. Heales, Nitric oxide and neurological disorders, Mol Aspects Med, 26 (2005) 67-96. [6] S. Connolly, A. Aberg, A. Arvai, H.G. Beaton, D.R. Cheshire, A.R. Cook, S. Cooper, D. Cox, P. Hamley, P. Mallinder, I. Millichip, D.J. Nicholls, R.J. Rosenfeld, S.A. St-Gallay, J. Tainer, A.C. Tinker, 14
ACCEPTED MANUSCRIPT A.V. Wallace, 2-aminopyridines as highly selective inducible nitric oxide synthase inhibitors. Differential binding modes dependent on nitrogen substitution, J Med Chem, 47 (2004) 3320-3323. [7] W. Long, P. Liu, Quantitative structure activity relationship modeling for predicting radiosensitization effectiveness of nitroimidazole compounds, J Radiat Res, 51 (2010) 563-572.
T
[8] C. Bonnefous, J.E. Payne, J. Roppe, H. Zhuang, X. Chen, K.T. Symons, P.M. Nguyen, M. Sablad, N. Rozenkrants, Y. Zhang, L. Wang, D. Severance, J.P. Walsh, N. Yazdani, A.K. Shiau, S.A. Noble, P.
IP
Rix, T.S. Rao, C.A. Hassig, N.D. Smith, Discovery of inducible nitric oxide synthase (iNOS) inhibitor development candidate KD7332, part 1: Identification of a novel, potent, and selective series of
SC R
quinolinone iNOS dimerization inhibitors that are orally active in rodent pain models, J Med Chem, 52 (2009) 3047-3062.
[9] S.G. Duron, A. Lindstrom, C. Bonnefous, H. Zhang, X. Chen, K.T. Symons, M. Sablad, N. Rozenkrants, Y. Zhang, L. Wang, N. Yazdani, A.K. Shiau, S.A. Noble, P. Rix, T.S. Rao, C.A. Hassig,
NU
N.D. Smith, Heteroaromatic-aminomethyl quinolones: potent and selective iNOS inhibitors, Bioorg Med Chem Lett, 22 (2012) 1237-1241. [10] HyperChem 7.0, Version 2.7.10.
MA
[11] MOPAC 6.0, Quantum Chemistry Program Exchange, QCPE No.455, Indiana University, Bloomington, IN, 1989.
[12] CODESSA. Comprehensive Descriptors for Structural and Statistical Analysis, Version 2.7.10. [13] M. Karelson, Molecular Descriptors in QSAR/QSPR, John Wiley & Sons, New York, 2000.
D
[14] R. Todeschini, V. Consonni, Handbook of Molecular Descriptors, Wiley-VCH, Weinheim,
TE
Germany, 2000.
[15] J.A. Devillers, T. Balaban, Topological Indices and Related Descriptors in QSAR and QSPR, Gordon and Breach, Amsterdam, The Netherlands, 1999.
CE P
[16] Katritzky A. R., Lobanov V. S., Karelson M. Comprehensive Descriptors for Structural and Statistical Analysis, Reference Manual, Version 2.7.10. [17] J.M. Luco, F.H. Ferretti, QSAR based on multiple linear regression and PLS methods for the anti-HIV activity of a large group of HEPT derivatives, J Chem Inf Comput Sci, 37 (1997) 392-401.
AC
[18] Kleinbaum D. G., et al. Applied Regression Analysis and Other Multivariable Methods (3rd edition). 1998, Duxbury Press. [19] Armitage P. and G. Berry, Statistical Methods in Medical Research (3rd edition). 1994, Blackwell. [20] Wikipedia, the free encypedia, http://en.wikipedia.org/wiki/Adaptive_algorithm [21] G. Melagraki, A. Afantitis, Enalos KNIME nodes: Exploring corrosion inhibition of steel in acidic medium, Chemometrics and Intelligent Laboratory Systems, (2013). [22] A. Tropsha, Best practices for QSAR model development, validation, and exploitation, Molecular Informatics, 29 (2010) 476-488. [23] A. Afantitis, G. Melagraki, H. Sarimveis, P.A. Koutentis, O. Igglessi-Markopoulou, G. Kollias, A combined LS-SVM & MLR QSAR workflow for predicting the inhibition of CXCR3 receptor by quinazolinone analogs, Molecular diversity, 14 (2010) 225-235. [24] OECD, Principles for the Validation, for Regulatory Purposes, of (Quantitative) Structure–Activity Relationship Models, OECD, Paris, France, 2004. [25] Š. Župerl, S. Fornasaro, M. Novič, S. Passamonti, Experimental determination and prediction of bilitranslocase transport activity, Analytica chimica acta, 705 (2011) 322-333. [26] A. Afantitis, G. Melagraki, P.A. Koutentis, H. Sarimveis, G. Kollias, Ligand-based virtual screening procedure for the prediction and the identification of novel< i> β-amyloid aggregation
15
ACCEPTED MANUSCRIPT inhibitors using Kohonen maps and Counterpropagation Artificial Neural Networks, European Journal of Medicinal Chemistry, 46 (2011) 497-508. [27] A.R. Katritzky, A.A. Oliferenko, P.V. Oliferenko, R. Petrukhin, D.B. Tatham, U. Maran, A. Lomaka, W.E. Acree, Jr., A general treatment of solubility. 1. The QSPR correlation of solvation free
T
energies of single solutes in series of solvents, J Chem Inf Comput Sci, 43 (2003) 1794-1805. [28] A.R. Katritzky, A.A. Oliferenko, P.V. Oliferenko, R. Petrukhin, D.B. Tatham, U. Maran, A.
IP
Lomaka, W.E. Acree, Jr., A general treatment of solubility. 2. QSPR prediction of free energies of solvation of specified solutes in ranges of solvents, J Chem Inf Comput Sci, 43 (2003) 1806-1814.
SC R
[29] D.T. Stanton, P.C. Jurs, Development and use of charged partial surface area structural descriptors in computer assissted quantitative structure property relationship studies., Anal. Chem., 62 (1990) 2323-2322.
[30] Clementi E., Computational Aspects of Large Chemical Systems, 1980, Springer Verlag, New
NU
York.
[31] A.R. Katritzky, L. Mu, V.S. Lobanov, Correlation of Boiling Points with Molecular Structure. 1. A Training Set of 298 Diverse Organics and a Test Set of 9 Simple Inorganics, J. Phys. Chem., 100 (1996)
MA
10400-10407.
[32] A. Golbraikh, M. Shen, Z. Xiao, Y.-D. Xiao, K.-H. Lee, A. Tropsha, Rational selection of training and test sets for the development of validated QSAR models, Journal of computer-aided molecular
AC
CE P
TE
D
design, 17 (2003) 241-253.
16
AC
CE P
TE
D
MA
NU
SC R
IP
T
ACCEPTED MANUSCRIPT
Figure 1
The workflow of the multi-stage adaptive regression
17
AC
CE P
TE
D
MA
NU
SC R
IP
T
ACCEPTED MANUSCRIPT
Figure 2
Influences of the number of descriptors on the correlation coefficient (R2)
18
AC
CE P
TE
D
MA
NU
SC R
IP
T
ACCEPTED MANUSCRIPT
Figure 3 Plots of predicted pEC50 versus experimental values for the compounds in the training and test set by BMLR and by MAR.
19
ACCEPTED MANUSCRIPT Highlights:
T
1. This paper presented a significant model for predicting the activities of iNOS inhibitors, and molecular structural modification information for improving the bioactivity of iNOS inhibitory compounds.
AC
CE P
TE
D
MA
NU
SC R
IP
2. A novel regression method, Multi-stage adaptive regression (MAR), for QSAR was applied. MAR is an original method for the first use in QSAR study. The results showed that it is a promising method and worth spreading.
20