Ecotoxicology and Environmental Safety 153 (2018) 175–180
Contents lists available at ScienceDirect
Ecotoxicology and Environmental Safety journal homepage: www.elsevier.com/locate/ecoenv
In silico study toward the identification of new and safe potential inhibitors of photosynthetic electron transport Taisa Pereira Piacentini Ribeiroa, Flávia Giovana Manarinb, Eduardo Borges de Meloa, a b
T
⁎
Dept. of Pharmacy, Western Paraná State University, UNIOESTE, Cascavel, PR, Brazil Dept. of Chemistry, Western Paraná State University, UNIOESTE, Toledo, PR, Brazil
A R T I C L E I N F O
A B S T R A C T
Keywords: Quantitative structure-activity relationship Virtual screening In silico toxicology New herbicides Quinolines Naphthalenes
To address the rising global demand for food, it is necessary to search for new herbicides that can control resistant weeds. We performed a 2D-quantitative structure-activity relationship (QSAR) study to predict compounds with photosynthesis-inhibitory activity. A data set of 44 compounds (quinolines and naphthalenes), which are described as photosynthetic electron transport (PET) inhibitors, was used. The obtained model was approved in internal and external validation tests. 2D Similarity-based virtual screening was performed and 64 compounds were selected from the ZINC database. By using the VEGA QSAR software, 48 compounds were shown to have potential toxic effects (mutagenicity and carcinogenicity). Therefore, the model was also tested using a set of 16 molecules obtained by a similarity search of the ZINC database. Six compounds showed good predicted inhibition of PET. The obtained model shows potential utility in the design of new PET inhibitors, and the hit compounds found by virtual screening are novel bicyclic scaffolds of this class.
1. Introduction Agricultural pests and weeds interfere with crop yields. When used in no-tillage practices, chemical agents with herbicidal activity contribute to reduction in soil erosion and increase in nutrient flow, and assist in water conservation. The use of chemical agents requires less labor than mechanical control methods. Therefore, there has been an increase in the annual growth rate of global herbicide market (Green, 2014). However, a highly intensive nature in the use of these substances resulted in the widespread pollution of pesticides in the environment. Added to this is the fact that pesticides have been shown to cause various kinds of organ toxicities, including some types of cancers (Wasi et al., 2013). Among herbicidal agents, those that act as photosystem II (PSII) inhibitors are the most commonly used agents in agriculture. These agents inhibit the photochemical phase of photosynthesis and consequently NADPH and ATP production, leading to the interruption of carbon fixation by plants. Because electrons cannot store chemical energy, they form free radicals, which lead to lipid peroxidation of the membrane, resulting in necrosis and death of weeds (Oliveira et al., 2011; Hess, 2000). Several herbicides that act by this mechanism, for instance, atrazine, diuron, and metribuzin, are currently available. However, environmental and safety issues, which have resulted in the discontinuation of some herbicides, and the evolution of herbicide-
⁎
resistant weeds, combined with the fact that no new herbicides exhibiting beneficial effects via new mechanisms of action have been available in the last few decades, have led to the need for development of new chemical agents as herbicides (Duke, 2012). In medicinal chemistry, the concept of privileged structures refers to the idea that certain structural features produce biological effects more often than others (Polanski et al., 2012). These structures include quinoline (Hussaini, 2016) and naphthalene (Horton et al., 2003) scaffolds. Computer-aided molecular design tools are currently very important in the rational designing of new biologically active chemicals and can be used for the development of new molecules based on privileged structures. Among these tools, quantitative structure-activity relationship (QSAR) model describes how a given biological activity can vary as a function of molecular structure in a set of chemical compounds (Csizmadia and Enriz, 2000). Although these tools are widely used in drug development and in several studies on environmental toxicology, they are still little explored in the development of new herbicidal agents. QSAR models can also be used to predict the response of new herbicide candidates (González et al., 2003; Zuo et al., 2016; Sharma, 2016). Therefore, we performed a multivariate QSAR study based on a set of 44 derivatives (Musiol et al., 2007; Gonec et al., 2013), with the objective of obtaining models that can be helpful as support tools for designing new PSII inhibitors. A 2D similarity-based virtual screening
Correspondence to: Theoretical Medicinal and Environmental Chemistry Laboratory (LQMAT), Dept. of Pharmacy, 2069 Universitária St, 85819110, Cascavel, PR, Brazil. E-mail address:
[email protected] (E. Borges de Melo).
https://doi.org/10.1016/j.ecoenv.2018.02.016 Received 5 November 2017; Received in revised form 31 January 2018; Accepted 2 February 2018 0147-6513/ © 2018 Elsevier Inc. All rights reserved.
Ecotoxicology and Environmental Safety 153 (2018) 175–180
T.P.P. Ribeiro et al.
value; and (v) descriptors with correlation to another descriptor larger than or equal to 0.90. A manual reduction was also performed to remove variables that still showed minor variation. The final reduction step was performed using the QSAR modeling software LQTA-QSAR (Martins and Ferreira, 2009) (http://lqta.iqm.unicamp.br), wherein descriptors that had absolute correlation with biological activity (|r|) below 0.2 and did not have relevant information for model construction were excluded. In the end, a matrix with 337 descriptors was obtained.
Fig. 1. Basic structure of quinolines and naphthalenes used as data set.
2.3. Variable selection and construction of models
was also performed, with the objective of identifying a related scaffold for the future synthesis of new derivatives.
In QSAR study, the variable selection process is usually conducted in an automated manner owing to the large number of descriptors available. This was done in the QSAR modeling software using ordered predictors selection (OPS) (Teófilo et al., 2009). This method uses partial least squares regression (PLS) (Liu and Long, 2009) to assign importance to each descriptor based on three possible informative vectors: correlation vector, regression vector, and the product between them. The final models were also constructed using PLS regression.
2. Methods 2.1. Data set The dataset for this study consisted of 44 selected derivatives, including 26 quinolines and 18 naphthalenes (Fig. 1 and Supplementary material, Table S1), capable of 50% inhibition of photosynthesis in spinach chloroplasts (IC50, in µmol/L) (Musiol et al., 2007; Gonec et al., 2013). The observed values were converted into their corresponding –logIC50 (or pIC50). The activities were distributed within the range of 2.329 log units (pIC50 from 2.796 to 5.125). For the validation step, a training set consisting of 37 compounds and a test set consisting of seven compounds were used. The test set was selected to adequately represent the structural variability and biological activity range of the dataset. The structures of the data-set compounds were built using HyperChem 7 (Hyper Co.) from crystallographic structures (CIF codes: 2201734, 7213893, 2208696, and 2218249) obtained from the Crystallography Open Database (Gražulis et al., 2012) (http://www. crystallography.net/cod). All structures were optimized using molecular mechanics and quantum mechanics strategies until the energy obtained no longer varied, indicating a possible minimum energy structure. The compounds were then optimized using quantum mechanics in Gaussian 09 (http://www.gaussian.com) by applying the Austin Model 1 (AM1) and Hartree-Fock (HF/6-31Gd,p). In the final step, density functional theory (DFT) (B3LYP/6–311 G++d,p) was used.
2.4. Validation of models Validation methods are used to check the quality of QSAR models, thus providing a measure of their capability to perform reliable predictions (Gaudio and Zandonade, 2001). The quality of the obtained model was tested through two validation steps: internal and external. To be internally validated, the model must present a good degree of fit, significance, and predictability. These criteria can be evaluated through: (i) coefficient of determination (R2), which must be greater than 0.6 (i.e., must be able to explain at least 60% of variability of the observed values of biological activity); (ii) F test—for correlating the variability explained by the model (R2) and the variability that remains unexplained (root mean square error of calibration, RMSEC), which should have the highest possible values in relation to a tabulated critical value; (iii) leave-one-out (LOO) cross-validation—a procedure by which a compound is excluded from the model. The model is then reconstructed to calculate the value of the excluded object to obtain the coefficient of determination of cross-validation (Q2LOO), which must be able to predict at least 50% of variability of the observed values of biological activity; (iv) RmSquare metrics [average r2m(LOO)-scaled and Δr2m(LOO)-scaled] of cross-validation that aids in confirming the predictability expressed by Q2LOO, because in some cases, a large value for this parameter does not necessarily indicate a good predictability. The rm2 metric is the result of a correlation between observed and predicted values without (r2) and with (r20) the prediction values centered on the origin. The same criterion can be applied in the external validation step; (v) y-randomization, which aims to evaluate whether the variabilities explained and predicted by the model are due to chance. In this process, the significance of R2 and Q2LOO values is estimated by the development of parallel models, maintaining the values of original descriptors (matrix X), and scrambling the values of the dependent variable (vector y) between the samples. These new models may be necessarily worse or there is a possibility that the data fit is mainly due to spurious correlations. It is expected that the values of these two parameters together will be considerably lower than the original values (without permutation), and this quality is expressed by the values of the intercepts of the new models (Q2 < 0.05 and R2 < 0.3) (Kiralj and Ferreira, 2009); and (vi) evaluation of the robustness of the model, which aims to verify the ability of the model to resist small and deliberate variations. For this purpose, leave-N-out (LNO) cross-validation is used, which aims to evaluate whether the model has the capacity to resist small and deliberate variations in its composition (Kiralj and Ferreira, 2009). The commonly used N value is 25–30% of the total number of training set samples (Ferreira et al., 2002; Lang et al., 2014; Roy and Mitra, 2012; Golbraikh and Tropsha, 2002).
2.2. Molecular descriptors The following electronic descriptors were obtained in the GaussView 5 program (http://www.gaussian.com): Mulliken's partial charges of structure common to all derivatives, total energy (ET), total dipole moment (D) in x (DX), y (DY), and z (DZ) axes, and the energies of the two highest occupied molecular orbitals (EHOMO-1 and EHOMO) and two lowest unoccupied molecular orbitals (ELUMO and ELUMO+1). In addition, electrophilicity index (ω), electrophilicity index in the ground state (ωgs), molecular electronegativity (c), molecular hardness (h) and softness (S), ionization potential (IP), activation energy index (AEI), electronic affinity (EA), difference between EHOMO and ELUMO (GAP), and the fraction of EHOMO/ELUMO energy (f(H/L)) were calculated. These descriptors were obtained using the equations described by Todeschini and Consonni (2009). Moreover, 4855 molecular descriptors (divided into constitutional, topological, geometric, molecular, and mixed) were calculated in the Dragon 6 program (http://www.talete.mi.it/index. htm). Next, a matrix with all descriptors was treated with variable reduction filters (also in Dragon 6) to eliminate descriptors that did not present information relevant to the model. The filters were used to eliminate: (i) descriptors with constant values; (ii) descriptors with constant and near-constant variables; (iii) descriptors with a standard deviation of less than 0.001; (iv) descriptors with at least one missing 176
Ecotoxicology and Environmental Safety 153 (2018) 175–180
T.P.P. Ribeiro et al.
consists of five descriptors that generated two latent variables (LVs), with 57.323% of cumulated information (LV1: 33.586%; LV2: 22.737%). The values of the descriptors of each sample of the data set are presented in Table 1. The quality of data fit was 75.9% and above the recommended minimum (R2 > 0.6). The RMSEC represents the variability in unexplained activity of the model, and therefore should have a value as little as possible. RMSEC had an acceptable value in this model. The significance of the equation was evaluated by F test (confidence of 95%, α = 0.05). The value of F obtained was greater than its critical value (cF = 3.316, for p = 2 and n-p-1 = 30), classifying the model as significant (Gaudio and Zandonade, 2001; Ferreira et al., 2002; Golbraikh and Tropsha, 2002).
In the external validation step, where the predictive power of a model is evaluated in relation to the biological activity of missing structures of the model, the following tests were performed: (i) coefficient of determination of external validation (R2pred) and associated error (root mean square error of prediction, RMSEP) that reflects the degree of correlation between the observed and predicted activity; (ii) RmSquare metrics of external prediction (average rm2(pred)-scaled and Δrm2(pred)-scaled), which should have results > 0.5 and < 0.2, respectively; (iii) Golbraikh-Tropsha statistics (k, k’, and |R20-R´20|), which aims to confirm the external predictability of the model (Golbraikh et al., 2003; Roy et al., 2015; Ojha et al., 2011; Tropsha and Golbraikh, 2007); and (iv) mean absolute error (MAE) and its standard deviation—to obviate the influence of rarely occurring high prediction errors that may significantly affect the quality of predictions, where an error of 10% of the training set range is acceptable, while an error value more than 20% of the training set range is a very high error (Roy et al., 2016). In this study, calculations of the statistical parameters adopted in the validation stage were performed through the QSAR Modeling software (Martins and Ferreira, 2009), in an in-house Microsoft Excel spreadsheet, and with Xternal Validation 1.0 software and Plus software (https://sites.google.com/site/dtclabxvplus). The equations for calculating all parameters are given in the studies by Todeschini and Consonni (2009), Kiralj and Ferreira (2009), Golbraikh and Tropsha (2002), Ojha et al. (2011), and Roy et al. (2016).
pIC50 = 4.148 − 0.967(DISPp) − 0.677(Mor30u) − 0.043(RDF060u) − 0.144(CATS2D09AL) + 0.190(CATS2D03DL)
(1)
n = 33; R2 = 0.759; RMSEC = 0.252; F = 47.291; Q2LOO = 0.656; RMSECV = 0.301; R2-Q2LOO = 0.103; average rm2(LOO)-scaled = 0.553; Δrm2(LOO)-scaled = 0.098 Internal predictability was tested by LOO cross-validation, which evaluates the amount of information or variability that a model can predict. The result showed that the model is capable of predicting 65.6% of the information and is above the recommended minimum (Q2LOO > 0.5). The value of the associated RMSECV was 0.301. The difference between the values of R2 and Q2LOO was only 0.103, which indicated a small possibility of occurrence of overadjustment of data (Golbraikh and Tropsha, 2002; Besalú and Vera, 2008). Thus, model 1 was significant and had good predictive capacity as well as low tendency of overfitting. The predicted values and corresponding prediction residues of each sample are shown in Table 1. The presence of chance correlation was evaluated by the y-randomization test. For the model to be considered free from spurious correlations, the obtained correlation lines must have intercepts less than 0.050 for Q2LOO and less than 0.300 for R2 (Kiralj and Ferreira, 2009). The plot presented in Fig. 2A shows that the model obtained has a low probability of being correlated by chance. The robustness of the model was evaluated by the LNO procedure. For the analysis of model 1, a Leave-10-out process (33.3% of the training set) was performed, and for each "N," the procedure was repeated six times. Fig. 2B shows that the model can be considered robust, since it presents minor fluctuations in the values of Q2LNO (from 0.624 to 0.686) and a minor difference between the value of Q2LOO and average Q2LNO (0.009). The highest oscillation observed was for Q2L8O, which showed a standard deviation of 0.051, expressed by the bar shown in the figure. Finally, after validation of the model in the internal validation tests, the external validation step was performed. Considering that only externally validated models may be considered adequate for prediction purposes, the results shown in Table 2 indicate that Model 1 has good predictability. The result obtained for R2pred is equivalent to the result obtained during the LOO cross-validation step, while the RMSEP is better than its cross-validation equivalent (RMSECV). Although this result appears counter-intuitive, it is important to note that internal and external validation tests can be considered independent tests. Therefore, unlike the values of R2 and Q2LOO (which must always be R2 > Q2LOO), the values obtained in the external validation can be numerically better than those obtained in the internal validation. This is in line with the so-called Kubiny's Paradox (Kubinyi et al., 1998; van Drie, 2003). The QSAR approach is widely recognized for the prediction of biological properties of compounds. It is also useful to perform mechanistic interpretation of the model in relation to the mechanism of action of the data set, which increases the credibility of the result (OECD, 2007). Currently, several herbicides that act on PET are commercialized, with atrazine being the best known and most studied herbicide. Atrazine interacts with the binding site (called site QB) via
2.5. Virtual screening and in silico toxicological study The QSAR model was also tested for the prediction of a set of equivalent molecules obtained by 2D similarity-based virtual screening (Nandy et al., 2014; Passeri et al., 2018) of the ZINC database (http:// zinc.docking.org) (Irwin et al., 2012). In accordance with previous tests, the searches were based on the chemical structures of compounds 6 (80% similarity) and 26 (70% similarity). These two compounds were selected because they represent the best possible structural variability of quinolines and naphthalenes used as data set. The obtained set was evaluated for its potential to exhibit toxic effects (mutagenicity in AMES test, a widely employed method using bacteria to test whether a given chemical can cause mutations, and carcinogenicity) in silico using the software VEGA QSAR (http://www.vega-qsar.eu). Each of these toxicological endpoints was tested using four different models, and compounds that were predicted as non-toxic in at least three models were selected. Finally, the remaining compounds had their geometric structures optimized and their descriptors calculated by the same approaches used for the original set. The values of their pIC50 were predicted by model 1, and the inclusion of the predicted values within the linearity range of data set was verified. The Euclidean domain of applicability, which is based only on molecular descriptors, was used to evaluate whether the molecular structure of each selected compound is represented by the chemical space of the model (Nandy et al., 2014). This test was carried out using the Euclidean Applicability Domain 1.0 software (http://dtclab.webs.com/software-tools). 3. Results and discussion After variable selection using the OPS method, an outlier was detected. Compound 1 showed a higher Student residue and Leverage value above the limits, and therefore, was considered an outlier and excluded from the data set to improve the statistical quality of the model (Golbraikh and Tropsha, 2002). A structural analysis corroborated the decision to exclude the compound, because the compound was revealed to have a very long lateral chain (C12H25), which differentiates it within the data set. A value of Q2LOO > 0.5 was used as the classification criterion to achieve models with high predictive capacity, and several models were obtained. The best model obtained is presented in Eq. (1). This model 177
Ecotoxicology and Environmental Safety 153 (2018) 175–180
T.P.P. Ribeiro et al.
Table 1 Values of the descriptors of the compounds used in the construction of model 1, and the results obtained in the LOO cross-validation.
2 3* 4 5 6 7 8 9* 10 11* 12 13 14 15 16* 17 18 19 20* 21 22 23* 24 25 26 27 28 29 30 31* 32 33 34* 35 36* 37 38 39 40 41 42* 43 44
Mor30u
DISPp
CATS2D_09_AL
RDF060u
CATS2D_03_DL
pIC50 real
pIC50 predicted
Residuals
0.402 0.415 0.381 0.433 −0.150 0.034 −0.216 0.072 0.048 −0.046 −0.068 −0.119 −0.048 −0.050 −0.069 0.025 −0.051 −0.068 0.014 −0.006 0.263 0.156 0.274 0.412 −0.079 −0.219 0.037 −0.021 −0.014 −0.045 0.030 −0.064 −0.034 −0.197 −0.174 −0.012 0.066 −0.043 −0.339 −0.237 −0.164 −0.162 −0.188
0.585 0.628 0.643 0.739 0.132 0.204 0.190 0.257 0.327 0.146 0.159 0.177 0.205 0.418 0.376 0.403 0.694 0.659 0.337 0.419 0.733 1.254 0.728 0.746 0.162 0.237 0.506 0.567 0.386 0.663 0.508 0.337 0.328 0.220 0.099 0.260 0.130 0.265 0.647 0.545 0.300 0.499 0.766
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 2 2 0 0 0 6 6 0 0 0 0 0 1 1 0 0 0
13.518 12.972 15.836 17.388 9.494 14.380 11.466 10.707 11.336 9.796 9.268 10.308 8.854 10.113 10.239 8.484 9.268 9.723 13.245 12.931 6.045 7.615 12.475 12.688 7.081 8.675 13.595 11.598 6.770 6.930 12.225 10.469 7.395 8.602 9.171 9.578 9.341 6.814 12.632 11.778 11.637 10.739 9.817
0 3 3 3 3 3 5 4 3 3 2 3 4 2 3 4 2 3 3 3 2 3 4 4 4 4 4 3 3 3 4 3 4 2 2 3 3 2 2 2 2 2 2
2.796
2.706
0.090
2.982 3.382 4.070 4.226 4.788
3.196 2.799 4.304 3.808 4.486
−0.214 0.583 −0.234 0.418 0.302
3.997
3.874
0.123
4.061 4.123 4.294 4.037
4.017 4.187 4.367 3.700
0.044 −0.064 −0.073 0.337
4.035 3.388 3.512
4.143 3.508 3.723
−0.108 −0.120 −0.211
3.482 3.452
3.776 3.401
−0.294 0.051
3.665 3.162 4.684 5.125 3.117 3.514 4.092
3.441 3.405 4.467 4.366 3.595 3.383 4.058
0.224 −0.243 0.217 0.759 −0.478 0.131 0.034
3.509 3.198
3.903 2.798
−0.394 0.400
4.086
4.077
0.009
3.733 3.848 3.836 3.312 3.667
4.080 4.178 4.035 3.520 3.499
−0.347 −0.330 −0.199 −0.208 0.168
3.349 3.870
3.726 3.419
−0.377 0.451
* Test set.
Fig. 2. Results of y-randomization test (A) and leave-N-out (LNO) cross-validation (B).
important descriptor, is a geometric descriptor of class COMMA2 (Silverman and Platt, 1996) weighted by atomic polarizability. Descriptors of this class are related to geometric structure and determine the displacement between a field center of atomic polarizability and the centroid of a molecule. The negative regression coefficient of DISPp suggests an increasing separation between geometric and polarization points, and decreasing activity. This may also be influenced by the third
hydrogen bonding with Phe 265 and Ser 264 residues and hydrophobic bonding with Phe 255 (Powles and Yu, 2010). A brief description of the function of the selected descriptors in model 1 is presented below. Through the autoscaled coefficients of model 1, the order of importance of the selected descriptors was observed to be DISPp (−0.400) > CATS2D_03_DL (0.358) > CATS2D_09_AL > (−0.314) > Mor30u (−0.257) > RDF060u (−0.215). DISPp, the most 178
Ecotoxicology and Environmental Safety 153 (2018) 175–180
T.P.P. Ribeiro et al.
Table 2 Results of external validation. Test set
pIC50 real
pIC50 predicted
Residuals
3 9 11 16 20 23 31 34 36 42 R2pred RMSEP |R20-R´20| k k’ Δrm2(pred)-scaled Average rm2(pred)-scaled MAE MAE+3*SD
3.118 3.845 4.008 3.831 3.961 3.073 3.988 3.585 4.585 3.836 0.746 0.212 0.024 1.007 0.990 0.017 0.671 0.156 0.381
3.270 4.149 4.185 3.959 3.811 2.927 3.808 3.433 4.154 3.847
−0.152 −0.304 −0.177 −0.128 0.150 0.146 0.180 0.152 0.431 −0.011 Fig. 3. Plot of Euclidean applicability domain for training set and virtual screening data set.
criterion to select those scaffolds that can potentially lead to newer and safer molecular entities. Thus, by downloading and using the Simplified Molecular Input Line Entry System (SMILES) (Weininger, 1988) code of each molecule obtained as a result of the screening process, the prediction models for mutagenicity (AMES test) and carcinogenicity available in the VEGA QSAR software were used. Only these two endpoints were selected, because they have the largest number of models available in this program (four models for both). Moreover, the use of other models led to complete elimination of the selected compounds. Notably, even if a compound that is synthesized or obtained by virtual screening exhibits undesired toxicity or biological activity profiles, their activity may be modulated by molecular modifications performed a posteriori. Thus, our final set comprised 16 compounds. The compounds had their potential pIC50 predicted by model 1, which was verified by their inclusion within the pIC50 linearity range of the complete data set (Table 3). Thus, none of the values obtained can be considered an extrapolation. The reliability of these predictions was evaluated using the Euclidean applicability domain, which is based on molecular descriptors only. The results obtained (Fig. 3) indicated that all the compounds fit into the chemical space represented by the obtained model, since all of them have mean normalized values lesser than 1. Based on these results, the predictions of all compounds can be considered reliable. Of these, six compounds showed pIC50 > 4 (Fig. 4). In addition, these six compounds were potentially safer than the other compounds in terms of risks of mutagenicity and carcinogenicity. Therefore, they were selected as the main candidates to be tested against PSII in future herbicide planning studies, and they comprise potential new scaffolds of new prototype compounds.
most important descriptor CATS2D_09_AL, which has a negative coefficient. This class of descriptors is based on topological distances between two points, which can range from 0 to 9 connections. Five potential pharmacophoric sites are defined: D, hydrogen bonding donor; A, hydrogen bond acceptor; P, groups with positive charges; N, groups with negative charges; and L, lipophilic groups (Teófilo et al., 2009). The signal of CATS2D_09_AL may indicate that a large distance between the group responsible for hydrogen bond formation and that responsible for hydrophobic bond formation is detrimental to its activity. The second most important descriptor CATS2D_03_DL shows similar information, but has a positive coefficient and corresponds to a distance of only three chemical bonds between the pharmacophoric sites. Thus, it can be proposed that an increase in the size of derivatives, which makes polarizable groups (i.e., groups capable of forming hydrogen bonds) distant from the geometric center of the molecule, can cause detrimental activity. Considering the size of site QB, this make sense because could result in the positioning of groups in a way that would be harmful to interactions with Phe 265, Ser 264, and Phe 255 residues. The geometrical descriptors RDF060u and Mor30u are not weighed by atomic properties, indicating that the geometry of derivatives is more relevant for PET inhibition. After obtaining the model, a 2D similarity-based virtual screening study was performed. The search resulted in 64 molecules. However, as the objective of this work was to obtain new herbicides, it was considered necessary to carry out an in silico toxicity study as a filtering Table 3 Values of descriptors for selected virtual screening data set, and predicted pIC50. ZINC codes
DISPp
Mor30u
RDF060u
CATS2D_09_AL
CATS2D_03_DL
pIC50 predicted
158027 1233328 2583918 5384799 12341384 16697846 32220082 32220241 34540122 38220346 39060407 39111415 39346453 59432669 62567443 85503477
0.171 0.080 0.652 0.487 0.302 0.171 0.559 0.459 0.109 0.107 0.408 0.111 0.715 0.121 0.396 0.342
0.282 0.108 0.149 0.257 0.141 0.282 0.001 0.067 0.168 0.140 0.230 0.134 0.124 0.187 0.185 0.154
8.560 6.593 4.987 7.535 5.470 8.560 6.243 6.947 8.006 5.693 4.942 5.968 6.240 5.086 13.185 8.198
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 2 2 2 2 2 5 3 4 2 2 2 2 2 1 0
3.804 4.094 3.582 3.559 3.905 3.804 4.288 3.930 4.345 4.085 3.765 4.073 3.484 4.066 3.263 3.361
179
Ecotoxicology and Environmental Safety 153 (2018) 175–180
T.P.P. Ribeiro et al.
P., Cizek, A., Kralova, K., Jampilek, J., 2013. Antimycobacterial and herbicidal activity of ring-substituted 1-hydroxynaphthalene-2-carboxanilides. Bioorg. Med. Chem. 21, 6531–6541. González, M.P., Díaz, H.G., Ruiz, R.M., Cabrera, M.A., de Armas, R.R., 2003. TOPS-MODE based QSARs derived from heterogeneous series of compounds. Applications to the design of new herbicides. J. Chem. Inf. Comput. Sci. 43, 1192–1199. Gražulis, S., Daškevic, A., Merkys, A., Chateigner, D., Lutterotti, L., Quirós, M., Serebryanaya, N.R., Moeck, P., Downs, R.T., Le Bail, A., 2012. A crystallography open database (COD): an open-access collection of crystal structures and platform for world-wide collaboration. Nucleic Acids Res. 40, D420–D427. Green, J.M., 2014. Current state of herbicides in herbicide-resistant crops. Pest. Manag. Sci. 70, 1351–1357. Hess, F.D., 2000. Light-dependent herbicides: an overview. Weed Sci. 48, 160–170. Horton, D.A., Bourne, G.T., Smythe, M.L., 2003. The combinatorial synthesis of bicyclic privileged structures or privileged substructures. Chem. Rev. 103, 893–930. Hussaini, S.M., 2016. Therapeutic significance of quinolines: a patent review (2013–2015). Expert. Opin. Ther. Pat. 26, 1201–1221. Irwin, J.J., Sterling, T., Mysinger, M.M., Bolstad, E.S., Coleman, R.G., 2012. ZINC: a free tool to discover chemistry for biology. J. Chem. Inf. Model. 52, 1757–1768. Kiralj, R., Ferreira, M.M.C., 2009. Basic validation procedures for regression models in QSAR and QSPR studies: theory and application. J. Braz. Chem. Soc. 20, 770–787. Kubinyi, H., Hamprecht, F.A., Mietzner, T., 1998. Three-dimensional quantitative similarity–activity relationships (3D QSAR) from SEAL similarity matrices. J. Med. Chem. 41, 2553–2564. Lang, K.L., Silva, I.T., Machado, V.R., Zimmermann, L.A., Caro, M.S.B., Simões, C.M.O., Schenkel, E.P., Durán, F.J., Bernardes, L.S.C., de Melo, E.B., 2014. Multivariate SAR and QSAR of cucurbitacin derivatives as cytotoxic compounds in a human lung adenocarcinoma cell line. J. Mol. Graph. Model. 48, 70–79. Liu, P., Long, W., 2009. Current mathematical methods used in QSAR/QSPR studies. Int. J. Mol. Sci. 10, 1978–1998. Martins, J.P.A., Ferreira, M.M.C., 2009. QSAR modeling: a new open source computational package to generate and validate QSAR models. Quim. Nova 26, 554–560. Musiol, R., Jampilek, J., Kralova, K., Richardson, D.R., Kalinowski, D., Podeszwa, B., Finster, J., Niedbala, H., Palkaa, A., Polanski, J., 2007. Investigating biological activity spectrum for novel quinoline analogues. Bioorg. Med. Chem. 15, 1280–1288. Nandy, A., Kar, S., Roy, K., 2014. Development of classification and regression based QSAR models and in silico screening of skin sensitization potential of diverse organic chemicals. Mol. Simul. 40, 261–274. OECD, 2007. Guidance Document on the Validation of (Quantitative) Structure Activity Relationship [(Q)SAR] Models. Organization for Economic Co-Operation and Development OECD, Paris. Ojha, P.K., Mitra, I., Das, R.N., Roy, K., 2011. Further exploring rm2 metrics for validation of QSPR models. Chemom. Intell. Lab. Syst. 107, 194–205. Oliveira Jr, R.S., Constantin, J., Inoue, M.H., 2011. Biologia e manejo de plantas daninhas. Ominipax, Curitiba. Passeri, G.I., Trisciuzzi, D., Alberga, D., Siragusa, L., Leonetti, F., Mangiatordi, G.F., Nicolotti, O., 2018. Strategies of virtual screening in medicinal chemistry. Int. J. Quant. Struct. Prop. Relat. 3, 134–160. Polanski, J., Kurczyk, A., Bak, A., Musiol, R., 2012. Privileged structures - dream or reality: preferential organization of azanaphthalene scaffold. Curr. Med. Chem. 19, 1921–1945. Powles, S.B., Yu, Q., 2010. Evolution in action: plants resistant to herbicides. Annu. Rev. Plant. Biol. 61, 317–347. Roy, K., Mitra, I., 2012. On the use of the metric rm2 as an effective tool for validation of QSAR models in computational drug design and predictive toxicology. Mini Rev. Med. Chem. 12, 491–504. Roy, K., Kar, S., Das, R.N., 2015. Understanding the Basics of QSAR for Applications in Pharmaceutical Sciences and Risk Assessment. Academic Press, Kidlington, Oxford. Roy, K., Das, R.N., Ambure, P., Aher, R.B., 2016. Be aware of error measures. Further studies on validation of predictive QSAR models. Chemom. Intell. Lab. Syst. 152, 18–33. Sharma, M.C., 2016. Identification of 3-nitro-2,4,6-trihydroxybenzamide derivatives as photosynthetic electron transport inhibitors by QSAR and pharmacophore studies. Interdiscip. Sci. Comput. Life Sci. 8, 109–121. Silverman, B.D., Platt, D.E., 1996. Comparative molecular moment analysis (CoMMA): 3D-QSAR without molecular superposition. J. Med. Chem. 39, 2129–2140. Teófilo, R.F., Martins, J.P.A., Ferreira, M.M.C., 2009. Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression. J. Chemom. 23, 32–48. Todeschini, R., Consonni, V., 2009. Molecular Descriptors for Chemoinformatics. WileyVCH, Weinheim. Tropsha, A., Golbraikh, A., 2007. Predictive QSAR modeling workflow, model applicability domains, and virtual screening. Curr. Pharm. Des. 13, 3494–3504. van Drie, J.H., 2003. Pharmacophore discovery – lessons learned. Curr. Pharm. Des. 9, 1649–1664. Wasi, S., Tabrez, S., Ahmad, M., 2013. Toxicological effects of major environmental pollutants: an overview. Environ. Monit. Assess. 185, 2585–2593. Weininger, D., 1988. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inform. Comp. Sci. 28, 31–36. Zuo, Y., Wu, Q., Su, S.W., Niu, C.W., Xi, Z., Yang, G.F., 2016. Synthesis, herbicidal activity, and QSAR of novel N-benzothiazolyl-pyrimidine-2,4-diones as protoporphyrinogen oxidase inhibitors. J. Agric. Food Chem. 64, 552–562.
Fig. 4. Compounds obtained by 2D similarity virtual screening in the ZINC database with predicted pIC50 > 4 by Model 1.
4. Conclusion In this study, QSAR and virtual screening studies were performed based on a set of quinoline and naphthalene derivatives described as PET inhibitors, and they were found to have the potential for use as new herbicides. The internal and external validation metrics adopted indicated that the model is significant, robust, does not show chance correlation, and has good external predictability. The model interpretation may be related to the inhibition of PET, indicating that not only the ability to form hydrogen and hydrophobic bonds, but also the geometry that a molecule adopts when interacting with the binding site is important. The predictions for a set of 16 molecules obtained by 2D similarity virtual screening, which were shown to be potentially safe through in silico toxicity studies, may be considered reliable. Thus, the obtained model could be useful as a tool for designing new herbicides that act through the inhibition of PSII. Acknowledgements Funding: The work was supported by the Araucaria Foundation (Fundação Araucária) (grant 2010/7354), State of Paraná, and the National Council of Scientific and Technological Development (CNPq) (Universal 14/2014) and the Coordination for the Improvement of Higher Education Personnel (CAPES), Govt. of Brazil. T.P.P.R. and E.B.M.: Fundação Araucária, CAPES (PROAP program); F.G.M.: CNPq, CAPES. Appendix A. Supplementary material Supplementary data associated with this article can be found in the online version at http://dx.doi.org/10.1016/j.ecoenv.2018.02.016. References Besalú, E., Vera, L., 2008. Internal test set (ITS) method: a new cross-validation technique to assess the predictive capability of QSAR models. Application to a benchmark set of steroids. J. Chil. Chem. Soc. 53, 1576–1580. Csizmadia, I.G., Enriz, R.D., 2000. The role of computational medicinal chemistry in the drug discovery process. J. Mol. Struct (THEOCHEM 504, ix-x). Duke, S.O., 2012. Why have no new herbicide modes of action appeared in recent years? Pest. Manag. Sci. 68, 505–512. Ferreira, M.M.C., Montanari, C.A., Gaudio, A.C., 2002. Seleção de variáveis em QSAR. Quim. Nova 25, 439–448. Gaudio, A.C., Zandonade, E., 2001. Proposition, validation and analysis of QSAR models. Quim. Nova 24, 658–671. Golbraikh, A., Tropsha, A., 2002. Beware ofq2!. J. Mol. Graph. Model. 20, 269–276. Golbraikh, A., Shen, M., Xiao, Z., Xiao, Y.D., Lee, K.H., Tropsha, A., 2003. Rational selection of training and test sets for the development of validated QSAR models. J. Comput. Aided Mol. Des. 17, 241–253. Gonec, T., Kos, J., Zadrazilova, I., Pesko, M., Keltosova, S., Tengler, J., Bobal, P., Kollar,
180