Current Topics in Computer-Aided Drug Design CARLTON A. TAFT,1 VINICIUS BARRETO DA SILVA,2 CARLOS HENRIQUE TOMICH DE PAULA DA SILVA2 1
Centro Brasileiro de Pesquisas Fı´sicas, Rua Dr. Xavier Sigaud, 150, Urca, 22290-180 Rio de Janeiro, Brazil
2
Faculdade de Cieˆncias Farmaceˆuticas de Ribeira˜o Preto, Universidade de Sa˜o Paulo, Av. de Cafe´, s/n, Monte Alegre, 14040-903 Ribeira˜o Preto—SP, Brazil
Received 25 April 2007; revised 25 October 2007; accepted 23 November 2007 Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/jps.21293
ABSTRACT: The addition of computer-aided drug design (CADD) technologies to the research and drug discovery approaches could lead to a reduction of up to 50% in the cost of drug design. Designing a drug is the process of finding or creating a molecule which has a specific activity on a biological organism. Development and drug discovery is a time-consuming, expensive, and interdisciplinary process whereas scientific advancements during the past two decades have altered the way pharmaceutical research produces new bioactive molecules. Advances in computational techniques and hardware solutions have enabled in silico methods to speed up lead optimization and identification. We will review current topics in computer-aided molecular design underscoring some of the most recent approaches and interdisciplinary processes. We will discuss some of the most efficient pathways and design. ß 2008 Wiley-Liss, Inc. and the American Pharmacists Association J Pharm Sci 97:1089–1098, 2008
Keywords:
drug design; drug interactions; drug resistance; ADME; antiinfectives
INTRODUCTION There are very large number of known compounds and new ones discovered each year indicating the need for an electronic information processing for storing of chemical information in databases and obtaining a better overview of known chemistry. The effect of a drug in the human body is a consequence of the molecular recognition between a ligand (the drug) and a macromolecule (the target). The pharmacological activity of the ligand at its site of action is related to the spatial arrangement and electronic nature of the atoms of the ligands and the way these atoms interact with their biological counterpart. Com-
Correspondence to: Carlton A. Taft (Telephone: 21-21417201; Fax: 21-2141-7201; E-mail:
[email protected]) Journal of Pharmaceutical Sciences, Vol. 97, 1089–1098 (2008) ß 2008 Wiley-Liss, Inc. and the American Pharmacists Association
putational chemistry can characterize the dynamics, energetics, and structure of such interactions.1–50 Computer assisted approaches to identify new inhibitors via pharmacophore, molecular and dynamic modeling, quantum chemical (QM) and combined QM and molecular mechanic methods, docking, structural interaction fingerprints, grid technologies, virtual screening, and statistical learning methods have been used to identify new inhibitors using large commercially available databases and highly selective and efficient programs such as InsightII and Catalyst (Accelrys, San Diego, CA), Sybyl (Tripos, San Diego, CA), Gaussian, Pittsburgh, PA, GOLD (CCDC, Cambridge, UK) and others, as well as databases such as iResearch Library (ChemNavigator, San Diego, CA).1–8,45–50 In the following, we discuss briefly a number of current topics in CADD addressing some of the most recent approaches and efficient pathways.1–50
JOURNAL OF PHARMACEUTICAL SCIENCES, VOL. 97, NO. 3, MARCH 2008
1089
1090
TAFT, DA SILVA, AND DA SILVA
We will also give a brief discussion of some of our recent work in computer aided-drug design of novel potential inhibitors for cancer, AIDS and Alzheimer diseases.1–8
TOPICS IN DRUG DESIGN Drug Targets During the last two decades, what was previously considered as drug receptors as a general term is now referred to as drug targets and classified into enzymes, receptors of various types, ion channels, transporters, and other targets. Most commonly used therapeutic drugs have a membrane-bound receptor or an enzyme as site of action. The sequencing of human and other genome has made possible the identification of many unknown proteins that might serve as new drug targets. However, the detailed 3D structures for the majority of membrane proteins are still unknown. Computerized modeling of protein structures, based on experimentally determined structures of homologue proteins, may be a useful methodological alternative, especially for membrane proteins. For the future, collaborative structural genomics initiatives may aim at determining the 3D structure of all known proteins, based on a combination of experimental structure determination and molecular modeling. Development of powerful computer softwares and hardwares will enable extensive studies of the protein structure and dynamics of new potential drug targets, raising, however, a new challenge in the validation and calibration of computerized methods of biosimulations.1–8,21 There has been substantial recent progress in the exploration and usefulness of support vector machines (SVM) approach for predicting druggable proteins. Effectively, identification and validation of viable targets is an important first step in drug discovery and new methods, and integrated approaches are continuously explored to improve the discovery rate and exploration of new drug tartets. An in silico machine learning method, SVM, has been explored as a new method for predicting druggable proteins from amino acid sequence independent of sequence similarity, thereby facilitating the prediction of druggable proteins that exhibit no or low homology to known targets. A review of recent progress in exploring SVM approaches for predicting druggable proteins has been recently published.29 JOURNAL OF PHARMACEUTICAL SCIENCES, VOL. 97, NO. 3, MARCH 2008
Determining the potential of a protein as a therapeutic target and it’s utility as part of structure-based drug design is essential for determining the biological function of the protein. Sequence homology has routinely been used as a rapid approach to assign biological function to hypothetical proteins or proteins of unknown function. Sequence and structural homology methods primarily determine ‘‘global’’ similarities between the compared proteins. The molecular function of a protein, however, is generally restricted to its identified active site, which may involve an interaction with small molecularweight ligands, nucleic acids, or other proteins. Maintaining the core structural component of the active site is essential for preserving the functional activity of the protein. Taking into consideration the completion of numerous genome sequencing projects and the vastly expanding list of unannotated proteins, there is an important need for rapid and accurate functional assignment of novel protein.12 Traditionally, putative functions have been determined by global primarysequence and structure comparisons. Similarities in active site configurations are not emphasized however by these approaches. Notwithstanding, these similarities are fundamental to a proteins’s activity and highly conserved relative to the global and more variable structural features.1–8,12 Different approaches should emphasize similarities in active site configurations that are fundamental to a protein’s activity and highly conserved relative to the global and more variable structural features. Comparison of experimentally identified ligand-binding sites to infer biological function and aid in drug discovery can be performed with databases and softwares such as CPASS (Comparison of Protein Active Site Structures).12 Ligand-defined active sites can be identified in the protein data bank to compare these sites to determine sequence and structural similarity without maintaining sequence connectivity. Irrespective of the identity of the bound ligand, any set of ligand-defined protein active sites can be compared.1–8,12
Statistical Methods Quantitative structure-activity relationships (QSAR) methods represent an attempt to correlate structural and/or property descriptors of compounds with biological activities. These descriptors characterizing steric, topologic, electronic, DOI 10.1002/jps
CURRENT TOPICS IN COMPUTER-AIDED MOLECULAR DESIGN
and hydrophobic properties of a series of molecules have been traditionally determined mainly empirically, and only more recently by computational methods. One of the most powerful 3D QSAR methods, the comparative molecular field analysis (CoMFA) has now been widely and successfully applied to the design of active compounds. CoMFA correlates molecular properties to biological activity through calculating steric, electrostatic, and lipophylic potentials around the molecules, and then, applying the partial least square method to the data set.35 Statistical learning methods are being increasingly used for predicting compounds of specific property and evaluating algorithms commonly used for representing structural and physicochemical properties of compounds. More recently, other statistical learning methods such as neural networks and SVM have been explored for predicting compounds of higher structural diversity than those covered by QSAR and Quantitative Structure-Property Relationships (QSPR).28 In contrast to QSAR and QSPR, these recently explored statistical learning methods derive inexplicit statistical model to classify agents into two classes, one possessing and the other not possessing a specific property.36 Recent work reviews the strategies, current progresses and underlying difficulties in using statistical learning methods for predicting compounds of specific pharmacodynamic, pharmacokinetic, or toxicological properties as well as facilitating drug discovery and drug safety evaluation.28
Chemoinformatics A new field with a long tradition, chemoinformatics is the application of informatics methods to solve chemical problems. Although this term was introduced only a few years ago, this field has a long history. Work on chemical structure representation and searching, QSAR, chemometrics, molecular modeling as well as computer-assisted structure elucidation and synthesis design have merged into a discipline of its own that is in full bloom.1–8,22 All areas of chemistry from analytical chemistry to drug design can benefit from chemoinformatics methods and there are still many challenging chemical problems waiting for solutions. An overview of chemoinformatics includes topics such as representation of chemical compounds and chemical reactions, data, datasources and databases, methods for calculating DOI 10.1002/jps
1091
physical and chemical data, calculation of structure descriptors as well as data analysis methods. The range of application is very wide and should answer questions such as: What structure do I need to obtain the desired property? How can I synthesize this structure? What is the product of the reaction that I have performed?1–18,22 Some typical application of chemoinformatics in different areas of chemistry include storage and retrieval of chemical structures and associated data to manage the flood of data, prediction of the physical, chemical or biological properties of the compounds, analysis of data from analytical chemistry to make prediction on the quality, origin, and age of the investigated objects, elucidation of the structure of a compound based on spectroscopic data, prediction of the course and products of organic reactions, design of organic synthesis, identification of new lead structures, optimization of lead structures, establishment of QSAR, comparison of chemical libraries, definition and analysis of structural diversity, planning of chemical libraries, analysis of high-throughput data, docking of a ligand into a receptor, de novo design of ligands, modeling ADME-TOX and properties, prediction of the metabolism of xenobiotics and analysis of biochemical pathways.1–8,22 Pharmacophore Modeling Pharmacophore definition and 3D searches are an important part of drug discovery. Pharmacophore model is a hypothesis on the 3D arrangement of structural properties. These models include aromatic rings, hydrophobic groups of compounds that bind to a biological target as well as hydrogen bond donor and acceptor groups. Geometric and steric constraints can be defined when one has the 3D structure of the receptor target or by comparison with inactive analogs. 3D searches in large databases can be performed once a pharmacophore model is established. In the presence of the 3D structure of this target or by comparison with inactive analogs, further geometric and/or steric constraints can be defined.10 Once a pharmacophore model is established, 3D searches in large databases can be performed, leading to a significant enrichment of active analogs. Pharmacophore modeling and 3D database search have been successful tools for enriching screening experiments aimed at the discovery of novel bioactive compounds.1–8 A representative pharmacophore model is able to attach the key features common to a given set of JOURNAL OF PHARMACEUTICAL SCIENCES, VOL. 97, NO. 3, MARCH 2008
1092
TAFT, DA SILVA, AND DA SILVA
active compounds, along with their spatial distribution in the respective molecular conformation. The pharmacophoric features constitute general information about the different ways in which ligands may interact with macromolecules, typically through ionic, hydrophobic and/or hydrogen bonding interactions. Thus, pharmacophore searches are commonly used for lead finding, in order to find compounds that could bind to the receptor in the same way as the known actives but do not share significant substructural similarity.30 The anchor-GRIND methodology11 efficiently combines a priori chemical and biological knowledge about the studied compounds with alignmentindependent molecular descriptors derived from molecular interaction fields. The anchor-GRIND approach is expected to fill the gap between standard 3D-QSAR and GRID independent descriptors and can be useful when ligands share a common scaffold with diverse substituents.11 The ‘‘anchor point,’’ a specific position of the molecular structure, is used to compare the molecular interaction fields of the substituents regarding their spatial distribution. The descriptors used in the anchor-GRIND method can avoid the bias introduced by the alignment, discriminate between inhibitors of high and low affinity and are statistically sound and easy to interpret.1–8,11
Combining Docking and Molecular Dynamics Simulations In current drug design, the issue of how to hit a moving target involves protein flexibility.1–8,13 The most advanced methods for CADD and database mining incorporate protein flexibility. Such techniques are not only needed to obtain proper results; they are also critical for dealing with the growing body of information from structural genomics. Numerous docking programs are extensively used in the biotechnology and pharmaceutical industries. Because of the high demand that flexibility implicates, most docking algorithms assumes the protein to be rigid. However, many of the current docking programs consider the ligands as flexible molecules. The algorithms for docking include forcefield-based methods such as molecular dynamics or Monte Carlo simulations allowing for movements of ligands and targets. There are no widely used molecular dynamics-based docking programs available although molecular dynamics JOURNAL OF PHARMACEUTICAL SCIENCES, VOL. 97, NO. 3, MARCH 2008
has been successfully applied to molecular docking. Molecular dynamics-based docking programs include QXP38 and ICM.39 Evolutionary methods for docking include Genetic Algorithms (GA), Tabu Searh (TB), and Evolutionary Programming (EP) which have been implemented in PROLEADS,40 GOLD,43 and AutoDock44 are among the most well-known docking programs based on GA. DOCK 4.0,42 FlexX,41 and Surflex use the fragment-based incremental methods which are well-known fragment-based methods that split the ligand into pieces that are, in an incremental way, docked. Shape complementary-based methods uses Gaussian functions to fit the ligand shape to the negative image of the protein whereas LigandFit explores grids to compare shapes of the ligand and the target. In the past decade, a number of scoring functions have been developed in order to estimate the binding affinity of novel structures or fragments in a given position inside the receptor pocket. These scoring functions include the Force-field-based scoring functions using classical molecular mechanics energy functions which approximates the binding free energy of ligand-protein complexes by a sum of electrostatic and van der Waals interactions.1–8 When one analyzes the future challenges of protein-ligand docking one must keep in mind the ruling principles whereby protein receptors recognize, interact, and associate with molecular substrates and inhibitors, which is of paramount importance in drug discovery efforts. Proteinligand docking aims to predict and rank the structure(s) arising from the association between a given ligand and a target protein of known 3D structure. Despite the breathtaking advances in the field over the last decades and the widespread application of docking methods, several downslides still exist.13 In particular, protein flexibility—a critical aspect for a thorough understanding of the principles that guide ligand binding in proteins— is a major hurdle in current protein-ligand docking efforts that needs to be more efficiently accounted for. The key concepts of protein-ligand docking methods must be outlined, with emphasis given to general strengths and weaknesses that presently characterize this methodology. Despite the size of the field, the principal types of search algorithms and scoring functions as well as traditional limitations associated with molecular docking must be addressed.1–8,13 Fast docking protocols can be combined with accurate but more costly molecular dynamics techniques to predict more reliable DOI 10.1002/jps
CURRENT TOPICS IN COMPUTER-AIDED MOLECULAR DESIGN
ligand-macromolecule complexes. The idea of this combination lies in their complementary strengths. Docking simulations are used to explore the vast conformational space in a short period of time, allowing the scrutiny of large collections of drug-like compounds at a reasonable cost. Molecular dynamics can treat both ligand and protein flexibility, especially in the protein that is usually a limited characteristic in docking protocols. In molecular dynamics simulations, the effect of explicit water molecules can be investigated directly, and accurate binding free energies can be obtained.31 Another important current topic in drug design involves computational sampling of a cryptic drug binding site in a protein receptor, that is, explicit solvent molecular dynamics and inhibitor docking. Many structural studies reveal alternative binding sites in protein receptors. These situations represent a significant challenge to CADD since they only become apparent when an inhibitor binds and correct prediction of these situations presents a significant challenge to CADD efforts.9 Movement of the side-chain can create new binding sites which can be successfully and repeatedly identified in explicit-solvent molecular dynamics (MD) simulations of the protein. Ligand-docking calculations on different structural snapshots generated during molecular dynamics can indicate that the conformations sampled are often surprisingly competent to bind the inhibitor in the crystallographic correct position and with docked energies that are generally more favorable than those of other positions. Docking studies research with binding site-directed inhibitor could suggest that it may be possible to develop hybrid inhibitors that target the regular and cryptic binding sites simultaneously. Molecular dynamics simulations demonstrate that existing computational methods may be very useful in predicting, prior to their experimental discovery, cryptic binding sites in protein receptors.
Virtual Screening Virtual database screening is an increasingly important strategy of the computer-aided search for novel lead compounds. There are, fundamentally, two approaches to this general topic: virtual screening (VS) by fast automated docking methods, which requires knowledge of 3D structure of the target binding site; and similarityDOI 10.1002/jps
1093
based virtual screening, where no information on the target structure is needed (instead, compounds that are known to bind the target are used as structural queries).32 Virtual screening has been established as a powerful alternative and complement to highthroughput screening (HTS). When performed optimally, impressive hit rates have been reported, which have been significantly higher than those for HTS.34 A promising model is the genome-to-drug-lead approach for lead identification of small-molecule inhibitors. Even fast computational approaches to identify drug leads such as virtual screening are limited by a known challenge in crystallographic determining flexible regions of proteins. These approaches have not been able to identify important active inhibitors using solely the crystal structures. This approach has not been able to identify many active inhibitors using solely the crystal structures.15 Tera scale computing, a genome-to-drug-lead approach, can model flexible regions of proteins, identifying drug leads via genetic information. Small-molecular inhibitors, exhibiting effective concentration in cell-based assays, can be identified through virtual screening against a computer-predicted model. Terascale computing can complement crystallography, broaden the scope of virtual screening, and accelerate the development of therapeutics to treat emerging infectious diseases.1–8 Molecular Science and Engineering are using increasingly more open computing grid, which is an emerging infrastructure for distributed computing that provides scalable and secure mechanisms for accessing and discovering remote data resources and software. This infrastructure have great potential for solving large scale material science, pharmaceutical and chemical problems. As an example, OpenMolGRID16 is an open computing grid that can provide software for building QSPR/QSAR models, data warehouse for chemical data. Compounds can thus be generated with biological activity and predefined chemical properties.1–8,16 A novel approach for computational in silico screening and ‘‘rational’’ selection of new lead antibacterial agents is nonstochastic and stochastic linear indices of the molecular pseudograph’s atom-adjacency matrix. TOMOCOMD-CARDD is based on the calculation of the stochastic and nonstochastic linear indices of the molecular pseudograph’s atom-adjacency matrix representing molecular structures.17 They are used for ‘‘rational’’ selection and computational (virtual) JOURNAL OF PHARMACEUTICAL SCIENCES, VOL. 97, NO. 3, MARCH 2008
1094
TAFT, DA SILVA, AND DA SILVA
screening of lead antibacterial agents using linear discrimination analysis. This approach compares well with some of the most useful models for antimicrobial selection reported.1–8,17 Another area of interest is the application of fragment screening by high throughput X-ray crystallography. Fragment-based approaches to lead discovery are becoming important as a complementary approach to high throughput screening. Fragments are compounds with low binding affinities as well as low molecular weight. Only small libraries of fragments need to be screened because molecules of low complexity have a much higher probability of being complementary to receptors when compared with druglike molecules of high complexity. We can also think of fragments as the building blocks of druglike molecules. Despite their low affinities, fragments possess high ratios of free energy of binding to molecular size and form a small number of high quality interactions whereas hits from high throughout screening often display low ligand efficiency with potencies derived from a larger number of lower quality interactions. It is thus possible to optimize fragments to high quality leads of relatively low molecular weight that possess better drug-like properties.27 The optimization can be achieved by synthesizing a relatively small number of compounds using structurebased drug design and, due to the simplicity of the initial fragment hits, the synthesis is often straightforward.
Computer-Aided Drug Design Strategies In general, a detailed 3D structure of the drug target, is the first consideration before starting a CADD project. A ligand-based (pharmacohore, QSAR, CoMFA) or a structure-based approach (de novo ligand design, docking) can be undertaken to generate new lead compounds, to be evaluated, selection of best candidates, then synthesized or purchased, tested for activity and fed back into CADD via an iterative process. However, there are numerous drawbacks to the strict separation between ligand- and structurebased CADD methods.18 Most ligand-based methods try to conserve the 3D arrangement of functional groups on a scaffold believed to be important in the activity of existing ligands precluding the discovery of novel ligands which undertake different interactions with the target protein. If induced fit of both ligand and protein JOURNAL OF PHARMACEUTICAL SCIENCES, VOL. 97, NO. 3, MARCH 2008
are evaluated in docking methods, it becomes more expensive computationally. Large scale changes, in the protein upon ligand binding are often ignored. With and without ligands complexed to it, the availability of detailed structures of the target, ideally in different conformations places limitations in structure-based methods. An interesting approach is the integration between ligand- and structure-based CADD methodologies which model separate facets of the natural system, using thus, in a quantitative and objective way all available information in a particular drug design.1–8,18 Integrated CADD methodologies use crystal structure complexes to produce structure-based pharmacophores representing the ligand features that are involved in interactions with the target protein, as well as the space around the ligand occupied by the protein18 yielding information about the binding cavity size and all relevant interactions. The protein-ligand complexes can thus yeld a ‘‘superligand’’ which can also be viewed as a pharmacophore. Various types of novel pharmacophores can be compared and investigated. In structural biology, exploring energy landscapes are major challenges for problems including protein dynamics, aggregation, and folding. For the last 30 years conventional molecular dynamics in Cartesian coordinate space has provided a wealth of information on conformational changes occurring within the nanosecond time scale which is not satisfactory however to provide a complete picture of the problems cited above. Development of more efficient Cartesian coordinate-based sampling techniques and the reduction of the number of degrees of freedom by moving to internal coordinate space have been followed to accelerate conformational sampling and increase our understanding of the kinetics, structure, and thermodynamics involved. There is consequently considerable interest in activated methods in internal coordinate space for sampling protein energy landscape.19 Artist is one of the first applications for sampling all-atom protein conformations using an activated method in internal coordinate space. The activation-relation technique for internal coordinate space trajectories differs from other internal coordinate-based studies aimed at folding or refining protein structures in that conformational changes results from identifying and crossing well-defined saddle points connecting energy minima. This method is efficient for exploring conformational space in DOI 10.1002/jps
CURRENT TOPICS IN COMPUTER-AIDED MOLECULAR DESIGN
both sparsely and densely packed environments and offers new perspectives for applications in CADD.1–8,19 This is because for infectious diseases, resistance, is primarily mediated by mutations in the genes of infectious organisms that alter their interaction with the target protein. Of increasing concern is drug resistance in the treatment of cancer and other diseases and the prediction of features that guide the design of new agents to counter resistant strains. As well as the molecular dissection of drug resistance mechanisms the prediction of resistance mutations in proteins is very valuable. Prediction of resistance mutations and mechanistic studies of numerous sequenceand protein structure-based computer methods have been explored.1–8,20 Another important aspect of drug design is computer prediction of drug resistance mutations in proteins. The availability of 3D structure of drug targets involved in disease enables the use of structurebased approaches for evaluation of molecular interactions, salvations, and dynamical properties of drug-protein binding and their correlation with resistance mutation. Structure-derived binding energies and binding-site volume-based fitness models have also been used. Since structural information is not always available, sequences are also used for predicting resistance mutations. Statistical learning methods such as neural networks, SVM and decision trees are also promising for predicting resistance mutations.1–8,20
Simplified Molecular Input Line Entry Specification Concurrent with the accelerated evolution of computational power, the advent of drug design allows us to store and manipulate substantial data in different formats, for substructure searching, preprocessing, 3D-matching, and creating of a combinatorial library. Simplified molecular input line entry specification (SMILES)23 is a simple linear chemical language for specifying molecules or molecular fragments. Classic 2D structural formats and SMILES offer a convenient way to represent molecules as a simplistic connection table yielding ease of handling and storage. In virtual screening, chemical databases are often initially represented by canonical SMILES strings. These can be processed and filtered in a number of ways, resulting molecules occupying similar regions of chemical space to active DOI 10.1002/jps
1095
compounds of a therapeutic target. A wide variety of software exists to convert molecules into SMILES format.23 The atoms of a SMILES string defining a molecule can be ordered differently depending on the algorithm used. Different permutation of a SMILES string can affect conformer generation, affecting repeatability and reliability of the results. Different SMILES representations can affect drug design processes such as docking. This phenomenon can also be used, in some cases, to increase the rate of production of a diverse set of conformers yielding a more effective sampling of conformation space.
ADMET Properties Investigations into de cause of late-stage failures in drug development, performed in the 1990s, revealed that poor pharmacokinetics and toxicity are responsible to most of the causes. In silico ADMET (absorption, distribution, metabolism, excretion, and toxicity) models are important tools in reducing the time and expense of the drug discovery process. The data used to build ADMET models often are provided by high throughput in vitro screens. There are numerous techniques and descriptors available for modeling categorical data that have been shown to be useful, such as Neural Networks and Bayesian statistics.33 The tools used for developing in silico ADMET models vary in their throughput, prediction accuracy and range of statistical methods and descriptors. Descriptors, for example, can be based on simple whole-molecule properties (polar surface area, hydrogen bond donors and acceptors, log P, among others) or can be semi-empirical methods based on quantum theory. Calculated molecular properties from 3D molecular fields of interaction energies are a novel approach to correlate 3D molecular structures with pharmacokinetic properties. The descriptors used can characterize in a quantitative manner the size, shape, polarity, hydrophobicity, and the balance between them.37 A topic of increasing importance is the computational approach to modeling drug transporters. Absorption into the systemic circulation is the first step for drugs to reach their targets followed by distribution to tissues where they can be metabolized into more readily excretable forms and eliminated from the systems circulation. Transporter proteins mediate or influence all JOURNAL OF PHARMACEUTICAL SCIENCES, VOL. 97, NO. 3, MARCH 2008
1096
TAFT, DA SILVA, AND DA SILVA
the above cited aspects. Our understanding of drug absorption, tissue distribution, excretion, and toxicity has been advanced by computational modeling by providing direct and indirect knowledge of drug-transporter interactions unavailable from experimental methods.24 Substrate-based and transporter-based models are complementary approaches for modeling transporters. The transporter’s 3D structure can be predicted directly by the transporter-based method. In order to understand the drug transport process substrate-based models infer such information by studying a group of substrates or inhibitors with measured activities. With continuously improved modeling algorithms and increasing computational power, computational techniques can assist in understanding transporter–substrate interactions as well as the optimization of transporter-directed drug design.1–8,24
amino acid side chain propensities may be important for improving the accuracy of structure-based drug design. LEA3D is a computer-aided ligand design25 for structure-based drug design. It is an improved version of the program LEA developed to design organic molecules. Finding solutions to large combinatorial problems for which an exhaustive search is impractical involves rational drug design. The tools for the investigation of such problems are provided by genetic algorithms. LEA3D can conceive organic molecules by combining 3D fragments which are extracted from biological compounds and known drugs. The search process is guided by a fitness function in order to optimize the molecules toward an optimal value of the properties.1–8,25
CONCLUSIONS Structure-Based Drug Design Of current interest are the nonbonded contacts analysis of protein-ligand complexes in crystal structures, in particular, propensities of polar and aromatic amino acids in nancononical interactions. A detailed analysis of crystal structures of protein-ligand complexes have established in recent years, the importance of weak interactions, which are directly relevant to rational drug design. Numerous crystal structures have been reported regarding weak nonbonded contacts such as CH–O which contribute significantly to the molecular recognition in protein–ligand interactions.26 However, there has not been a systematic analysis of the roles and frequencies of these interactions despite the fact that the importance of these weak interactions in biological systems has been well established. An exhaustive analysis of the Protein Data Bank (PDB) may provide us with new insights for the patterns and the preferences of protein–ligand interactions and aid structure-based drug design of more potent ligands. Nonbonded contacts analysis for polar and aromatic amino acid side chains for protein–ligand complexes derived from the crystal structures in the Protein Data Bank indicated high frequencies for the canonical hydrogen-bonding NH–O, OH–N, and OH–O interactions, while the preferences in noncanonical interactions such as CH–p interactions were not always consistent among the side chains with similar characteristics. Understanding such JOURNAL OF PHARMACEUTICAL SCIENCES, VOL. 97, NO. 3, MARCH 2008
Computer assisted approaches to identify new inhibitors via pharmacophore, molecular and dynamic modeling, QM and combined QM, and molecular mechanic methods, docking, structural interaction fingerprints, grid technologies, virtual screening, and statistical learning methods have been used to identify new inhibitors using large commercially available databases and highly selective and efficient programs.1–8 Based on using a number of techniques discussed above the authors have proposed novel inhibitor for cancer, Aids, Alzheimer, and other diseases. The receptor targets used in our recent works1–8 included the retinoic acid for cancer, integrase for AIDS, acetylcholinesterase for Alzheimer’s disease. We have used Gaussian0348 for density functional calculations to obtain initial geometries and charges for ligands, GOLD46,51 for docking ligands to the receptor sites, iResearch Library (ChemNavigator)50 for database screening, DEREK, and METEOR49 for toxicity and metabolism predictions, SYBYL47 for molecular interaction fields calculations and InsightII45 for molecular dynamics calculations.
ACKNOWLEDGMENTS We acknowledgement financial assistance from Fundac¸a˜o Carlos Chagas Filho de Amparo a` Pesquisa do Estado do Rio de Janeiro (Faperj), Fundac¸a˜ o de Amparo a` Pesquisa do Estado de Sa˜o Paulo (FAPESP), and Conselho Nacional de DOI 10.1002/jps
CURRENT TOPICS IN COMPUTER-AIDED MOLECULAR DESIGN
Desenvolvimento Cientı´fico e Tecnolo´ gico (CNPq) (Brasil).
REFERENCES 1. Taft CA, Silva CHTP, editors. 2007. Current methods in medicinal chemistry and biological physics, vol 1. Kerala: Research Signpost. 2. Taft CA, editor. 2006. Modern biotechnology in medicinal chemistry and industry. Kerala: Research Signpost. 3. Taft CA, Silva CHTP. 2006. Invited international review: Cancer and aids, new trends in drug design and chemotherapy. Curr Comput-Aided Drug Des 2:307. 4. Silva CHTP, Sanches SM, Taft CA. 2004. A molecular modeling and QSAR study of suppressors of the growth of Trypanosoma cruzi epimastigotes. J Mol Graph Model 23:89. 5. Silva CHTP, Del Ponte G, Neto AF, Taft CA. 2005. Rational design of novel diketoacid-containing ferrocene inhibitors of HIV » 1 integrase. Bioorg Chem 33:4. 6. Silva CHTP, Taft CA. 2006. ADMET properties, database screening, molecular dynamics, density functional, and docking studies of novel potential anti-cancer compounds. J Biomol Struct Dyn 24:236. 7. Silva CHTP, Carvalho I, Taft CA. 2006. Molecular dynamics, docking, density functional, and ADMET studies of HTV-1 reverse transcriptase inhibitors. J Theoret Comput Chem 5:579. 8. Silva CHTP, Carvalho I, Taft CA. 2007. Virtual screening, molecular interaction field, molecular dynamics, docking, density functional, and ADMET properties of novel AChE inhibitors in Alzheimer’s disease. J Biomol Struct Dyn 24:515. 9. Stewart KD, Shiroda M, James CA. 2006. Drug Guru: A computer software program for drug design using medicinal chemistry rules. Bioorg Med Chem 14:7011. 10. Langer T, Wolber G. 2004. Pharmacophore definition and 3D searches. Drug Discov Today Technol 1:203. 11. Fontaine F, Pastor M, Zamora I, Sanz F. 2005. Anchor-GRIND: Filling tile Gap between Standard 3D QSAR and the GRid-INdependent Descriptors. J Med Chem 48:2867. 12. Powers R, Copeland JC, Germer K, Merceier KA, Ramanathan A. 2006. Comparison of protein active site structures for functional annotation of proteins and drug design. PROTEINS Struct Funct Bioinform 65:124. 13. Sousa SF, Fernandes PA, Ramos MJ. 2006. Protein-ligand docking: current status and future challenges. PROTEINS Struct Funct Bioinform 65:115. DOI 10.1002/jps
1097
14. Steindl TM, Crump CE, Hayden FG, Langer T. 2005. Pharmacophore modeling, docking, and Principal Component Analysis Based Clustering: combined computer-assisted approaches to identify new inhibitors of the human rhinovirus coat protein. J Med Chem 48:6250. 15. Dooley AJ, Shhindo S, Taggart B, Park J-G, Pang Y-P. 2006. From genome to drug lead: identification of a small-molecule inhibitor of the SARS virus. Boiorg Med Chem Lett 16:830. 16. Sild S, Maran U, Lamaka A, Marelson M. 2006. Open computing grid for molecular science and engineering. J Chem Inform Model 46:953. 17. Marrero-Ponce Y, Marrero R, Torrens F, Martinez Y, Bernal MG, Zaldivar VR, Castro EA, Abalo RG. 2006. Non-stochastic and stochastic linear indices of the molecular pseudograph’s atom-adjacency matrix; a novel approach for computational in silico screening and ‘‘rational’’ selection of new lead antibacterial agents. J Mol Model 12:255. 18. Griffith R, Luu TTT, Garner J, Keller PA. 2005. Combining structure-based drug design and pharmacophores. J Mol Graph Model 23:439. 19. Yun M-R, Lavery R, Mousseau N, Zakrzskarew K, Derreumaux P. 2006. ARTIST: An activated method in internal coordinate space for sampling protein, energy landscapes. PROTEINS Struct Funct Bioinform 63:967. 20. Cao ZW, Han LY, Zheng CJ, Chen X, Lin HH, Chen YZ. 2005. Computer prediction of drug resistance mutations in proteins. Drug Discov Today Biosilico 10:521. 21. Dahl SG, Sylte I. 2005. Molecular modeling of drug targets: the past, the present and the future. Basis Clin Pharmacol Toxicol 96:151. 22. Gasteiger J. 2006. Chemoinformatics: A new field with a long tradition. Anal Bioanal Chem 384: 57. 23. Carta G, Onnis V, Know AJS, Fayne D, Lloyd DG. 2006. Permuting input for more effective sampling of 3D conformer space. J Comput-Aided Mol Des 20:179. 24. Chang C, Swaan PW. 2006. Computational approaches to modeling drug transporters. Eur J Pharm Sci 27:511. 25. Douguet D, Munier-Lehmann H, Labesse G, Pochett S. 2005. LEA3D: A computer-aided ligand design for structure-based drug design. J Med Chem 48:2457. 26. Imai YN, Inouse Y, Yamamoto Y. 2007. Propensities of Polar and Aromatic Amino Acids in Noncanonical Interactions: Nonbonded Contacts Analysis of Protein-Ligand Complexes in Crystal Structures. J Med Chem 50:1189. 27. Murray CW, Callaghan O, Chssari G, Cleasby A, Congreve M, Frederickson M, Hartshorn MJ, McMenamin R, Patel S, Wallis N. 2007. Application of fragment screening by X-ray crystallography to b-Secretase. J Med Chem 50:1116. JOURNAL OF PHARMACEUTICAL SCIENCES, VOL. 97, NO. 3, MARCH 2008
1098
TAFT, DA SILVA, AND DA SILVA
28. Yap CW, Xue Y, Li H, Li ZR, Ung CY, Han LY, Zheng CJ, Cao ZW, Chen YZ. 2006. Prediction of compounds with specific pharmacodynamic, phannacokinetic or toxicological property by statistical learning methods. Mini Rev Med Chem 6: 449. 29. Han LY, Zheng CJ, Xie B, Jia J, Ma XH, Zhu F, Lin HH, Chen X, Chen YZ, 2007. Support vector machines approach for predicting druggable proteins: recent progress in its exploration and investigation of its usefulness. Drug Discov Today 12:304. 30. Shepphird JK, Clark RD. 2006. A marriage made in torsional space: Using GALAHAD models to drive pharmacophore multiplet searches. J Comput Aided Mol Des 20:763. 31. Alonso H, Bliznyuk AA, Gready JE. 2006. Combining docking and molecular dynamic simulations in drug design. Med Res Rev 26:531. 32. Lengauer T, Lemmen C, Rarey M, Zimmermann M. 2004. Novel technologies for virtual screening. Drug Discov Today 9:27. 33. O’Brien SE, Groot MJ. 2005. Greater than the sum of its parts: Combining models for useful ADMET prediction. J Med Chem 48:1287. 34. Klebe G. 2006. Virtual ligand screening: Strategies, perspectives and limitations. Drug Discov Today 11:580. ˆ rfi L, Na´ ray-Szabo´ G, 35. Ko¨ vesdi I, Rodriguez MFD, O Varro´ A, Papp JG, Ma´ tyus P. 1999. Application of neural networks in structure-activity relationships. Med Res Rev 19:249. 36. Li H, Yap CW, Xue Y, Li ZR, Ung CY, Han LY, Chen YZ. 2006. Statistical learning approach for predicting specific pharmacodynamic, pharmacokinetic, or toxicological properties of pharmaceutical agents. Drug Dev Res 66:245. 37. van de Waterbeemd H, Lennernas H, Artursson P. editors. 2003. Drug bioavailability—Estimation of solubility, permeability, absorption and bioavailability. Weinheim: WILEY-VCH. 38. McMartin C, Bohacek RS. 1997. QXP: Powerful, rapid computer algorithms for structure-based drug design. J Comput Aided Mol Des 11:333–344. 39. Abagyan R, Totrov M, Kuznetzov D. 1994. ICM—A new method for protein modeling and design: Applications to docking and structure prediction from the distorted native conformation. J Comput Chem 15:488–506. 40. Murray CW, Baxter CA, Frenkel AD. 1999. The sensitivity of the results of molecular docking to
JOURNAL OF PHARMACEUTICAL SCIENCES, VOL. 97, NO. 3, MARCH 2008
41.
42.
43.
44.
45. 46.
47. 48.
49. 50. 51.
induced fit effects: Application to thrombin, thermolysin and neuraminidase. J Comput Aided Mol Des 13:547–562. Rarey M, Kramer B, Lengauer T, Klebe G. 1996. A fast flexible docking method using an incremental construction algorithm. J Mol Biol 261:470–489. Ewing TJA, Kuntz ID. 1997. Critical evaluation of search algorithms for automated molecular docking and database screening. J Comput Chem 18:1175. Jones G, Willett P, Glen RC, Leach AR, Taylor R. 1995. Molecular recognition of receptor sites using a genetic algorithm with a description of desolvation. J Mol Biol 267:727. Morris GM, Goodsell DS, Halliday RS, Huey R, Hart WE, Belew Rk, olson AJ. 1998. Automates docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J Comput Chem 19:1639. Insight II. 2005. Catalyst. San Diego, CA: Accelrys. Verdonk ML, Cole JC, Hartshorn MJ, Mulrray CW, Taylor RD. 2003. Improved protein-ligand docking using GOLD. Proteins Struct Funct Genet 52:609. Tripos Inc. 2006. Sybyl user guide, version 7.3. San Diego, CA: Tripos, Inc. Frisch MJ, Trucks GW, Schlegel HB, Scuseria GE, Robb MA, Cheeseman JR, Montgomery JA, Jr., Vreven T, Kudin KN, Burant JC, Millam JM, Iyengar SS, Tomasi J, Barone V, Mennucci B, Cossi M, Scalmani G, Rega N, Petersson GA, Nakatsuji H, Hada M, Ehara M, Toyota K, Fukuda R, Hasegawa J, Ishida M, Nakajima T, Honda Y, Kitao O, Nakai H, Klene M, Li X, Knox JE, Hratchian HP, Cross JB, Adamo C, Jaramillo J, Gomperts R, Stratmann RE, Yazyev O, Austin AJ, Cammi R, Pomelli C, Ochterski JW, Ayala PY, Morokuma K, Voth GA, Salvador P, Dannenberg JJ, Zakrzewski VG, Dapprich S, Daniels AD, Strain MC, Farkas O, Malick DK, Rabuck AD, Raghavachari K, Foresman JB, Ortiz JV, Cui Q, Baboul AG, Clifford S, Cioslowski J, Stefanov BB, Liu G, Liashenko A, Piskorz P, Komaromi I, Martin RL, Fox DJ, Keith T, Al-Laham MA, Peng CY, Nanayakkara A, Challacombe M, Gill PMW, Johnson B, Chen W, Wong MW, Gonzalez C, Pople JA. 2003. Gaussian 03, revision A.1. Pittsburgh PA: Gaussian, Inc. DEREK and METEOR 8.0. 2004. Leeds, UK: LHASA Limited. Chemnavigator, Inc; San Diego, CA. 2007. GOLD, Cambridge Crystallographic Data Centre (CCDC). Cambridge, UK.
DOI 10.1002/jps