Computer applications in the biomolecular sciences. Part 1: molecular modelling

Computer applications in the biomolecular sciences. Part 1: molecular modelling

ELSEVIER Computer Biochemical Education Biochemical Education 26 (1998) 103-110 applications in the biomolecular sciences. Part 1: molecular modell...

831KB Sizes 4 Downloads 69 Views

ELSEVIER

Computer

Biochemical Education Biochemical Education 26 (1998) 103-110

applications in the biomolecular sciences. Part 1: molecular modelling Clare E. Sansom”, Christopher A. Smithb “Department of Cystallography, Birkheck College. Universi& of London, London WCIE 7HX, UK hDepartment of Biological Sciences, Manchester Metropolitan University, Manchester MI SGD, UK

Abstract This article describes the basic tenets of molecular modelling, a computer-based means of visualizing and investigating the structures and properties of molecules. Its emphasis is on the applications of molecular modelling to the study of biological molecules and its uses in teaching students in the life sciences. 0 1998 IUBMB. Published by Elsevier Science Ltd. All rights reserved

1. Introduction Computers have infiltrated all aspects of educational life, witness the computer-aided learning pages in this journal and, for example, the “Computer Corner” in Trends in Biochemical Sciences. This article on molecular modelling is generally based on our experiences and the material used in teaching and assessing students on the BSc (Hons) in Biological Sciences at the MMU, the BSc (Hons) in Biochemistry and Molecular Biology at the University of Leeds and on the Advanced Certificate in Principles of Protein Structure Using the Internet approved by Birkbeck College of the University of London. Molecular modelling is a computer-based means of representing, visualizing and investigating the threedimensional structures and related properties of molecules. Modern biochemical texts are resplendent with marvellous computer-generated pictures representing biological molecules, while most journals in the biological sciences manage to have a computer graphic on their front covers. Our experiences suggest that many lecturers and students in the biological sciences have little idea of how images of molecules are generated and how their structures and properties may be investigated in s&co (a point recently emphasized by the editor of this journal [l]). Furthermore, these disciplines are relatively poorly supported by texts. None of the large biochemistry books (see for example refs [2-51) explains the basics of bioinformatics or molecular modelling, and those texts that are available are rather detailed requiring some relatively specialized knowledge of chemistry and/or mathematics; and are aimed at the 0307-4412/98/$19.00 + 0.00 0 1998 IUBMB. Published PII: SO307-4412(97)00155-6

by Elsevier

Science

research-level worker (for example refs [6-131). The aim of this article is to provide an easily accessible introduction to the basics of molecular modelling. In a subsequent article we will deal with some relatively restricted aspects of bioinformatics and genome projects. The term bioinformatics means “the application of information technology to the biological sciences”. It is most often used to mean the storage and analysis of one-dimensional biological data, typically sequences (primary structures) of peptides, proteins and nucleic acids. The sheer volume of such data coming out of initiatives, such as the Human Genome Project, is responsible for the growth in importance of bioinformatics in recent years [14-171. Thus, bioinformatics and molecular modelling are complementary and interrelated disciplines.

2. Molecular modelling Using computers it is possible to simulate scientifically meaningful pictures of molecules, to study their physical and chemical properties, for example shape, size and charge; to simulate the dynamic behaviour of atoms and molecules, such as their vibrational, twisting and rotational movements; to explore their interactions with other molecules; to design rationally molecules of biological and clinical interest; and, perhaps most importantly, to greatly improve scientific communication and the teaching of all aspects of biomolecular sciences. It is generally accepted that human brains are designed to receive visual information [18,19], hence Ltd. All rights reserved

104

C. E. Sansom, C. A. SmithiBiochemical

molecular modelling has many advantages over the traditional approaches to examining the structures of biomolecules. Real physical models of molecules often lack visual appeal and are unwieldy and fragile, especially when the model is large. In comparison, computer-generated molecules are generally easy to build (although of course this depends to an extent on the molecule in question, the software available and what is known of the structure of the molecule) and are attractive and robust. Molecules can be “built” using an input of atomic coordinates imported from databases, or by using chemical templates stored in the modelling program, or by sketching a two-dimensional image directly using the VDU, or, indeed, a combination of all three. Further, once built, images on a VDU do not have a tendency to “denature” in one’s hands! On the negative side, the programs may run slowly, particularly with older types of personal computers. Molecular modelling may be arbitrarily divided into molecular graphics and computational chemistry [20]. The former is the visualization of chemical structures and molecular properties, although the definition is often extended to include simple manipulations, such as modifying the torsion angles of chemical bonds and basic geometric calculations, for example, estimating inter atomic distances. Computational chemistry involves attempting to calculate numerical properties of molecules, the most common in the molecular/ biochemical sciences being molecular energies. The primary outputs of such calculations are large amounts of numerical data. These are usually analysed using molecular graphics programs, and so the definition between molecular graphics and computational chemistry is becoming increasingly blurred.

3. Molecular

graphics

Molecular graphics offers a number of ways of viewing molecules, which can be exploited to advantage when examining or investigating them. Depth shading and smooth, real-time movement of the structures give realistic three-dimensional images. Models can be moved by rotation or translation. It is also possible to zoom in/out on particular regions of the structure and to clip sections through the model to gain clear views of internal features. They can also be represented in one of a number of forms, e.g. stick, ball and stick, dot surface or space filled, either uniformly or in combinations to highlight specific features. “Realistic” three-dimensional structures and stereo images may be simulated (Fig. 1). Attention can be drawn to individual portions of the model by thickening lines, changing the density or colour of the dots or putting a “ribbon” through the feature of interest (Fig. 1). Molecules may be compared structurally by overlaying the models using a least-square

Education

26 (1998)

103-110

analysis of the atomic coordinates; while the rotation of bonds allows the conformational space available to the molecule to be investigated. Molecular graphics may be used to display the vast amounts of numerical data available from computational chemical exercises such as ab in& studies, or simulations such as those involved in molecular mechanics, molecular dynamics and “Boltzmann jumps” (Monte-Carlo simulations) and allow a rapid analysis using appropriate graphical representations. It is an axiom of biology that function follows structure. This is as true of biomolecules as any other biological entity [21,22], hence the great interest in the structures of biological molecules. Biochemists who determine the structures of large biomolecules tend to spend a great deal of time and energy using X-ray crystallography or nuclear magnetic resonance spectroscopy, and consequently techniques such as computational chemistry, which bypass these efforts are clearly very attractive.

4. Computational

chemistry

In theory, quantum mechanical calculations should provide a complete description of the energy of any particular conformation (shape) of a molecule by solving the Schrodinger equation [6-131. This is regarded as fundamental and the approach does not require the input of any experimental data. However, in practice, this so-called ab initio approach is only possible for very small molecules and many biomolecules consist of extremely large numbers of atoms by any definition of the word “large”. It is possible, however, to extend this approach. In the semi-empirical technique, the input of some experimentally derived parameters allows approximate solutions to the Schrodinger equation to be calculated, but the technique is still limited to relatively small molecular structures. In contrast, if a classical mechanical approach is adopted, then simulations of large molecular structures, their dynamics and properties becomes possible [6-131. In general, this type of simulation is possible when a simplified description of the molecular structure is available as a basis for calculations. In computational chemical simulations, the simplified description is a calculated potential energy surface which represents the molecule of interest. This energy is a (complex) function of the atomic coordinates of the molecule. Three major methods for conformational analysis are available to simulate molecular structures: molecular mechanics [6-13, 23-251, molecular dynamics [6-13, 26-281 and Monte-Carlo methods [6-13, 291. Unfortunately, no method is sufficiently well developed to fold a macromolecule into its biologically active structure [30]. Molecular mechanics and dynamic simulations

C. E. Sansom, C. A. Smith/Biochemical

are of use only if a molecule of very similar structure is already known which can act as a starting point for the simulation. If none is available, then a technique like homology modelling (not covered here), will have to be used, assuming the three-dimensional structure of a homolgue is known.

5. Molecular

mechanics

In molecular mechanics (MM) [6-13,23-25,281, the atoms of the molecule are assumed to be incompressible spheres and the (covalent) bonds to consist of elastic

Education

26 (1998) 103-110

105

springs. The energy, E, of such a system (molecule) can then be described by breaking it into a number of constituent parts:

Thus the total or steric energy, E, is considered to be the addition of a series of individual energies, where E, is the energy of the bond(s) on being stretched from their “ideal” values; E,, the energy of bending bond angles away from their ideal values; ET, the energy caused by twisting about a bond; E,,, is the out of plane energy; ENB, the through space or nonbonded energies and E,,,, other energy terms which individuals may wish to

(4

Fig. 1. Shows a variety of ways of representing molecular models. (a) A space fill model of a ferredoxin molecule. The hydrogen atoms have been omitted and the iron-sulphur electron transfer centres are shown in black for clarity. (b) In this representation of the ferredoxin molecule, the polypeptide backbone is shown as a ribbon to highlight its secondary structure and the iron-sulphur centres as sticks emphasizing their regular structure, while the van der Waals’ radii of the individual iron and sulphur atoms are shown as dotted spheres to indicate the overall shapes of the centres. (c) A molecular model of P-trypsin (multishaded) complexed with the pancreatic trypsin inhibitor (prey). Full colour (not available in Biochemical Education) is necessary to appreciate fully these models. All models were constructed using coordinates obtained from the Brookhaven Protein Database.

C. E. Sansom, C. A. SmithlBiochemical

106

include. Each of these terms may be described by one or more relatively simple expressions. Thus the simplest case for bond stretch energy assumes that Hooke’s Law describes the stretching of the bond: Es = &,<,nd\O.5ks(d- 4)’ where bond good close given

k, is the energy associated with d,,, the optimal length, and d is the deformed length. This gives approximation of the potential energy curve if d is to equilibrium value (Fig. 2). A better description by the Morse function: Es = Ch,,ndskJI - e -ocd-d”)]2

Education

26 (1998)

103-110

especially as d exceeds d,, since at an infinite distance, the energy of the stretched bond must equal the dissociation energy (Fig. 2). This expression requires more computation than Hooke’s Law and therefore longer computer time. Bond angle energy may be described by an expression similar to Hooke’s Law: E, = C;,,,,,,0.5k,(U - Q2. Torsion angle energies require more complex functions, which incorporates cosine values, since 0” and 360” must be equivalent, while 180” and -180” are equivalent positions, but reached by rotating through different directions. The commonest method used to describe the changes in energy associated with changes in the torsion angle 4, is by applying the Fourier series: E, = OSk,( I + cos&) + OSk,( 1 - cos 24) +0.5k,(I

+cos3$)+...

.

Out of plane and other nonbonded energies, for example, van der Waals, salt links and hydrogen bonds are extremely important in determining the conformations of biologically active macromolecules. They may be described by various expressions. The out of plane energies are described by E o
(a)

Bond

length

where P is the distance out of plane. This expression is necessary to stop, for example, sp’ carbon atoms distorting out of plane, and is often essential in simulations involving peptides and proteins. When two atoms approach one another there is a mutual attraction due to London dispersion forces, but a net van der Waals’ repulsion as they get too close, hence at one particular distance, the van der Waals’ distance, the attraction is maximal (Fig. 3). This situation is most simply described by the Lennard-Jones potential: F = A/r” -B/r’, where the attraction between atoms A and B is described by the rph and the repulsion by c’~. Electrostatic interaction are covered by the familiar function: F = Xq,q,lw’ where q, and q2 are two ions separated by a distance of r in a medium of dielectric constant E. Some force fields (see below) include hydrogen bonds within this general electrostatic function, others have a separate hydrogen bonding descriptor.

(b)

Bond

length

Fig. 2. Shows graphically the relationships between bond energy and bond length for (a) Hooke’s law and (b) a Morse function. Note that for small deviations from the optimum bond length (that is the minimum ener,ev -_ state).I the two nraohs L . mimic each other.

6. Minimization,

force fields and algorithms

The combination of specific functions ated individual parameters is known

and their associas a force field

C. E. Sansom, C. A. SmithlBiochrmical Education 26 (1998) 103-110

Interatomic

distance

Fig. 3. Shows graphically a Leonard-Jones description of van der Waals’ interactions. Note that the energy is minimal at an optimum distance.

[6-13,23-251. The force field listed above is a very simple one, and attempts to improve the sophistication and predictive accuracy of force fields have been, and are, extensive. For example, extra cross terms are often added to account for interdependent effects which occur as one function is changed. If a bond length is changed, then this affects the energies of adjacent bonds. Alternatively, in Urey-Bradley force fields, “natural” distances are assigned to distances between two atoms linked by a common bond. The parameters included in a particular force field are derived from quite specific atomic interactions. Thus, bond stretch parameters for an aromatic C-C bond are not applicable to C to C bonds in ethane or ethene, or the C to 0 bonds in acidic amino acid side chains. Strictly, a force field should only be used with the compound for which the parameters where determined. However, it is of course not feasible to measure parameters for every chemical environment in which an atom type is found and it is usual to represent each element by a restricted number of atom types, each type having a specific number of parameters. A typical force field may have 5-10 potential carbon types, but fewer types of hydrogens, etc. The exact number is always a compromise. Too few and the field is poor as a predictor, too many and the parameters may not be easily determined and computational time to solve the field will increase markedly. It is important to realize that the restricted set of potential

107

types within a force fields means that the fields can only be applied to restricted sets of molecules. Some widely used force fields are MM + (general use with hydrocarbons), AMBER, CHARMM and OPLS (proteins, peptides, nucleic acids, oligosaccharides). Note the preponderance of force fields for treating proteins; proteins are most accurately modelled of all macromolecules by current force fields [6-13,23-25,28, 31-351. Force fields are solved by one or a combination of algorithms which optimize the structure by reducing the energy of the molecule in some systematic way until a minimum energy conformer is found. In general terms, all work by calculating the energy of the system, adjusting the coordinates, then re-calculating the energy, followed by more perturbations of the atomic coordinates to gradually lower the energy of the system (i.e. molecule). An important limitation in identifying optimal structures is that because the system cannot jump energy barriers, the determined optimized structure does not have a global minimum energy, but one with a local energy minimum (Fig. 4). Most algorithms require that derivatives of the energy be continuous. Some require only the derivation of the energy (first order minimizers), others the derivative of the gradient of the energy with respect to the conformer structure (second order minimizers). Algorithms in general use include steepest descents (a first-order minimizer), conjugate gradients (first-order), Newton-Raphson and modified Newton-Raphson (second-order minimizers). Minimization allows the energy of a given molecular configuration to be calculated quickly and produce models of molecules rapidly. It also has the advantage of being conceptually easy to understand. A major use of the technique is in removing “bad” van der Waals’ contacts between atoms in a molecule or between contact points in the interactions of molecules.

B 5

I.8 L G Conformations Fig. 4. A very schematic representation of a minimization profile in one dimension. L represents examples of local energy minima; G is the global energy minimum.

C. E. Sansom, C. A. Smith/Biochemical

108

(4

!oO

I

-I--.-

~1000 1100

1300

1400

1.500

*11-l-__

1600

1700

Education

26 (1998) 103-110

__

1800

l!%?-%kO

TimeIfs

Fig. 5. Energy profile against time for a smallmolecule, in this case 1-cyclohexy-2-methyl-butane, during a short molecular after 1 ps). (b-d) Comparisons of that molecule at the local minima A, B and C (molecular graphics using Rasmol).

dynamics

rhn (starting

C. E. Sansom, C. A. Smith/Biochemical

7. Molecular

dynamics

Molecular dynamics (MD) uses the standard types of molecular mechanics force fields and solves the Newtonian equations of motion in a systematic, iterative manner to simulate the motions of the atoms within a molecule and so predict the conformations the molecule may adopt over a set time period (Fig. 5). It has the great advantage over molecular mechanics of accounting for the thermal energy of atoms/molecules and so may be used to overcome energy barriers between the current and other (lower) energy conformers. Molecular dynamics has three major uses. It allows for conformational analysis, constrained dynamics investigations, and analysis of trajectories. The addition of thermal energy to the model of interest allows the simulation to jump energy barriers and so overcomes one of the major limitations of molecular mechanics. In theory, MD should eventually (that is, after a very long time!) predict the minimum global energy conformation. However, it should be noted that many molecules of biological interest, such as peptides and proteins are metastable, and so perhaps do not occur in their lowest energy state anyway! In practice a great deal of any in silica experiment will explore higher energy conformations and so for all but the smallest of molecules, MD will be too time consuming to yield a full range of low energy conformations. The input of structural data, for example from NMR experiments or the binding of inhibitors or substrate analogues allows for constrained dynamics. These constraints limit the conformations the molecule is allowed to explore and so reduce considerably the computer time involved in the simulation. It also has the considerable advantage of adding some “reality” to the simulation! Analysis of trajectories essentially is a playback of the motions of the atoms within the molecule over time. This illustrates the relative flexibility of different parts of the structure and can give insights into some of its properties, such as intramolecular strains.

8. Monte-Carlo

methods

Monte-Carlo methods offer a stochastic approach, that is, one subject to the laws of probability, to circumvent the inherent disadvantages of MM and MD methods. Essentially a random conformation is assigned to the molecule of interest which is then minimized to remove any poor geometries within the structure. The algorithm then randomly changes all the torsion angles within a small set range and then recalculates the energy of the molecule. If this new conformation has a lower energy than the original it is accepted and used as a basis for further conformational analysis. However, if it has a higher energy then it must be accepted or dismissed on some rational basis. The decision is made by calculating

Education

26 (1998) 103-110

109

the Boltzmann factor (see Appendix A) and comparing this value to a previously decided random number between 0 and 1. If the Boltzmann factor is less than the random number the new conformation is rejected, if it is greater then the new structure is accepted as the basis for further structural explorations. These methods will eventually determine a minimum global energy within a reasonable computer time, because only low energy conformations are explored and energy barriers are overcome by the random nature of the search patterns.

9. Concluding

comments

The visualization and investigation of molecules using computers and advanced graphic systems has revolutionized our perception of biomolecular structures and functions, particularly over the last 10 or so years. The use of computers to teach relevant portions of biomolecular sciences has been advocated by the authors and others [25,36-44]. We feel the teaching of basic aspects of molecular modelling is educationally essential to life sciences students, and can be applied fruitfully to underpin, illustrate, explain and expand many of the general features of the molecular and cellular life sciences. Furthermore, the use of molecular modelling offers educational advantages, since students appreciate the immense visual appeal of current computer graphics. These images are particularly useful in attracting students whose main areas of interest are in the biological rather than the chemical sciences (see Appendix B). Biochemistry students are moving increasingly from the hard (chemistry) towards the soft (biology) end of the molecular spectrum, while students of biological sciences, an increasingly popular degree subject in the UK, tend generally to have a greater interest in the biological, as opposed to chemical, topics. Even biochemical students often perceive the chemical side as “hard” [l]. This article has been an attempt to summarize some of the essential underlying theory of molecular modelling for general biology readers. Naturally the authors would be grateful for any criticisms, comments or advice that readers can offer on the topic and its teaching.

References

[II PI

E. J. Wood, Biochem. Educ. 24 (1996) 68-69. C.K. Matthews, K.E. van Holde, Biochemistry, Benjamin/ Cummings, CA, 1990. [31 G. Zubay, Biochemistry, 3rd ed., W.C. Brown, IA, 1993. 4th ed., W.H. Freeman, New York, 141 L. Stryer, Biochemistry, 1995. 2nd ed., Wiley, New York, 151 D. Voet, J.G. Voet, Biochemistry, 1995. Chemistry, Wiley, New PI T. Clark, A Handbook of Computational York, 1985.

110

I71 D.M. Hirst, A Computational

C. E. Sansom, C. A. SmithlBiochemical

Approach to Chemistry, Blackwell, Oxford, 1990. (Ed.) Computer Modelling in Molecular (81 J.M. Goodfellow, Biology, VCH, Weinheim, Germany, 1994. [91 S. Fraga, J.M.R. Parker, J.M. Pocock, Computer Simulations of Protein Structures and Interactions, Springer, Berlin, 1995. Chemistry, OUP, PO1 G.H. Grant, W.G. Richards, Computational Oxford, 1995. [Ill A. Hinchliffe, Modelling Molecular Structures, Wiley, Chichester, 1996. WI A.R. Leach, Molecular Modelling: Principles and Applications, Addison-Wesley Longman, Harlow, 1996. [I31 H.J. Holtje, G. Folkers, Molecular Modeling: Basic Principles and Applications, VCH, Weinheim, Germany, 1996. 1141 R. Fuchs, P. Rice, P. and G.N. Cameron, TIBTECH 10 (1992) 61-63. 1151 S. J.M. Jones, Curr. Opin. Gen. Dev. 5 (1995) 349-353. WI J. Kuzio, TIG 12 (1996) 321-323. u71 C. Sansom, The Biochemist 18 (1996) 32-33. [W Z. Semir, Sci. Am. 267(3) (1992) 42-50. [I91 A. J. Olsen and D.S. Goodsell, Sci. Am. 267(5) (1992) 44-51. PO1 J. Petts, Lab. Prac. 40(7) (1991) 9-13. Introduction to Ecological Biochemistry, WI J.B. Harborne, Academic Press, London, 1993. PI D.S. Goodsell, Our Molecular Nature, Springer, New York, 1996. 1231 J. Petts, Lab. Prac. 40(B) (1991) 21-25. [241 P. J. Cox, J. Chem. Educ. 59 (1992) 275-277. [251 D.B. Boyd and K.B. Lipkowitz, J. Chem. Educ. 59 (1992) 269-274. Sci. Am. 254(4) (1986) WI M. Karplus, and J.A. McCammon, 30-39. Curr. Opin. Struct. Biol. 3 (1993) v71 W.F. Van Gunsteren, 272-281. Opin. Struct. Biol. 5 (1995) 205-210. P81 T.A. Helgren,Curr. v91 J. Skolnik and A. Kolinski, Ann. Rev. Phys. Chem. 40 (1989) 207-235. [301 F.M. Richards,Sci. Am. 264( 1) (1991) 34-41. B.D. Olafson, D.J. States, S. [311 B.R. Brooks, R.E. Broccoleri, Swaminathan and M. Karplus, J. Comput. Chem. 4 (1983) 187-217. D.A. Case, J.W. Caldwell, W.S. Ross, T.E. ~321 D.A. Pearlman, Cheatham, S. Debolt, R. Ferguson, G. Seibel and P. Kollman, Comp. Phys. Comm. 91 (1995) 1-41. [331 J. Moult, Curr. Opin. Strut. Biol. 7 (1997) 194-199. I341 S. Vajda, M. Sippl and J. Novotny, Curr. Opin. Struct. Biol. 7 (1997) 222-228. and W.C. Still, J. Org. Chem. 62 (1997) [351 H. Senderowitz 1427-1438. [361 R.R. Sauers, J. Chem. Educ. 10 (1991) 816-818. [371 H. Dugas, J. Chem. Educ. 7 (1992) 533-535.

Education

26 (1998)

103-110

(381 C.A. Smith, A.H. Fielding, G.A. Nicholas and D.A.R. Williams, Binary 4 (1992) 156-161. [39] S.C. Harvey and R.K-Z. Tan, Biophys. J. 63 (1992) 1683-1688. [40] E. J. Milner-White, Binary 4 (1992) 9-10. [41] C.J. Cramer, Biochem. Educ. 22 (1994) 140-143. (421 D.C. Richardson and J.S. Richardson, TIBS 19 (1994) 135-138. [43] C.E. Sansom, D.A. Wailer and A. J. Geddes, Biochem. Educ. 24 (1996) 32-35. 1441 C.A. Smith, A.H. Fielding, M.O. McCormick, J. Murrey and C.E. Sansom, Biochem. Sot. Trans. 24 (1996) 123s.

Appendix A: The Boltzmann

Constant

The thermal energy of individual particles at a temperature, T is of the order kT, where k is the Boltzmann constant, 1.38054 x lo-” J Km’. This allows the approximate value of the kinetic energy of a particle to be calculated. In a true chemical system, containing large numbers of molecules, the distribution of energy is described by the Boltzmann Distribution Law. If there are N atoms or molecules in the system which are distributed among energy levels 0, 1, etc., such that N,, occupy level 0, N, level 1, etc. Then, N,/No = eeAElk7 where for example BE = E, -E,, sented by 0 and 1.

Appendix

the energy levels repre-

B: Things to do with students

Create attractive graphics to illustrate projects and dissertations. Explore molecular structures with graphics routines. Predict secondary structures of peptides using MM and MD programs. “Dock” drugs and inhibitor molecules into the active sites of enzymes and into receptors. Explore the structural effects of point mutations in protein molecules. Compare the theoretical minimum energy of molecules with their crystal structures. Calculate and display molecular surfaces and electrostatic potentials.