Normal mode analysis for proteins

Normal mode analysis for proteins

Journal of Molecular Structure: THEOCHEM 898 (2009) 42–48 Contents lists available at ScienceDirect Journal of Molecular Structure: THEOCHEM journal...

1MB Sizes 0 Downloads 117 Views

Journal of Molecular Structure: THEOCHEM 898 (2009) 42–48

Contents lists available at ScienceDirect

Journal of Molecular Structure: THEOCHEM journal homepage: www.elsevier.com/locate/theochem

Normal mode analysis for proteins Lars Skjaerven a,b, Siv M. Hollup c, Nathalie Reuter a,* a b c

Computational Biology Unit, Bergen Center for Computational Science, University of Bergen, Thormohlensgt.55, N-5008 Bergen, Norway Department of Biomedicine, University of Bergen, Jonas Lies vei 91, N-5009 Bergen, Norway Department of Informatics, University of Bergen, PB. 7803, N-5020 Bergen, Norway

a r t i c l e

i n f o

Article history: Received 1 May 2008 Received in revised form 27 August 2008 Accepted 15 September 2008 Available online 25 September 2008 Keywords: Normal modes analysis Slow dynamics Proteins Coarse-grained models

a b s t r a c t Proteins are dynamical objects; their structure fluctuations are often the key to their function and keen attention need therefore be paid to it. Molecular dynamics (MD) simulations are a widely used technique to study dynamics of both proteins and nucleic acids. However, simulating molecular machines containing several thousands of amino acids, on long enough time scales to observe relevant structural deformation, remains challenging. Normal mode analysis (NMA) is better suited to study the slow dynamics of proteins. We briefly describe the theory underlying NMA and the simplifications used to render it tractable for (large) proteins. We also describe different kinds of analyses that can be performed on the eigenvectors to characterize the dynamical properties of the system. A number of validation studies are then summarized. Finally we describe NMA servers available on the Internet. Ó 2008 Elsevier B.V. All rights reserved.

1. Introduction Proteins are dynamical objects; their structure fluctuations are often the key to their function and keen attention need therefore be paid to it. The importance of NMR and X-ray diffraction methods in that respect cannot be overstated as they provide essential information about structure. Yet, their value as information sources on dynamical processes has so far been limited. In contrast, molecular modelling methods are very well suited for providing both structural and dynamical information, and thus nicely complement experimental techniques. The procedures are often easier to establish and faster to perform than experiments. Moreover they provide a finer resolution in space and time, which is necessary to understand and describe protein dynamics. The fluctuations of protein structures occur at very different time scales and have different amplitudes. Local motions happen on relatively short time scales (0.01–5 Å, 1015 to 101 s) while rigid body motions such as helix, domain or subunit displacements involve larger amplitudes (1–10 Å) and require significantly longer time scales (109 to 1 s). Events such as folding or helix coil transitions involve even larger displacements (>5 Å) and can take up to a few hours [1]. Molecular dynamics (MD) simulation are a widely used technique and has, since the first MD simulation of a protein 30 years ago, been used increasingly to study dynamics of both proteins and nucleic acids [2]. In 1977 the first MD simulation was reported. It was the simulation of the bovine pancreatic trypsin inhibitor * Corresponding author. Tel.: +47 555 84040; fax: +47 555 84295. E-mail addresses: [email protected] (L. Skjaerven), [email protected] (S.M. Hollup), [email protected] (N. Reuter). 0166-1280/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.theochem.2008.09.024

(BPTI), a 58-amino acids protein, and covered a time-frame of 9.2 ps [3]. Since then the increase in computer power has enabled the simulation of systems of increasing size, such as membrane proteins [4,5], and increasing simulation times. In 1998, Duan and Kollman reported a 1 ls simulation (villin headpiece subdomain, 36 aa) [6] performed on 256 cores of a Cray T3E supercomputer. More recently, Freddolino et al. simulated a protein of similar size (Pin1 WW domain, 35 aa), with slightly more explicit water molecules, for 10 ls [7]. This was done using a modified version of NAMD [8], and running on 329 cores of a Linux cluster. Yet, simulating molecular machines containing several thousands of amino acids, long enough to observe relevant structural deformation remains challenging. Normal mode analysis (NMA) is better suited to study large structural rearrangements than MD and much less demanding in terms of CPU time (although more expensive in terms of memory). NMA of proteins is based on the hypothesis that the vibrational normal modes exhibiting the lowest frequencies (also named soft modes) describe the largest movements in a protein and are the ones that are functionally relevant. Several tools have been developed [9–19] and successfully applied to predict the collective, large amplitude motions of proteins and protein assemblies, from small proteins to molecular machines: e.g. lysozyme [20–23], HIV1-protease [24], aspartate transcarbamylase [25], myosin [26], integrins [27], Ca-ATPase [15,28,29], F1-ATPase [30,31], chaperonin GroEL [32], virus capsids [33–35], ribosome [36]. In recent years, the use of normal mode analysis for the study of the slow dynamics of biomolecules has become increasingly popular. The low computer cost of most of the simplified approaches have certainly contributed to the development of web servers performing normal

L. Skjaerven et al. / Journal of Molecular Structure: THEOCHEM 898 (2009) 42–48

modes calculations and analyses starting from a simple structure file submitted by the user through a web browser. We first briefly describe the theory underlying simplified NMA. The second section focuses on the methods used to analyze the calculated normal modes and what information can be obtained from it, while the third section describes a couple of examples of the application of NMA. Section 4 focuses on validation studies. Finally, we describe NMA servers available on the Internet. 2. Normal modes calculation Normal mode analysis (NMA) is a technique to investigate the vibrational motion of a harmonic oscillating system in the immediate vicinity of its equilibrium [37]. The motions studied are of small amplitude in a potential well and they cannot cross energy barriers. A system is defined to be in equilibrium, or in the bottom of the well, when the generalized forces acting on the system are equal to zero. At this minimum q0, the potential energy can be expanded in a Taylor series, yielding a quadratic approximation V, to the potential energy E, with respect to the generalized coordinates qi:

1 @2V V¼ 2 @qi @qj

!

1 2

gi gj ¼ V ij gi gj 0

ð1Þ

where g is the deviation from the equilibrium (qi = q0i+gi). Similarly the kinetic energy, T, is also approximated as a quadratic function. The Lagrangian is given by L = TV, which leads to the n linear differential equations of motion:

€ i þ V ij gj ¼ 0 Tig

ð2Þ

By assuming an oscillatory solution, gi ¼ aik cosðxk t þ dk Þ and substituting it in Eq. (2), one obtains an eigenvalue problem:

AT VA ¼ k

ð3Þ

where A is the matrix of the amplitudes aik , and V is the matrix of the second derivatives of the potential energy and is referred to as the Hessian. k is a diagonal matrix, and ATA = I. The pattern of motions is fully specified by the vibrational normal modes, i.e. the eigenvectors (Ak) and their associated eigenvalues (kk). The normal mode vectors describe in which direction each particle moves, and how far it moves relative to the other particles. Hence, it does not give an absolute amount of displacement for each particle. All particles in each normal mode vibrate with the same frequency. Normal mode analysis has been used extensively in chemistry. Already in the 1950s the NM theory was applied to small molecules, and its predictive power shown by reproducing vibrational spectra [38–40]. NMA was first applied to macromolecules such as proteins in the beginning of the 1980s. The first studies at the atomic level of detail relied on existing semi-empirical potential functions used for molecular dynamics (MD) simulations, which commonly have the following form:

1 X 1 X Ep ¼ K b ðb  b0 Þ2 þ K h ðh  h0 Þ2 2 bonds 2 angles 1 X K / ½1 þ cosðn/  dÞ 2 torsions  X A B q1 q2  þ þ r12 r 6 Dr nonbonded

þ

ð4Þ

In this expression the first three terms describe the internal degrees of freedom and the last term describes the van der Waals and electrostatic interactions. Go [11] and Levitt [20] studied the trypsin inhibitor (BPTI) using only torsion angles as dynamic variables. Bond lengths and bond angles were fixed

43

since they were not expected to affect the delocalized, low frequency vibrational modes. This reduction of the degrees of freedom has the advantage of reducing the size of the Hessian matrix that holds the second derivatives of the potential energy. In 1983 Brooks and Karplus investigated the same protein [10] with the Charmm molecular mechanics force field [41], thus including all degrees of freedom. Because of their size, proteins require special methods to build and diagonalize the Hessian. The limiting phase is the diagonalization of this 3N  3 N matrix, where N is the number of atoms. Several algorithms have been developed to overcome this limitation. Marques and Sanejouand [16] used an implementation of the Block Lanczos algorithm [42] intended for the solution of eigenvalue problems involving large, sparse, symmetric matrices. This approach converges rapidly towards the exact modes and was used to calculate the normal modes of citrate synthase, a protein consisting of about 400 amino acids [16]. Perahia and Mouawad developed a method called ‘‘Diagonalization In a Mixed Basis” (DIMB) [17,43]. In this method the eigenvectors of sub-blocks of the Hessian are first calculated. Combining them to Cartesian coordinate displacements then iteratively refines these ‘‘guess vectors”. This approach takes into account the coupling between low and high frequency vibrations. It has been applied to obtain a few normal modes of some very large proteins, such as the aspartate transcarbamylase (ATCase), a protein consisting of a little less than 3000 amino acids [44]. Sanejouand and co-workers introduced a low-resolution method [45,46] based on partitioning the protein chain into rigid blocks of one or more residues, having only six rotation-translation degrees of freedom. This is the so-called rotations-translations of blocks (RTB) method. Normal modes calculated on the model yield a set of low-resolution normal modes, which are linear combinations of the rotations and translations of the blocks. They can then be perturbed by modes calculated in each block; this refinement procedure gives the exact normal modes. Yet another interesting approach is to model the molecule as an elastic network model (ENM); where atoms, or nodes, are interconnected with springs. In this approach, the detailed empirical potential is replaced by a simple Hookean potential between all nodes a and b closer than a certain cut off distance. Tirion was the first to show that such a potential reproduced the slow dynamics in proteins [19]. The potential between atoms a and b is defined by,

Eðr a ; rb Þ ¼

C ðjr a;b j  jr 0a;b jÞ2 2

ð5Þ

where C is a spring constant assumed to be the same for all interacting pairs. ra;b and r 0a;b are the instantaneous and equilibrium distance, respectively, between atoms a and b. The total potential energy within a molecule is then given by

Ep ¼

X

Eðr a ; r b Þ

ð6Þ

ða;bÞ

The sum is restricted to pairs of atoms separated by a distance shorter than a given cutoff. An important consequence of this potential is that the harmonic model is directly constructed around a given experimental conformation. Thus, the expensive minimization of the potential energy is eliminated. However, only protein configurations that are known to be physically relevant will give meaningful results. Hinsen used the ENM approach to develop a coarse grained method in which only the carbon alpha atoms were considered; one point mass per residue located at the carbon alpha position [12]. The benefit of such a simplified representation is the reduction of the Hessian matrix, allowing the calculation of normal modes of large proteins containing several thousands residues. The harmonic pair potential is similar to what Tirion proposed (Eq. (5)):

44

L. Skjaerven et al. / Journal of Molecular Structure: THEOCHEM 898 (2009) 42–48

Eðr a ; rb Þ ¼

1X kðjr 0a;b jÞðjr a;b j  jr 0a;b jÞ2 2 ði;jÞ

ð7Þ

where the pair force constant k is a function of the atomic pair distance. After having first used a Gaussian function, Hinsen proposed a more sophisticated spring force constant (obtained by fitting to a local minimum of the Amber94 force field [47]) [48]:

( kðrÞ ¼

8:6  105 molkJnm3  r  2:39  105 molkJnm2

r < 0:4 nm

128 kJ nm4 mol r6

r  0:4 nm ð8Þ

The Gaussian Network Model (GNM) [9], also inspired by the EN model, uses an N  N Kirchhoff matrix instead of the 3N  3N force constant (Hessian) matrix. The Kirchhoff matrix describes the inter-residue contact topology. The benefit of this approach is the small size of the Kirchhoff matrix. In addition, the mean-square fluctuations of each Ca atom can be directly evaluated from the diagonal elements of the inverse of the matrix. Similarly, the offdiagonal elements give the cross-correlations between the fluctuations of the Ca atoms directly. The extension of GNM, namely the anisotropic network model (ANM), adds the details of the directions of individual residue fluctuations [49]. ANM is equivalent to the potentials developed by Tirion [19] and Hinsen [12,48] but with a different force constant. 3. Analysis of the modes 3.1. Characterizing and visualizing the displacements associated with the modes Calculations of squared atomic displacements and vector field representation are most often used for characterizing a particular mode. Normalized squared atomic displacements (Di) are calculated as follows:

, Di ¼

n X

2 di

! 2 di

3 Bj 8p2

ð10Þ

where Bj represents the B-factor of atom j. The atomic displacements can also be interpreted as the values of a vector field, defined everywhere in space [25]. The vector field, defined on a regular lattice at the center of each cube, is the massweighted average of the displacements of the atoms in the cube, and is called the displacement field. It provides a graphical representation of the displacement associated to each mode, which is, for example, very useful to identify rigid body displacements of domains. 3.2. Identifying correlated motions It is often desirable to identify potentially correlated movements of distant regions of the protein. This can be done by calculating the covariance matrix and plotting its off-diagonal elements [10,50]:

PN

h DR i DR j i 1

Distance fluctuation maps measure the change in distances between all pairs of atoms, residues or blocks in a mode or a collection of modes [25,51]. They help in identifying the most displaced regions and gives information about the degree of flexibility of the domains constituting the protein. Another way to evaluate the degree of collectivity (j) of a protein motion has been proposed by Bruschweiler [52] and has been used to obtain the number of atoms (residues) which are significantly displaced in a normal mode [51,53]. The expression of j is: N X 1 j ¼ exp  aDR2i :logaDR2i N

! ð12Þ

where DRi is the displacement associated with atom i and a is a PN aDR2i ¼ 1. The value of normalization factor chosen such that j lies in the interval [1/N, N] for a system containing N particles. j will be close to 1/N if the movement involves only a few atoms and will get closer to 1 as the displacements of all the atoms become more equivalent. The deformation energy of a mode, as implemented in MMTK [13,54], is a measure of the collectivity of the movements of this mode. It is a measure of the energy associated with the deformation as a function of the position of each atom. The lower the deformation energy, the higher the degree of collectivity. A high degree of collectivity means that large regions of the protein, possibly domains, are displaced and behave as rigid bodies. Although the calculated deformation energies only have a qualitative meaning, values obtained on different proteins can be compared. They can further be used to identify the so-called dynamical domains [13]; rigid regions moving as rigid bodies in the same direction. Using a set of normal modes, the rigid regions are then sorted according to their global motion; the dynamical domains are composed of ‘‘rigid” residues having similar motions. 3.4. How many modes are needed to describe a transconformation?

where di is the component of the eigenvector corresponding to the ith Ca. Atomic fluctuations can also conveniently be compared to crystallographic B-factors using:

C ij ¼

3.3. Measuring the degree of collectivity of the displacements

ð9Þ

resid¼1

Di ¼

where DRi and DRj are displacement vectors of any two atoms, amino acids or domains i and j.

1

hDRi DRi i2 hDRj DRj i2

¼

l¼1

PN

Aim Ajm m¼1 km

Ail Ajl kl

12 P

Ajn Ajn N n¼1 kn

12

ð11Þ

The calculation of the dot product (overlap) between the difference vector (calculated for a pair of structures, e.g. open and closed conformations) and the full set of normal modes identifies the modes that contribute most to the structural difference. By definition, the cumulative sum of the squared overlap is equal to one. It allows the identification of a few modes that can be used to describe the transconformations between the two structures (see e.g. [55,56]). 3.5. Simulating ‘‘transition paths” between two conformations Simulating the transition between two conformations, e.g. an open and a closed state, of a protein is of tremendous importance, even more so because describing such a transition experimentally remains extremely challenging. Once the normal modes contributing most to a conformational difference have been identified, it is possible to use these modes to make a ‘‘movie” of the transition. One possible approach is to generate conformations along a linear combination of the normal modes by using the result of an overlap analysis to define the coefficients. The overlap and the coefficients can be re-calculated for each conformation generated. However, obtaining information about the energetics of the transition path and the structure of true intermediates along the path is less straightforward. For that purpose, more sophisticated methods, not necessarily

L. Skjaerven et al. / Journal of Molecular Structure: THEOCHEM 898 (2009) 42–48

45

relying on normal modes, are being used [57–61]. Unfortunately the current limitations of the experimental techniques do not allow rigorous validation of these methods. 4. Applications We summarize two studies using NMA to investigate the dynamics of (a) a large multimeric protein (8000 amino acids) and (b) a medium-sized proteic domain (200 amino acids), using a coarse-grained and an all-atom model, respectively. 4.1. GroEL–GroES The Escherichia coli chaperonin GroEL–GroES (Fig. 1) is a molecular machine that assists in the folding of a wide range of substrate proteins. GroEL consists of two heptameric rings (cis and trans) stacked back to back. Together with the co-chaperonin GroES, the GroEL complex provides a folding chamber for non-native substrate proteins. Each ring is formed by 7 flexible subunits, each consisting of 3 domains: apical, intermediate, and equatorial domain (Fig. 2). In the apo state, where both rings are closed and GroES is not bound, the apical domains of the GroEL subunits have

Fig. 2. The GroEL subunit in the closed (T) state (a), and the open (R0 0 ) state (b). The subunit is composed of three domains: apical (cyan), intermediate (yellow), and equatorial (orange). ATP binding induces the conformational changes (a) to (b). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this paper.)

a high affinity for non-native substrate proteins. The binding of ATP to the cis ring triggers large conformational changes of the subunits, needed for the formation of the folding chamber and binding of GroES. Binding of ATP to the opposite (trans) ring triggers the release of GroES and the newly folded protein substrate. Observing the motion of the conformational changes as a consequence of ATP binding in this 8000 amino acid machinery is not accessible with standard molecular dynamics simulations (the time-scale of the reaction-cycle is approximately 15 seconds), nor with all-atom NMA (because of memory requirements). A more suitable approach is to employ a coarse-grained normal mode analysis. A recent study [32] employing such a method provides further insight into the conformational changes and allosteric communication within the chaperonin GroEL-(ADP)7-GroES structure [62]. Using an overlap measure to compare the set of normal modes with the observed conformational changes, the authors of this study reveal one specific, rather high frequency, normal mode that describes the conformational change (mode 18): highly concerted motions within each ring, and negatively correlated motions between the two rings. In the cis-ring subunits, mode 18 consists of a downward and clockwise rotation of the apical domain, while the equatorial domain moves slightly upwards towards the apical domain. In the subunits of the opposite (trans) ring, the apical domains rotate in the opposite direction compared to in the cis ring; an upward and counter clockwise rotation. In comparison to the other normal modes, mode 18 is the most robust to sequence variations, thus validating its functional importance. Making use of an approach called structural perturbation method (SPM) [63], the dynamical importance of a given residue i is predicted by utilizing the fact that the sequence perturbation (mimicking the mutation of an amino acid) will give small changes to the force constants of the springs connecting residue i to its neighbours. Utilizing SPM, the authors construct a network of key residues that encode the interactions between the subunits. Several of these residues had been identified as important in earlier experimental and computational studies. The good agreement with previous results, computational and experimental, provides further evidence that NMA is indeed able to predict the biologically important motions. Fig. 1. The chaperonin GroEL-(ADP)7-GroES complex [62] in the R0 0 T conformation, with GroES (red) and ADP bound to the cis-ring (blue, R0 0 conformation), and the GroEL trans-ring (green, T conformation). Each subunit is in contact with 2 subunits of the opposite ring. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this paper.)

4.2. I-domain of the aLb2 integrin The use of all-atom force fields limits the size of the system one can study but it does provide the opportunity to investigate

46

L. Skjaerven et al. / Journal of Molecular Structure: THEOCHEM 898 (2009) 42–48

dynamical events at a much finer structural level and address problems that cannot be satisfactorily addressed when using coarse-grained force fields. One example is the description of the network of atomic interactions involved in the transmission of signals governing allosteric regulation. Another example is the influence ligand binding has on the receptor dynamics. Both have direct implications in inhibitor and drug design. Recently, Gaillard et al. [27] reported an NMA study of the I-domain of the lymphocyteassociated antigen-1, LFA-1, also known as aLb2 integrin. Integrins are cell-surface receptors that mediate various signals. They are involved in a number of crucial biological processes such as cell adhesion, diapedesis, signal transduction to and from the extracellular matrix and binding of viruses. Integrins are dimeric transmembrane proteins (an a and a b subunit); they contain one transmembrane a-helix per subunit, a large extra-cellular domain (about 1000 and 700 aa for a and b subunits, respectively) and a small cytoplasmic domain (approximately 50 aa). The extra-cellular domains are themselves constituted of several domains. The inserted domain (I-domain) is a ligand binding domain (about 200 aa), which is found in about 50% of the a subunits. To understand the details of the transition between the conformations described as high- and low-affinity for the ligand, Gaillard et al. performed normal mode analysis of several conformations (with and without ligands) of the I-domain of LFA-1; they used an all-atom representation and, among other analyses, characterized the protein dynamics in terms of atomic fluctuations and cross-correlations. They describe movements (‘‘downward shift” and ‘‘swinging-out”) of the a7 helix located in the C-terminal region. They also predict the coupling of the displacements of a7 with the distant metal ion-dependent adhesion site (MIDAS). Moreover, they can show that the binding of an allosteric inhibitor limits the fluctuations of the a7 helix. The binding of a natural ligand is shown to affect the dynamics of the domain in a different way; the dynamics of several regions (not only a7) is affected and the allosteric communication is reinforced.

5. Validation The application of NMA to study the slow dynamics of large biomolecular systems can appear hazardous. In particular using the harmonic approximation while investigating movements of large amplitude can be disturbing. Moreover, coarse-grained force fields do not include coupling between low and high frequency displacements and normal modes of proteins are most often calculated in the absence of solvent or with simple solvent models, as opposed to explicit solvent molecules in molecular dynamics simulations. All studies comparing NMA and experimental structural data, or molecular dynamics simulations, do validate the use of NMA even with coarse-grained models. Several studies have evaluated the number of low-frequency modes necessary to describe the structural difference between two different X-ray structures (e.g. one opened and one closed) of the same protein. They showed that a few low-frequency normal modes account for most of the structure difference (in terms of difference vector) [13,16,53,65]. Krebs et al. [65] showed that more than half of a set of 3800 protein motions could be described by only two of the lowest frequency normal modes. These studies used the overlap between the set of modes and the structure difference vector as a quality measure. More recently Petrone and Pande studied four proteins using more stringent estimators [66]. For instance, given two conformations; O (opened) and C (closed), they try to map C by applying a linear combination of the normal modes to O. When using only the lower frequency normal modes, they cannot bring the opened conformation closer than 50% to the closed one, in terms of RMS difference. The authors conclude that the normal modes can indeed be useful

to capture functional movements of proteins, but that they may not necessarily be found in the lower-frequency normal modes. Van Wynsberghe et al. [50] demonstrated that the analysis of more than only a few low-frequency normal modes is needed to correctly describe correlated motions. Two studies, one using GNM [67] and the other an ANM [68] showed that including the effect of the crystal packing in the normal modes calculation significantly improved the agreement between calculated and experimental B factors. As mentioned earlier, a common approximation is the absence of (explicit) solvent. Studies have shown that explicitly present water yields an increase in the density of local minima compared to vacuum simulations [69]. In addition, water is found to increase the friction and damping on the atomic fluctuations [70,71]. A recent study employs a newly developed approach that allows the incorporation of the solvent effects into coarse-grained models [72]. Results showed a shift to higher frequencies, and better agreement with experimental results and MD simulations. However, a systematic comparison between an ENM coarse-grained normal mode calculations and essential dynamics [73] obtained from 30 ns molecular dynamics simulations of 33 proteins provides evidence that the combination of several low frequency eigenvectors obtained from NMA do indeed give a correct picture of the flexibility of proteins in aqueous solution [74]. Yang et al [75] applied principal component analysis (PCA) on more than 150 X-ray structures of the HIV-1 protease, and on the trajectory of a 10 ns MD simulation of the same protein. Comparing the results with coarse-grained ENM normal modes, they report significant similarities of the slow protein motion. Yet another study, making use of a 400 ns MD simulation of Protein G provides further evidence of the robustness of the slow modes obtained with coarse-grained ENM [64]. However, comparing the MD simulations of the solvated and unsolvated lysozyme, Hinsen provides evidence that the solvent effect is negligible on high-frequency internal dynamics but is significant for a few slow motions [76]. 6. Web servers and databases of movements Over the years, numerous web applications and databases with varying levels of atomic detail have been made available over the Internet (Cf. Table 1). The ProMode database collects all-atom NMA results which can be viewed as animations [77]. Another all-atom approach is used to produce the MoViES database, where modified AMBER parameters are used to compute vibrational modes for all-atom models [78]. Both ProMode and MoViES use

Table 1 List of applications presented in part V and their Internet address (URL, as of August 2008), sorted alphabetically Application

Internet address (URL)

AD-ENM DC-ENM DFProt ElNémo HingeProt iGNM MinActionPath MolMovDB MoViES database NOMAD-Ref NORMA oGNM Path-ENM ProMode database TMM@ WEBnm@

http://enm.lobos.nih.gov/start.html http://enm.lobos.nih.gov/start_dc.html http://sbg.cib.csic.es/Software/DFprot/ http://igs-server.cnrs-mrs.fr/elnemo/index.html http://www.prc.boun.edu.tr/appserv/prc/hingeprot/ http://ignm.ccbb.pitt.edu/Index.htm http://lorentz.immstr.pasteur.fr/maxL_traj.php http://molmovdb.org/ http://ang.cz3.nus.edu.sg/cgi-bin/prog/norm.pl http://lorentz.immstr.pasteur.fr/nomad-ref.php http://www.igs.cnrs-mrs.fr/elnemo/NORMA/ http://ignm.ccbb.pitt.edu/GNM_Online_Calculation.htm http://enm.lobos.nih.gov/start_path.html http://promode.socs.waseda.ac.jp/pages/jsp/index.jsp http://services.cbu.uib.no/tools/tmma a http://services.cbu.uib.no/tools/normalmodes a

a

These applications were previously found at http://www.bioinfo.no/tools/.

L. Skjaerven et al. / Journal of Molecular Structure: THEOCHEM 898 (2009) 42–48

all-atom models, and cannot handle large systems. Most other services use only Ca coordinates in their calculations. MolMovDB [79–81] and iGNM [82] are examples of databases collecting results from normal mode analyses on a large amount of structures from the PDB. Motions in proteins where different conformations have been determined are collected in the MolMovDB database. Morphs from one conformation to the other have been generated by interpolating the Cartesian coordinates. In addition, normal mode analysis can be performed on other structures, and residues that are likely to be most flexible (hinge or key residues) are indicated. The selection of these residues is based on either experimental B-factors or multiple structure alignments for a given protein family [83]. The iGNM database uses the Gaussian Network Model (GNM) rather than the Anisotropic Network Model (ANM). A large fraction of the PDB has been analysed and the results deposited in iGNM. For each protein, the user can review possible key residues (in terms of mechanical role or in terms of potential folding mechanism), correlation between residue fluctuations and domain motions, and comparison of experimental and computed B-factors [82]. oGNM is an online server that performs the same type of analyses on user supplied structure files or structures not pre-analysed in iGNM. This server handles protein complexes as well as DNA and RNA structures. HingeProt uses a combination of GNM and ANM to automatically predict hinge residues in protein structures and is available as a web server [84]. DFprot [85] is a web server that allows the user to analyse their protein structures (also including DNA/RNA) using conformal vector field theory [86]. This server uses the normal modes to compute deformability and mobility. WEBnm@ [87] uses the molecular modeling toolkit (MMTK) [54], and bases its analysis on a large set of the lowest frequency normal modes (up to 50% of all the modes). The server offers a wide range of analyses of the calculated normal modes such as visualization of displacements associated with the lowest frequency normal modes, vector fields and atomic displacement. WEBnm@ can also identify the dynamical domains based on the normal modes [12,13]. Finally, using the overlap procedure described earlier, WEBnm@ can identify which normal mode(s) contributes the most to a given conformational change (e.g. opening and closing). TMM@ [88] (TransMembrane alpha-helical Mobility analyzer) is another web server using MMTK for calculating the normal modes. It is dedicated to the investigation of the mobility of the transmembrane (TM) a-helices in TM proteins. More specifically, it identifies which a-helices are the most mobile in terms of rotation, translation, slide, tilt and kink within the TM bundle by using the cumulative overlap between the respective displacement vectors and the normal mode vectors. The same approach has earlier been used to predict the a-helices involved in the transport of calcium ions through the SERCA1 Ca-ATPase [28]. The ElNémo web server [89] uses the RTB approximation (rotations-translations of blocks), and can therefore handle large structures while using all atoms in their normal mode computations. Through ElNémo the user can review the degree of collectivity of movements, measured as the fraction of residues affected by a particular mode, residue mean square displacement, distance fluctuation maps and correlation between experimental and computed B-factors. Overlap analysis is used to find the mode that contributes the most to a conformational change. Decoys based on the modes can also be generated and used for molecular replacement for X-ray structures. Another web application that can calculate the transition path between two conformations is MinActionPath [58]. For each conformation the normal mode analysis is solved analytically based on the Ca coordinates, and by finding the crossing points of the

47

solutions for the two conformations, a transition path can be found. The software suite NORMA allows a user to model conformational changes of high-resolution crystal structures under the constraint of density maps from low-resolution electron-microscopy (EM) [90]. This is done by minimizing the differences in electron density between the models and the EM map and can be used on complexes as well as single chain proteins. The NOMAD-Ref server [91] uses the RTB approach for computation of all-atom normal mode analysis of large structures (up to 100,000 atoms). It can perform overlap analysis and create decoy structures with corrected stereochemistry. Finally the server provides the opportunity to use normal mode analysis for structure refinement under the restraint of X-ray or cryo-EM maps. The AD-ENM server handles proteins, DNA and RNA, and has several tools to investigate pocket residues and how they affect the structure [92]. If two conformations are known, the overlap can be computed. More interestingly, if alternative coordinates for a few key residues are known, these can be used to predict how the rest of the structure will deform due to the new placement of these residues. Possible key residues can also be identified. If the user supplies distance constraints from NMR or FRET, this can be used through the related DC-ENM to generate models perturbed by the lowest normal modes [93]. Finally, PATH-ENM can be used to compute a transition path between conformations using the mixed elastic network model (MENM) [94]. 7. Conclusion Normal mode analysis has proven to be a useful and reliable approach to study the dynamics of biomolecules. Moreover, it is the method of choice to address their slow dynamics, or large amplitude conformational changes. The numerous user-friendly web servers available allow any non-expert to calculate and perform analysis of the normal modes of a protein. However, molecular modeling packages or NMA programs will remain the tools of choice for thorough NMA studies. They allow for more advanced and customized analysis and they are well suited for high-throughput calculations. MMTK [54] in particular is well suited for that purpose as it is a collection of library modules and can therefore easily be used in other codes. Several high-throughput analyses of proteins have already been reported and with the steadily increasing number of structures solved experimentally, researchers will be able to extend the datasets to learn more, for example about conservation of dynamics in protein families. NMA is emerging as a tool for the refinement of low-resolution structures (experimental or predicted) and an increasing number of tools and applications have been reported (e.g. [95–99]). The large number of publications describing new approaches for performing normal mode analysis of biomolecules, reporting NMA studies and web servers, or reporting studies challenging NMA makes it difficult to summarize it all in an article. For more information about NMA, the reader is referred to the many excellent review articles [100–106] and books [107–109] treating that subject. References [1] C.L. Brooks, 3rd, M. Karplus, B.M. Pettitt (Eds.), Proteins A Theoretical Perspective of Dynamics Structure and Thermodynamics, John Wiley & Sons, 1988. [2] M. Karplus, J.A. McCammon, Nat. Struct. Biol. 9 (2002) 646. [3] J.A. McCammon, B.R. Gelin, M. Karplus, Nature 267 (1977) 585. [4] W.L. Ash, M.R. Zlomislic, E.O. Oloo, D.P. Tieleman, Biochim. Biophys. Acta 1666 (2004) 158. [5] O. Beckstein, P.C. Biggin, P. Bond, J.N. Bright, C. Domene, A. Grottesi, J. Holyoake, M.S. Sansom, FEBS Lett. 555 (2003) 85.

48

L. Skjaerven et al. / Journal of Molecular Structure: THEOCHEM 898 (2009) 42–48

[6] Y. Duan, P.A. Kollman, Science 282 (1998) 740. [7] P.L. Freddolino, F. Liu, M. Gruebele, K. Schulten, Biophys. J. 94 (2008) L75. [8] L. Kalé, R. Skeel, M. Bhandarkar, R. Brunner, A. Gursoy, N. Krawetz, J. Phillips, A. Shinozaki, K. Varadarajan, K. Schulten, J. Comput. Phys. 151 (1999) 283. [9] I. Bahar, A.R. Atilgan, B. Erman, Fold. Des. 2 (1997) 173. [10] B.R. Brooks, M. Karplus, Proc. Natl. Acad. Sci. USA 80 (1983) 6571. [11] N. Go, T. Noguti, T. Nishikawa, Proc. Natl. Acad. Sci. USA 80 (1983) 3696. [12] K. Hinsen, Proteins 33 (1998) 417. [13] K. Hinsen, A. Thomas, M.J. Field, Proteins 34 (1999) 369. [14] O. Kurkcuoglu, R.L. Jernigan, P. Doruker, Polymer 45 (2004) 649. [15] G. Li, Q. Cui, Biophys. J. 83 (2002) 2457. [16] O. Marques, Y.H. Sanejouand, Proteins 23 (1995) 557. [17] L. Mouawad, D. Perahia, Biopolymers 33 (1993) 599. [18] F. Tama, F. Gadea, O. Marques, Y. Sanejouand, Proteins 41 (2000) 1. [19] M. Tirion, Phys. Rev. Lett. 77 (1996) 1905. [20] M. Levitt, C. Sander, P.S. Stern, J. Mol. Biol 181 (1985) 423. [21] B. Brooks, M. Karplus, Proc. Natl. Acad. Sci. USA 82 (1985) 4995. [22] J.F. Gibrat, N. Go, Proteins 8 (1990) 258. [23] S. Hayward, A. Kitao, H.J. Berendsen, Proteins 27 (1997) 425. [24] V. Zoete, O. Michielin, M. Karplus, J. Mol. Biol. 315 (2002) 21. [25] A. Thomas, K. Hinsen, M.J. Field, D. Perahia, Proteins 34 (1999) 96. [26] I. Adamovic, S.M. Mijailovich, M. Karplus, Biophys. J. 94 (2008) 3779. [27] T. Gaillard, E. Martin, E. San Sebastian, F.P. Cossio, X. Lopez, A. Dejaegere, R.H. Stote, J. Mol. Biol. 374 (2007) 231. [28] N. Reuter, K. Hinsen, J.J. Lacapere, Biophys. J. 85 (2003) 2186. [29] N. Reuter, K. Hinsen, J.J. Lacapère, Ann. NY Acad. Sci. 986 (2003) 344. [30] W. Zheng, S. Doniach, Proc. Natl. Acad. Sci. USA 100 (2003) 13253. [31] Q. Cui, G. Li, J. Ma, M. Karplus, J. Mol. Biol. 340 (2004) 345. [32] W. Zheng, B.R. Brooks, D. Thirumalai, Biophys. J. 93 (2007) 2289. [33] M.K. Kim, R.L. Jernigan, G.S. Chirikjian, J. Struct. Biol. 143 (2003) 107. [34] F. Tama, C.L. Brooks 3rd, J. Mol. Biol. 345 (2005) 299. [35] H.W. van Vlijmen, M. Karplus, F. Tama, C.L. Brooks 3rd, M.K. Kim, R.L. Jernigan, G.S. Chirikjian, J. Mol. Biol. 350 (2005) 528. [36] F. Tama, M. Valle, J. Frank, C.L. Brooks 3rd, Proc. Natl. Acad. Sci. USA 100 (2003) 9319. [37] H. Goldstein, C. Poole, J. Safko (Eds.), Classical Mechanics, Addison Wesley, San Francisco, 2001. [38] E.B. Wilson Jr., J.G. Decius, P.G. Cross, R.T. Lagemann, Am. J. Phys. 23 (1955) 550. [39] T. Shimanouchi, Dis. Faraday Soc. 49 (1970) 60. [40] K. Itoh, T. Shimanouchi, Biopolymers 9 (1970) 383. [41] B.R. Brooks, R.E. Bruccoleri, B.D. Olafson, D.J. States, S. Swaminathan, M. Karplus, J. Comput. Chem. 4 (1983) 187. [42] B.N. Parlett (Ed.), The Symmetric Eigenvalue Problem, Prentice-Hall, 1980. [43] D. Perahia, L. Mouawad, Comput. Chem. 19 (1995) 2. [44] A. Thomas, M.J. Field, D. Perahia, J. Mol. Biol. 261 (1996) 490. [45] P. Durand, G. Trinquier, Y.H. Sanejouand, Biopolymers 34 (1994) 759. [46] F. Tama, F.X. Gadea, O. Marques, Y.H. Sanejouand, Proteins 41 (2000) 1. [47] W. Cornell et al., J. Am. Chem. Soc. 117 (1995) 5179. [48] K. Hinsen, A. Petrescu, S. Dellerue, M. Bellissent-Funel, G. Kneller, Chem. Phys. 261 (2000) 25. [49] A.R. Atilgan, S.R. Durell, R.L. Jernigan, M.C. Demirel, O. Keskin, I. Bahar, Biophys. J. 80 (2001) 505. [50] A.W. Van Wynsberghe, Q. Cui, Structure 14 (2006) 1647. [51] H. Valadie, J.J. Lacapere, Y.H. Sanejouand, C. Etchebest, J. Mol. Biol. 332 (2003) 657. [52] R. Bruschweiler, J. Chem. Phys. 102 (1995) 3396. [53] F. Tama, Y.H. Sanejouand, Protein Eng. 14 (2001) 1. [54] K. Hinsen, J. Comput. Chem. 21 (2000) 79. [55] K. Hinsen, G.R. Kneller, Mol. Sim. 23 (2000) 275. [56] G. Song, R.L. Jernigan, Proteins 63 (2006) 197. [57] P. Maragakis, M. Karplus, J. Mol. Biol. 352 (2005) 807. [58] J. Franklin, P. Koehl, S. Doniach, M. Delarue, Nucleic Acids Res. 35 (2007) W477. [59] O. Miyashita, J.N. Onuchic, P.G. Wolynes, Proc. Natl. Acad. Sci. USA 100 (2003) 12570.

[60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109]

O. Miyashita, P.G. Wolynes, J.N. Onuchic, J. Phys. Chem. B 109 (2005) 1959. R.B. Best, Y.G. Chen, G. Hummer, Structure 13 (2005) 1755. Z. Xu, A.L. Horwich, P.B. Sigler, Nature 388 (1997) 741. W. Zheng, B.R. Brooks, S. Doniach, D. Thirumalai, Structure 13 (2005) 565. F. Pontiggia, G. Colombo, C. Micheletti, H. Orland, Phys. Rev. Lett. 98 (2007) 048102. W.G. Krebs, V. Alexandrov, C.A. Wilson, N. Echols, H. Yu, M. Gerstein, Proteins 48 (2002) 682. P. Petrone, V.S. Pande, Biophys. J. 90 (2006) 1583. S. Kundu, J.S. Melton, D.C. Sorensen, G.N. Phillips Jr., Biophys. J. 83 (2002) 723. K. Hinsen, Bioinformatics 24 (2008) 521. S. Hayward, A. Kitao, F. Hirata, N. Go, J. Mol. Biol. (1993). J.P. Ma, M. Karplus, J. Mol. Biol. 274 (1997) 114. J.P. Ma, Structure 13 (2005) 373. L. Zhou, S.A. Siegelbaum, Biophys. J. 94 (2008) 3461. M. Rueda, C. Ferrer-Costa, T. Meyer, A. Perez, J. Camps, A. Hospital, J.L. Gelpi, M. Orozco, Proc. Natl. Acad. Sci. USA 104 (2007) 796. M. Rueda, P. Chacon, M. Orozco, Structure 15 (2007) 565. L. Yang, G. Song, A. Carriquiry, R.L. Jernigan, Structure 16 (2008) 321. K. Hinsen, G.R. Kneller, Proteins 70 (2008) 1235. H. Wako, M. Kato, S. Endo, Bioinformatics 20 (2004) 2035. Z.W. Cao, Y. Xue, L.Y. Han, B. Xie, H. Zhou, C.J. Zheng, H.H. Lin, Y.Z. Chen, Nucleic Acids Res. 32 (2004) W679. M. Gerstein, W. Krebs, Nucleic Acids Res. 26 (1998) 4280. N. Echols, D. Milburn, M. Gerstein, Nucleic Acids Res. 31 (2003) 478. S. Flores et al., Nucleic Acids Res. 34 (2006) D296. L.W. Yang, X. Liu, C.J. Jursa, M. Holliman, A.J. Rader, H.A. Karimi, I. Bahar, Bioinformatics 21 (2005) 2978. V. Alexandrov, U. Lehnert, N. Echols, D. Milburn, D. Engelman, M. Gerstein, Protein Sci. 14 (2005) 633. U. Emekli, D. Schneidman-Duhovny, H.J. Wolfson, R. Nussinov, T. Haliloglu, Proteins 70 (2008) 1219. J.I. Garzon, J. Kovacs, R. Abagyan, P. Chacon, Bioinformatics 23 (2007) 901. J.A. Kovacs, P. Chacón, R. Abagyan, Proteins 56 (2004) 661. S.M. Hollup, G. Salensminde, N. Reuter, BMC Bioinform. 6 (2005) 52. L. Skjaerven, I. Jonassen, N. Reuter, BMC Bioinform. 8 (2007) 232. K. Suhre, Y.H. Sanejouand, Nucleic Acids Res. 32 (2004) W610. K. Suhre, J. Navaza, Y.H. Sanejouand, Acta Crystallogr. D Biol. Crystallogr. 62 (2006) 1098. E. Lindahl, C. Azuara, P. Koehl, M. Delarue, Nucleic Acids Res. 34 (2006) W52. W. Zheng, B.R. Brooks, Biophys. J. 89 (2005) 167. W. Zheng, B.R. Brooks, Biophys. J. 90 (2006) 4327. W. Zheng, B.R. Brooks, G. Hummer, Proteins 69 (2007) 43. S. Falke, F. Tama, C.L. Brooks 3rd, E.P. Gogol, M.T. Fisher, J. Mol. Biol. 348 (2005) 219. K. Suhre, Y.H. Sanejouand, Acta Crystallogr. D Biol. Crystallogr. 60 (2004) 796. K. Hinsen, N. Reuter, J. Navaza, D.L. Stokes, J.J. Lacapere, Biophys. J. 88 (2005) 818. E. Lindahl, M. Delarue, Nucleic Acids Res. 33 (2005) 4496. A.J. Mccoy, R.W. Grosse-Kunstleve, P.D. Adams, M.D. Winn, L.C. Storoni, R.J. Read, J. Appl. Crystallogr. 40 (2007) 658. S. Hayward, N. Go, Annu. Rev. Phys. Chem. 46 (1995) 223. I. Bahar, Rev. Chem. Eng. 15 (1999) 319. H.J. Berendsen, S. Hayward, Curr. Opin. Struct. Biol. 10 (2000) 165. F. Tama, Protein Pept. Lett. 10 (2003) 119. J. Ma, Structure (Camb) 13 (2005) 373. I. Bahar, A.J. Rader, Curr. Opin. Struct. Biol. 15 (2005) 586. F. Tama, C.L. Brooks, Annu. Rev. Biophys. Biomol. Struct. 35 (2006) 115. Q. Cui, I. Bahar (Eds.), Normal Mode Analysis: Theory and Applications to Biological and Chemical Systems, CRC Press, 2005. W.D. Cornell, S. Louise-May, Normal Mode AnalysisI Normal Mode Analysis, John Wiley & Sons, Chichester UK, 1998. pp. 1904. S. Hayward, Normal mode analysis of biological moleculesI, Normal mode analysis of biological molecules, Marcel Dekker Inc., New York, 2001. pp. 153.