Biased coarse-grained molecular dynamics simulation approach for flexible fitting of X-ray structure into cryo electron microscopy maps

Biased coarse-grained molecular dynamics simulation approach for flexible fitting of X-ray structure into cryo electron microscopy maps

Journal of Structural Biology 169 (2010) 95–105 Contents lists available at ScienceDirect Journal of Structural Biology journal homepage: www.elsevi...

4MB Sizes 0 Downloads 25 Views

Journal of Structural Biology 169 (2010) 95–105

Contents lists available at ScienceDirect

Journal of Structural Biology journal homepage: www.elsevier.com/locate/yjsbi

Biased coarse-grained molecular dynamics simulation approach for flexible fitting of X-ray structure into cryo electron microscopy maps Ivan Grubisic, Maxim N. Shokhirev, Marek Orzechowski, Osamu Miyashita, Florence Tama * Department of Biochemistry and Molecular Biophysics, The University of Arizona, 1041 E. Lowell Street, Tucson, AZ 85721, USA

a r t i c l e

i n f o

Article history: Received 1 June 2009 Received in revised form 1 September 2009 Accepted 15 September 2009 Available online 2 October 2009 Keywords: Cryo-EM Flexible fitting Go-model Targeted MD Refinement

a b s t r a c t Several approaches have been introduced to interpret, in terms of high-resolution structure, low-resolution structural data as obtained from cryo-EM. As conformational changes are often observed in biological molecules, these techniques need to take into account the flexibility of proteins. Flexibility has been described in terms of movement between rigid domains and between rigid secondary structure elements, which present some limitations for studying dynamical properties. Normal mode analysis has also been used, but is limited to medium resolution data. All-atom molecular dynamics fitting techniques are more appropriate to fit structures into higher-resolution data as full protein flexibility is considered, but are cumbersome in terms of computational time. Here, we introduce a coarse-grained approach; a Go-model was used to represent biological molecules, combined with biased molecular dynamics to reproduce accurately conformational transitions. Illustrative examples on simulated data are shown. Accurate fittings can be obtained for resolution ranging from 5 to 20 Å. The approach was also tested on experimental data of Elongation Factor G and Escherichia coli RNA polymerase, where its validity is compared to previous models obtained from different techniques. This comparison demonstrates that quantitative flexible techniques, as opposed to manual docking, need to be considered to interpret low-resolution data. Ó 2009 Elsevier Inc. All rights reserved.

1. Introduction Conformational dynamics of biological molecules are essential for their function. As it is often difficult to characterize different conformational states for large biological molecules by X-ray crystallography, medium to low resolution techniques are often used to study dynamical properties of biological molecules. In particular, cryo electron microscopy (cryo-EM) has played a key role in identifying conformational states of macromolecular assemblies (Saibil, 2000) such as the ribosome (Frank and Agrawal, 2000; Valle et al., 2003a,b), GroEL, RNA polymerase (Darst et al., 2002), myosin (Wendt et al., 2001) and viruses (Conway et al., 2001; Lee and Johnson, 2003) amongst others. Because the cryo-EM technique only provides medium to lowresolution data, its interpretation often requires the fitting of known high-resolution structures (obtained from X-ray or NMR measurements) of the same biological molecule into the map. For objective and reproducible fitting, several algorithms have been developed to replace manual fitting. In the first quantitative approaches introduced, only rigid body motions of the molecules were considered (Wriggers et al., 1999; Volkmann and Hanein, 1999; Rossmann, 2000; Rossmann et al., 2001; Jiang et al., 2001; * Corresponding author. E-mail address: [email protected] (F. Tama). 1047-8477/$ - see front matter Ó 2009 Elsevier Inc. All rights reserved. doi:10.1016/j.jsb.2009.09.010

Chacon and Wriggers, 2002). However, as the resolution of cryoEM data improves, distinct conformational states can now be observed. Therefore, in order to interpret the experimental data at a near atomic level, approaches that include protein flexibility have been introduced. The first approaches for flexible fitting considered biological systems as a collection of domains that could be fitted independently as individual rigid bodies. These approaches have revealed conformational changes of several important biological systems (Volkmann et al., 2000; Wendt et al., 2001; Rawat et al., 2003; Gao et al., 2003; Gao and Frank, 2005). However, such methods rely on a subjective partitioning of the system and ignore concerted motions between domains that occur in biological molecules during conformational changes, resulting in faulty models. Another type of approach to simulate the flexibility of biological molecules is to consider coarse-grained models. Some earlier works used reduced models where a few points would represent the biological molecule to fit the cryo-EM data (Wriggers et al., 1999; Wriggers and Birmanns, 2001). However, this method might introduce ambiguity by embedding the full atomic structure into a reduced model (a few points) to perform the flexible fitting. A more objective representation of the dynamics of biological molecule is partitioning the protein at a finer level. Such partition can be based on secondary structure elements (SSE) and only movements between those rigid SSE are considered. Such an ap-

96

I. Grubisic et al. / Journal of Structural Biology 169 (2010) 95–105

proach was implemented in optimization methods based on Monte Carlo, simulated annealing and coarse-grained models (Mears et al., 2006). In methods that use iterative comparative modeling for fitting into cryo-EM data (Topf et al., 2005, 2006), multilevel subdivisions of the structure from domain to secondary structure elements (Topf et al., 2008) have also been considered. Movements of SSE have also been used in a flexible fitting technique that compares structural variability of domains within a given family (Velazquez-Muriel et al., 2006). Finally, partitioning can also be determined by identifying elements of the biological molecule that are rigid using graph theory (Jacobs and Thorpe, 1995; Jacobs et al., 2001). Flexibility between these elements can then be used for flexible fitting (Jolley et al., 2008). In each of these methods, some level of protein rigidity is considered, which could impair the interpretation of the mechanical properties of biological molecules. In particular, rigid block approximations, even at the SSE level, would limit the interpretations of smaller scale conformational changes that can now be observed with higher-resolution data. Coarse-grained representations at the non-hydrogen atom or Ca level of the molecules based on elastic network model have also been considered to incorporate full protein flexibility during the fitting process. Schröder et al. combined the elastic network model with random walk displacements and distance restraints, which have been successful in predicting conformational changes of the ribose-binding protein (Schröder et al., 2007). Similarly, Tan et al. used an elastic network model combined with a soft sphere potential, which represents interactions that are not included in the elastic network model, to model the high-resolution structure. A global optimization method using simulated annealing is used to optimally fit the structure into the cryo-EM map (Tan et al., 2008). Coarse-grained elastic network normal mode analysis (NMA) has also been adapted to flexibly fit high-resolution structures into low-resolution data (Delarue and Dumas, 2004; Tama et al., 2004a,b, 2006; Hinsen et al., 2005; Suhre et al., 2006; Mitra et al., 2005; Falke et al., 2005; Tama et al., 2006). Due to developments in cryo-EM techniques, resolution of cryoEM data can now reach up to 4 Å (for symmetric structures). Such higher-resolution data provides a more detailed definition of the structure and reveals smaller scale rearrangements compared to known high-resolution structures (Stahlberg and Walz, 2008). While an elastic network model is used to study large conformational changes of biological molecule it does not describe adequately smaller scale conformational changes well (Tama and Sanejouand, 2001). Methods that maintain parts of the protein rigid would also limit the interpretation of such data. Therefore, methods allowing full protein flexibility during fitting are needed. The most rigorous way to describe protein flexibility is by molecular dynamics (MD) simulations, which is a well-established technique to investigate dynamics of biological molecule. In earlier work, molecular dynamics simulations were used for real space refinement (RSRef), which considers all-atoms, and optimizes the fit to the data and the stereo-chemical properties of the molecule (Chapman, 1995; Chen et al., 2003; Fabiola and Chapman, 2005). However, this implementation assumes that certain units of the molecules, domains, are rigid. Recently, several approaches based on MD simulation that consider full protein flexibility, have been introduced for flexible fitting. Overall, the approach of these methods is to bias the MD simulation toward a conformation that would fit the cryo-EM data. The difference between these approaches is the form of the biasing potential that is being used. In an approach introduced by Caufield et al., the molecular dynamics is steered using a minimum biasing function (Maxwell’s demon molecular dynamics) (Caulfield and Harvey, 2007). In a different approach, atoms are steered by a potential map created from the cryo-EM data (Noda et al., 2006; Trabuco et al., 2008). In order to maintain the stereochemical quality

of the structure, restraints are applied to coordinates relevant to secondary structural elements (Trabuco et al., 2008), as such an approach may lead to artifacts (Orzechowski and Tama, 2008). Biased molecular dynamics using correlation coefficient as the biasing potential have also been introduced with no restraint imposed on the secondary structure (Orzechowski and Tama, 2008). These approaches have been successful; however, due to an all-atom representation of the protein, they can be computationally expensive especially with large systems and because simulations are run in vacuum there might be undesirable effect on protein structure. In this paper, we describe an optimization method based on molecular dynamics simulation with full protein flexibility but with reduced protein representation to reduce computational costs associated with fitting. In coarse-grained models used in molecular dynamics simulations, not all of the atoms in the system are considered explicitly; rather each residue is reduced to a few points, which considerably reduces the computational complexity of the systems and therefore speeds up the simulation time while representing flexibility and stereochemistry of the protein structure (Onuchic and Wolynes, 2004; Tozzini, 2005; Tozzini and McCammon, 2005). Such coarse-grained models would be sufficient for modeling the conformational changes based on cryo-EM data, because with such low-resolution data, it is not possible to define the exact position of all the atoms beyond backbone atoms. Nevertheless, all atoms models could be reconstructed using modeling programs (Gront et al., 2007; Rotkiewicz and Skolnick, 2008). Given an appropriate potential for a coarse-grained model, conformational changes observed in experimental data could be naturally reproduced by MD simulation to construct structural models consistent with experimental data. Illustrative results of our studies on simulated EM data from several proteins with large conformational changes are presented. We demonstrate that this flexible fitting method yields structures that agree remarkably well with the error-free simulated EM maps. Finally, we discuss the results of our method applied to experimental data of Elongation Factor G and RNA polymerase.

2. Methods 2.1. Biasing potential ‘‘Biased” molecular dynamics simulation, such as targeted MD (Schlitter et al., 1994; Ma et al., 2000) and Steered Molecular MD (Isralewitz et al., 2001), in which external forces are added to guide the system into a certain region of the conformational space, have been successfully employed to study important conformational transitions of biological systems (Krammer et al., 1999; Sanbonmatsu et al., 2005). Here we employ classical molecular dynamics technique with a modified force field potential V which is calculated as a sum of the classical potential from the standard molecular dynamics force field Vff and a new effective potential VFit to fit high-resolution structure into low-resolution data:

V ¼ Vff þ VFit The additional effective potential is calculated according to the following equation:

VFit ¼ kð1  c:c:Þ

ð1Þ

The c.c. represents the correlation coefficient that measures the similarity (overlap) of a target cryo-EM map to a cryo-EM map synthetically generated from the X-ray structure being fitted, as defined in the following way:

I. Grubisic et al. / Journal of Structural Biology 169 (2010) 95–105

P

exp ði; j; kÞqsim ði; j; kÞ ijk q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi c:c: ¼ qP P sim exp ði; j; kÞ2 ði; j; kÞ2 ijk q ijk q

where qexp ði; j; kÞ and qsim ði; j; kÞ represent experimental and synthetically simulated density of voxel (i,j,k). k in Eq. (1) is a constant parameter that regulates the magnitude of the effective potential and needs to be calibrated. This constant is the only arbitrary parameter introduced. In the Discussion section, we discuss the appropriate value of k that should be used to obtain optimal results from the fitting. While the described biasing potential is not the only choice, other biasing potentials that are simply proportional to the density (Noda et al., 2006; Trabuco et al., 2008) (C.C. Jolley and M.F. Thorpe, personal communication) could lead to artifacts, therefore VFit was preferred (Orzechowski and Tama, 2008). 2.2. Go-model In order to reduce computational costs associated with an all atom description of the protein in the MD simulation and reproduce full protein flexibility, a coarse-grained model was used for flexible fitting. While several coarse-grained models have been developed, we will employ a well-established one: Go-model (Taketomi et al., 1975; Ueda et al., 1978), which has been extensively used to study protein folding (Tozzini, 2005; Clementi et al., 2000) and more recently to describe conformational transitions of biological molecules (Best et al., 2005; Koga and Takada, 2006; Okazaki et al., 2006; Whitford et al., 2007). A Go-potential takes into account only native interactions, and each of these interactions enters into the energy balance with the same weight. Residues in the proteins are represented as single beads centered at their Ca positions (Clementi et al., 2000). Adjacent beads are connected into a polymer chain by bond and angle interactions, while the geometry of the native state is encoded in the dihedral angle potential and a non-local bead–bead potential. Because of the bond, angle and dihedral terms between consecutives Ca good stereochemistry can be conserved. The parameters for the potential Vff used in this study were taken from Clementi et al. (Clementi et al., 2000) (see supplementary information). 2.3. Creating the cryo-EM map Synthetic maps are created by placing three-dimensional Gaussian function on each atom and integrating these functions for every atom in each of the voxels: for a given set of atomic coordinates (xn, yn, zn):

qsim ði; j; kÞ ¼

N Z X n¼1

dxdydz gðx; y; z; xn ; yn ; zn Þ V ijk

where n denotes nth atom from a set of N atoms and (i,j,k) denotes a given voxel and gðx; y; z; xn ; yn ; zn Þ is the three-dimensional Gaussian function of the following formula:

 o 3 n gðx; y; z; xn ; yn ; zn Þ ¼ exp  2 ðx  xn Þ2 þ ðy  yn Þ2 þ ðz  zn Þ2 2r where r is a resolution parameter. The resolution of a synthetic map is equal to 2r (Wriggers and Birmanns, 2001). We note that the resolution parameter is set as a rough estimate and it may not exactly coincide with the resolution reported by the experimental data. This is because the resolution of the experimental data is often defined as a Fourier filter. However, previous studies with coarse-grained models have shown that the precise details of the simulated map do not affect the fitting performance (Wriggers and Birmanns, 2001; Tama et al., 2004a,b).

97

2.4. MD simulation We developed our own software to run biased molecular dynamics simulation using a Go-model. Temperature was maintained using the Berendsen algorithm to couple the system to a thermal bath. Biasing forces to fit the structure into the cryo-EM map were implemented following Orzechowski and Tama (2008). 2.5. Determining the folding temperature To determine the folding temperature of the protein structure, several simulations, without the biasing potential for fitting into the cryo-EM map were run at different temperature. For assessing the degree of nativeness of the structures, the fraction of native contacts (Q-value) can be estimated for each structure sampled during the simulation.



current number of contacts total native contacts

The total native contacts are calculated for the initial folded protein. If the Q value is equal to one, then the structure is fully folded since all the native contact are present. On the other hand, if the Q-value is close to 0, all the native contacts have disappeared indicating that the protein is mostly unfolded or non-native like. Several simulations with gradual increase (increment of 10) of temperature were performed and we estimated the temperature at which the protein is no longer stable. For the fitting process, we used temperature much below the folding temperature ensuring that the protein will remain folded during the simulation. Native contacts in the starting structure were defined as follow: Two residues have contact when at least one non-hydrogen atom of the ith amino acid is within 6.5 Å of any non-hydrogen atom of the jth amino acid (Okazaki et al., 2006). 2.6. Rigid-body fitting We employed the Situs package (Wriggers et al., 1999) to perform initial rigid-body fitting of all-atom structures to experimental cryo-EM maps of the Elongation Factor G and the RNA polymerase. In this work we used 10 codebook vectors to represent a PDB structure and a cryo-EM map. The highest ranked model was used to start the flexible fitting. 3. Results We have first tested our procedure on simulated EM data. Several proteins have been used in the past for developing flexible fitting approaches (Tama et al., 2004a,b; Velazquez-Muriel et al., 2006; Jolley et al., 2008; Topf et al., 2008; Trabuco et al., 2008). Here we used a subset of proteins that have been used in some of these studies with a variety of sizes. The proteins that we choose undergo a conformational change that has been observed experimentally and high-resolution structures of the two states are available. The proteins that were used for the simulations are listed in Table 1 along with the PDB codes for the structure that is being fitted (initial) and for the structure that is being used to simulate the cryo-EM data (target). The initial RMSD between these two structures is also shown. A synthetic density map was constructed by convolution with a Gaussian kernel of r = 2.5, 4, 5 7.5, 10 for each structure. Several resolutions were considered, 5, 8, 10, 15 and 20 Å, as this method is intended to target both the high-end resolution spectrum of cryo-EM data and the lower end. The fitting was done from the X-ray structures into the calculated error-free low-resolution maps. During the fitting, simulated maps from the deformed struc-

98

I. Grubisic et al. / Journal of Structural Biology 169 (2010) 95–105

Table 1 A list of the studied proteins.

1

Protein

Initial PDB

Target PDB

Number of residues

RMSD (Å)1

Adenylate kinase Lactoferrin Elongation Factor 2 Ca2+-ATPase Acetyl CoA Synthase Transglutaminase Elongation Factor G E. coli RNA Polymerase

LAKE 1LFG 1N0V 1IWO 10AO (Chain C) 1KV3 1FNM 1HQM

4AKE 1LFH 1N0U 1SU4 10A0 (Chain D) 2Q3Z N/A N/A

214 691 819 994 728 651 655 2650

7.2 6.4 14.5 14.0 7.0 28.8 / /

The initial RMSD value (Ca atoms) between the two conformations was calculated by aligning the two PDB files in VMD.

tures were created with the same resolution as the target map in order to evaluate the c.c. The purpose of the method presented here is to fit the X-ray structure into a raw experimental (in this case simulated) EM map of a different conformation of the same molecule. To examine the fitness between the atomic structure and the experimental map during the simulation we examine the c.c. between the structure and the target EM map. In the following we also examine the Root Mean Square Deviation (RMSD) between the deformed structure and the structure from which the target EM map was created. Ideally, when the c.c. reaches a maximum, the RMSD should reach a minimum. However, two different structures with different RMSD can have the same c.c., because detailed information of the atomic coordinates is lost in the EM map and the EM map and atomic coordinates are not simply related by a linear transformation. Thus to characterize the performance of our method we examine both values. In order to perform accurate fitting, flexible fitting techniques should be able to tolerate noise in the map. Jolley et al. have analyzed the effect of noise on the result of fitting with a correlation coefficient optimization (Jolley et al., 2008). Though high level of noise S/N = 1 is detrimental for fitting, with higher S/N > 2, little impact is observed. Similarly, Schroder et al. have compared fitting with and without noise and have observed that their method is robust against noise (Schröder et al., 2007). We have also tested the applicability of our approach to experimental data, which contains noise. The Elongation Factor G (EF-G), a 844 residue protein, and the RNA polymerase were taken as examples.

Table 3 The correlation coefficient between maps generated from a Ca representation and generated from full atom representation. Resolution for simulated maps generated from Ca atom structure

5 8 10 15 20

Resolution for simulated maps generated from full atom structure 5

8

10

15

20

0.91 0.96 0.95 0.90 0.84

0.86 0.97 0.98 0.97 0.92

0.82 0.95 0.98 0.99 0.95

0.75 0.89 0.95 0.99 0.99

0.69 0.85 0.91 0.98 0.99

Resolution in Å.

3.1. Folding temperature The folding temperature varies from one protein to another. In most Go-model studies, the primary aim is to study protein folding. Therefore simulations are run at the folding temperature in order to observe multiple folding/unfolding events (Clementi et al., 2000). In our simulation, the proteins should not need to unfold; instead the protein needs to undergo its conformational change while remaining folded. We have measured the folding temperature for each protein. The folding temperature for proteins of various sizes is different because of the nature of the Go-model but overall in the same range (see Table 2). Running simulations at a

Table 2 The folding temperatures of the proteins. Protein

Folding Temperature

1AKE 1N0V 1LFG 1IW0 1KV3 10A0.C

800 960 960 940 920 1040

Fig. 1. Acetyl CoA synthase’s simulation results. RMSD (open symbols) and correlation coefficient (solid symbols) at T = 500 using as a function of MD steps (a) a single force iteration of k = 1000. (b) Using a single force iteration of k = 10,000. Several resolutions have been tested 5 Å (s), 8 Å (h), 10Å (e), 15Å (D) and 20 Å (r).

I. Grubisic et al. / Journal of Structural Biology 169 (2010) 95–105

temperature that is too close to the folding temperature runs the risk of unfolding the protein. Very low temperatures on the other hand will freeze the protein. The temperature lacks specific units because the simulation has not been calibrated to the Kelvin scale but we found that a T = 500 with the definition of the potential as obtained from Clementi et al. (2000) worked well for all our test systems. 3.2. Validity of coarse graining In our approach, a coarse-grained representation is used to represent the biological molecule. To ensure the validity of our approach, we computed the correlation coefficient between a given simulated cryo-EM map using an all-atom structure against several maps generated using a Ca atom model and with different resolutions. Results for the case of adenylate kinase (PDB: 4AKE) are shown Table 3. For all of the target resolutions, the map obtained from a Ca representation correlates well with the full atom map at the same resolution. Therefore using a coarse-grained model seems appropriate and it has been shown to be successful in several cases (Wriggers et al., 2000; Tama et al., 2004a,b). 3.3. Simulated data In our implementation, the weight k is a parameter to adjust how strongly the system is biased to fit into the density. k needs to be calibrated so that sampling of conformations is enhanced while maintaining the structural integrity of the protein. Therefore, it is necessary to determine what force constant or sequence of force constants would be capable of fitting the initial structure to the target map. Simulations with weights k ranging from 1000 to 100,000 were run for each protein. Initially a force constant k = 1000 was used. Fig. 1a shows this in the case of acetyl CoA synthase (PDB: 1OAO Chain C). At a 5 Å resolution, the RMSD decreased from 7 Å to close to 1 Å; however, for lower resolution, this force constant is only capable of causing the

99

initial structure to oscillate around its initial position. Using a stronger force constant, k = 10,000, as illustrated in Fig. 1b was capable of reducing the RMSD for all resolutions. The RMSD approximately decreases to 1 Å for the 5 Å resolution map. For lower resolution map such as 20 Å, the RMSD decreases up to 1.8 Å. In such cases, the forces resulting from the gradient of the c.c. are strong enough to overcome the energetic barriers separating the two structures. With a weight of 1000, generally the potential from the correlation coefficient is not strong enough which leads to a poor fit especially for lower (15 and 20 Å) resolution data (see Table 4). However, as also shown in Table 4, while a higher 10,000 weight leads to more successful simulations overall (Lactoferrin, Ca2+-ATPase, AcetylCoA Synthase), in several cases, we observed still high RMSD values. In the case of the Elongation Factor 2, higher RMSD values are observed for the highest resolution maps (5–10 Å). For Ca2+-ATPase (1IWO) (see Fig. 2a) while the RMSD value initially decreases to less than 10 Å, for the highest resolution map, the RMSD seems to remain trapped at higher final values (8 Å), i.e. in a local minimum, than their lower resolution counterparts, which converges lower final RMSD values (see Table 4). We originally anticipated that the force constant might be dependant on the size of the protein. However, it appears that the force constant cannot be calibrated simply based on the protein size and resolution of the map. For example, the fitting of 1OAO to a 5 Å resolution map is the only case that k = 1000 works, however

Table 4 The best correlation coefficient and corresponding RMSD value for Go-model simulations that used a single force iteration at k = 1000 and k = 10,000. PDB

Resolution (Å)

Best c.c.

RMSD (Å)

k = 1000

Best c.c.

RMSD (Å)

k = 10000

1AKE

5 8 10 15 20

0.87 0.95 0.96 0.98 0.99

1.4 1.8 2.0 2.5 3.0

0.91 0.97 0.98 0.99 0.99

1.6 5.0 4.3 5.1 1.8

1N0V

5 8 10 15 20

0.87 0.94 0.96 0.96 0.97

1.4 1.7 1.9 5.2 5.8

0.89 0.97 0.98 0.99 0.99

5.0 4.5 4.9 5.7 6.8

1LFG

5 8 10 15 20

0.87 0.89 0.94 0.97 0.98

1.4 4.5 3.6 4.3 4.7

0.90 0.96 0.97 0.98 0.99

1.6 1.1 9.1 1.9 2.0

1IWO

5 8 10 15 20

0.87 0.95 0.96 0.98 0.99

14.0 13.2 8.9 9.8 11.4

0.85 0.95 0.96 0.99 0.99

8.3 7.1 8.1 3.0 3.6

10AO.C

5 8 10 15 20

0.87 0.94 0.96 0.97 0.92

1.1 1.4 2.1 4.8 4.9

0.89 0.96 0.98 0.99 0.95

0.8 1.1 1.2 1.4 1.4

Fig. 2. Ca2+-ATPase’s simulation results. RMSD (open symbols) and correlation coefficient (solid symbols) at T = 500 using as a function of MD steps (a) a single force iteration of k = 10,000. (b) A three-force iterations: k = 1000 for the first 5000 steps, k = 10,000 for steps 5000–10,000 and k = 100,000 for steps 10,000–15,000. Several resolutions have been tested 5 Å (s), 8 Å (h), 10 Å (e), 15 Å (D) and 20 Å (r).

100

I. Grubisic et al. / Journal of Structural Biology 169 (2010) 95–105

1OAO is not the smallest protein we tested. We speculate that the best force constant would depend on how potential energy from Go-model changes from the initial conformation to the fitted model as well as the change in the correlation coefficient during the fitting. However, those values cannot be estimated beforehand, and thus it may be difficult to know the best force constant from the initial structure and the target map. Our aim is to develop a program that is applicable to any system and any resolution therefore to establish general conditions to obtain low RMSD values, an iterative procedure was implemented. The simulation starts with a k = 1000 force constant for 5000 steps. At a force constant of k = 1000 it can be seen in Fig. 2b that there are only relatively small changes, the protein slightly alter its conformation to fit the experimental data. The simulation was then continued from the final position and velocity vectors with a k = 10,000 force constant where it began to move towards the target structure causing the RMSD to decrease. Using this iterative procedure, low RMSD values are consistently observed for both high and low-resolution maps (see Table 5) and the final RMSDs were lower for the higher resolution maps, which is exactly what would be expected. The k = 10,000 force constant still allows for some oscillations and it was of interest to see if we could minimize them and in turn get an even better final structure. A larger force constant, k = 100,000, was then implemented for an additional 5000 steps and that was able to get minimize the RMSD even further (see Table 5) in particular for the lowest resolution maps (15 and 20 Å) (see Fig. 2b for the Ca2+-ATPase). In each of the cases, the final structures agree well with the simulated cryo-EM map as indicated by a high c.c and are very close to the conformation from which those simulated maps were derived.

Table 5 Results for multi step fitting with different force constants. PDB

Resolution (Å)

Best c.c.

RMSD (Å)

Three-forcesa

a

Best c.c.

RMSD (Å)

Two-forcesb

1AKE

5 8 10 15 20

0.93 0.99 0.99 1.00 1.00

0.9 3.9 3.4 1.9 1.7

0.91 0.97 0.98 1.00 1.00

0.9 1.5 1.8 1.7 1.9

1N0V

5 8 10 15 20

0.92 0.97 0.99 1.00 1.00

3.3 1.1 1.2 1.4 1.4

0.89 0.96 0.98 0.99 1.00

3.3 1.2 1.3 1.5 1.9

1LFG

5 8 10 15 20

0.92 0.98 0.99 1.00 1.00

0.7 1.2 1.4 1.2 1.3

0.90 0.96 0.98 0.99 1.00

0.9 1.1 1.3 1.4 1.7

1IW0

5 8 10 15 20

0.92 0.97 0.99 1.00 1.00

2.0 2.1 2.1 2.5 2.6

0.89 0.96 0.97 0.99 0.99

2.2 2.2 2.8 3.0 3.9

10A0.C

5 8 10 15 20

0.92 0.97 0.99 1.00 0.95

0.7 1.1 1.1 1.1 1.1

0.90 0.96 0.98 0.99 0.95

0.8 1.0 1.2 1.3 1.4

EFGc

10.8

0.94

11.2

0.88

8.6

RNAPc

15

0.96

10.8

0.89

5.1

Three-force iteration (k = 1000, k = 10,000 and k = 100,000). Two-force iteration (k = 1000 and k = 10,000). c Experimental data, therefore the RMSD is calculated relative to the initial X-ray structure. b

Fig. 3. Adenylate kinase’s simulation results. RMSD (open symbols) and correlation coefficient (solid symbols) at T = 500 using as a function of MD steps when using a three-force iterations: k = 1000 for the first 5000 steps, k = 10,000 for steps 5000– 10,000 and k = 100,000 for steps 10,000–15,000. Several resolutions have been tested 5Å (s), 8 Å (h), 10 Å (e), 15 Å (D) and 20 Å (r).

In some cases however, the minimization only improved for a few of the resolutions. In particular for adenylate kinase, additional fitting with k = 100,000 worsens the final model for two cases (see Fig. 3). One noticeable difference between adenylate kinase and the other proteins is its small size. It has 214 residues versus 691 residues the next smallest tested protein. The large k = 100,000 force constant that is applied for the last 5000 steps of the simulation is capable of distorting the smaller proteins out of a native structure to force fitting into the density. Since the correlation coefficient continued to increase, and the RMSD failed to decrease, this is a situation where over-fitting occurred. Of the proteins tested with our approach (see Table 1) transglutaminase (PDB: 1KV3) is the only case for which the fitting procedure failed using simulated data. The initial RMSD was 28 Å and the final RMSD was 30 Å, which indicates that the refinement did not converge toward a structure in agreement with the target even though an increase in the c.c. was observed. Transglutaminase has four domains. Fig. 4a and b show a trace of the initial and fitted structure with two domains of interest. The arrangements of these two domains in the target structure are also shown. These structures were superimposed to minimize the RMSD of the entire system. While the structure appears to fit well inside the simulated cryo-EM map (Fig 4c), the final RMSD value is high due to an incorrect domain placement. Fig. 4a and b illustrate that the domain III moved into its proximal area of domain IV. The final structure is in fact a energy minimum, i.e. the c.c. is improved but with atoms in regions where they should not be. For this particular case, our approach fails to predict the correct structure. Methods using a potential map created from the cryoEM data (Noda et al., 2006; Trabuco et al., 2008) would lead to the same results due to the limitations associated with this potential (Orzechowski and Tama, 2008). It is also reasonable to think that the NMA based approach would also fail in such cases. Methods based on domains structure variability within a given family might provide a better avenue for such cases (Velazquez-Muriel et al., 2006); however, enough structural information across the family would be needed for the approach to be successful. It is also important to note that the results in Fig. 4 were simulated at T = 700. At T = 500, domains III and IV oscillate around their initial positions without any significant conformational change. This seems to indicate that the examination of temperature dependence on the fitting results would be able to distinguish models with such potential problems.

I. Grubisic et al. / Journal of Structural Biology 169 (2010) 95–105

101

EM techniques (Darst et al., 2002). Because of the high similarity in sequence between the E. coli and Thermus aquaticus (Taq) core RNAP, it is possible to annotate the low-resolution map from the E. coli system with the high-resolution structure of Taq (Campbell et al., 2001; Zhang et al., 1999). Previous studies found that a large

Fig. 4. Transglutaminase is composed of four domains. (a) Domains III and IV for the target PDB structure from which the simulated EM map was created are shown in red and blue. The domains in the initial conformation are superimposed (pink and cyan) by minimizing the RMSD of the whole system (4 domains). (b) Domains III and IV in the final structure (pink and cyan) after the Go-model simulation at T = 700 and an 8 Å resolution and using three-force iterations is superimposed to the target structure. (c) The final Ca backbone of the whole transglutaminase (domains I and II are shown in ice-blue) as it fits into the cryo-EM map simulated from the target structure after Go-model simulation.

3.4. Fitting to experimental data We have also applied our approach to experimental data, the cryo-EM map of the Elongation Factor G bound to the ribosome at 11.8 Å resolution (Valle et al., 2003a,b) and the cryo-EM map of the Escherichia coli RNA polymerase at a 15 Å resolution (Darst et al., 2002). EF-G has been studied with NMFF (Tama et al., 2004a,b), i.e. normal mode based refinement, and by all-atom molecular dynamics simulations (Orzechowski and Tama, 2008) as well as a method based on using structural variability of protein domains within a same family (Velazquez-Muriel et al., 2006). The RNA polymerase has been studied using coarse-grained fitting using Situs and by NMFF as well (Darst et al., 2002; Tama et al., 2004a,b). Since the high-resolution structure corresponding to the cryo-EM map is not available, one can compare our modeled structure to the ones that have been obtained by other approaches. The structure of an ‘‘open’’ form of the E. coli core RNAP has been determined at a 15 Å resolution using single-particle cryo-

Fig. 5. E. coli RNA Polymerase’s Ca backbone fit into the cryo-EM map (a) initial (PDB:1HQM). (b) Model produced by the two-force iteration at T = 500.

102

I. Grubisic et al. / Journal of Structural Biology 169 (2010) 95–105

conformational change was necessary to accommodate the highresolution structure into the cryo-EM map (Darst et al., 2002; Tama et al., 2004a, 2004b). The Go-model flexible fitting was performed using the threeforce sequence. The RMSD between the model and initial structure is 5.1 Å with the two-force fitting and 10.8 Å with the three-force fitting while the correlation coefficient increases to 0.89 and 0.96, respectively. Fig. 5 shows the initial and final structures of RNAP in the two-force iteration condition. A much better fit to the data is observed; in particular the jaw is now open, which is in agreement with previous fits using coarse-grained dynamics (Darst et al., 2002; Tama et al., 2004a,b). To determine whether a two- or three-force sequence should be used the resulting models were compared to models obtained from other fittings. The model obtained from NMA deformed the initial structure by 7.3 Å RMSD, which is more comparable to the deformation obtained from the two-force fitting. The three-force model seems to produce too large deformation. We have seen with simulated data that a three-force sequence can in some cases lead to over-fitting, considering the fact that experimental data contain noise, it is preferable to use a two-force sequence approach for fitting. We also performed flexible fitting into the isolated EF-G map using the X-ray structure of a mutant factor (Laurberg et al., 2000). The PDB entry code for this structure was 1FNM. The Ca RMSD between the starting and the fitted structure is 11.2 Å with the three-force iterations and 8.6 Å with two-force iterations. Fig. 6a shows the original rigid-body fitting for which some regions of the density remain unaccounted for. Fig. 6b illustrates our model with the two-force fitting in which significant improvement of the fit to the density is observed. Results from this fitting can be compared to models obtained from other techniques such as NMFF (Tama et al., 2004a,b), biased MD all atom (Orzechowski and Tama, 2008) and rigid body manual fitting (Valle et al., 2003a,b) (see Fig. 7). The RMSD obtained with the two-force sequence is more comparable with existing NMA results. The final structure had the lowest RMSD value, 2.5 Å, when compared to the NMA fitted structure. This observation reemphasizes the point that a two-force sequence would be a better choice for experimental data. We should also note that the visual comparison with the results from Velazquez-Muriel et al. (2006) reveals a similar structural rearrangement. In this study, for EF-G, we observe that different flexible techniques approaches, based on normal mode analysis, molecular dynamics simulation, all-atom protein representation or coarsegrained approach converge toward a same structural model with similar domain arrangements (see Fig. 7). These flexible fittings reveal large rearrangements between the domains II, IV and V. In particular, a large displacement (up to 20 Å) of domain IV, which is correlated with rotations of domains II and V, is observed. In addition, models obtained from computational approaches can be compared to a model that was proposed based on manual docking (PDB: 1PN6) (Valle et al., 2003a,b). Fig. 7 shows all four fitting methods superimposed with the manual docking simulation and a closer view of domain II. The orientation of this domain in the manually docked structure differs considerably from the ones observed in the computational models. The rotation between domain II in the manual docking and in the Go-model fit is 88.7 degrees. Similarly rotations of 86.9 degrees and of 89 degrees rotation are observed with the MD all atoms fitted model and with the normal mode based model, respectively. The case of Elongation Factor G is a prime example that illustrates the need for quantitative computational approaches for flexible fitting. Indeed, computational approaches based on different techniques, molecular dynamics simulation, normal mode analysis and structural variability within a same family using either allatom protein representation or coarse-grained models converge

Fig. 6. Elongation Factor G’s Ca backbone fit into the cryo-EM map (a) initial (PDB:1FNM). (b) Model produced by the two-force iteration at T = 500.

toward structural models that display an overall identical domain arrangement. In comparison, the model based on manual docking leads to a different domain arrangement. The main difference between these results comes from the fact that domain rearrangements involved correlated motions between domains, which are ignored by simple manual docking. Methods based on physical properties of biological molecules naturally take into those correlated motions and provide a more accurate model.

3.5. Performance comparison to other methods In terms of computational time, this approach is relatively fast. The length of time needed to complete a single simulation is dependent on two factors: the size of the protein and the resolution and grid size (see Table 6). Adenylate kinase (214 residues) was the fastest simulation at 5 Å, which took 7 min on a single processor. As the resolution decreases, the computational time increases. The 20 Å simulation for the same protein took 23 min. For larger systems the computational time increase (see Table 6) as forces on more atoms are calculated. The proposed approach with Go-model is considerably faster than using all-atom MD fitting, however it is still slower than NMFF, which is based on normal mode displacements (see Table 7). While NMFF is faster than the Go-model fitting, it is limited to larger scale conformational changes while MD simulation can describe smaller scale rearrangements. In addition, the overall accuracy in terms of fit is better using the Go-model approach even

I. Grubisic et al. / Journal of Structural Biology 169 (2010) 95–105

103

Fig. 7. A comparison multiple final structures for Elongation Factor G. The red is the Go-model MD simulation as it is fit to the experimental data using two-force iterations Go-model approach, colored cyan. The all-atom MD simulation is colored ice blue, the NMA fitted EF-G is colored silver and the manual docked structure (PDB: 1PN6) is shown in red. The domain of interest is zoomed in on and rotated to show the rotation of the manually docked structure with respect to the others. It is clear that the position of the two alpha helices is different between the manual and computational model. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this paper.)

Table 6 Computational time as a function of the resolution. Resolution

5 10 15 20 *

Computational time (min) Adenylate kinase (214 residues)

Ca2+-ATPase (994 residues)

RNA polymerase* (2650 residues)

7 9 9 23

45 57 85 122

N/A N/A 201 N/A

4. Conclusions

Experimental data.

Table 7 Flexible fitting for adenylate kinase at 8 Å resolution.

Computational time (min)

NMFF

Go-model

All-atom MD

3

7

156

Table 8 NMFF versus Go-model flexible fitting. Final RMSD (Å)

1AKE (closed) ? 4AKE (open) 4AKE (open) ? 1AKE (closed)

mains are present in the Go-model it does not preclude the protein from undergoing a close to open transition due to its potential form. Therefore in such particular case, starting from a closed conformation, the Go-model is a better approach to fit the data than NMFF.

NMFF

Go-model

2.9 1.2

1.5 1.3

*

Initial RMSD = 7.2 Å.

at larger scale. In Table 8, we are comparing the fitting for simulated data using the adenylate kinase. In such an example, both transitions (open to close and close to open) were modeled using either NMFF or Go-model fitting. It is clear that in the case of the close to open conformation NMFF is less accurate than the Gomodel approach. This is not surprising as it has been shown that conformational changes are better described using NMA starting from the open conformation that the closed conformation. Tan et al. have also noted a similar behavior (Tan et al., 2008). We should also note that even though native contacts between do-

We introduced a method for the flexible fitting of high-resolution structures into low-resolution electron density maps from cryo-EM based on molecular dynamic simulations and coarsegrained representation, Go-model, of the molecule. The only adjustable parameter is the force constant to control the strength of the biasing potential. In the proposed algorithm, through testing, we found that a two step iterative procedure with k = 1000 and 10,000 produce good results for most systems, including experimental data. This method can be applied to higher-resolution cryo-EM maps for which more structural details are available (8 Å and higher) but it is also applicable to lower resolution data. The results on simulated data have shown that it is a robust algorithm because of its ability to work on a multitude of systems with the same parameters and a very short computational time. Results from this method are in agreement with previous works using different approaches. We show also its success in predicting conformational changes from experimental data. The models obtained for the RNA polymerase and EF-G are in agreement with other computational studies. This paper also demonstrates the necessity to employ computational tools to interpret experimental data. Comparisons of several computational models with a manual build model reveal significant difference. Indeed, in general computational tools provide robust ways to flexibly fit proteins while maintaining the overall architectural integrity. Due to the use of MD simulation or normal mode analysis for fitting with an all-atom or coarse-grained representation of the protein, only feasible deformations are possible, i.e. correlated motions between distant parts of the biological system are taken into account. Such correlated motions need to be taken into account for accurate flexible fitting.

104

I. Grubisic et al. / Journal of Structural Biology 169 (2010) 95–105

Acknowledgments We acknowledge Professors S. Darst and J. Frank for sharing their experimental data with us. Financial support from National Science Foundation Grant No. 0744732 (Molecular Cellular and Biosciences) is greatly appreciated.

Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.jsb.2009.09.010.

References Best, R.B., Chen, Y.G., Hummer, G., 2005. Slow protein conformational dynamics from multiple experimental structures: the helix/sheet transition of arc repressor. Structure 13, 1755–1763. Campbell, E.A., Korzheva, N., Mustaev, A., Murakami, K., Nair, S., Goldfarb, A., Darst, S.A., 2001. Structural mechanism for rifampicin inhibition of bacterial RNA polymerase. Cell 104, 901–912. Caulfield, T.R., Harvey, S.C., 2007. Conformational fitting of atomic models to cryogenic-electron microscopy maps using Maxwell’s demon molecular dynamics. Biophys. J. 368A. Chacon, P., Wriggers, W., 2002. Multi-resolution contour-based fitting of macromolecular structures. J. Mol. Biol. 317, 375–384. Chapman, M.S., 1995. Restrained real-space macromolecular atomic refinement using a new resolution-dependent electron-density function. Acta Cryst. A. 51, 69–80. Chen, J.Z., Fürst, J., Chapman, M.S., Grigorieff, N., 2003. Low-resolution structure refinement in electron microscopy. J. Struct. Biol. 144, 144–151. Clementi, C., Nymeyer, H., Onuchic, J.N., 2000. Topological and energetic factors: what determines the structural details of the transition state ensemble and ‘‘enroute” intermediates for protein folding? An investigation for small globular proteins. J. Mol. Biol. 298, 937–953. Conway, J.F., Wikoff, W.R., Cheng, N., Duda, R.L., Hendrix, R.W., Johnson, J.E., Steven, A.C., 2001. Virus maturation involving large subunit rotations and local refolding. Science 292, 744–748. Darst, S.A., Opalka, N., Chacon, P., Polyakov, A., Richter, C., Zhang, G.Y., Wriggers, W., 2002. Conformational flexibility of bacterial RNA polymerase. Proc. Natl. Acad. Sci. USA 99, 4296–4301. Delarue, M., Dumas, P., 2004. On the use of low-frequency normal modes to enforce collective movements in refining macromolecular structural models. Proc. Natl. Acad. Sci. USA 101, 6957–6962. Fabiola, F., Chapman, M.S., 2005. Fitting of high-resolution structures into electron microscopy reconstruction images. Structure 13, 389–400. Falke, S., Tama, F., Brooks, C.L., Gogol, E.P., Fisher, M.T., 2005. The 13 angstrom structure of a chaperonin GroEL-protein substrate complex by cryo-electron microscopy. J. Mol. Biol. 348, 219–230. Frank, J., Agrawal, R.K., 2000. A ratchet-like inter-subunit reorganization of the ribosome during translocation. Nature 406, 318–322. Gao, H., Frank, J., 2005. Molding atomic structures into intermediate-resolution cryo-EM density maps of ribosomal complexes using real-space refinement. Structure 13, 401–406. Gao, H., Sengupta, J., Valle, M., Korostelev, A., Eswar, N., Stagg, S.M., Van Roey, P., Agrawal, R.K., Harvey, S.C., Sali, A., Chapman, M.S., Frank, J., 2003. Study of the structural dynamics of the E. coli 70S ribosome using real-space refinement. Cell 113, 789–801. Gront, D., Kmiecik, S., Kolinski, A., 2007. Backbone building from quadrilaterals: a fast and accurate algorithm for protein backbone reconstruction from alpha carbon coordinates. J. Comput. Chem. 28, 1593–1597. Hinsen, K., Reuter, N., Navaza, J., Stokes, D.L., Lacapere, J.J., 2005. Normal modebased fitting of atomic structure into electron density maps: Application to sarcoplasmic reticulum Ca-ATPase. Biophys. J. 88, 818–827. Isralewitz, B., Gao, M., Schulten, K., 2001. Steered molecular dynamics and mechanical functions of proteins. Curr. Opin. Struct. Biol. 11, 224–230. Jacobs, D.J., Rader, A.J., Kuhn, L.A., Thorpe, M.F., 2001. Protein flexibility predictions using graph theory. Proteins 44, 150–165. Jacobs, D.J., Thorpe, M.F., 1995. Generic rigidity percolation—the pebble game. Phys. Rev. Lett. 75, 4051–4054. Jiang, W., Baker, M.L., Ludtke, S.J., Chiu, W., 2001. Bridging the information gap: computational tools for intermediate resolution structure interpretation. J. Mol. Biol. 308, 1033–1044. Jolley, C.C., Wells, S.A., Fromme, P., Thorpe, M.F., 2008. Fitting low-resolution cryoEM maps of proteins using constrained geometric simulations. Biophys. J. 94, 1613–1621. Koga, N., Takada, S., 2006. Folding-based molecular simulations reveal mechanisms of the rotary motor F-1-ATPase. Proc. Natl. Acad. Sci. USA 103, 5367–5372. Krammer, A., Lu, H., Isralewitz, B., Schulten, K., Vogel, V., 1999. Forced unfolding of the fibronectin type III module reveals a tensile molecular recognition switch. Proc. Natl. Acad. Sci. USA 96, 1351–1356.

Laurberg, M., Kristensen, O., Martemyanov, K., Gudkov, A.T., Nagaev, I., Hughes, D., Liljas, A., 2000. Structure of a mutant EF-G reveals domain III and possibly the fusidic acid binding site. J. Mol. Biol. 303, 593–603. Lee, K.K., Johnson, J.E., 2003. Complementary approaches to structure determination of icosahedral viruses. Curr. Opin. Struct. Biol. 13, 558–569. Ma, J., Sigler, P.B., Xu, Z., Karplus, M., 2000. A dynamic model for the allosteric mechanism of GroEL. J. Mol. Biol. 302, 303–313. Mears, J.A., Sharma, M.R., Gutell, R.R., McCook, A.S., Richardson, P.E., Caulfield, T.R., Agrawal, R.K., Harvey, S.C., 2006. A structural model for the large subunit of the mammalian mitochondrial ribosome. J. Mol. Biol. 358, 193–212. Mitra, K., Schaffitzel, C., Shaikh, T., Tama, F., Jenni, S., Brooks III, C.L., Ban, N., Frank, J., 2005. Structure of the E. coli protein-conducting channel bound to a translating ribosome. Nature 438, 318–324. Noda, K., Nakamura, M., Nishida, R., Yoneda, Y., Yamaguchi, Y., Tamura, Y., Nakamura, H., Yasunaga, T., 2006. Atomic model construction of protein complexes from electron micrographs and visualization of their 3D structure using a virtual reality system. J. Plasma. Phys. 72, 1037–1040. Okazaki, K., Koga, N., Takada, S., Onuchic, J.N., Wolynes, P.G., 2006. Multiple-basin energy landscapes for large-amplitude conformational motions of proteins: structure-based molecular dynamics simulations. Proc. Natl. Acad. Sci. USA 103, 11844–11849. Onuchic, J.N., Wolynes, P.G., 2004. Theory of protein folding. Curr. Opin. Struct. Biol. 14, 70–75. Orzechowski, M., Tama, F., 2008. Flexible fitting of high-resolution X-ray structures into cryoelectron microscopy maps using biased molecular dynamics simulations. Biophys. J. 95, 5692–5705. Rawat, U.B.S., Zavialov, A.V., Sengupta, J., Valle, M., Grassucci, R.A., Linde, J., Vestergaard, B., Ehrenberg, M., Frank, J., 2003. A cryo-electron microscopic study of ribosome-bound termination factor RF2. Nature 421, 87–90. Rossmann, M.G., 2000. Fitting atomic models into electron-microscopy maps. Acta. Cryst. D 56, 1341–1349. Rossmann, M.G., Bernal, R., Pletnev, S.V., 2001. Combining electron microscopic with X-ray crystallographic structures. J. Struct. Biol. 136, 190–200. Rotkiewicz, P., Skolnick, J., 2008. Fast procedure for reconstruction of full-atom protein models from reduced representations. J. Comput. Chem. 29, 1460–1465. Saibil, H.R., 2000. Conformational changes studied by cryo-electron microscopy. Nat. Struct. Biol. 7, 711–714. Sanbonmatsu, K.Y., Joseph, S., Tung, C.S., 2005. Simulating movement of tRNA into the ribosome during decoding. Proc. Natl. Acad. Sci. USA 102, 15854–15859. Schlitter, J., Engels, M., Krüger, P., 1994. Targeted molecular dynamics: a new approach for searching pathways of conformational transitions. J. Mol. Graph. 12, 84–89. Schröder, G.F., Brunger, A.T., Levitt, M., 2007. Combining efficient conformational sampling with a deformable elastic network model facilitates structure refinement at low resolution. Structure 15, 1630–1641. Stahlberg, H., Walz, T., 2008. Molecular electron microscopy: state of the art and current challenges ACS. Chem. Biol. 3, 268–281. Suhre, K., Navaza, J., Sanejouand, Y.H., 2006. NORMA: a tool for flexible fitting of high-resolution protein structures into low-resolution electronmicroscopy-derived density maps. Acta Crystallogr. D. Biol. Crystallogr. 62, 1098–1100. Taketomi, H., Ueda, Y., Go, N., 1975. Studies on protein folding, unfolding and fluctuations by computer-simulation.1. Effect of specific amino-acid sequence represented by specific inter-unit interactions. Int. J. Peptide Protein Res. 7, 445–459. Tama, F., Miyashita, O., Brooks III, C.L., 2004a. Flexible multi-scale fitting of atomic structures into low-resolution electron density maps with elastic network normal mode analysis. J. Mol. Biol. 337, 985–999. Tama, F., Miyashita, O., Brooks III, C.L., 2004b. NMFF: flexible high-resolution annotation of low-resolution experimental data from cryo-EM maps using normal mode analysis. J. Struct. Biol. 147, 315–326. Tama, F., Ren, G., Brooks 3, C.L., Mitra, A.K., 2006. Model of the toxic complex of anthrax: responsive conformational changes in both the lethal factor and the protective antigen heptamer. New J. 15, 2190–2200. Tama, F., Sanejouand, Y.H., 2001. Conformational change of proteins arising from normal mode calculations. Protein Eng. 14, 1–6. Tan, R.K., Devkota, B., Harvey, S.C., 2008. YUP.SCX: coaxing atomic models into medium resolution electron density maps. J. Struct. Biol. 163, 163–174. Topf, M., Baker, M.L., John, B., Chiu, W., Sali, A., 2005. Structural characterization of components of protein assemblies by comparative modeling and electron cryomicroscopy. J. Struct. Biol. 149, 191–203. Topf, M., Baker, M.L., Marti-Renom, M.A., Chiu, W., Sali, A., 2006. Refinement of protein structures by iterative comparative modeling and cryoEM density fitting. J. Mol. Biol. 357, 1655–1668. Topf, M., Lasker, K., Webb, B., Wolfson, H., Chiu, W., Sali, A., 2008. Protein structure fitting and refinement guided by cryo-EM density. Structure 16, 295–307. Tozzini, V., 2005. Coarse-grained models for proteins. Curr. Opin. Struct. Biol. 15, 144–150. Tozzini, V., McCammon, J.A., 2005. A coarse grained model for the dynamics of flap opening in HIV-1 protease. Chem. Phys. Lett. 413, 123–128. Trabuco, Villa, Mitra, Frank., Schulten, 2008. Flexible fitting of atomic structures into electron microscopy maps using molecular dynamics. Structure 16, 673– 683. Ueda, Y., Taketomi, H., Go, N., 1978. Studies on protein folding, unfolding, and fluctuations by computer-simulation. 2. Three-dimensional lattice model of lysozyme. Biopolymers 17, 1531–1548.

I. Grubisic et al. / Journal of Structural Biology 169 (2010) 95–105 Valle, M., Gillet, R., Kaur, S., Henne, A., Ramakrishnan, V., Frank, J., 2003a. Visualizing tmRNA entry into a stalled ribosome. Science 300, 127–130. Valle, M., Zavialov, A., Sengupta, J., Rawat, U., Ehrenberg, M., Frank, J., 2003b. Locking and unlocking of ribosomal motions. Cell 114, 123–134. Velazquez-Muriel, J.A., Valle, M., Santamaría-Pang, A., Kakadiaris, I.A., Carazo, J.M., 2006. Flexible fitting in 3D-EM guided by the structural variability of protein superfamilies. Structure 14, 1115–1126. Volkmann, N., Hanein, D., 1999. Quantitative fitting of atomic models into observed densities derived by electron microscopy. J. Struct. Biol. 125, 176–184. Volkmann, N., Hanein, D., Ouyang, G., Trybus, K.M., DeRosier, D.J., Lowey, S., 2000. Evidence for cleft closure in actomyosin upon ADP release. Nat. Struct. Biol. 7, 1147–1155. Wendt, T., Taylor, D., Trybus, K.M., Taylor, K., 2001. Three-dimensional image reconstruction of dephosphorylated smooth muscle heavy meromyosin reveals asymmetry in the interaction between myosin heads and placement of subfragment 2. Proc. Natl. Acad. Sci. USA 98, 4361–4366.

105

Whitford, P.C., Miyashita, O., Levy, Y., Onuchic, J.N., 2007. Conformational transitions of adenylate kinase: Switching by cracking. J. Mol. Biol. 366, 1661–1671. Wriggers, W., Agrawal, R.K., Drew, D.L., McCammon, A., Frank, J., 2000. Domain motions of EF-G bound to the 70S ribosome: insights from a hand-shaking between multi-resolution structures. Biophys. J. 79, 1670– 1678. Wriggers, W., Birmanns, S., 2001. Using Situs for flexible and rigid-body fitting of multiresolution single-molecule data. J. Struct. Biol. 133, 193–202. Wriggers, W., Milligan, R.A., McCammon, J.A., 1999. Situs: a package for docking crystal structures into low- resolution maps from electron microscopy. J. Struct. Biol. 125, 185–195. Zhang, G.Y., Campbell, E.A., Minakhin, L., Richter, C., Severinov, K., Darst, S.A., 1999. Crystal structure of Thermus aquaticus core RNA polymerase at 3.3 angstrom resolution. Cell 98, 811–824.