Detection and separation of heterogeneity in molecular complexes by statistical analysis of their two-dimensional projections

Detection and separation of heterogeneity in molecular complexes by statistical analysis of their two-dimensional projections

Available online at www.sciencedirect.com Journal of Structural Biology Journal of Structural Biology 162 (2008) 108–120 www.elsevier.com/locate/yjsb...

2MB Sizes 3 Downloads 31 Views

Available online at www.sciencedirect.com Journal of

Structural Biology Journal of Structural Biology 162 (2008) 108–120 www.elsevier.com/locate/yjsbi

Detection and separation of heterogeneity in molecular complexes by statistical analysis of their two-dimensional projections Nadav Elad, Daniel K. Clare, Helen R. Saibil, Elena V. Orlova

*

School of Crystallography, Birkbeck College and Institute of Structural Molecular Biology, Malet Street, London WC1E 7HX, UK Received 20 April 2007; received in revised form 8 November 2007; accepted 9 November 2007 Available online 22 November 2007

Abstract Progress in molecular structure determination by cryo electron microscopy and single particle analysis has led to improvements in the resolution achievable. However, in many cases the limiting factor is structural heterogeneity of the sample. To address this problem, we have developed a method based on statistical analysis of the two-dimensional images to detect and sort localised structural variations caused, for example, by variable occupancy of a ligand. Images are sorted by two consecutive stages of multivariate statistical analysis (MSA) to dissect out the two main sources of variation, namely out of plane orientation and local structural changes. Heterogeneity caused by local changes is detected by MSA that reveals significant peaks in the higher order eigenimages. The eigenimages revealing local peaks are used for automated classification. Evaluation of differences between classes allows discrimination of molecular images with and without ligand. This method is very rapid, independent of any initial three-dimensional model, and can detect even minor subpopulations in an image ensemble. A strategy for using this technique was developed on model data sets. Here, we demonstrate the successful application of this method to both model and real EM data on chaperonin-substrate and ribosome-ligand complexes.  2007 Elsevier Inc. All rights reserved. Keywords: Heterogeneity; Electron microscopy; Single particle analysis; Multivariate statistical analysis; Structural variations

1. Introduction Single particle electron microscopy (EM) has achieved remarkable results in analysis of macromolecular complexes. Structures of viruses, ribosomes, and smaller mole˚ cules are now routinely solved to a resolution of 7–10 A (Zhou and Chiu, 2003; Orlova and Saibil, 2004; Cheng et al., 2006; Frank, 2006; Ranson et al., 2006). By using single particle EM, isolated macromolecular complexes can be captured in their native state and analysed without crystallisation. Therefore, this approach can be applied to samples containing macromolecules in multiple conformations, which would be intractable for crystallography. This advantage of the technique, however, imposes a limitation on the achievable resolution, since information from different molecular states (heterogeneous ensemble) is combined in *

Corresponding author. Fax: +44 (0)20 7631 6803. E-mail address: [email protected] (E.V. Orlova).

1047-8477/$ - see front matter  2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jsb.2007.11.007

the same reconstruction. Finding a solution for separating images of molecules in different states has become increasingly important for EM structural analysis. Heterogeneity in a biological sample can arise from several sources: (1) partial occupancy of a ligand in a molecular complex (Halic et al., 2004), (2) structural dynamics that is reflected in few distinct reaction states or a gradual transformation with several intermediate states (Saibil, 2000; Heymann et al., 2003; Valle et al., 2003b; Klaholz et al., 2004), and (3) different oligomeric states that result in changes of symmetry and/or size (Tilley et al., 2005; White et al., 2006). Ideally, distinct conformations should be trapped biochemically before imaging in an electron microscope (Stark et al., 2000; Valle et al., 2003a), but this is not always experimentally possible. In these cases image separation has to be done computationally. Several approaches have been suggested for solving this predicament and can be divided into three major categories according to the procedures used to identify and sort

N. Elad et al. / Journal of Structural Biology 162 (2008) 108–120

molecular heterogeneity. In the first category of methods, recognition of heterogeneity and initial sorting are done in 2D only, prior to any 3D reconstruction. This ‘a priori’ group of methods is based primarily on multivariate statistical analysis of features in the 2D images to detect variations between images. These variations reflect changes in the molecule and are revealed in the eigenimages produced by this analysis (van Heel and Frank, 1981; van Heel, 1984; Frank, 1990). The separation procedure uses the sorting of images based on their major variations which are reflected in the low order eigenimages (White et al., 2004; Falke et al., 2005). White et al. (2004) described a method that allows separation of particle images based on size variation, with subsequent 3D analysis. The technique was shown to work with overall size variations as small as 5% (White et al., 2006). In the second category of methods, an initial 3D map is required in order to separate the images into subsets where images present a molecular complex in similar orientations. Analysis of heterogeneity is then done in 2D for each image subset. This minimises orientation variation within classes, and as a result facilitates recognition of conformational variations. The approach was applied both to the reconstruction of heterogeneous ribosome complexes (Klaholz et al., 2004; Fu et al., 2007) and to icosahedral viruses with symmetry mismatches or partial occupancy (Briggs et al., 2005). The third category of methods is based on a posteriori analysis of 3D reconstructions by considering a population (many as possible) of 3D reconstructions in order to examine variations in maps. A technique suggested by Ohi and colleagues (Ohi et al., 2004) is based on random conical tilt. First a pair of tilted and untilted images of the particles is taken. The untilted images are classified and several 3D maps are reconstructed using the corresponding tilted images. Maps can then be analysed for differences and the representative ones are used as initial models for refinement. In the so-called ‘bootsrap’ technique (Penczek et al., 2006a, b), several hundred 3D maps are calculated from randomly selected subsets of images whose spatial orientations were determined by projection matching to a starting 3D map. The evaluation of variability in the resulting 3D maps and localisation of areas with high variance allows assessment of the heterogeneity level, and estimation of covariance in the population enables classification of 3D maps. Once the region of major variation is localised in the 3D maps and in the corresponding 2D projections, images are sorted into subgroups according to average pixel density in the high variance region (Penczek et al., 2006a, b). Another similar approach is the maximum likelihood-based classification of 3D maps that identifies conformational variability within the maps and then separates the different molecular states (Scheres et al., 2007). The aim of the work described here is to reveal, analyse heterogeneity, and separate mixed populations with conformational changes triggered by ligand binding using 2D

109

image analysis prior to 3D reconstruction. The approach is based on multivariate statistical analysis (MSA) and comprises two successive classifications: the first based on eigenimages showing global variance of structural features due to different orientations followed by a second classification based on eigenimages showing variance due to changes induced by ligand binding. These steps do not require angular orientation determination, so the technique is not biased to any 3D model. We demonstrate the feasibility of the approach in successful tests on heterogeneous model data sets and on real EM data of GroEL–GroES– ADP–rhodanese and ribosome–EF-G complexes. 2. Results Analysis of heterogeneity using multivariate statistical analysis is based on the information presented in eigenimages, or eigenvectors. Multivariate statistical analysis, as implemented in IMAGIC-5, is a version of principal component analysis that compresses the number of parameters (variables) of the data to the most important parts, variances in the data set (Borland and van Heel, 1990; van Heel et al., 2000). The largest variance becomes the first coordinate; the second greatest variance corresponds to the second, and so on. The eigenvectors with the largest eigenvalues correspond to the most significant characteristic features of the data set (low-order components) (Jolliffe, 2002). Typically, the eigenvectors are presented in decreasing order of significance (eigen-, or k-value), so that the lower-order eigenimages correspond to large scale image variations and fine variations in the images are described by higher-order vectors (images). To test our approach we have generated model data sets and then applied the strategy developed to real EM data. 2.1. Statistical analysis of the model data Projection images were generated from a model structure based on the atomic coordinates of the 800 kDa GroEL oligomer, and from a second model structure in which GroEL contains an additional 33 kDa substrate protein in the binding sites of one ring (Fig. 1a and b, see Methods). In GroEL-substrate complexes the substrate is unfolded or partially folded; for modelling purposes, the native structure of malate dehydrogenase (MDH) was used. Gaussian noise with a 2:1 signal-to-noise ratio (SNR) was added to the images (Fig. 1). The images were arranged in three data sets: 100% pure GroEL, 100% ligand-bound and an equal mixture (named here ‘evenly mixed’). After alignment, images in the three data sets were classified with 15 images per class based on 40 eigenimages. Multi reference alignment (MRA) was subsequently performed to three representative classes—top, side, and tilted view (side view and tilted references rotated to a vertical position). Alignment of the evenly mixed set was performed independently using either ligand-bound or unbound class averages as references (Figs. 2 and 3). These

110

N. Elad et al. / Journal of Structural Biology 162 (2008) 108–120

Fig. 1. Surface representations of apo GroEL (Braig et al., 1994) (gold) and of the apo GroEL complex with a native MDH subunit (Gleason et al., 1994) (blue) manually docked inside one end cavity. (a) Cut away view of apo GroEL with the MDH subunit in the cavity and (b) end view of apo GroEL with the MDH subunit. Representative projections of GroEL (c) and GroEL–MDH (d) models, randomly rotated and with added noise.

Fig. 2. Flowchart for preparation and initial alignment of the model data sets. White boxes represent 3D models, light grey boxes represent processing of data, and dark grey boxes represent data sets and references used for alignment.

alignments produced differences in the orientations of side and tilted views: in the evenly mixed set aligned to unbound references (Fig. 3a), images were aligned so that the ligand density was located either in the upper or in the lower ring, because of the dihedral symmetry of the main molecule. In the mixed set aligned to the ligand-bound references

(Fig. 3b), images were aligned mostly with ligand density in the upper ring due to the asymmetry of the references. This difference in alignment influenced the classification of the two sets as described below. The four aligned data sets were subjected to statistical analysis and their eigenimages were compared (Fig. 3).

N. Elad et al. / Journal of Structural Biology 162 (2008) 108–120

111

Fig. 3. MSA of aligned images. (a) Representative unbound aligned images. (b) Representative ligand-bound aligned images. (c–f) Eigenimages of unbound data set (c), ligand-bound data set (d), evenly mixed data set aligned to ligand-bound references (e), and evenly mixed data set aligned to unbound references (f). (g–j) Ratio between standard deviations of two subregions within eigenimages. Subregions were boxed in all eigenimages as shown in the inset in (g). The highest ratios are highlighted in white.

The first 10–12 eigenimages mainly show density variations distributed over area covered by the molecular projection. The first 4–5 eigenimages (Fig. 3c–f) appear similar for all the sets, indicating that these variations are due to different molecular orientations. However, higher order eigenimages such as 7 in Figs. 3e and 5 in Fig. 3f demonstrate prominent positive or negative peaks (bright or dark spots). The overall pattern of these eigenimages points at two sources of variation: one is due to different orientations and the other is related to the presence/absence of ligand, which reflects the presence of ligand in only half of the molecules. To systematically evaluate the significance of local peaks relative to other features in the eigenimages, the standard deviation (rP) was calculated within a square around the

peak and in a background region (rB) (boxed areas in Fig. 3g inset). The ratio rP/rB of the two eigenimage subregions is plotted in Fig. 3g–j. The plots demonstrate a fairly uniform distribution of variations in both unbound and ligand-bound sets (Fig. 3c, d, g, and h). The eigenimages of the evenly mixed data sets demonstrate variations related to ligand occupancy (Fig. 3e, f, i, and j). This can be seen as a strong local maximum in eigenimage 7 in the evenly mixed set aligned to ligand-bound references (Fig. 3e and i), with the rP/rB ratio  1.5. Classification into two groups based on this eigenimage resulted in separation of the images with about 80% homogeneity in each group (not shown). In the mixed data set aligned to the unbound references, eigenimages 5, 9, and 12 seem to show localised peaks in their distribution of grey values, which

112

N. Elad et al. / Journal of Structural Biology 162 (2008) 108–120

indicate variation of ligand binding (Fig. 3f and j). However, since images of ligand bound molecules could be equally well aligned in upside-down orientation, the separation was less definitive. 2.2. Analysis of heterogeneous populations by ‘double MSA’ 2.2.1. Evenly mixed population Our preliminary analysis demonstrated the possibility of data separation using MSA based on selected eigenimages to segregate global density deviations due to different orientations from the local ones caused by ligand binding. However, the experiment presented above demonstrated that this separation of a mixed data set into homogeneous subsets was not unambiguous. Since variations due to rotation in space (rB) and to ligand binding (rP) are independent variables, the total variation (rt) can be considered as rt = rP + rB. To resolve the problem of mixed structures, we aimed at emphasising small variations related to ligand binding (rP) and reduce variations caused by deviations in orientation (rB). To minimise variance caused by different molecular orientations (rB) we first classify images into a small number of groups (20–30) using the first 6–7 eigenimages that reflect variable orientations. Each group would thus contain images with similar orientations. The second MSA reveals the remaining variance within these groups that can be attributed to ligand binding to specific regions in the images. Due to the restriction of object orientations to a small angular range, the local variance from ligand binding will prevail over density variations from orientation variations. The procedure is shown as a flowchart in Fig. 4. We tested the method on several data sets, including the evenly mixed data set aligned to unbound references, which

did not separate images satisfactorily in the previous experiment. Images were initially classified into 24 classes (115 images per class) that differed mainly in orientation. Within each class, the projections were restricted to b ± 10 (out of plane tilt of the molecular axis) and c ± 35 (rotation around the molecular axis) (Fig. 5a). A second round of MSA was then performed on the image sets within each class and the eigenimages analysed (Fig. 5b). In most classes prominent positive or negative peaks (bright or dark spots) were seen in the third or fourth eigenimages. These indicate an enhancement of the components related to the minor variations produced by the presence or absence of ligand. Systematic investigation of local peaks was done as in the previous section by analysing the rP/rB ratio (using the same sub-regions) (Fig. 5c). The distribution of variations seen in the second eigenimage of each set (Fig. 5b and c) is more uniform but still influenced by some variation in orientation within the classes. Appearance of localised peaks with rP/rB in the range 1.5–2 in the third and fourth eigenimages shows that the significance of these variations has been increased (Fig. 5c). The images contained in each of the 24 classes were extracted and subjected to further classification into 2–6 subclasses based on the eigenimages with strong local variations, where the rP/rB ratio was highest. Separation into 2 subclasses was insufficient to resolve clear differences, whereas a large number of subclasses reduced the number of images per class and therefore the significance of ligand features. Separation was therefore carried out into 4 subclasses each containing 30 images (Fig. 5d). Determining whether a subclass should be attributed to the ‘full’ (containing ligand) or ‘empty’ (not containing ligand) group could be done even by visual inspection in some cases. However, calculation of difference maps between each subclass and the corresponding class average provided a more

Fig. 4. Flowchart of double MSA.

N. Elad et al. / Journal of Structural Biology 162 (2008) 108–120

113

Fig. 5. Results of double MSA of evenly mixed images aligned to unbound references. (a) Representative class averages of different orientation. (b and c) Eigenimage analysis of images within orientation classes (6 representative classes are shown) and plots of the ratio between standard deviations of two subregions within eigenimages (same subregions were used as in Fig. 3g inset). Eigenimages with high ratio between subregions are highlighted in (b) and their corresponding columns are highlighted in white in (c). These eigenimages were used for classification into subclasses. (d and e) Corresponding subclass averages and difference images between the orientation classes and each of their subclasses (subclass averages and difference maps of the same class are in a single row). (f and g) 3D reconstructions of the extracted groups. Shown is an overlay of the ‘empty’ map (orange) with a difference map between the ‘full’ and the ‘empty’ maps (blue). (f) Cut-away view of the empty map showing the difference density. (g) Top view of the overlay. The mass of the difference is 80% of the mass of the MDH subunit.

objective evaluation of the separation and indicated the presence or absence of ligand. This was done by subtracting the orientation class averages from each of their corresponding 4 subclass averages (Fig. 5e). The difference images reveal a maximum (bright spot) if most of the

images in a subclass contain ligand or a minimum (dark spot) if most are empty. The difference maps for the class in Fig. 5e, row 1 represent an example of a successful separation, in which the difference images demonstrate prominent positive or negative peaks at the expected ligand

114

N. Elad et al. / Journal of Structural Biology 162 (2008) 108–120

position. The class corresponding to the view along the 7fold axis (Fig. 5e, row 3) demonstrates some heterogeneity of in-plane rotational alignment. Two of its difference maps show variance in the background and demonstrate the presence of mixed populations. Images were extracted from the subclasses if clear maxima or minima appeared in the difference map, and assigned either to the ‘full’ or ‘empty’ group. The final statistics of image separation can be found in Table 1. In total, 85% (2358) of the images were assigned to full or empty groups, whereas 15% were left unassigned because their difference maps did not permit an unambiguous assignment. Out of the assigned images, 95% were identified correctly in both groups, confirmed by tracking the images throughout the procedures. Successful separation was also achieved for the evenly mixed data set aligned to ligand bound references (Table 1). Successful separation into two homogeneous groups allowed the reconstruction of two reliable 3D maps with and without ligand present. Fig. 5f and g show the 3D model reconstructed from images aligned to ligand-bound class averages made by angular reconstitution with no external reference and using C7 symmetry. It contains ligand density in one ring, whereas the 3D model made from images categorised as ‘empty’ shows the GroEL features with no extra density in the binding cavity. 2.2.2. Populations with different proportions of ligand occupancy We checked the performance of the double MSA method for separation of heterogeneous data sets with different proportions of unbound and ligand-bound images. These model data sets contained either 75% unbound images and 25% ligand-bound images (‘mostly unbound’) or 25% unbound and 75% ligand-bound (‘mostly ligandbound’). The results of image separation can be found in Table 1. Similar to the results from the evenly mixed data Table 1 Results of double MSA for the four mixed data sets analysed Data set

Proportion of images in data set

Total in data set

Assigned to group

% Correctly identified within groups

Evenly mixed aligned to unbound

50% Full 50% Empty

1390 1390

1279 1079

95.2 96.7

Evenly mixed aligned to ligand-bound

50% Full 50% Empty

1390 1390

1078 978

89.1 89.5

Mostly ligand-bound

75% Full 25% Empty

2085 695

1785 653

96.5 80.2

Mostly unbound

25% Full 75% Empty

695 2085

651 1621

90.0 98.6

sets, the majority of the images in both data sets were assigned correctly to either ‘full’ or ‘empty’ groups. Identifying and separating out the minority groups was more difficult. Accordingly, the empty group separated from the mostly ligand-bound data set was 80% homogeneous, compared to 96.5% homogeneity of the full group segregated from the same data set. 2.3. Application to real EM data 2.3.1. Analysis of GroEL–GroES–ADP complexes with the GroEL substrate rhodanese The method was applied to a data set of 3060 side views of negatively stained GroEL–GroES–ADP complexes with the GroEL substrate rhodanese, a 37 kDa enzyme (Fig. 6a). Although we aimed for high occupancy of rhodanese in the GroEL complex by adding denatured rhodanese in molar excess over GroEL oligomer, its actual occupancy in the GroES-bound (cis) ring or the opposite (trans) ring was unknown. Using the double MSA method we tested whether heterogeneity in binding of non-native rhodanese could be detected and whether the images could be separated according to rhodanese occupancy. Statistical analysis of the data set and analysis of local variations in the eigenimages indicate variations in orientation of the complex (Fig. 6b, mainly eigenimages 2, 3, and 4) and signs of heterogeneity in the trans ring (Fig. 6b and c, eigenimage 5). Since the data set contained only side views, statistical analysis mainly revealed different orientations of the complex around the 7-fold axis and gave a strong indication of heterogeneity in the trans ring related to rhodanese occupancy (Fig. 6b, eigenimage 5). Separation of images based on eigenimage 5 might produce homogeneous groups, but this eigenimage still shows signs of orientation variation, so we applied the double MSA approach. Images were first classified into 12 classes (250 images per class) based on the first 11 eigenimages. Eigenimage 5 was excluded for this classification in order to minimise bias by ligand occupancy at this stage. Each class was subjected to a second round of MSA and further classified into 3 subclasses based on eigenimages showing local variations in the trans ring (Fig. 6d–f). The parent class averages were subtracted from their respective subclass images (Fig. 6g). Next, subclasses were selected according to the appearance of significant positive or negative difference peaks in the trans ring cavity and the images separated into two groups. 1035 images were extracted from the subclasses that showed positive maxima (rhodanese present) and 1093 images were extracted from the subclasses that showed strong minima (rhodanese absent). The remaining 932 images belonged to subclasses which did not show unambiguous minima or maxima in their difference map and therefore were omitted from the analysis. The sorted images were used to obtain two independent 3D maps using angular reconstitution with C7 symmetry, and the substrate-occupied one is shown in Fig. 6h–j. Since rhoda-

N. Elad et al. / Journal of Structural Biology 162 (2008) 108–120

115

Fig. 6. Double MSA of GroEL–GroES–rhodanese complexes. (a) Representative raw images. (b and c) First 6 eigenimages of the data set and the ratio between standard deviations of two subregions within eigenimages. Eigenimage 5, showing an exceptionally high ratio, is highlighted in (b) and shown in white in (c). (d and e) Eigenimage analysis within orientation classes (eigenimages of 3 representative classes are shown) and standard deviation ratios. Eigenimages used for classification into subclasses are highlighted. (f and g) Corresponding subclass averages and difference images with the overall class average (subclass averages and difference maps of the same class are in a single row). (h) Side view of the 3D map reconstructed from the ‘empty’ group. (i and j) Overlay of the ‘empty’ map (beige) with a difference map between the ‘full’ and the ‘empty’ maps (blue). The density in blue corresponds to the extra density in the full map. (i) Cut-open view. (j) View from the bottom.

nese binds asymmetrically to a GroEL ring, 7-fold averaging spreads the rhodanese density into a ring. Nonetheless

the map shows extra density lining the interior of the trans ring at the expected position of the substrate binding sites.

116

N. Elad et al. / Journal of Structural Biology 162 (2008) 108–120

2.3.2. Analysis of 70S ribosome with elongation factor-G (EF-G) The method of double MSA analysis has been tested on real cryo EM images of 70S Escherichia coli ribosomes. Images were collected from ribosomes complexed with the ligand EF-G. The EM data were kindly provided by J. Frank and H. Gao (Gao et al., 2003). Under these reaction conditions, the population of complexes contained a mixture of molecules in bound and non-bound states. To separate the molecules in different states, we used a procedure similar to that used in the GroEL analysis. Images were CTF-corrected, high-pass filtered to remove uneven background, and normalised. After centring the set of images (10,000) was subjected to statistical analysis and classified into 100 classes. Ten different characteristic views that revealed clear features reflecting different orientations of the ribosome were selected and used for the multi-reference alignment. After rotational and translational alignment the images were grouped into 30 classes (350 images/class), from which 23 with the best contrast were

selected for the following sub-classification. Representative classes and their eigenimages are shown in Fig. 7a and b. Analysis of local variations in the eigenimages indicates variations due to heterogeneity in ligand binding. The first 3 eigenimages are mainly related with the orientation variations. Because of the different views, not all groups of images demonstrated strong local variation in one specific eigenimage. For example: class 1 (Fig. 7a) shows local variations in eigenimages 5 and 6 (Fig. 7c), class 2 (Fig. 7a) shows local variations in eigenimages 4 and 5. For this reason, separation of each (parent) class into five subclasses was based on the same group of eigenimages, 4–8 inclusive (Fig. 7b and c). Next, the parent class averages were subtracted from their respective subclass images. Separation of the ribosome images was based on two parameters: the highest sigma in the difference map and their average difference density (Fig. 7c and d; sigma peaks from classes with negative average difference are shown in black, positive difference in white). 1651 images were extracted from the subclasses that showed positive average difference density

Fig. 7. Double MSA of 70S ribosome/EF-G complex. (a) Representative classes after the first classification. (b) First 8 eigenimages from MSA within the classes. Regions of high r are shown in white dashed circles. (c) Subclass averages resulting from classification on the basis of eigenimages 4–8. (d) The ratio between standard deviations of difference images between subclasses and parent class averages. The area used for assessment of standard deviations is shown by the white circle on the bottom left image (c). Black peaks indicate a negative average difference, and white ones indicate a positive difference.

N. Elad et al. / Journal of Structural Biology 162 (2008) 108–120

(EF-G present) and 1532 images were extracted from the subclasses that showed negative difference density (EF-G absent). The remaining images belonged to subclasses, which did not show clear positive or negative differences in their difference map and were therefore not used for this step of analysis. The sorted images were used to obtain initial 3D maps (A, with EF-G and B, without EF-G) using angular reconstitution without any symmetry. Reprojections from these 3D maps were used for competitive alignment of the complete data set followed by classification. According to this alignment, 4161 images were assigned to the ribosome without elongation factor, while 5839 images were assigned to the EF-G bound state. Reconstructions are shown in Fig. 8. The upper panel shows several sections of the two structures and their differences (Fig. 8a). The white arrow points to the main region of variations in these sections. The difference maps clearly reveal the position of EF-G as additional density in the cleft between subunits. Therefore, structure A corresponds to the ribosome with bound EF-G and structure B the ribosome without EF-G (Fig. 8b). The density of EF-G (red arrow) appears as a major difference density; although other differences are also present (Fig. 8c). They are related to the movement of the 30S subunit caused by binding of EF-G (Fig. 8, the shifts are shown by black arrows), which has been described as a ratchet movement (Gao et al., 2003). The difference map reveals the transfer of the tRNA from the E-site to the Psite (Fig. 8d). Comparison of the two maps reveals global conformational changes in the ribosome upon binding of EF-G. Our results are in excellent agreement with results published by Valle et al. (2002). 3. Discussion The use of two successive MSA steps makes it possible to detect and separate mixed conformations. The first MSA step aims at sorting images by orientation, reducing variations related to different spatial orientations of the object. The second step, statistical analysis of images within classes of similar molecular orientation, reveals vari-

c Fig. 8. Reconstructions of 70S ribosome/EF-G complex. (a) Several sections of the 3D maps and their differences. 70S structure A showed extra density in the cleft between the large and small subunits, whereas structure B did not show this density. White arrows point to the area of major differences in these sections. (b) In structure A, the small subunit is shown in blue, and the large subunit is in light blue. In structure B the small subunit is shown in orange and the large subunit in gold. (c) The difference B–A (in dark blue) was superimposed on the A map. The difference map between the A and B structures (in red) was superimposed onto structure B. Red arrows point to the EF-G location. Black arrows point to the differences related to the shift of the small subunit. (d) Cutaway views of the 70S ribosome with and without EF-G. Red arrows point to the position of the E-site tRNA, that is filled in the presence of the elongation factor and is empty in its absence. The blue arrows point to the P-site of tRNA which is occupied in the absence of EF-G. Labels: h, head; b, body; sp, spur; st, stalk base.

117

ations due to local structural changes. The structural change due to ligand occupancy is reflected in the variance of each class, observed in its eigenimages. These variations prevailed after reducing angular differences between the images, which is indicated by the higher ranking (larger

118

N. Elad et al. / Journal of Structural Biology 162 (2008) 108–120

eigenvalues) of eigenimages reflecting these features. The images can then be sorted into groups for separate 3D reconstruction on the basis of eigenimages exhibiting localised peaks, followed by competitive projection matching to refine the separation and the 3D models. The number of classes produced in the first classification should reflect the range of characteristic views of the molecule, which will depend on the symmetry and angular distribution of the molecules. An asymmetric particle with an even angular distribution would have a large number of orientations, but the number of classes is limited by the size of the data set. The aim is to distinguish characteristic views with enough of images per class to enable reliable statistical analysis of density variations (usually 100). In the analysis presented here we have modelled three heterogeneous data sets (Table 1). Our approach of a priori analysis of images by double MSA proved to be successful on the model data. It worked well for the even mixture of two states, where we identified subgroups of images with better than 90% accuracy, leading to reliable reconstructions. The separation was successful regardless of the references used for alignment. For the other two populations where the ratio of molecules in the different states was 3:1 or 1:3, a higher level of homogeneity was achieved for the major component. However, even for the minor subpopulation this analysis gave at least 80% homogeneity, which was adequate for a reliable reconstruction of the minor sub-group. These reconstructions provided good quality models for further refinement. Analysis of real data by double MSA revealed two states of the GroEL–GroES– ADP–rhodanese complex that differ in occupancy. One state shows extra density in the cavity of the trans ring which can be attributed to the non-native rhodanese. The principles developed here have been successfully applied to cryo-EM data on GroEL complexes with MDH, in which five different structural states of the non-native ligand were resolved from a data set of 35,000 images (Elad et al., 2007). The proposed technique has also been successfully applied to the asymmetrical assembly of a ribosome (70S ribosome from E. coli). In this case, large changes in the density distribution were observed in some orientations but not in others. Difference densities between subclass and the parent class averages indicated the presence or absence of the ligand. Results of separation of the real data set of ribosomes allowed us to obtain structures of the ribosome with and without EF-G. Moreover, even without extensive refinement we were able to observe a clear difference density corresponding unambiguously to EF-G, changes in the position of the 30S subunit with respect to the 50S subunit and relocation of the tRNA from the E-site to the P-site upon EF-G binding. Altogether, this indicates that the technique can be effectively used for both symmetrical and non-symmetrical particles. The method described here provides a framework for analysis of EM images in which data sets can be tested for heterogeneity arising from local changes in a macro-

molecule and accompanied by global conformational changes. The images could be separated based on the differences identified. It is a fast and efficient method that is not computationally expensive and applicable to large data sets. In addition, the method avoids model bias because it is applied before 3D reconstruction. Double MSA reduces the influence of variance caused by different molecular orientations and enhances variations induced by small ligands. The search for peaks and evaluation of their significance relative to background variance allows automation of the procedure by analysing local variance in the eigenimages. Eigenimages with significant variations can then be presented to the researcher for an expert analysis or can be fed directly into a computational scheme without human intervention. The work focuses on heterogeneity due to partial occupancy of a small ligand, in this case 3.5–4% of the total molecular mass, but can potentially reveal even smaller local variations, as long as the changes are statistically significant. It can be applied to structures of unknown conformation with any symmetry. Therefore, any set of projections that contains a range of views of a macromolecular complex can be tested in this way for local variance, and if necessary classified into more homogeneous subsets. 4. Experimental procedures 4.1. Preparation of model data sets The atomic coordinates used in this study to generate the model data sets (Fig. 1) are of GroEL (Braig et al., 1994; PDB Accession code: 1GRL) and mitochondrial malate dehydrogenase (Gleason et al., 1994; PDB Accession code: 1MLD) (MDH). The GroEL crystal structure is a tetradecamer with D7 molecular symmetry in the unbound state. To model the ligand-bound state, one subunit of the MDH was manually positioned inside the GroEL cavity using PyMOL (www.pymol.org). Atomic coordinates of the complex were converted to electron density maps using IMAGIC (van Heel et al., 1996). Densities in the maps were truncated to the range of ±2 standard deviations to avoid extreme values. Projections of these maps were calculated along directions evenly distributed over the unit sphere with an angular increment of 2.5 yielding 5000 images from which 2781 images were randomly selected. Each image contained ˚ /pixel. Gaussian 256 · 256 pixels at a sampling size of 1.4 A noise was then added to the projections. The intensity of the noise was adjusted to provide 2:1 signal-to-noise ratio (SNR), which was determined as the ratio of spectral powers between noise-free projection and noise. Noisy projections were rotated randomly in-plane to form the two initial homogeneous data sets for analysis. These two data sets were subsequently mixed in different proportions to create the heterogeneous data sets. Initial alignment was done by the approach of ‘alignment by classification’ (Dube et al., 1993). In statistical analysis we used modula-

N. Elad et al. / Journal of Structural Biology 162 (2008) 108–120

tion distances between images (see for details van Heel et al., 2000). Evaluation of local deviations in the eigenimages was done by extracting boxes of 50 · 50 pixels from upper and lower parts of the eigenimages (the same coordinates were used for all model images, inset in Fig. 3g). Image processing was done with IMAGIC (van Heel et al., 1996) and SPIDER (Frank et al., 1996) software packages, both under Linux operating systems, but equivalent procedures can be implemented with other single particle processing software. 4.2. Data collection and processing of negative stain images Images of negatively stained GroEL–GroES–ADP–rhodanese complexes were collected on a 200 kV FEG (FEI Tecnai F20) at 50,000 magnification. The complex adopts either side or end orientations on the carbon film which can be easily distinguished. Micrographs were digitised at ˚ /pixel, and side view images were a sampling of 2.8 A selected interactively and extracted into boxes of 150 · 150 pixels. Images were then band-pass filtered between ˚ and normalised. Initial centring was done 10 and 200 A to a single projection of the GroEL–GroES–(ADP)7 crystal ˚ . This was folstructure (Xu et al., 1997) filtered to 40 A lowed by MSA and classification to produce class averages that served as references for a second round of alignment. The aligned data set was used for the double MSA procedure. For evaluation of local deviations in the eigenimages, boxes of 30 · 30 pixels were extracted from the upper and lower regions of the eigenimages. 4.3. Ribosome data The data set was provided by J. Frank and H. Gao. The conditions under which images of E. coli 70S ribosomes were taken are provided in the supplementary material of the paper by Scheres et al. (2007). We were given a set of 10,000 images, that were thought to consist of 50% ribosomes bound to factor EF-G and 50% unbound. Images ˚ /pixel. of ribosomes were in 130 · 130 pixel frames 2.82 A Prior to CTF correction all images were normalised and padded into boxes of 160 · 160 pixels. After phase flipping ˚ the images were band-pass filtered between 10 and 250 A and size of the boxes reduced to 140 · 140 pixels. Centring was done to a rotationally averaged total sum of all images. This was followed by MSA and classification to produce class averages that served as references for a second round of alignment. The aligned data set was used for the double MSA procedure. The differences in images were assessed within a circle whose radius corresponds to the outside radius of the rotationally averaged sum of all the images. Acknowledgments We thank A. Horwich and G. Farr for providing GroEL, GroES, and rhodanese samples, J. Frank and H.

119

Gao for the images of the ribosome complex, D. Houldershaw and R. Westlake for computer support. This work was funded by the European Union 3D-EM Network of Excellence and by the Wellcome Trust. References Borland, L., van Heel, M., 1990. Classification of image data in conjugate representation spaces. J. Opt. Soc. Am. A7, 601–610. Braig, K., Otwinowski, Z., Hegde, R., Boisvert, D.C., Joachimiak, A., Horwich, A.L., Sigler, P.B.J., 1994. The crystal structure of the ˚ . Nature 371, 578–586. bacterial chaperonin GroEL at 2.8 A Briggs, J.A.G., Huiskonen, J.T., Fernando, K.V., Gilbert, R.J.C., Scotti, P., Butcher, S.J., Fuller, S.D., 2005. Classification and three-dimensional reconstruction of unevenly distributed or symmetry mismatched features of icosahedral particles. J. Struct. Biol. 150, 332–339. Cheng, Y., Wolf, E., Larvie, M., Zak, O., Aisen, P., Grigorieff, N., Harrison, S.C., Walz, T., 2006. Single particle reconstructions of the transferrin–transferrin receptor complex obtained with different specimen preparation techniques. J. Mol. Biol. 355, 1048–1065. Dube, P., Tavares, P., Lurz, R., van Heel, M., 1993. Bacteriophage SPP1 portal protein : a DNA pump with 13-fold symmetry. EMBO J. 15, 1303–1309. Elad, N., Farr, G.W., Clare, D.K., Orlova, E.V., Horwich, A.L., Saibil, H.R., 2007. Topologies of a substrate protein bound to the chaperonin GroEL. Mol. Cell 26, 415–426. Falke, S., Tama, F., Brooks 3rd, C.L., Gogol, E.P., Fisher, M.T., 2005. The 13 angstrom structure of a chaperonin GroEL-protein substrate complex by cryo-electron microscopy. J. Mol. Biol. 348, 219–230. Frank, J., 1990. Classification of macromolecular assemblies studied as ‘single particles’. Quart. Rev. Biophys. 23, 281–329. Frank, J., 2006. Three-Dimensional Electron Microscopy of Macromolecular Assemblies. Oxford University Press, New York. Frank, J., Radermacher, M., Penczek, P., Zhu, J., Li, Y., Ladjadj, M., Leith, A., 1996. SPIDER and WEB: processing and visualization of images in 3D electron microscopy and related fields. J. struct. Biol. 116, 190–199. Fu, J., Gao, H., Frank, J., 2007. Unsupervised classification of single particles by cluster tracking in multi-dimensional space. J. Struct. Biol. 157, 226–239. Gao, H., Sengupta, J., Valle, M., Korostelev, A., Eswar, N., Stagg, S.M., Van Roey, P., Agrawal, R.K., Harvey, S.C., Sali, A., Chapman, M.S., Frank, J., 2003. Study of the structural dynamics of the E. coli 70S ribosome using real-space refinement. Cell 113, 789–801. Gleason, W.B., Fu, Z., Birktoft, J., Banaszak, L., 1994. Refined crystal structure of mitochondrial malate dehydrogenase from porcine heart and the consensus structure for dicarboxylic acid oxidoreductases. Biochemistry 33, 2078–2088. Halic, M., Becker, T., Pool, M.R., Spahn, C.M., Grassucci, R.A., Frank, J., Beckmann, R., 2004. Structure of the signal recognition particle interacting with the elongation-arrested ribosome. Nature 427, 808–814. Heymann, J.B., Cheng, N., Newcomb, W.W., Trus, B.L., Brown, J.C., Steven, A.C., 2003. Dynamics of herpes simplex virus capsid maturation visualized by time-lapse cryo-electron microscopy. Nat. Struct. Biol. 10, 334–341. Jolliffe, I.T., 2002. Principal Component Analysis. Springer, New York. Klaholz, B.P., Myasnikov, A.G., Van Heel, M., 2004. Visualization of release factor 3 on the ribosome during termination of protein synthesis. Nature 427, 862–865. Ohi, M., Li, Y., Cheng, Y., Walz, T., 2004. Negative staining and image classification—powerful tools in modern electron microscopy. Biol. Proced. Online 6, 23–34. Orlova, E.V., Saibil, H.R., 2004. Structure determination of macromolecular assemblies by single-particle analysis of cryo-electron micrographs. Curr. Opin. Struct. Biol. 14, 584–590. Penczek, P.A., Frank, J., Spahn, C.M.T., 2006a. A method of focused classification, based on the bootstrap 3D variance analysis, and its

120

N. Elad et al. / Journal of Structural Biology 162 (2008) 108–120

application to EF-G-dependent translocation. J. Struct. Biol. 154, 184– 194. Penczek, P.A., Yang, C., Frank, J., Spahn, C.M.T., 2006b. Estimation of variance in single-particle reconstruction using the bootstrap technique. J. Struct. Biol. 154, 168–183. Ranson, N.A., Clare, D.K., Farr, G.W., Houldershaw, D., Horwich, A.L., Saibil, H.R., 2006. Allosteric signaling of ATP hydrolysis in GroEL-GroES complexes. Nature Struct. Mol. Biol. 13, 147–152. Saibil, H.R., 2000. Conformational changes studied by cryo-electron microscopy. Nat. Struct. Biol. 7, 711–714. Scheres, S.H., Gao, H., Valle, M., Herman, G.T., Eggermont, P.P., Frank, J., Carazo, J.M., 2007. Disentangling conformational states of macromolecules in 3D-EM through likelihood optimization. Nat. Methods 4, 27–29. Stark, H., Rodnina, M.V., Wieden, H.-J., van Heel, M., Wintermeyer, W., 2000. Large-scale movement of elongation factor G and extensive conformational change of the ribosome during translocation. Cell 100, 301–309. Tilley, S.J., Orlova, E.V., Gilbert, R.J.C., Andrew, P.W., Saibil, H.R., 2005. Structural basis of pore formation by the bacterial toxin pneumolysin. Cell 121, 247–256. Valle, M., Gillet, R., Kaur, S., Henne, A., Ramakrishnan, V., Frank, J., 2003a. Visualizing tmRNA entry into a stalled ribosome. Science 300, 127–130. Valle, M., Sengupta, J., Swami, N.K., Grassucci, R.A., Burkhardt, N., Nierhaus, K.H., Agrawal, R.K., Frank, J., 2002. Cryo-EM reveals an active role for aminoacyl-tRNA in the accommodation process. EMBO J. 21, 3557–3567.

Valle, M., Zavialov, A., Sengupta, J., Rawat, U., Ehrenberg, M., Frank, J., 2003b. Locking and unlocking of ribosomal motions. Cell 114, 123– 134. van Heel, M., Frank, J., 1981. Use of multivariate statistics in analysing the images of biological macromolecules. Ultramicroscopy 6, 187–194. van Heel, M., 1984. Multivariate statistical classification of noisy images (randomly oriented biological macromolecules). Ultramicroscopy 13, 165–183. van Heel, M., Harauz, G., Orlova, E.V., Schmidt, R., Schatz, M., 1996. A new generation of the IMAGIC image processing system. J. struct. Biol. 116, 17–24. van Heel, M., Gowen, B., Matadeen, R., Orlova, E.V., Finn, R., Pape, T., Cohen, D., Stark, H., Schmidt, R., Schatz, M., Patwardhan, A., 2000. Single-particle electron cryo-microscopy: towards atomic resolution. Q. Rev. Biophys. 33, 307–369. White, H.E., Orlova, E.V., Chen, S., Wang, L., Ignatiou, A., Gowen, B., Stromer, T., Franzmann, T.M., Haslbeck, M., Buchner, J., Saibil, H.R., 2006. Multiple distinct assemblies reveal conformational flexibility in the small heat shock protein Hsp26. Structure 14, 1197–1204. White, H.E., Saibil, H.R., Ignatiou, A., Orlova, E.V., 2004. Recognition and separation of single particles with size variation by statistical analysis of their images. J. Mol. Biol. 336, 453–460. Xu, Z., Horwich, A.L., Sigler, P.B., 1997. The crystal structure of the asymmetric GroEL-GroES-(ADP)7 chaperonin complex. Nature 388, 741–750. Zhou, Z.H., Chiu, W., 2003. Determination of icosahedral virus structures by electron cryomicroscopy at subnanometer resolution. Adv. Protein Chem. 64, 93–124.