CHAPTER 25
Localization of Protein Complexes by Pattern Recognition Christoph Best, Stephan Nickell, and Wolfgang Baumeister Max Planck Institute of Biochemistry Am Klopferspitz 18, D-82152 Martinsried, Germany
I. Introduction II. Template Matching A. Template Libraries for Visual Proteomics B. Principles of Machine Learning and Pattern Recognition C. Templates D. The Cross-Correlation Function E. Normalization of the Correlation Function F. Local Correlation Functions G. Fourier-Space Representation H. Probabilistic Interpretation I. Practical Implementation III. The Missing-Wedge Problem A. Tomographic Imaging B. Restricted Cross-Correlation Coefficients IV. Applications V. Conclusions References
As cryo-electron tomography of whole cells or cell sections approaches molecular resolution, it becomes feasible to locate protein complexes in cells by their distinctive three-dimensional (3D) structures alone, without the need of markers. This opens the way to creating a molecular map of the cellular proteome and its interactions under near-life conditions. However, the process requires sophisticated computational methods to create large libraries of molecular templates and search for them in cellular tomograms, taking into account the low signal-to-noise ratio and limited resolution of cryo-electron tomography (cryo-ET), as well as the problem of the missing wedge. METHODS IN CELL BIOLOGY, VOL. 79 Copyright 2007, Elsevier Inc. All rights reserved.
615
0091-679X/07 $35.00 DOI: 10.1016/S0091-679X(06)79025-2
Christoph Best et al.
616
I. Introduction Cryo-electron tomography (cryo-ET) is unique among biological imaging methods in being able to produce three-dimensional (3D) representations of organelles and cells in near-life conditions with a resolution high enough to identify and locate individual macromolecules. This opens the door to the creation of molecular maps of cellular proteomes (Baumeister, 2004; Nickell et al., 2006). It is now widely recognized that cells are not just membrane-bound reaction compartments in which macromolecules diVuse freely and interact randomly. Although there is a strong element of stochasticity, macromolecules must form functional modules, at least transiently, to execute cellular processes. Thus, a molecular map of a cell would provide significant insights into the functional associations between macromolecular complexes that make up the operating modules of the cellular ‘‘factory.’’ Drawing such a molecular map must rely on the ability to localize protein complexes systematically and objectively. One could imagine introducing labels that are specific for each macromolecule of interest, but it is a significant challenge to get markers that are visible by electron microscopy (EM) into cells without perturbing them. We are therefore pursuing an alternative approach. Cryo-ET of cells combines an optimal preservation of the sample (see Chapter 1 by Dubochet, this volume) with 3D imaging at molecular resolution (<5 nm), and it provides faithful representations of the density landscapes inside cells. However, interpretation of the tomograms is hampered by the low signal-to-noise ratio, which necessitates the use of powerful statistical pattern recognition techniques to map objects of interest in the images. Template matching, in which the content of the sample volume is compared to templates representing the molecules under consideration, has shown great promise to achieve this goal.
II. Template Matching A. Template Libraries for Visual Proteomics The database of high-resolution protein structures has already grown to an impressive size in recent years, and structural genomics initiatives have accelerated the pace with which new structures are added to it. We may anticipate that within the next decade this database will become eventually complete, that is, it will contain structures representing most of the proteomic inventory of cells. On the other hand, our knowledge of how, when, and where these proteins interact to perform their functions is limited (for review see Aloy and Russell, 2006 and Sali et al., 2003). Genome-scale interaction networks that have been more or less successfully generated by techniques such as the yeast-two-hybrid system tell mainly about possible physical interactions between two proteins; less is known about whether these interactions actually take place and have a functional role in living cells. Characterization of such interactions is a challenge for classical biochemical techniques, which rely on isolating molecules from within the cell
25. Localization of Protein Complexes by Pattern Recognition
617
and characterizing them in isolation. EM, on the other hand, has the potential to provide a whole-cell picture: a high-resolution cryo-electron tomogram is essentially a picture of the cell’s entire proteome, and it should, in principle, allow us to map macromolecular interaction patterns in a fairly comprehensive manner (‘‘visual proteomics,’’ Nickell et al., 2006; see Lucic et al., 2005, for review). The biggest challenge in achieving this goal, besides the sheer size of the data sets and the high level of noise within them, is that EM cannot take advantage of the labeling techniques that allow easy discrimination between diVerent macromolecules. In light microscopy, fluorescence labeling has provided a powerful route to molecular identification and localization, thanks in particular to the construction and expression of chimeras with fluorescent proteins. EM currently lacks eVective clonable labels, though eVorts have been undertaken to label specific proteins with gold nanoparticles that are clearly visible in the images. Unfortunately, these techniques are limited, both in their eVectiveness for cellular work and in their usefulness for proteomic applications. Not only is it not simple to insert such labels into a molecularly crowded cell without running the risk of introducing artifacts, nanoparticle labeling usually is restricted to one or maybe a few molecular species at a time. Computational identification of macromolecules in cellular tomograms based on template matching is an alternative approach that aims at identifying macromolecules solely by their inherent structural features as seen in their 3D density distribution. Central to this approach is the availability of 3D structures to be used as templates. Ideally, a template library would contain structures from the entire proteomic inventory of the cellular system under scrutiny. If high-resolution structures from x-ray crystallography or NMR spectroscopy are not available, as is often the case for membrane proteins or large and labile complexes, mediumresolution (1–2 nm) structures can also be used as templates. EM single-particle analysis (Frank, 1996) and hybrid methods that combine high-resolution structures of components (subunits or domains) with the lower-resolution structures of large complexes (Baumeister and Steven, 2000) will undoubtedly contribute to the template libraries in a major way. Currently a slow and specialized craft, EM single-particle analysis is expected to become a high-throughput method in the near future, thanks to automated data acquisition and analysis. Once the challenge of creating a comprehensive template library is met, it can be used to create a proteomic map of a cell through the comparison of the densities observed in a tomogram to the templates in the library. In practice, several technical problems must be overcome for such an approach to be successful. First, the technique is based on the proposition that diVerent macromolecules appear suYciently diVerent in electron tomograms that they can be reliably distinguished from each other. It also requires tomograms of cells that have a suYciently high resolution and acceptable signal-to-noise ratios to allow meaningful comparisons with the template library to be made. Since there are limits as to how far these issues can be pushed with current and near-future electron microscopes, improvements in the computational processing are needed that allow one to identify
Christoph Best et al.
618
template patterns even in the presence of relatively high noise and at borderline resolutions. Luckily, similar problems of pattern recognition in the presence of high noise have been encountered in a variety of fields from computer vision through remote sensing to bioinformatics. Progress in these fields has led to the development of advanced pattern recognition algorithms that are based on statistical learning techniques. Another problem is the limitation in specimen thickness, which precludes tomographic imaging of eukaryotic cells in their entirety. Here cryosectioning techniques that physically cut a frozen-hydrated (vitrified) cell into slices of about 400-nm thickness show considerable promise (see Chapter 15 by Dubochet et al., this volume). Tomograms of these slices could then in principle be computationally reassembled into a complete image of the original cell. Finally, imaging and analyzing single cells is not enough to make general statements about the proteome. SuYcient statistics about the molecular distribution in single cells will have to be achieved. However, advances in automated EM will facilitate the acquisition of large numbers of images and tomograms in the near future. Taken together, these techniques have the potential to make cryo-EM into a valuable proteomics tool with the unique capability of providing insight into the actual molecular interaction network in living cells.
B. Principles of Machine Learning and Pattern Recognition In the theory of machine learning (Duda et al., 2001; Hastie et al., 2003), the term ‘‘pattern recognition’’ loosely refers to techniques for identifying repetitive features in a set of objects and for designing machines (i.e., computer programs) that are capable of recognizing these features. One distinguishes here between the problem of unsupervised learning, in which the feature is not known a priori and has to be identified by the program, and supervised learning, in which the feature is known and can be given to the program as an example. This general theory is agnostic about what constitutes a feature, and it can be applied equally well to computer vision as well as to mining credit card data. When trying to identify macromolecules in ET, we can generally assume that we know the shape of the molecule and can therefore produce a ‘‘template.’’ The problem of pattern recognition then reduces to finding occurrences of this template in the sample volume, while accounting for noise and imaging artifacts. Machine learning theory oVers two alternative approaches to this problem: in the probabilistic approach, physical knowledge of the imaging process is used to produce a noise model of the image, which can then be used to calculate the probability that a subvolume arises from the template under the influence of the noise. While this is the classical approach used in statistics, it depends on a good characterization of the noise and it is often only feasible for specialized noise distributions such as the Gaussian or the Poisson distribution.
25. Localization of Protein Complexes by Pattern Recognition
619
A more pragmatic alternative is the pure machine learning approach in which one seeks a mathematical function, the discriminant, that allows empirical discrimination between the occurrence and nonoccurrence of the template. This approach makes use of two training sets of images, one representing the object that is sought and another representing observations of noise or other objects. A function is then devised pragmatically that has a large value on the positive set and a small value on the negative set; thus it serves to discriminate between the two sets. This approach has been shown to be very eVective for many pattern recognition problems. An alternative to template matching is often used in segmentation problems. Here the focus is not on the detection of the overall shape of a known object, but on the recognition of typical features associated with macromolecules like their surface curvatures and sizes. Methods such as anisotropic diVusion, meancurvature motion (MCM; Bo¨hm et al., 2000), and scaling indices (Ortiz et al., 2006), which are ordinarily used for segmentation, can be tuned to detect subvolumes that contain macromolecular complexes. This approach is often useful for the initial identification of target volumes. It can in principle be combined with unsupervised learning methods (clustering) to identify macromolecular complexes without having prior knowledge of their structure. C. Templates A template is an idealized image of the macromolecular complex under consideration, scaled to the resolution of the target volume and appropriately filtered to account for the physical imaging process. It is usually taken from an atomicscale model of the complex, derived either by x-ray or electron crystallography, by single-particle EM, or from any combination of these methods. To construct the idealized image of the complex, as it would appear in an electron microscope, the electrostatic potential is calculated approximately from an atomic map by summing the total atomic numbers Z in each volume element. At the resolution aVorded by current EM, this approximation is reasonable. The resulting density is then convoluted with the appropriate contrast transfer function of the microscope and low-pass filtered. The resulting template is a 3D volume Vtempl, typically with a linear extension with size on the order of 102 volume elements (voxels). To identify occurrences of the template in a larger sample volume V, with linear extensions around 103 voxels, the template is compared to all possible subvolumes having the size of the template. Since the rotational orientation of the particles is also unknown, all possible orientations of the template have to be considered. The comparison itself has to take into account that the absolute magnitude (the scale) and the zero oVset (the bias) of the voxel values are often unknown, as they depend on the details of the image acquisition process and the postprocessing. Note that when we refer to ‘‘scale’’ here, it is in reference to the magnitude of the individual voxel values, not the 3D size of the template.
Christoph Best et al.
620
Mathematically, this comparison can be expressed by the approximate equality: f ðxÞ aðTy Rf;y;c cÞðxÞ þ b
for x 2 Vy
ð1Þ
Here c(x) is the template function, Rf;y;c is the rotation operator with Euler angles f, y, and c, Ty indicates a translation of the template to the position y, and a and b are the scale and bias of the voxel values as mentioned above. Equation (1) expresses that a translated and rotated copy of the template, after approximate rescaling of the voxel values, should be (approximately) identical to the subvolume of the sample volume under consideration (Fig. 1). This is the basic operation in template matching. In the following we will, in the guise of diVerent correlation functions, lay out the mathematical procedure to eYciently test for this approximate equality in large sample volumes. The comparison is performed at all elements of the subvolume Vy , which has the size and shape of the template and is located at the reference point y. The parameters y; f; y; c; a, and b are initially unknown. The correct values for some of them, in particular the scale a and bias b, can be intelligently estimated from the characteristics of the sample volume or calculated eYciently by special transformations, such as the oVset y. However, at least the rotation parameters f, y, and c are usually determined by complete enumeration, that is, by considering all independent combinations of the Euler angles, typically in increments of 5 or 10 , looking for the orientation that provides the best match. This is often followed by a refinement step in which only angles immediately around the coarsely determined maximum are scanned. Even so, on the order of 104 diVerent rotations of the template must be compared to each target volume. The idea of template matching using the correlation function (Frank, 1972) occurred initially in EM in relation to the problem of picking particles (Frank and Wagenknecht, 1984; Huang and Penczek, 2004) for single-particle analysis (see Frank, 2002 for a review on single-particle methods, and Zhu et al., 2004 for a review of current particle picking methods). Its application to 3D objects was introduced by Walz et al. (1997), as a tool to align manually selected particles for averaging, and by Volkmann and Hanein (1999), and Roseman (2000), for the problem of docking high-resolution crystallographic structures into low-resolution EM density maps. Bo¨hm et al. (2000) first made use of a combination of segmentation and cross-correlation techniques to localize macromolecules in tomograms. D. The Cross-Correlation Function The most popular discriminant function for identifying the occurrence of templates in subvolumes is the cross-correlation function. It is defined as the voxel-wise sum of the products of the translated template and the subvolume: X CCF½ f ; cðyÞ ¼ f ðxÞcðx yÞ ð2Þ x2Vy
621
25. Localization of Protein Complexes by Pattern Recognition
Fig. 1 Three-dimensional (3D) template matching between a sample volume (gray) and a template (green). The correlation function is determined by the amount of overlap between both (red). In the template matching process, the template is translated and rotated until maximum overlap is achieved.
It is a function of the oVset y of the template in the volume. Its significance is most easily explained in terms of the concept of mean-square deviation, which is central to the theory of normal distributions. The deviation between sample volume f and template c can be expressed as the diVerence image: f Ty c
ð3Þ
where the operator Ty signifies that the template c has been translated to the point y, mathematically expressed as: ðTy cÞðxÞ ¼ cðx yÞ
ð4Þ
Christoph Best et al.
622
The closer this diVerence image is to zero, the less f and c diVer from each other. A natural measure of this deviation is the square norm of the diVerence image, that is, the sum over its squared voxel values: X j f Ty cj2 ¼ ½ f ðxÞ cðx yÞ2 ð5Þ x2Vy
This sum is taken over the subvolume Vy, which has the size and shape of the template volume and is translated to the reference point y. In signal processing, the sum over squared values is often referred to as total power in an image. Since the diVerence image may have both positive and negative voxel values, it is important that the voxel values are squared, as this makes all contributions to the sum nonnegative. The norm of the diVerence image is therefore zero if and only if both images agree completely. It is thus a good measure of the total deviation between the two images. In Section II.H, we will discuss in more details the character of this measure. Expanding the square on the right-hand side leads to the expression: X X X j f Ty cj2 ¼ f ðxÞ2 þ cðxÞ2 2 f ðxÞcðx yÞ ð6Þ x2Vy
x2Vtempl
x2Vy
In the second term, we made use of the identity: X X cðx yÞ2 ¼ cðxÞ2 x2Vy
ð7Þ
x2Vtempl
as Vy is just the template volume Vtempl translated to the point y. The last term in Eq. (6), the cross-term in the sense that it contains both the template and the subvolume, is the cross-correlation function. The two other terms only contain either the subvolume or the template. In signal-processing terminology, they give the total power of the signal in the subvolume Vy and in the template and thus set the overall scale of the comparison. As the cross-correlation function enters Eq. (6) with a negative term, the higher the value of the cross-correlation function, the lower the mean-square deviation. In general, we evaluate Eq. (6) at all (or a large number of) points y. If the volume Vtempl (and correspondingly Vy) is fairly large, the first two terms change only slowly. Since they are sums of squares, and thus each voxel contributes positively, shifting the volume Vy by a small amount only changes a small number of contributing voxels in the sum (namely the edges of the volume), and most of the time this will only result in a small change in the result. This approach rests, of course, on certain assumptions about the statistical nature of the voxel value distributions, but these assumptions hold in most cases. The cross-term, that is, the cross-correlation function, however, receives both positive and negative contributions, which may cancel or reinforce each other. Small changes in the contributing voxels can then result in large changes in the cross-correlation function. Therefore, most of the variation of Eq. (6) comes
623
25. Localization of Protein Complexes by Pattern Recognition
from the cross-term, so it is generally justified to consider exclusively the crosscorrelation function as an indicator for template similarity. This holds especially when only local maxima of the correlation function are sought, and its absolute value is disregarded. From a pure machine learning point-of-view, the cross-correlation function is one of the simplest possible discriminant functions, namely a linear combination of the voxel values f (x) with weights given by the template function. In signal processing, such a function is called a finite-response filter operating on the voxel data and producing a detection signal, and techniques from digital filter design can be applied to improve the filter beyond the cross-correlation function. E. Normalization of the Correlation Function The mean-square deviation, Eq. (5), does not account for a possible scale and bias diVerence between the voxel values in the template and the sample volume. One way to deal with such a diVerence is to normalize both functions to the same scale before calculating the cross-correlation function. We first discuss this concept in the general setting of comparing two functions f (x) and g(x) and will show its application to template matching in the form of the local correlation function in the next section. In a first step, a bias, that is, an additive constant or a constant background, can be removed from a function f(x) by subtracting the respective average from both functions, with the average defined in general by: 1X f ¼ f ðxÞ ð8Þ N x (N gives the number of voxels over which the function f is defined). Then an overall scale, that is, a multiplicative constant, can be removed by dividing both functions by their square norm, defined as: sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 1 X ð9Þ jfj ¼ f ðxÞ f N x Note that this quantity, up to the factor of 1/N, is just the squared term found in Eq. (6). The normalized function f^ corresponding to f is then given by: f f f^ ¼ jfj
ð10Þ
Going back to Eq. (6) for the mean-square deviation of two functions, we then find that the square deviation of two functions f^ and ^g normalized in this way is simply: X j f^ ^ gj 2 ¼ 2 2 f^ðxÞ^gðxÞ ¼ 2 2r ð11Þ x
Christoph Best et al.
624 with the Pearson correlation coeYcient r defined as: P g x f ðxÞ f ½gðxÞ ffi r ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi q 2 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P P g 2 x f ðxÞ f x ½ gðxÞ
ð12Þ
While this expression may look fairly complicated, it can be calculated in two simple steps by first applying the normalization transformation, Eq. (10), and then calculating the cross-correlation function: X r¼ f^ðxÞ ^gðxÞ ð13Þ x
This expression measures the similarity or dissimilarity of the functions f(x) and g(x) if bias and scale of the voxel values are ignored. The normalization factors in the denominator make sure that the correlation coeYcient is confined to values between þ1 and 1, where þ1 indicates complete correlation and 1 complete anticorrelation (i.e., the two subvolumes are proportional to each other with a negative proportionality constant). At a value r ¼ 0, the vectors are said to be orthogonal to each other, and thus completely uncorrelated. The correlation coeYcient has a natural interpretation in the setting of highdimensional vector spaces. In such a vector space, both the template and the subvolume are represented as vectors. Just as in 3D Euclidean geometry, a vector is a composite quantity whose components are (usually) real numbers. The number of components is called the dimension of the vector space. To represent subvolumes as vectors, we choose the dimension to be the number of voxels in each subvolumes, and assign vectors to volumes such that each component of the vector represents one voxel in the volume. It must be emphasized that the spatial organization of the voxels is of no import here; they are simply lined up in the vector. The reason for thinking of the cross-correlation function in terms of vectors lies in the peculiar interpretation of Eq. (12) in geometry. The sum of the componentwise products of two vectors f and g is called the scalar product and frequently written as h f ; gi. In complete analogy to ordinary geometry, it can be written using the length | f | and |g| of the vectors, as given by their square norm, Eq. (9), and the angle f between them: h f ; gi ¼ j f jjgj cos f
ð14Þ
In elementary geometry, the angle f is a quantity independent of the lengths of the vectors. Rewriting Eq. (14) as: cos f ¼
h f ; gi j f jjgj
ð15Þ
we see that we can identify cos f with the correlation coeYcient, Eq. (12), for the case f ¼ g ¼ 0. In a vector space, the correlation coeYcient characterizes the angle between the vectors f and g, with þ1 corresponding to f ¼ 0 (two parallel
625
25. Localization of Protein Complexes by Pattern Recognition
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 −0.1 −0.2
80
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 −0.1 −0.2
70
60
50
40
30
20
10
0 0
10
20
30
40
50
60
70
80 80
70
60
50
40
30
20
10
0
0
10
20
30
40
50
60
70
80
Fig. 2 Localization and discrimination of objects using the cross-correlation coeYcient. The top image shows the sample volume containing two diVerent objects in a noisy environment. Templates matching either object were used to search for the objects in the volume. The result is shown, for two templates matching either object, in the surface plots at the bottom. Peaks signify the locations of the objects in the sample volume. While a signal can be observed for both objects with either template, the signal is strongest when the correct template is used.
vectors), 1 to f ¼ 180 (antiparallel vectors), and 0 to f ¼ 90 (orthogonal vectors).
F. Local Correlation Functions In the case of template matching, the correlation coeYcient is computed between a subvolume of the sample volume f (x) and a translated copy of the template, given by g(x) ¼ c(x – y) (Fig. 2). The two functions are compared in a small volume Vy, corresponding to the template volume translated to the reference point y. The resulting normalized cross-correlation function then reads: P x2Vy f ðxÞ f cðx yÞ rðyÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð16Þ ffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ffi P f 2 P f ðxÞ cðxÞ x2Vy x2Vtempl
626
Christoph Best et al.
We assume that c(x) is already properly normalized, that is c ¼ 0, which is possible without restriction. One problem arising here, however, is that the normalization of the sample f(x) now depends on the reference point y, that is, f and | f | are derived from the sample in the subvolume Vy, not in the complete volume V. This is called the local correlation function. Unfortunately, this implicit dependence makes it impossible to calculate r(y) by the simple two-step procedure, Eq. (13). Alternatively, one can introduce a global correlation function in which the normalization of f is performed over the whole volume V, that is, using the average f and square norm | f | calculated over all voxels of the sample volume. Now the normalization can be performed by preprocessing the volume and the template using Eq. (10), and during the matching procedure, only the unnormalized cross-correlation function, that is, the numerator, has to be computed. While this is an approximation, it has a certain justification, as discussed above, in the small variation of the squared terms as compared to the cross-term. To illustrate the diVerence between the local and the global correlation function, let us consider the following situation: instead of considering a small template volume Vtempl and a large sample volume V, we extend the template to the sample volume by padding it with zeros, creating a function c(x) that is now defined everywhere, but zero outside the original template volume. We further stipulate that, if the value x – y falls outside the volume, it ‘‘wraps around,’’ that is, it is periodically continued. This periodical continuation is a technical detail mandated by the use of the Fourier transform (see next section). The global cross-correlation coeYcient of this extended template, shifted to position y, then reads: P x2V f ðxÞ f cðx yÞ rglob ðyÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð17Þ P 2 P cðxÞ2 f ðxÞ f x2V x2V Since multiplying any number with zero gives zero, the numerator here, that is, the cross-correlation function, Eq. (2), is unchanged as compared to the local function (16). In the denominator, the second term again remains unchanged whether we sum over Vtempl or V, but in the first term, the summation now includes the full volume V rather than the shifted template volume Vy: this slight change vastly simplifies the calculation, as the numerator is now completely independent of the template position y and needs only to be calculated a single time, if at all, as the relative magnitude of the correlation coeYcient at diVerent y is completely determined by the correlation function in the numerator. The diVerence between the global and the local cross-correlation function can be seen by considering the diVerent interpretations: the global cross-correlation coeYcient measures the probability that the full volume is identical, modulo noise, to a volume that contains a single copy of the template at position y, and zeros otherwise. The second part of this statement is, of course, wrong in general, as there will be other objects and noise in the volume. The local cross-correlation function only considers information in a volume of the size of the template volume
627
25. Localization of Protein Complexes by Pattern Recognition
around the position y. By choosing the correct, localized normalization in the denominator of the correlation coeYcient, we ensure that the comparison between template and volume is only performed over the appropriate subvolume. Note that again an equivalent way of writing Eq. (16) is by applying a mask to the sample volume, zeroing out all components outside a template volume around y: P x2V f ðxÞ f cðx yÞ rðyÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð18Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 P P 2 x2V f ðxÞ f wy ðxÞ x2V cðxÞ Here wy(x) is a function that is unity only if x 2 Vy , and zero otherwise. After applying the mask, and subtracting the local average f , global and local correlation functions coincide. However, the position of the mask depends on y, so it must be applied for every y individually. This general principle that a locally significant comparison of two functions can be achieved by applying the same mask to both, and comparing only the information remaining after masking, will be applied in the section on dealing with the missing wedge below.
G. Fourier-Space Representation An attractive feature of the cross-correlation function is that it can be eYciently computed using the Fourier transform. From Eq. (2), the calculation of the crosscorrelation function for a spindle point y requires N0 ¼ jVy j ¼ jVtempl j operations, that is, as many as there are voxels in the template volume. To calculate it for all possible y thus requires NN0 operations, where N ¼ |V | is the number of voxels in the sample volume being searched. With volumes of the order of 109 voxels and templates of size 106, the resulting 1015 operations would be prohibitively large even with today’s computer speeds that approach 1010 operations per second and per processor. Luckily, the convolution theorem of harmonic analysis allows us to turn the calculation of the correlation function into a simpler operation on the Fourier transforms of the volume and the template. It states that the Fourier transform of the correlation function: X Cf ;g ðyÞ ¼ f ðxÞgðx yÞ ð19Þ x
can be computed as the direct product of the Fourier transforms of f and the complex conjugate of g: F Cf ;g ðkÞ ¼ F f ðkÞ F gðkÞ
ð20Þ
where k runs over all Fourier frequencies. For the Fourier transform to be applicable, the functions f and g must be defined in the same volume, and x runs over all voxels in the volume. Since y – x may then fall onto a point outside the volume, the volume must be extended periodically, that is, the coordinates ‘‘wrap
Christoph Best et al.
628
over’’ to the other side at the edge of the volume. Usually, this is a technical detail that can be satisfied by suYciently padding and extending the sample and template volumes, and/or applying appropriate filters at the edges. The calculation thus requires two Fourier transforms, one multiplication and one inverse Fourier transform. Since the complexity of a fast Fourier transform (FFT, a specially optimized algorithm to calculate the Fourier transform) is proportional to N log N, and the multiplication just to N, the total number of operations does not grow faster than N log N, which is significantly better than the NN0 number of operations in the original problem, at least if N and N0 are large. This is, of course, an estimate, and the total eYciency of the algorithm depends on the size of the volume and the template. In most cases, however, the calculation in Fourier space is vastly more eYcient. The calculation of the locally normalized cross-correlation function can also be sped up using the Fourier transform (Roseman, 2003). If we define a function wy(x) that is unity if x is inside the volume Vy, and zero otherwise, we can write the local power of f(x) as: X X X f ðxÞ2 ¼ f ðxÞ2 wy ðxÞ ¼ f ðxÞ2 w0 ðx yÞ ð21Þ x
x2Vy
x
This again has the form of a correlation function and can be calculated by multiplying the Fourier transform of f(x)2 and of the function w0(x): X ð22Þ f ðxÞ2 ¼ F 1 F f 2 F w0 x2Vy
H. Probabilistic Interpretation Up to this point, we have just posited that Eqs. (5) or (17) are good measures of the similarity between a template and a subvolume. The theoretical underpinning of this assumption lies in the theory of normally distributed errors and the stochastic model of the imaging process. In this theory, it is assumed that the observed image f can be mathematically expressed as a combination of an ideal value f0 and some unknown noise e: f ðxÞ ¼ f0 ðxÞ þ eðxÞ
ð23Þ
The actual value of the noise e(x) is of course unknown, but we can assume that its probability distribution is known. In some cases, it may be reasonable that the noise is distributed according to a normal (or Gaussian) distribution with certain characteristics. Knowing this probability distribution allows us to ask how probable it is that a certain observed image f (x) has arisen from an idealized image f0(x) by calculating the diVerence image and evaluating it according to the probability distribution of the noise. In the theory of normal distributions, for example, a central role is given to the mean-square deviation, Eq. (5), generally called w2.
25. Localization of Protein Complexes by Pattern Recognition
629
It is, in this theory, a direct measure of the probability that the diVerence between two functions can be attributed to chance. In general, the idealized image f0(x) is not exactly known. In template matching, it depends on the displacement y of the template, the Euler rotation angles, and also an overall proportionality factor and shift in the voxel values. Probabilistic modeling oVers one technique to systematically deal with this ignorance by means of the maximum likelihood principle. The maximum likelihood principle states that the unknown transformation parameters of the image can be estimated by choosing them in such a way that the image they produce is the most probable match for the actual image. It is a very general principle and underlies many parameter estimation problems in statistics, such as linear fits. In principle, it requires us to systematically consider all values of the unknown parameters, apply them to the template and compare the resulting template image to the actual image, and then choose the best correspondence. Luckily, in some cases (such as linear fits) this problem can be solved analytically without actually enumerating all values, but in other cases (such as the rotation and translation parameters), this is not possible, and enumeration is the only viable way to proceed. While the maximum likelihood method provides us with the most likely fit of our unknown parameters (position and orientation of the template) to the observed image, it does not make any statement whether this fit is the only likely one or one of many similar ones. It is possible to calculate the probability that the observed image arose from the template, but it is much more diYcult to make any statement about the probability of this fit among all possible fits in the maximumlikelihood framework. Such questions are especially important when diVerent templates have to be considered under noisy conditions. Such issues are beyond the scope of this chapter, and they often require more advanced statistical techniques such as Bayesian estimation and Monte Carlo analysis. I. Practical Implementation In the preceding sections, we have introduced the correlation function, its correct normalization, and its probabilistic interpretation. Putting the pieces together, template matching in this approach requires the calculation of the locally normalized cross-correlation function, Eq. (16), for all (or a suYciently large number) of Euler rotations (f, y, c) of the template. The maxima of the resulting cross-correlation coeYcient: P x2Vy f ðxÞ f Rðf;y;cÞ cðx yÞ rðf;y;cÞ ðyÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð24Þ P 2 2P f ðxÞ f R cðxÞ x2Vy x2Vtempl ðf;y;cÞ give the most probable locations of the template in the volume V. For any given set of Euler angles, both the numerator and the denominator can be eYciently calculated in Fourier space, resulting in one 3D function per Euler angle
Christoph Best et al.
630
Sample volume Template
Correlation
Detection 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 −0.1 −0.2
80 70 60
50 40 30
20
10
0
0
10
20
30
40
50
60
80 70
Fig. 3 The template matching process for the detection of macromolecules. Possible locations of molecules can be identified by looking for peaks in the cross-correlation functions between a set of template volumes and the sample volume.
combination. The local peaks of this function indicate likely locations of the template in the orientation given (Fig. 3). If the template volume Vtempl is rotationally symmetric, the local normalization factor in the denominator does not depend on the Euler angles, and one calculation of the localized normalization factors is suYcient for all Euler angles, significantly speeding up the calculation. Typically, the space of Euler angles is sampled at 10 or 5 increments, resulting in 23,328 and 186,624 diVerent combinations, respectively, and making it infeasible to store the complete function rðf;y;cÞ ðyÞ in core memory. Rather, one
25. Localization of Protein Complexes by Pattern Recognition
631
computes it consecutively for all Euler angles, keeping only the maximum values at each location and the corresponding values of the Euler angles in memory. Local peaks in the maximum values for each location then identify possible locations of the template in the sample volume in any orientation. As usual in image processing, suitable filtering is applied to the image before the correlation coeYcients are calculated. This filtering removes Fourier components that carry little or no information of relevance. Usually a band-pass filter is applied that removes both high-frequency information that are often noise, and low-frequency areas that correspond to features significantly larger than the objects under consideration.
III. The Missing-Wedge Problem A. Tomographic Imaging In tomographic imaging, a series of projection images is acquired by mechanically tilting the specimen around a fixed axis (see Leis et al., 2006, for a recent review of the image-processing problems associated with this technique; Chapter 29 by Fo¨rster and Hegerl, this volume). The images acquired in this process show the projections of the electrostatic density in the object, that is, each pixel represents the total density along a straight line, or ray, perpendicular to the imaging plane. Features of the object therefore appear superimposed on the individual images, but a mathematical procedure based on the Radon transform allows one to disentangle the individual features in the object and recover the 3D locations by combining the information from diVerent tilt angles. The biggest artifact in tomographic reconstruction comes from the limited tilt range that is accessible experimentally. This limit is caused simply by the thickness of the vitreous ice layer and the holder in which the specimen is suspended and which, at some inclination, becomes impenetrable to the electrons. Tilt angles beyond 70 are rarely accessible, and the information corresponding to these directions is not included in the reconstruction. In Fourier space, this corresponds to a cone- or wedge-like area around the positive and negative z-axis where no information is available, the socalled missing wedge. In ordinary space, this is manifest as an elongation or smearing of the object along the z-axis, caused by the fact that these Fourier components are missing and features in this direction lack definition. This missing wedge causes a serious problem for template matching, as it distorts the shape of the observed macromolecules. To understand the eVect of the missing wedge on the correlation function, one should remember that it does not matter whether the objects are compared in real space or in Fourier space. The missing wedge is a very pronounced feature in Fourier space, and a patternmatching algorithm, such as the correlation function, will have a tendency to fit this feature to some similar feature of the template. This process is in competition
Christoph Best et al.
632
to the alignment with the actual distinctive features of the molecule, and if it wins, the molecule will be misaligned and/or misidentified. B. Restricted Cross-Correlation Coefficients The missing wedge can be dealt with in a similar way as with the local normalization discussed in Section II.F (Fo¨rster, 2005). There, we dealt with the problem that the comparison between the template and the volume should only be performed inside a localized subvolume in which the template is defined. This was achieved by choosing as the correct normalization of the correlation coeYcient the total power contained in the subvolume under consideration. With the missing wedge, we have the similar situation that the comparison should be limited to the information outside the missing wedge. In the definition of the correlation coeYcient, Eq. (16), correct normalization was achieved by summing over only these elements x for which information is available, that is, the shifted subvolume Vy. Another way of including this weight function in the calculation is to set all elements of the sample volume and the template, for which no information is available, to zero and applying this transformation before the numerator and denominator of Eq. (16) are calculated. In the case of the missing wedge, the unavailable information is a volume in Fourier space. Applying the previous prescription, this can be taken into account by setting the Fourier coeYcients of both the template and the sample volume inside the missing wedge to zero. Denoting the Fourier transform of the template by F cðkÞ, the template is replaced by a smeared template: c0 ðxÞ ¼ F 1 ½wðkÞF cðkÞ
ð25Þ
where w(k) is the wedge function, which has the value one inside the experimentally accessible cone and zero outside it. This operation is a multiplication in Fourier space that can be translated to a convolution in position space: X c0 ðxÞ ¼ PSFðx yÞ_cðyÞ ¼ ðPSF cÞðxÞ ð26Þ y
The point-spread function PSF(x – y) is the Fourier transform of the wedge function; it describes how a single bright point would appear under the influence of the missing wedge. Applying the previous recipe for accounting for unknown information, it is suYcient to replace the template c(x) by the smeared-out template c0 (x), both in the numerator and the denominator of the locally normalized cross-correlation function, Eq. (24). The smearing operation must be performed after the template has been rotated: P x2Vy f ðxÞPSF Rðf;y;cÞ c ðx yÞ rðf;y;cÞ ðyÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð27Þ 2ffi P 2 P f ðxÞ PSF R c ðxÞ ðf;y;cÞ x2Vy x2Vtempl
25. Localization of Protein Complexes by Pattern Recognition
633
This means that now, even if the template volume is spherical, the second term in the denominator will depend on the rotation angles, reflecting the fact that due to the missing wedge, diVerent amounts of information are available in diVerent orientations of view.
IV. Applications The usefulness of template matching in 3D to the identification of macromolecules in ET was first explored by Bo¨hm et al. (2000). This study investigated the power of the cross-correlation coeYcient to discriminate between diVerent macromolecules, namely the 20S proteasome, the GroEL chaperonin, and the thermosome using both synthetic data and real electron-tomographic images of isolated macromolecules. Templates for the three molecules were generated either from atomic models available in the Protein Data Base (for the 20S proteasome and GroEL) or from pseudo-atomic models derived from EM (for the thermosome). Tomographic sample volumes were generated both synthetically, by filtering and adding colored noise, and from tomographic reconstructions of actual tilt series obtained from purified samples embedded in vitreous ice. The diVerent templates were then compared to the sample volumes using the cross-correlation coeYcient. Histograms of the coeYcient show that it generally takes on a larger value whenever the correct template is used. The quantitative accuracy of the particle identification depends on the combination of template and sample molecule in use, as well as on the resolution of the images. However, it was found that, in this setting, the cross-correlation function was highly significant, at several standard deviations, in discriminating between the diVerent objects. Performance was especially good for the 20S proteasomes, which diVer markedly in size from other two species, while the discrimination between GroEL and the thermosome required a resolution better than 4 nm. This chapter also considered the use of an alternative approach to particle identification based on segmentation of the volume using an anisotropic diVusion technique called MCM. In this approach, particles are not identified according to a template, but subvolumes exhibiting surfaces of low curvature such as would by typical for macromolecules or cellular compartments. This is used to find interesting regions of the tomogram that contain macromolecules before the correlation function is calculated, thus saving computing time. To demonstrate the performance of the method in a cellular context, Frangakis et al. (2002) have applied template matching to the detection of macromolecules in ‘‘phantom cells’’ constructed from liposome vesicles that mimic real prokaryotic cells in size and shape, but whose content is known (Fig. 4). Again, the thermosome and the 20S proteasome were used as sample macromolecules. A solution of purified proteins was encapsulated in lipid vesicles with an average diameter of 400 nm
634
Christoph Best et al.
Fig. 4 Detection of macromolecules in a phantom cell. The image shows a liposome vesicle (white) in which the locations of macromolecules have been identified by cross-correlation with templates. The molecule locations and orientations are marked by color-coded templates pasted into the volume. Adapted from Frangakis et al. (2002).
and embedded in vitreous ice. The resulting tomograms were investigated using the local correlation coeYcient. The template was constructed from atomic-scale models of the macromolecules. To optimize the template, diVerent band-bass filters were applied and the cross-correlation function inspected manually. Locally normalized cross-correlation coeYcients were calculated with both the correct and the incorrect template for the tomogram under consideration. The authors found that there was a clear separation in the values of the correlation coeYcients between both templates, leading to identification accuracies of 96% for the 20S proteasomes and 87% for the thermosomes, at a resolution of about 4 nm. Applying the algorithm to phantom cells that contain a mixture of both species, the molar ratio was retrieved with a deviation of less than 10%. This led to the conclusion that macromolecular complexes in the size range of 0.5–1 MDa can be identified with satisfactory fidelity. There remains, however, the question of how this identification algorithm is aVected by molecular crowding and the presence of other cellular structures in the tomogram. Recent applications of the template matching technique have focused on particle averaging, which is described elsewhere in this book (see Chapter 29 by Fo¨rster and Hegerl, this volume). Here the template matching is used to locate and align subvolumes containing the macromolecule under consideration which are then averaged to produce a density map with improved resolution. This is similar
25. Localization of Protein Complexes by Pattern Recognition
635
to the single-particle method without the problem of assigning projection angles to the images. Fo¨rster et al. (2005) studied the structure of envelope glycoprotein (Env) complexes on intact Moloney murine leukemia retrovirus particles using the restricted crosscorrelation coeYcient. These trimeric transmembrane complexes stud the surface of the retrovirus. They were located manually on the surface of the virions, and the restricted cross-correlation function was used to align the particles to a common template. The authors could take advantage of the fact that the complexes are oriented radially on the spherical surface of the virus, so that only one rotational degree of freedom had to be considered initially. Averaging of several thousand particles resulted in a reconstruction with about 2.5-nm resolution. Beck et al. (2004) used a similar approach to the nuclear pore complex in Dictyostelium discoideum. Again, the restricted cross-correlation function was used to align manually identified copies of the complex, and an improved density map was found by averaging. Rath et al. (2003) explored the application of the locally normalized crosscorrelation function to identify specific ribosomal proteins in a tomogram of the Escherichia coli ribosome. They also looked at ryanodine receptors in a suspension of vesicles derived from skeletal muscle sarcoplasmic reticulum and transverse tubules. In the first case, the results could be compared against the results of realspace refinement in which the known structures of the ribosomal proteins had been fitted in the cryo-EM density. Here, the cross-correlation function was able to identify the location of the proteins to within two pixels, and the angular orientation was specified to 8 . In the second case, a template volume was picked manually from the tomogram and used to identify other occurrences of this motif. Most particles were found in plausible locations on the vesicle membranes. Ortiz et al. (2006) applied the restricted cross-correlation function to identify 70S ribosomes in Spiroplasma melliferum (Fig. 5). The authors were able to identify several hundred of these macromolecular complexes in a single tomogram and average the aligned subtomograms to create a density map with a resolution of 4.7 nm that matched the structure of an E. coli 70S ribosome. Similar to Bo¨hm et al. (2000), an alternative method of particle identification based on the scaling index segmentation method was considered. This method identifies possible particles based on fractal dimensional properties of the subvolumes; since it does not make use of a template, the method is not able to discriminate between diVerent particles. Still an overlap of 70% was observed between the two methods.
V. Conclusions Improved methods for recognizing patterns in highly noisy 3D tomographic reconstructions are helping to develop cryo-ET into a useful tool for mapping the macromolecular content and interactions of cells in near-life conditions. At the
636
Christoph Best et al.
Fig. 5 Identification of ribosomes in Spiroplasma melliferum. The left-hand side shows a reconstructed tomographic slice with macromolecular complexes visible as dark regions. In the 3D view on the right-hand side, locations and orientations of 70S ribosomes identified by template matching are indicated by copies of the templates (in blue and yellow) (image by J. Ortiz, see Ortiz et al., 2006).
resolutions attainable today, large macromolecular complexes can already be reliably identified and discriminated from each other both in isolation and in phantom cells. Recent progress indicates that, at least in relatively simple cells, satisfactory results can be obtained with intact cells as well. Further improvements are under way in the cryo-EM pipeline. These range from sample preparation and sectioning to the automation of the image acquisition and postprocessing process to intelligent pattern recognition algorithms. Meanwhile, large libraries of templates are becoming available from, among others, the various structural genomics eVorts. Given the progress we may reasonably expect from all this work, the method described here holds the promise of becoming a viable tool for the study of biologically significant protein interactions that underly the processes of life.
25. Localization of Protein Complexes by Pattern Recognition
637
References Aloy, P., and Russell, R. B. (2006). Structural systems biology: Modelling protein interactions. Nat. Rev. Mol. Cell. Biol. 7, 188. Baumeister, W. (2004). Mapping molecular landscapes inside cells. Biol. Chem. 385(10), 865–872. Baumeister, W., and Steven, A. C. (2000). Macromolecular electron microscopy in the era of structural genomics. Trends Biochem. Sci. 25(12), 624–631. Beck, M., Fo¨rster, F., Ecke, M., Plitzko, J. M., Melchior, F., Gerisch, G., Baumeister, W., and Medalia, O. (2004). Nuclear pore complex structure and dynamics revealed by cryoelectron tomography. Science 306(5700), 1387–1390. Bo¨hm, J., Frangakis, A. S., Hegerl, R., Nickell, S., Typke, D., and Baumeister, W. (2000). Toward detecting and identifying macromolecules in a cellular context: Template matching applied to electron tomograms. Proc. Natl. Acad. Sci. USA 97(26), 14245–14250. Duda, R. O., Hart, P. E., and Stork, D. G. (2001). ‘‘Pattern Classification.’’ John Wiley & Sons, New York. Fo¨rster, F. (2005). Quantitative Analyse von Makromoleku¨len in Kryoelektronentomogrammen mittels Korrelationsmethoden. Ph.D. Thesis, Technische Universita¨t Mu¨nchen. Fo¨rster, F., Medalia, O., Zauberman, N., Baumeister, W., and Fass, D. (2005). Retrovirus envelope protein complex structure in situ studied by cryo-electron tomography. Proc. Natl. Acad. Sci. USA 102(13), 4729–4734. Frangakis, A. S., Bo¨hm, J., Fo¨rster, F., Nickell, S., Nicastro, D., Typke, D., Hegerl, R., and Baumeister, W. (2002). Identification of macromolecular complexes in cryoelectron tomograms of phantom cells. Proc. Natl. Acad. Sci. USA 99(22), 14153–14158. Frank, J. (1972). Two-dimensional correlation functions in electron microscope image analysis. In ‘‘Proceedings of the Fifth European Congress on Electron microscopy,’’ pp. 622–623. Institute of Physics, London, UK. Frank, J. (1996). ‘‘Three-Dimensional Electron Microscopy of Macromolecular Assemblies.’’ Academic Press, San Diego, CA. Frank, J. (2002). Single-particle imaging of macromolecules by cryo-electron microscopy. Annu. Rev. Biophys. Biomol. Struct. 31, 303–319. Frank, J., and Wagenknecht, T. (1984). Automatic selection of molecular images from electron micrographs. Ultramicroscopy 12, 169–176. Hastie, T., Tibshirani, R., and Friedman, J. H. (2003). ‘‘The Elements of Statistical Learning.’’ Springer-Verlag, New York, Heidelberg, Berlin. Huang, Z., and Penczek, P. A. (2004). Application of template matching technique to particle detection in electron micrographs. J. Struct. Biol. 145(1–2), 29–40. Leis, A. P., Beck, M., Gruska, M., Best, C., Hegerl, R., Baumeister, W., and Leis, J. W. (2006). Cryo-electron tomography of biological specimens. IEEE Signal Process. Mag. 23(3), 95. Lucic, V., Fo¨rster, F., and Baumeister, W. (2005). Structural studies by electron tomography: From cells to molecules. Ann. Rev. Biochem. 74, 833–865. Nickell, S., Kofler, C., Leis, A. P., and Baumeister, W. (2006). A visual approach to proteomics. Nat. Rev. Mol. Cell. Biol. 7(3), 225–230. Ortiz, J., Fo¨rster, F., Kurner, J., Linaroudis, A., and Baumeister, W. (2006). Mapping 70S ribosomes in intact cells by cryo-electron tomography and pattern recognition. (In press). Rath, B. K., Hegerl, R., Leith, A., Shaikh, T. R., Wagenknecht, T., and Frank, J. (2003). Fast 3D motif search of EM density maps using a locally normalized cross-correlation function. J. Struct. Biol. 144(1–2), 95–103. Roseman, A. M. (2000). Docking structures of domains into maps from cryo-electron microscopy using local correlation. Acta Crystallogr. D Biol. Crystallogr. 56, 1332–1340. Roseman, A. M. (2003). Particle finding in electron micrographs using a fast local correlation algorithm. Ultramicroscopy 94(3–4), 225–236.
638
Christoph Best et al. Sali, A., Glaeser, R., Earnest, T., and Baumeister, W. (2003). From words to literature in structural proteomics. Nature 422, 216–225. Volkmann, N., and Hanein, D. (1999). Quantitative fitting of atomic models into observed densities derived by electron microscopye. J. Struct. Biol. 125, 176–184. Walz, J., Typke, D., Nitsch, M., Koster, A. J., Hegerl, R., and Baumeister, W. (1997). Electron tomography of single ice-embedded macromolecules: Three-dimensional alignment and classification. J. Struct. Biol. 120(3), 387–395. Zhu, Y., Carragher, B., Glaeser, R. M., Fellmann, D., Bajaj, C., Bern, M., Mouche, F., de Haas, F., Hall, R. J., Kriegman, D. J., Ludtke, S. J., Mallick, S. P., et al. (2004). Automatic particle selection: Results of a comparative study. J. Struct. Biol. 145, 3–14.