Interactive map projection algorithm for illustrating protein surfaces D J Barlow and J M Thornton Crystallography
Department,
Birkbeck College, London University, Malet Street, London WClE 7HX, UK
An interactive map projection algorithm and cluster analysis program are described which can be used in the display and analysis of protein surfaces. The application of the techniques to the analysis of protein charge distributions is described, and a brief discussion presented on various other applications. Keywords: charge distributions, cluster analysis, map projection, protein surfaces received 5 November 1985, accepted 18 November 1985
With the development of various ‘dot-surface’ and spacetilling representations IA, there has been extensive use made of computer graphics for studying the surfaces of proteins and other macromolecules~. However, while these models are easy to use and interpret in 3D (i.e. on the graphics screen) they are less valuable when transcribed into 2D. In particular, since one half of a molecule is usually hidden or obscured by foreground detail, it is impossible to view the complete surface in a single figure. One solution to this problem is to simplify the surface representation using a suitable form of map projection. This technique has already been used in a study of the spatial distribution of amino-acids in proteins9 and as a preliminary step in the analysis of their surface topographylO. In this paper, the development of an interactive map projection algorithm is described that has been used to produce ‘surface maps’ for the display and analysis of protein charged group distributions. The maps are ideally suited to this kind of analysis and may generally be applied in the study of any surface group distribution in globular proteins. CHOICE
OF PROJECTION
When choosing from the many map projections able”+‘* the criteria used were that: l l
l
avail-
the map should allow the entire charge distribution of a protein to be visualized in a single figure, there should be the minimum of distortion during the projection process, so that the relative positions of charges would be conserved between protein and map, and (most important of all) the map should provide an equal area projection of the protein’s surface, since
Volume 4 Number 2 June 1986
it is the distribution interest.
of surface groups which is of
The two map projections found to satisfy these criteria were the Mollweide and Hammer projectionsi2. Although both are equal-area projections the latter was considered the most suitable for present purposes. This is because it gives less distortion when used to represent the surface of a globe on a single continuous map. Since the parallels are represented as curved (rather than straight) lines, distortion in the polar regions is less evident than in maps produced with the Mollweide projection. Also, because the meridians and parallels meet at less oblique angles than in the Mollweide case, the Hammer projection gives less distortion at the ‘east’/ ‘west’ boundaries13. In the case of the Mollweide projection there is significant distortion here unless the map is interrupted (or recentred), a feature which makes it less easy to appreciate the surface displayed. HAMMER
PROJECTION
The map coordinates for a Hammer projection (x, y) are readily calculated from the protein’s polar coordinates (r, 8, h) using the relationships’*: 2.2*.R.cosO.sinh/2 * = (1 + c0&c0sh/2)f
(1)
2f.W.sin0 y = (1 + cose.cosh/2)f
(2)
where R (the radius of the ‘globe’ used to generate the map) is taken as the mean of the distances between all charges and the centroid of the protein. Implicit in this approximation is the need to assume a spherical distribution of the charged groups. The extent of distortions arising in cases where there is imperfect spherical symmetry can be judged from the standard deviations from the mean for all charge to centroid distances (see Table 1). It should be noted that the errors incurred in this way, are more significant with regard to the value of a map, than those produced during the actual projection process. The general form of the Hammer projection, together with the relationship between map and protein surface is shown in Figure 1. Map projections for bovine ribonuclease14 and ferredoxin15 show how they can be used to illustrate the distributions of charged groups in pro-
0263-7855/86/020097-04 $03.00 @ 1986 Butterworth & Co (Publishers) Ltd
97
Table 1. Sample data to illustrate the variation in charged group to centroid distances for a number of proteins of different shapes and sizes. represents the mean distance between charged groups and the centroid of each protein, o, is the corresponding standard deviation and N is the total number of charged groups present in each molecule. LADH is used as an abbreviation for liver alcohol dehydrogrenase. Protein
0,
N
Molecular dimensions (A)
Subtilisin Ferredoxin Crambin LADH Flavodoxin
19.6 11.8 8.7 23.8 16.8
3.6 1.5 3.0 6.5 2.5
30 10 6 82 42
22 x 11 x 20x 21 x 19 x
a
W
23 x 19 14 x 11 9x14 23 x 40 19 x 14
b
teins (see Colour Plate 1). The map projections were calculated using a PDP 1l/60 and were drawn using a Trilog Colourplot device.
INTERACTIVE
PROJECTION
ALGORITHM
The disadvantage of plotting the map projections directly is that the views obtained are (by necessity) standard and frequently do not highlight the features of interest. To overcome this problem, allowing the user to reorient a protein to optimize the view seen in projection (and also to minimize distortion), the program was developed to allow simultaneous display of both map and protein model on an Evans and Sutherland PS 2 colour computer graphics system. The protein is displayed in the centre of the screen with a-carbon atoms linked by virtual bonds and labelled by means of residue number. The corresponding map projection, which illustrates the protein’s charge distribution, is displayed below the molecular model. As the molecule is rotated on the screen (by means of a tracker ball) the map coordinates for its charged groups are modified and the view seen in projection is updated. (Cartesian coordinates for the charged groups are transformed using the current picture system rotation matrix, and map coordinates for the groups are recalculated as these are converted to new polar coordinates.) Options which are selected from a screen menu allow for enlarged views of the protein or map to be displayed independently. The program also displays the protein’s electric dipole moment, calculated according to the relationship: N
p = Cqi~
(Debye)
where x is a vector drawn from the centroid of the protein to the ith charged group (charge qi = f l), and N is the total number of charged groups in the molecule. The dipole moment is shown as a vector drawn in the appropriate direction on the protein molecular model, with its positive and negative ends labelled as + and - respectively. The points at which the ends of the vector intersect the ‘generating globe’ are labelled on the map as X and 0 respectively (see Colour Plate 1). Data made available to the user at the terminal includes the magnitude of a protein’s dipole moment, together with the mean and standard deviation for all charge to centroid distances. 98
Figure 1. A schematic illustration of the stages involved in producing a Hammer map projection to show the distribution of charged groups on the surface of a protein. (a) The positions of charged groups (labelled w-z) are projected onto the surface of a sphere with radius r (referred to as the generating globe). r is calculated as the mean distance between all charged groups and the centroid of the protein (labelled 0). In the Figure, the protein is illustrated as a solid ‘thread’ and the projected locations of chargedgroups are shown by means of crosses. Crosses drawn with solid lines refer to points on the front hemisphere of the generating globe, those drawn with dashed lines refer to points on the rear hemisphere. N and S mark the poles of the globe; E and W mark its east-west boundaries. (b) Segments of the generating globe (labelled l-8) are displayed here so as to demonstrate the relation between globe and map. The rear hemisphere has been bisected along a line joining the poles (N and S) and the 2 quarter-spheres produced then folded forwards as shown by the arrows. Labels for other details shown are as described in Figure 1 (a). (c) A Hammer map projection illustrating the distribution of charged groups shown in Figure 1 (a). Regions of the map corresponding to the segments of the globe shown in Figure 1 (b), are appropriately labelled 1 to 8 and are alternately shaded to improve clarity. N, S, E, W and o show the positions corresponding to those similarly labelled in Figure 1 (a) and (b). Lines of ‘latitude’ and ‘longitude’ are shown by means of dashed lines and are drawn at intervals of 45”
Journal of Molecular Graphics
CLUSTER ANALYSIS Although the analysis of a protein’s charge distribution can be made simply by inspecting the corresponding map projection, in most cases it is safer to use an automatic ‘cluster analysis’ program. The cluster analysis program employed in the present work uses a modified form of the Kmeans algorithm? The M charged groups in a protein are initially partitioned into N clusters, where for practical reasons it is suggested that N must usually be less than M/516. This assignment is arbitrary but is best made so that all clusters contain roughly equal numbers of charges. The squares of the distances (E(i)) between all charges in the ith cluster and their centroid is calculated for all N clusters, and the N values of E(i) are summed to give the function D. By repeated exchanges of cluster members a minimum value of D is obtained so that an optimal partitioning of the M charges into N clusters is found. The optimum is achieved with no reduction or increase in the number of clusters so that the algorithm detects only a local optimum which is dependent upon the value chosen for N. In order to find a ‘global’ minimum it is necessary to make repeated minimizations (or runs) using different values of N. After each run 0) the values of D(j) and o(i) are recorded, where Do is as defined above and o(i) is the standard Optimal portitioninq
I 0.6
‘\
t
I
Number
of clusters
(A/)
Figure 2. Sample data illustrating the progress of cluster analysis of the charge distribution of ferredoxin. The molecule contains 34 charged groups (including the Nand C-termini) which are partitioned into 2,3,4,, . . IO clusters. After each run with N clusters (N > 2) the values of the functions D and 0 are compared to their values for the preceding (N - 1) run. The fractional changes are plotted here; D:(- - - - -); in these functions ). As shown in the Figure, the chargedgroups (3:( are optimally partitioned into 8 clusters. The optimal partitioning is identified by peaks which are coincident in the two curves
Volume 4 Number 2 June 1986
deviation of the final E(i) values. For all runs except the first, the fractional changes in Do‘) and o(j) are then calculated, and the partitioning 0) selected as the ‘global’ optimum is taken as the one which gives the largest values of these two parameters. Sample data to illustrate the method is shown in Figure 2. For proteins where the number of charged groups (M) < 15 the cluster analysis as described has little practical use since only one partitioning is determined (i.e. with N = 2). By taking values of N > M/5 the likelihood of obtaining clusters with only one charge increases, and under these conditions the algorithm ceases to function as required; E(i) for a cluster with only one member is zero and such clusters are ignored during the exchange process. However, the criterion N < M/5 is entirely arbitrary, since the value of N which produces single-charge clusters varies from one distribution to another. In the analysis of protein charge distributions described here it was sufficient to try values of N in the range 2-M/3 and then to ignore any runs which gave clusters with only one member. CLUSTER ‘CONTOURING’ The results of cluster analysis can be conveniently summarized on a Hammer map projection. Figure 3 shows the clustering of charged groups in the bacterial protein ferredoxin15. The molecule contains 34 charged groups which are optimally partitioned into eight clusters. The members of these eight clusters are ringed by broken lines and other partitionings are indicated by dotted lines. (All cluster boundaries are drawn by hand.) It is noted that the Hammer map projection provides an accurate summary of the molecule’s charge distribution since the charged groups which are clustered in space also appear clustered on the map. Moreover, from the pattern of cluster contours, it is possible to tell how evenly the charged groups are spread over the protein’s surface. In regions where the groups are close together, they retain the same cluster partners during several different runs, e.g. Asp 63 and Glu 62 and 66. Where the charged groups are spread more sparsely they often exchange their cluster partners, e.g. Lys 98 and the Cterminus.
Figure 3. Hammer map projection illustrating the distribution of chargedgroups in ferredoxin. Clusters of charged groups are those identcj?ed by the Kmeans algorithm. Solid ) show how the charged groups are partilines ( tioned into 3, 6 and 7 clusters, dotted lines (........) show how they are partitioned into (the optimum) 8 clusters. Other details of the presentation are as described in the legend to Colour Plate I
99
APPLICATIONS OF SURFACE MAPS IN THE ANALYSIS OF PROTEIN CHARGE DISTRIBUTIONS Since the Hammer map projections allow all of a protein’s charged groups to be displayed in a single figure, they give a clear impression of how the groups are distributed over the whole of the molecule’s surface. Any clusters of charged groups which might be functionally important, can thus be readily identified by noting the positions of groups in relation to the dipole vector, and/or by studying the pattern of cluster ‘contours’. In the case of ribonuclease, the surface map shows that there is a marked clustering of charged groups around the enzyme’s active site (see Colour Plate 1). Most of these groups have a positive charge and several of them bind directly to the enzyme’s (negatively charged) RNA substrate (e.g. Kl, K7 and K41, Reference 14). As might be expected, the positive end of the dipole vector passes through the middle of the active site cleft. The clustering of charged groups in ferredoxin has already been considered by Ghosh et ~1.‘~.They identified three clusters, which correspond well with those identified using the Kmeans algorithm. Two of these clusters contain the charged groups which surround the two [Fe-S] centres, and the third contains the groups surrounding the ‘cleft’ region, Since the former two clusters have different charge compositions - one [Fe-S] centre being surrounded entirely by negative charges, the other by a mixture of positive and negative charges, it is interesting to consider how this might influence their redox potentials. From an analysis of the surface maps obtained for several other proteins” it can be noted that protein dipoles frequently act in the correct sense either to aid the binding of metal ion cofactors (e.g. Ca*+/parvalbumin) or to aid in the formation of protein-protein complexes (e.g. cytochrome c-cytochrome b.5). CONCLUSIONS Surface maps have proved useful in the study of protein charge distributions and it is considered that they may be valuable in a number of related areas. Preliminary work shows that they could be used to visualize the distribution of polar/apolar groups on a protein’s surface, providing a convenient way to identify surface ‘hydrophobic’ patches. Also, since the map symbols can be colour-coded according to the heights of surface groups above or below the surface of the generating
100
globe, the maps can be used to summarize details of surface topography, enabling simple predictions to be made about the conformations of molecular complexes (c.f. Reference 10). These and/or related maps might also be used to study protein antigenic determinants. ACKNOWLEDGEMENTS The authors acknowledge the support of the SERC and DJB is indebted to Dr I Tickle for help with graphics programming. The protein dimensions for Table 1 were kindly provided by Dr W R Taylor.
REFERENCES Pearl, L H and Honneger, A J. Mol. Gruph. Vol 1 (1983) pp 9-12 Tickle, I J, Borkakoti, N, Moss, D A and Palmer, R A J. Mol. Graph. Vol 1 (1983) pp 68-70 Henry, D R Comput. Chem. Vo17 (1983) pp 119-l 35 Langridge, R, Ferrin, T E, Kuntz, I D and Connolly, M L Science Vol211 (1981) pp 661-666 Weiner, P K, Langridge, R, Blaney, J M, Schaefer, R and Kollman, P A Proc. Natl. Acad. Sci. (U.S.A.) Vo179 (1982) pp 3754-3758 6 Getzoff, E D, Tamer, J A, Weiner, P K, Kollman, P A, Richardson, J S and Richardson, D C Nature Vol306 (1983) pp 287-290 7 Matthew, J B, Weber, P C, Salemme, F R and Richards, F M Nature Vol301 (1983) pp 169-171 8 Nakamura, H, Kusonoki, M and Yasuoka, N J. Mol. Graph. Vo12 (1984) pp 14-17 9 Prabhakaran, M and Ponnuswamy, P Macrorno~e~~~esVol 15 (1982) pp 3 14-320 10 Wodak, S J and Janin, J J. Mol. Biof. Vol 124 (1978) pp 323-342 11 Melluish, R K An introduction to the use of map projections Cambridge University Press, UK (193 1) 12 McDonnell, P W Jr Introduction to map projections Dekker, USA (1979) 13 Deetz, C H and Adams, 0 S U.S. coast and geodesic survey Government Printing Office, USA (1921) 14 Borkakoti, N, Moss, D S, Stanford, M J and Palmer, R A J. Cryst. Spect. Res. Vol 14 (1984) pp 467-494 1.5 Ghosh, D, O’Donnell, S, Furey, W Jr., Robbins, A H and Stout, C D J. Mol. Biol. Vol 158 (1982) pp 73-109 16 Spath, H Cluster ~~~ysis algorithm for data relation and classification of objects Horwood, USA (1980) 17 Barlow, D J and Thornton, J M unpublished results
Journal of Molecular Graphics