J. Mol. Biol. (1976) 102, 221-235
Similarity of Three-dimensional Structure Between the lmmunoglobulin Domain and the Copper, Zinc Superoxide Dismutase Subunit JANE S. RICHARDSON, DAVID C. RICHARDSON, KENNETH A. THOMAS
Department of Anatomy and Department of Biochemistry Duke University, Durham, N.C. 27710, U.S.A. ENID W. SILVERTON AND DAVID R. DAVIES
Laboratory of Molecular Biology, N I A M D D National Institutes of Health, Bethesda, Md 20014, U.S.A. (Received 11 August 1975, and in revised form 8 December 1975) A striking similarity in three-dimensional structure has boon observed in two functionally unrelated, sequentially non-homologous proteins: the immuneglobulin domain and the copper, zinc superoxide dismutase subunit. The immuneglobulin molecule contains several structurally similar domains composed of antiparallel fi strands forming a bilayer structure or flattened cylinder. The same topological folding pattern of the antiparallol fl strands into a bilayer structure of the same overall shape is found in suporoxide dismutase, and external loops occur in places equivalent to hyporvariable region loops. Quantitative comparisons of the various structures have been made and are discussed in detail. 1. I n t r o d u c t i o n
I n the past few years m a n y examples have been noted of significant similarities in basic three-dimensional folding pattern among the proteins whose conformations have been determined b y X-ray crystallography. So far, these similarities in threedimensional structure have fallen into three general categories. (1) Similarity of tertiary structure among protein families which share clear amino acid sequence homologies, such as the globins (Hendrickson & Love, 1971), the cytochromes c (Timkovich & Dickerson, 1973), and the trypsin-like serine proteases (Shorten & Watson, 1970). (2) Similarity of folding pattern (with or without clear sequence homology) for structural domains within a single protein, such as are found in the immunoglobulins (Schiffer etal., 1973; Poljak et al., 1973) and in the carp muscle calcium-binding protein (Kretsinger, 1972). (3) Similarity of three-dimensional folding pattern for functionally related domains in separate proteins which differ widely in the remainder of their backbone conformations, such as the nucleotide-binding domain found in lactate dehydrogenase, liver alcohol dehydrogenase, flavodoxin, and adenylate kinase (Rossmann et al., 1974; Sehulz & Sehirmer, 1974). 221
222
J . S. R I C H A R D S O N
ET
AL.
The current paper describes a fourth such type of similarity, a close resemblance of folding pattern between two functionally unrelated proteins with no evident sequence homology: the immunoglobulin structural domain and the Cu,Zn superoxide dismutase subunit.
2. Description and Comparison o f the Two Structures An IgG immunoglobulin molecule contains 12 domains (4 in the 2 light chains and 8 in the 2 heavy chains), all about the same size and with an equivalent internal disulfide bridge. The Fab fragment, the largest portion of an immunoglobulin for
Zn
FIG. 1. Stereo drawings of the ~-carbon backbone for (a) a n immunoglobulin variable domain (VH of McPC603) and (b) a copper, zinc superoxide dismutase subunit. Both are viewed from the same direction as the schematic drawings in Fig. 4. The hypervariable residues are indicated by solid circles in (a).
IMMUNOGLOBULIN
AND SUPEROXIDE
DISMUTASE
223
w h i c h a d e t a i l e d t h r e e - d i m e n s i o n a l s t r u c t u r e is k n o w n ( P o l j a k e~ aL, 1974; Segal ~ aL, 1974) c o n t a i n s f o u r s u c h d o m a i n s (V~.,VH,C~., a n d CH1) ; t h e s e h a v e n o w b e e n s h o w n t o b e s i m i l a r i n c o n f o r m a t i o n as well as sequence. This s i m i l a r i t y e x t e n d s across species, h u m a n a n d m o u s e s t r u c t u r e s b e i n g closely r e l a t e d ( P a d l a n & Davies, 1975). F i g u r e l ( a ) is a s t e r e o d r a w i n g o f t h e ~ - c a r b o n b a c k b o n e o f a n i m m u n o g l o b u l i n v a r i a b l e d o m a i n . F i g u r e 2(a) is a d i a g r a m o f i t s t o p o l o g y . T h e b a s i c f o l d i n g p a t t e r n c o m m o n t o all t h e h n m u n o g l o b n l i n d o m a i n s c o n t a i n s s e v e n s t r a n d s o f ~ s t r u c t u r e ( s t r a n d s A t h r o u g h G i n Fig. 2(a)), all a n t i p a r a l l e l e x c e p t for t h e first a n d l a s t s t r a n d s . I n t h e v a r i a b l e d o m a i n s (VL a n d VH) t h e r e a r e i n a d d i t i o n t h r e e h y p e r v a r i a b l e regions, b e t w e e n ~ s t r a n d s B a n d C, C a n d D, a n d F a n d G. T h e s e
N
A
J
"-
s
(
J
E --"
-
D
9
! eoe o ooeoBoeo
o 0o0 o~ ~
~
eo 9 9 ooeeo ee~
--C
o
J
e o e o e o o o O~
"~
9
F"
oeoeeoo
G
~
",-
(a)
A J
B
E" -"""
..._
--
D C
"--
G
",-
(b)
F~G. 2. Topology diagrams of the fl structures of (a) an immunoglobulin variable domain and (b) a copper, zinc superoxide dismutase subunit. Each horizontal line represents a strand of the E-sheet, labeled as described in the text. The cylinders are shown as though opened between the N and C terminal strands, laid fiat on the page, and viewed from the outside. The dotted sections in (a) arc the hypervariablc regions; the topology of a constant domain would be the same, but with the dotted sections loft out.
224
J.S.
RICHARDSON
ET
AL.
hypervariable regions extend as loops to form the antigen-binding site, and they vary greatly in length and amino acid sequence from one ~mmunoglobulin molecule to another. The constant domains (Cr. and CH1) are appro~mately the same total length as variable domains (about 110 residues) but are missing the long loop between strands C and D; each fl strand is longer and the entire constant domain is somewhat longer and flatter than a variable domain. The seven fl strands of each domain are arranged in two layers which curve around to form a flattened cylinder. Strands A,B,E and D are in one layer and strands C,F and G in the other. For variable domains the curve on the fl sheet is fairly smoothly continuous, with three or four parallel-type hydrogen bonds, on the side between strands A and G, but strands C and D are rather widely separated. For constant domains the layers are quite distinct. In almost all variable domains the hypervariable region loop between strands C and D forms two more extended chains which approximately continue the C,I~,G layer of ~ sheet, forming a nine-stranded cylinder. In each domain a disulfide bridge crosses the cylinder between strands B and F. Further description of ~mmunoglobulin three-dimensional structures is available in the literature, including the antigen-binding sites, binding of small antigens, domain interactions, ~-carbon co-ordinates, hydrogen-bonding schemes, positions of invariant residues, etc. (Schiffer et al., 1973; Poljak et al., 1974; Epp et al., 1974; Segal et al., 1974; Davies ~ al., 1975a,b). The superoxide dismutases are enzymes which scavenge the highly reactive superoxide radical (02"-) by dismuting it to 02 and H202, a protection that seems to be essential for any organism capable of metabolizing oxygen (Fridovich, 1974). The form of this enzyme found in the cytoplasm of eukaryotes has two identical subunits each with about 150 amino acid residues, one zinc, and one catalytic copper. The principal feature of the three-dlmensional structure of bovine Cu2+, Zn 2+ superoxide dismutase (Richardson et al., 1975a,b) is a somewhat flattened cylinder (or "barrel") consisting of eight strands of fl sheet, all with antiparallel-type hydrogen bonding between neighbouring strands. Figure l(b) is a stereo drawing of the ~-carbon backbone of the superoxide dismutase subunit and Figure 2(b) is a diagram showing the topology of the fl structure. I f the first fl strand in the sequence is labeled N, then the remaining strands, A through G, have the same topological connectivity and occupy the same positions on the cylinder as the corresponding strands in an immunoglobulin domain. The superoxide dismutasc cylinder can also be thought of as two layers of fl sheet, with fl strands N,A,B and E in one layer and D,C,F and G in the other. The superoxide dismutase fl sheet is somewhat more smoothly cylindrical than that in the hnmunogiobulin domains (especially the constant domains). It also has a relatively wide separation next to strand D, but in this case between strands D and E. The D strand is within hydrogen-bondhag distance of E next to the D,E bend, but most of its length lies next to C. The similarity in shape and arrangement of the various fl cylinders can be seen in Figure 3, which shows end-on and side views of just the fl strands for a copper, zinc superoxide dismutase subunit, a variable immunoglobulin domain, and a constant immunoglobuliu domain. The cylinders are all somewhat flattened in the direction perpendicular to the two "layers"; the minor and major cross-sectional axes (measured between main chains) and the length of the cylinders are about 12 A • 16 A • 28 A for superoxide dismutase, about 10 A • 16 ~ • 28 ~ for variable domains, and about 10 A • 16 A • 33 A for constant domains. All of these cylinders have the
IMbIUNOGLOBULIN
AND SUPEROXIDE
DISbIUTASE
225
(o)
(b)
(c) FIG. 3. E n d - o n a n d side views of the ~ cylinder for (a) copper, zinc superoxide dismutase, (b) V s from MePC603, a n d (e) CH1 from MePC603. For t h e variable domain, ~ strands contributed b y hypervariable regions are shown with open lines.
usual direction of/~-sheet twist (right-handed if defined along the chains, left-handed if defined perpendicular to the chains). An imaginary spiral path going once around the cylinder and remaining locally perpendicular to the chain direction would have its beginning and end offset from each other by four or five residues. This represents much less twist than, for instance, in the two six-stranded chymotrypsin/~ cylinders where the offset would be eight or nine residues. In the superoxide dismutase structure, bends CD and FG extend out from the cylinder as long loops which help form the copper and zinc binding sites (the metals are only 6 ~ apart). These bends are in the same places as two of the three bends which form hypervariable-region (antigen-binding) loops in immunoglobulin variable domains. In Figure 4 the correspondence of both ~ structure and loops between a superoxide dismutase subunit and an immunoglobulin VH domain is shown by means 15
226
J.S.
RICHARDSON
ET
AL.
FIG. 4. Comparison of step-by-step build-up of backbone configurations for suporoxide dismutase (down the left side) a n d a n immunoglobulin variable domain (down t h e right side). New backbone added a t each stop is shown b y h e a v y arrows.
IMMUNOGLOBULIN AND S U P E R O X I D E DISMUTASE
227
of a sequence of schematic drawings in which the two structures are built up side-byside. Superoxide dismutase does not have a disulfide bridge across the fl cylinder, and the residues in those positions arc not cysteines (the single disulfide in superoxide dismutase is between fl strand G and one of the long external loops). Also the subunit contact within the superoxide dismutase dimer (Richardson et al., 1975b) does not have the same geometry as either of the two types of domain contact found in Fab structures.
3. Superposition o f a-carbon Co-ordinates Within the Fab fragment there is very close conformational (and also sequence) homology between the two variable domains VT. and Vm and between the two constant domains C~. and C~I. Variable and constant domains are almost certainly related to each other also, but much more distantly. For the purposes of the current study, the superoxide dismutase co-ordinates were compared to Vm VL, Cr. and CH1 co-ordinates, and for reference, comparisons were made between V H and CL, VH and VL, and between VT, and C~I. As a quantitative measure of the similarity between these structures, their a-carbon co-ordinates were superimposed and the square of the distances between equivalent a-carbons was refined to an optimal least-squares fit (Rao & Rossmann, 1973). The superoxide dismutase co-ordinates were measured from the 3 A resolution electron density map of the bovine erythrocyte enzyme (Richardson et al., 1975a,b) and the immunoglobulin co-ordinates were from the 3-1 A resolution map of the mouse myeloma protein McPC603 Fab fragment (Segal et al., 1974). Initial equivalencing of residues was achieved by aligning the corresponding fl strands, taking into account which residues were internal and which external, and aligning the known or presumed hydrogen bonds. A relative shift along the cylinder axis b y two residues in either direction allowed fewer residues to be superimposed. The initial transformation matrix (defined by 3 Eulerian angles and 3 translational parameters) which would approximately superimpose the two structures was then refined by least-squares, a-carbons wkich were more than 4 A apart were omitted from the following refinement cycle; this had the effect of including only the "framework" of fl structure and a few of the most equivalent loops. Where needed, the assignment of equivalent residues was changed. Cycles of refinement and equivalencing were repeated until there were no further changes in the equivalencing. Table 1 lists the a-carbon pairs which were equivalenced for each of the comparisons and gives the overall root-mean-square separation distances obtained. Figure 5 shows the complete (sorted) distribution of separation distances for each of these comparisons. Also illustrated is a similarly obtained distribution for the comparison of the MePC603 VL with VT.REI, an analogous domain but in an independently determined structure from a different species (Epp et al., 1974) ; that overall r.m.s, distance was 1.18 A for 80 equivalenced co-carbon pairs. V H and V,. are the most similar of the non-identical domains, giving an r.m.s, distance of 1.72 A for 77 a-carbon pairs. Comparisons of variable with constant domains show somewhat greater differences, even though the hypervariable loops are not included. All four comparisons with superoxide dismutase show greater differences than any of the intra-immunoglobulin comparisons, but both variable and constant domains are nearly as similar to superoxide dismutase as they are to each other.
~
A
I
I
A~
A
N
~
r~
+.-
,.Q
c~
o
? I
r
f
I
I
~
I
E~
r~
r~
I
I C,l
c,3
I
='
A
r
,
I
I
Z
I
"7
I
~
~q
o'2
I1KMUNOGLOBULIN
AND SUPEROXIDE
DISMUTASE
229
4'0
SOD-CH
.
I
OD_VH
/
soo-
SOD-CL
2.5
f
o
/ v L- v LREI
~.
1.5-
9
.'~162 ~'~
0.5 84
I
I0
~
I
20
~
I
50
~
I
40
~
t
50
~
I
~
60
I
70
~
L
80
a-Carbon pair number (sorted by increasing separolion distance)
FiG. 5. Distributions of e-carbon pair separation distances for the structure superpositions described in Table 1 ; for each structure comparison the distances are sorted so t h a t t h e y increase monotonically from left to right. All immunoglobulin domains are from McPC603 except for V~.REI, which is also a Ktype light chain. SOD refers to the bovine copper, zinc superoxide dismutase.
4. Probability Analysis of the Topological Similarity In order to make a rough estimate of the likelihood that the degree of similarity shown by the immunoglohulin and superoxide dismutase structures occurred by chance, one may calculate the number of possible topologies for a tertiary structure of this general type, as was done by Schulz & Schirmer (1974) for the nucleotidebinding domains. For this purpose we will consider the structure as a complete cylinder, although (except for VL of IgG RHE (B. C. Wang, C. S. Yoo and M. Sax, personal communication)) immunoglobulin domains are always missing the hydrogen bonds on one side of strand D. We will allow either parallel or antiparallel fl sheet (after all, the immunoglobulin case itself has two parallel strands). It is arguable
230
J. S. RICHARDSON E T A L .
whether or not to allow connections which "cross" (using a helix or some other structure) from one end of the cylinder to the other; seven "cross" connections occur in the triose phosphate isomerase fl cylinder (Banner et al., 1975), but they are undoubtedly somewhat disfavored in a structure of only 100 to 150 residues. We will try the calculation both with and without "cross" connections, but in any case will not permit a connecting chain to go down the inside of the cylinder. For a cylinder with n strands of fl structure (parallel or anti-parallel) there are ( n - - l ) ! possible topologies if "cross" connections are not permitted and 2 n-1 x ( n - - l ) ! possible topologies if they are permitted. I f n = 8 this gives 7! = 5040 possibilities (27 x 7 ! ---- 645,120 possibilities under the less conservative assumption), or a probability of 1/5040 that two such structures will match by chance. The probability of such a structure matching the immunoglobulin domain structure is a factor of two greater than that, since deletion to make seven rather than eight strands could be on either the N or C terminal side. The overall probability of matching must be multiplied by another factor of 1/7, because there are at least 7 x 6 ---- 42 ways of placing the external loops of superoxide dismutase relative to the topology of the cylinder, and only 3 x 2-~6 of these ways match the placement of immunoglobulin hypervariable region loops. From this analysis, the probability of the topological similarity between these two molecules is 1/7! X 2 x 1/7~1/17,640 (or 1/2,257,920 if "cross" connections are permitted). There are two very serious problems with the above sort of analysis. The first is that, as we have seen, a fairly minor change in our defmition of what constitutes "the same general type of structure" changed the result b y more than two orders of magnitude. The second, related problem is that we have assumed t h a t all local connectivity types are equally probable; although there is no way to calculate the relative probabilities a priori, a simple count of what patterns occur in the known protein structures shows an extremely strong preference for simple connection between nearest-neighbor strands. This is presumably a result of the fact (Wetlaufer, 1973; Ptitsyn & Raslun, 1975) that the statistics of interactions during the folding process very strongly favor interaction of secondary structure pairs which are adjacent in the sequence. In order to greatly alleviate both of the above problems, we will perform a topological analysis which builds in empirical estimates of the relative probabilities of the various local connectivity types. Table 2 shows the observed occurrence frequencies of the different strand connectivities in fl structures of at least four strands, classified according to the separation of the two connected strands in the fl sheet and according to whether the connection stays at one end of the sheet (types =kn) or crosses to the opposite end of the sheet (types ~=nX, or "cross" connections). For • types the two strands connected are antiparallel to one another and for =knX they are parallel; however, except for n ---- 1 that-does not directly correlate with whether they each are parallel or antiparallel to their nearest neighbors in the sheet. Type X connections possess a handedness; however, right and left-handed cases are not listed separately in the table because out of 66 total observed cases of type X connections 64 are right-handed and only two left-handed (a =]=IX in subtilisin and a • in hexokinase). In the following analysis we will simply assume t h a t only right-handed "cross" connections are perml.qsible. Referring to the topology in l~igure 2(b), let us proceed through the superoxide dismutas.e cylinder one strand at a time in sequence, using the empirical occurrence
IMMUNOGLOBULIN
AND SUPEROXIDE
DISMUTASE
231
TABLE 2
A summary of how often each local connectivity tylae has been observed in 18 Troteins with fl-sheet struvtures of at least four atrands AntiparallelJ-
Mixedw
Total occurrences
Smoothed relative occurrence frequencies
29 5 8 4 4
52 20 8 10 8
52 20 9 9 9
4
5
5
2 2
2 3 2
2 2 2 1 1 1 1
4-7X 4-8
1
1
1 1
4-8X
1
1
1
4-1 4-1X 4-2 4-2X 4-3
23 15 4 4
/:3X
4-4 4-4X 4-5 4-5X 4-6 4-6X 4-7
Parallel~t
2 1
3
No structures are included which have identical topologies or which differ in only one strand. The first 3 columns give separate tabulations for fl-sheets with all antiparallelt, all parallel:~, and with mixed w hydrogen bonding. The connectivity types are defined in the text. Note that one reason for small numbers near the bottom of the table is that only a few of the fi-sheets are large enough to make those connectivity types possible; however, all sheets included allow for differences at least up to 4-3. Concanavalin A (Reeke et al., 1975), superoxide dismutase (Richardson e$ al., 1975a), chymotrypsin (Birktoft & Blow, 1972), papain (Drcnth et al., 1971b), rubredoxin (Watenpaugh e$ aL, 1973), and T4 phage lysozyme (Matthcws & Remington, 1974). Subtilisin (Drenth et al., 1971a), lactate dehydrogenase (Rossmann eta/., 1974), and triosc phosphate isomerase (Banner e$ al., 1975). wCarbonic anhydrase (Kannaa e$ al., 1971), hcxokinase (Fletterick et al., 1975), carboxypeptidase A (Lipscomb et al., 1968), thermolysin (Cohnan et al., 1972), prealbumin (Blake et al., 1974), glyceraldehyde 3-phosphate dehydrogenase (Buehner et al., 1974), staphylococcal nuclease (Arnone et al., 1971), cytochromo b5 (Mathews et al., 1971), and bhioredoxin (Holmgren et aZ., 1975). frequencies t o e v a l u a t e a m o n g t h e possibilities a v a i l a b l e a t e a c h s t e p t h e r e l a t i v e p r o b a b i l i t y o f t h e a c t u a l c o n n e c t i o n m a d e . T h e p r o b a b i l i t y a t each s t e p is defined as t h e s m o o t h e d r e l a t i v e occurrence f r e q u e n c y for t h e t y p e o f c o n n e c t i o n a c t u a l l y m a d e , d i v i d e d b y t h e s u m of t h e s m o o t h e d r e l a t i v e occurrence frequencies for all p e r m i s s i b l e c o n n e c t i o n s still a v a i l a b l e a t t h a t step. A t t h e first c o n n e c t i o n (which is a c t u a l l y o f t y p e 4-1 o u t of t h e possibilities : h l , : h l X , ! 2 , + 2 X , • o m i t t i n g l e f t - h a n d e d X t y p e s ) t h e p r o b a b i l i t y is 52 - - ( 5 2 - 5 2 0 + 9 + 9 + 9 + 5 + 2 + 2 + 9 - 5 9 - 5 5 2 ) -~ 52/178. A t t h e n e x t s t e p (also + 1 ) t h e p r o b a b i l i t y is 52 - - (52-520-5 9 - 5 9 + 9 - 5 5 - 5 2 - 5 2 - 5 9 - 5 9 ) ---- 52/126, since one o f t h e n e a r e s t - n e i g h b o r c h a i n s h a s a l r e a d y b e e n used. C o n t i n u i n g in t h i s m a n n e r , one calculates 52[178 • 52/126 • 9/88 x 52/151><52/90 • 9/18 X 52/52 = 1[815 as t h e overall p r o b a b i l i t y t h a t a n eights t r a n d e d ~ c y l i n d e r w o u l d h a v e t h e s u p e r o x i d e clismutase t o p o l o g y . A l t h o u g h t h i s n u m b e r is c l e a r l y a v e r y r o u g h a p p r o x i m a t i o n , i t is a m u c h f a i r e r e s t i m a t e t h a n a s s u m i n g t h a t all c o n n e c t i v i t i e s a r e e q u a l l y l i k e l y t o occur. W e h a v e also d i m l n i a h e d the problem of sensitivity to initial assumptions: with this empirical method the
232
J.S.
RICHARDSON
ET
AL.
estimated probability would increase only b y a factor of five (rather than a factor of 128) ff "cross" connections were ruled out altogether. Applying to our empirical probability estimate the additional factors derived above to allow for the extra strand and for the loop placement, one obtains 1/815 • 2 • 1/7~1/2853 as an overall estimate of the probability that the immunoglobulin and superoxide dismutase structures would match by chance. 5. D i s c u s s i o n
The basic tertiary structures of the immunoglobulin domain and the superoxide dismutase subunit are strikingly similar. That is, the "basic immunoglobulin fold" (Poljak et al., 1973) has now been found to occur in copper, zinc superoxide dismutase. This fact provides us with a more memorable way of classifying and describing each of these three-dimensional structures, and it provides one more example to contribute toward an eventual understanding of the types and degrees of diversity present in the total population of proteins. It also inevitably arouses speculation about the possibility of an evolutionary relationship between superoxide dismutase and immunoglobulins. The only definitive proof of such an evolutionary relationship would be to show clear amino acid sequence homology either directly between the two proteins or through a series of intermediate forms. Some preliminary comparisons have been done between the bovine copper, zinc superoxide dismutase sequence (Steinman et al., 1974) and immunoglobulin sequences (Segal et al., 1974; Dayhoff, 1972). For the residues wlfich were equivalenced in the superposition of ~-carbon co-ordinates (see above) between superoxide dismutase and the various domains of McPC603, the minimum number of base changes per residue lies between 1.34 and 1.60 (essentially random, especially considering constraints due to the general structural similarity (Dickerson, 1971)). A computer search comparing variable-length segments from the complete sequences of bovine copper, zinc superoxide dismutase and human Eu immunoglobulin located partially matching segments only slightly more than random expectation t. These negative results do not disprove the existence of a relationship; nor are they particularly surprising, since except for the internal disulfide the level of homology is essentially random between the variable and the constant immunoglobulin domains themselves (Barker et al., 1972). However, if we are left with no independent evidence to support the hypothesis of an evolutionary relationship, we should examine in detail whether or not the observed conformational resemblance between immunoglobulins and superoxide dismutase is indeed of the sort to be expected between distantly related proteins. From both sequencing and crystallographic work, there is now a considerable body of evidence available for cases of proteins with known evolutionary relationships from which we can infer empirical rules about what kinds of conformational features often differ between related proteins and what kinds of features are highly conserved. (1) In general, three-dimensional structure is more highly conserved than amino acid sequence: there are no known examples with clearly related sequences and significantly different tertiary structures, but there are a number of instances of distantly related proteins with extremely similar three-dimensional structures and t The computer search was done by Howard M. Steinman and Robert L. Hill of the Duke University Biochemistry Department.
IMMUNOGLOBULIN
AND
8UPEROXIDE
DISMUTASE
233
widely different sequences (Perutz et al., 1968; Salemme et al., 1973; Hendrickson & Love, 1971; Timkovich & Dickerson, 1973). (2) Insertions and deletions (even quite long ones) are common at the ends of the polypeptide chain (Dayh0ff, 1969). (3) Disulfide bridges are often either added or deleted but never interchange their connectivities, and there is no enhanced tendency for cysteine to occur in positions where there are disulfide bridges in homologous proteinst. (4) Short insertions and deletions readily occur on the "loops" connecting adjacent elements of secondary structure (~-helices or fl strands) but only very rarely within those elements (Hartley, 1970; Stroud et al., 1971). (5) Secondary structure elements can become longer or shorter or even disappear altogether (Perutz et al., 1968; Salemme et al., 1973) but they do not interchange their relative positions; that is, the three-dimensional topology connecting elements of secondary structure is apparently invariant. (6) Related proteins often have different numbers of subunits, and even when the number of subunits is the same they may utilize entirely different contact surfaces (Liljas & Rossmann, 1974). The pattern of conformational similarities and differences between immunoglobulin domains and superoxide dismutase does indeed fit all of these empirical generalizations : there is change in length at one end of the chain and deletion of one fl strand, addition of one disulfide and deletion of another but no interchange, some changes in the external loops, and different subunit (or domain) contact geometry, but there are no interchanges in relative positions of secondary structure elements and the overall topology is the same. B y analyzing the probabihties involved in this degree of topological similarity it has been shown extremely unhkely that two such similar three-dimensional structures occurred purely by chance; there is a real need, therefore, for some explanation of the close resemblance. Postulating a common ancestor for copper, zinc superoxide dismutase and immunoglobulins is a very reasonable explanation, but it is not the only possible one. Convergent evolution is possible but extremely unlikely, since we have no hint of any functional similarity between the two molecules. Another alternative explanation is that constraints on the folding process actually limit the range of allowable topologies so greatly that chance matches among them are reasonably likely. The process of protein folding is not well enough understood to judge this hypothesis, but we can set some limits at either end. I f folding restrictions are completely ignored, the probability of this being a chance match is 1/17,640. Our empirical probability analysis included what could be considered "first-order" folding constraints, with the result of raising the probability to about 1/3000. I f there exist sufficiently strong "second-order" folding constraints to raise the probabihty b y another two orders of magnitude, then one could certainly claim to have explained the similarity of these two proteins. On the other hand, we can be sure that the viable possibilities are fairly numerous, since the seven or eight known fl cylinders t In comparing 7 widely different trypsin-like serlne proteinases from mammals and bacteria (Dayhoff, 1972}, there are a total of 34 amino acids at positions whore others members of the homology family have disulfide bridges; only one of those 34 residues is a CysI-I.
234
J.S.
RICHARDSON ET AL.
(chymotrypsin, staphylococcal nuclease, immunoglobulins, superoxide dismutase, prealbumin, triose phosphate isomerase, soybean trypsin inhibitor, and perhaps papain) all have different topologies except for the pair we are analysing. I n summary, then, none of the considerations discussed here in any way constitute a proof of an evolutionary relationship between copper, zinc superoxide dismutase and the immunoglobulins; however, t h a t possibility remains a plausible and intriguing hypothesis. We thank Richard J. Feldmann of the National Institutes of Health Division of Computer Research and Technology for the use of his interactive molecular display system. The portion of this work done at Duke University was supported by National Institutes of Health grant GM-15000, and by a National Institutes of Health Research Career Award to one of us (D.C.R.). REFERENCES A_rnone, A., Bier, C. J., Cotton, F. A., Day, V. W., Hazen, E. E. Jr, Richardson, D. C., Richardson, J. S. & Yonath, A. (1971). J. Biol. Chem. 246, 2303-2316. Banner, D. W., Bloomer, A. C., Petsko, G. A., Phillips, D. C., Pogson, C. I. & Wilson, I. A. (1975). Nature (London), 255, 609-614. Barker, W. C., McLaughlin, P. J. & Dayhoff, M. O. (1972). In Atlas of Protein Sequence and St~'ucture, 5, 31-40 and 106. Birktoft, J. J. & Blow, D. M. (1972). J . ~lol. Biol. 68, 187-239. Blake, C. C. F., Geisew, M. J., Swan, I. D. A., Rerat, C. & Rerat, B. (1974). J. Mol. Biol. 88, 1-12. Buehner, M., Ford, G. C., Moras, D., Olsen, K. W. & Rossmann, M. G. (1974). J. Mol. Biol. 90, 25-49. Colman, P. M., Jansonius, J. N. & Matthews, B. W. (1972). J. Mol. Biol. 70, 701-724. Davies, D. R., Fad]an, E. A. & Segal, D. M. (1975a). A n n u . Roy. Biochem. 44, 639-667. Davies, D. R., Fad]an, E. A. & Segal, D. M. (1975b). In Contemporary Topics in Molecular Immunology (Inman, F. P., ed.), pp. 127-155, Plenum Press, New York. Dayheff, M. O. (1969). In Atlas of Protein Sequence and Structure, 4, D-213, D-215, D-217 and D-224. Dayhoff, M. O. (1972). In Atlas of Protein Sequence and Structure, 5, D-372, D-375 to D-378. Diekerson, R. E. (1971). J. Mol. Biol. 57, 1-15. Drenth, J., Hol, W. G. J., Jansonius, J. N. & Koekoek, R. (1971a). Cold Spring Harbor Syrup. Quant. Biol. 36, 107-116. Drenth, J., Jansonius, J. N., Koekeek, R. & Wolthers, B. G. (1971b). Advan. Protein Chem. 25, 79-115. Epp, O., Colman, P., Fehlhammer, H., Bode, W., Seldffer, M., Huber, R. & Palm, W. (1974). Eur. J . Biochem. 45, 513-524. Fletterick, R. J., Bates, D. J. & Steitz, T. A. (1975). Proc. Nat. Acad. Sci., U.S.A. 72, 38-42. Fridovich, I. (1974). Advan. Enzymol. 41, 35-97. Hartley, B. S. (1970). Phil. Trans. Roy. Soc. ser. B, 257, 77-87. Hendriekson, W. A. & Love, W. E. (1971). Nature New Biol. 232, 197-203. Holmgren, A., SSderberg, B.-O., Eklund, H. & Br~nd6n, C.-I. (1975). Proc. Nat. Acad. Sci., U.S.A. 72, 2305-2309. Kannan, K. K., Liljas, A., Waara, I., Bergstdn, P. C., LSvgren, S., Strandberg, B., Bengtsson, U., Carlbom, U., Fridborg, K., J~rup, L. & Petef, M. (1971). Cold Spring Harbor Syrup. Quant. Biol. 36, 221-231. Kretsinger, R. H. (1972). Nature Hew Biol. 240, 85-88. Liljas, A. & Rossmann, M. G. (1974). A n n u . Rev. Bioohem. 43, 475-507. Lipscomb, W. N., Hartsuck, J. A., Reeke, G. N. Jr, Quiecho, F. A., Bethge, P. H., Ludwig, M. L., Steitz, T. A., Muirhead, H. & Coppela, J. C. (1968). Brookhaven Symp. Biol. 21, 24-90.
IMMUNOGLOBULIN
AND SUPEROXIDE
DISMUTASE
235
Mathews, F. S., Argos, P. & Levine, M. (1971). Cold Spring Harbor Syrup. Quant. Biol. 36, 387-395. Matthews, B. W. & Remington, S. J. (1974). Prec. Nat. Acad. Sci., U.S.A. 71, 4178-4182. Padlan, E. A. & Davies, D. R. (1975). Prec. Nat. Acad. Sci., U.S.A. 72, 819-823. Perutz, M. F., Muirhead, H., Cox, J. M. & Goaman, L. C. G. (1968). Nature (London), 219, 131-139. Poljak, R. J., Amzol, L. M., Avey, H. P., Chen, B. L., Phizackerloy, R. P. & Saul, F. (1973). Prec. Nat. Acad. Sci., U.S.A. 70, 3305-3310. Poljak, R. J., Amzel, L. M., Chen, B. L., Phizackerloy, R. P. & Saul, F. (1974). Prec. Nat. Acad. Sci., U.S.A. 71, 3440-3444. P t i t s y n , O. B. & Rashin, A. A. (1975). Biophya. Chem. 3, 1-20. Rao, S. T. & Rossmann, M. G. (1973). J. Mol. Biol. 76, 241-256. Reeke, G. N., Becker, J. W., Jr, & Edelman, G. M. (1975). J. Biol. Chem. 250, 1525-1547. Richardson, J. S., Thomas, K. A. & Richardson, D. C. (1975a). Biochem. Biophys. Res. Commun. 63, 986-992. Richardson, J. S., Thomas, K. A., Rubin, B. H. & Richardson, D. C. (1975b). Prec. Nat. Acad. Sci., U.S.A. 72, 1349-1353. Rossmann, M. G., Moras, D. & Olsen, K. W. (1974). Nature (London), 250, 194-199. Salemme, F. R., K r a u t , J. & K a m e n , M. D. (1973). J. Biol. Chem. 248, 7701-7716. Schiffer, M., Girling, R. L., Ely, K. R. & Edmundson, A. B. (I973). Biochemistry, 12, 4620-4631. Schulz, G. E. & Schirmer, R. H. (1974). Nature (London), 250, 142-144. Segal, D. M., Padlan, E. A., Cohen, G. C., Rudikoff, S., Potter, M. & Davies, D. R. (1974). Prec. Nat. Acad. Sci., U.S.A. 71, 4298-4302. Shotton, D. M. & Watson, H. C. (1970). Phil. Trans. Roy. Soc. ser. B, 257, 111-118. Steinman, H. M., Naik, V. R., Abornethy, J. L. & Hill, R. L. (1974). J. Biol. Chem. 249, 7326-7338. Stroud, R. M., K a y , L. M. & Dickerson, R. E. (1971). Cold Spring Harbor Symp. Quant. Biol. 36, 125-140. Timkovich, R. & Dickerson, R. E. (1973). J. Mol. Biol. 79, 39-56. W a t e n p a u g h , K. D., Sieker, L. C., ~Ierriott, ft. R. & J e n s e n , L. H. (1973). Acta Crystallogr. sect. B, 29, 943-955. Wetlanfer, D. B. (1973). Prec. Nat. Acad. Sci., U.S.A. 70, 697-701.