Gene duplication in glutathione reductase

Gene duplication in glutathione reductase

J. Mol. Biol. (1980) 135, 335-347 Gene Duplication in Glutathione Reductase GEORQ E. SCHULZ Max-Plan&-Institut fiir medizinische Forschung Jahnst...

849KB Sizes 23 Downloads 141 Views

J. Mol. Biol.

(1980) 135, 335-347

Gene Duplication

in Glutathione Reductase

GEORQ E. SCHULZ

Max-Plan&-Institut fiir medizinische Forschung Jahnstr. 29, 6900 Heidelberg, West Germany (Received 3 October 1979) The two nucleotide-binding domains of the flavo-enzyme glutathione reductase have similar chain folds. In order to evaluate whether the observed similarity is significant or not, a mean distance between both chains after best overlay was calculated. Insertions and deletions were taken into account. The significance of the observed similarity was then derived from the corresponding mean distance by evaluating the probability that such a distance is found by chance. This probability was determined from the distribution of mean distances between randomly generated chain folds. Care was taken to ensure that the simulated chain folds fit natural ones as well as possible. The resulting significance for the two domains of glutathione reductase is of the order of 106, which indicates an evolutionary relationship; that is, a gene duplication,

1. Introduction resemble each other structurally. Close resemblance is usually detected between proteins of identical functions found in different species (Dayhoff, 1976). These similarities reflect the evolutionary process described as protein speciation (Smith & Margoliash, 19a; Dickerson, 1977). But there exist also similarities between proteins with different functions, which reveals a more basic evolutionary process: protein differentiation. It is generally accepted that protein differentiation is caused by a gene duplication and subsequent separate evolution of both genes (Ingram, 1961; Hartley et al., 1972). is followed by gene fusion so that the particular gene Sometimes a gene duplication product becomes a part of a larger protein. In a number of cases the resemblance between differentiated proteins is not very obvious. This applies especially for similarities between parts of a single larger protein or between parts of several larger proteins. Such a situation has been encountered in the enzyme glutathione reductase (Schulz et al., 1978). Whenever a resemblance is weak the question arises of whether it is significant enough to indicate an evolutionary relationship, or whether it is merely spurious. If amino acid sequences can be fitted to each other, the significance is high and there exists almost invariably an historical connection (Schulz & Schirmer, 1979). The situation is much more ambiguous in cases like glutathione reductase, where no sequence homology can be established. Here the significance has to be determined by a purely geometric comparison, which is less informative. But, on the other hand, Many

of the

known

proteins

335 0022-2836/80/100335-13

$02.00/O

0 1980 Academic

Press Inc. (London)

Ltd

G. E.

336

SCHULZ

geometric comparisons have the advantage that they may reveal very ancient evolutionary relationships, because the geometry of a protein changes at a much smaller rate than does its amino acid sequence (Dickerson $ Timkovich, 1975). The significance of a geometric similarity between proteins is difficult to assess. The calculation of such significances has been attempted on the basis of p-sheet topologies (Schulz & Schirmer, 1974; Richardson et al., 1976; Sternberg & Thornton, 1976). But this method is restricted to subgroups of proteins and it is, to some extent, subjective because there remains a certain degree of freedom in defining a p-sheet. A more exact and also universally applicable measure of similarity is the average distance between corresponding C a atoms in two proteins (McLachlan, 1972a; Rossmann & Argos, 1977). Such an average distance between C, atoms is here evaluated for the two resembling domains of glutathione reductase. Moreover, a procedure is presented that allows the derivation of approximate significances from average C, distances.

(a)

2. Methods C, distances

Average

The geometric similarity between 2 proteins is expressed by a mean residual distance between both polypeptide chain backbones after establishing the best overlay. In order to calculate this distance, all backbone geometries, that is all “chain folds”, are represented merely by the co-ordinates of their C, atoms (Schulz $ Schirmer, 1979). The most easily evaluated mean distance between 2 chain folds with a 1 to 1 correspondence between their C, atoms is the residual root-mean-square distance at the best overlay, the r.m.s.-C, distance. Following the proposals of McLachlan (1972b,1979) and Kabsch (1978), such a distance can be computed with about 3000 arithmetic operations per 100 residues chain length, which amounts to 70 ms central processor time on the machine used in the present analysis (Nord-10/S, Norsk Data, Oslo). Very distantly related proteins usually have not only lost all sequence homology but have also undergone insertions and deletions of residues. This destroys the 1 to 1 correspondence required for calculating the r.m.s.-C, distance. Rut insertions and deletions express the evolutionary distance as well as other geometric deviations and should therefore be included in all comparisons. This can be achieved by calculating the residual minimal area suspended between 2 chains at best overlay (Schulz, 1977). Dividing this area by the average chain length results in the area-C, distance. This measure requires no further assumptions and accounts for insertions and deletions in a rather natural manner. Therefore, it will be used in the comparison of the domains of glutathione reductase. However, using the available algorithm it takes about 40 times longer to compute an area-& distance as compared to an r.m.s=C, distance. (b) Structurally

known

proteins

as statistical

base

In order to derive a significance from this distance one needs the distribution of area-& distances between all proteins that are not evolutionarily connected with each other. Furthermore, complete domains and not only parts of them should be compared with each other for establishing this distribution, because the distribution will be applied to comparisons between complete domains. Moreover, complete domains or folding units are the units very likely to be duplicated and transferred during evolution (Sakano et al., 1979). These requirements are difficult to fulfil. It seems impossible to detect distant evolutionary relationships beforehand and eliminate them. If not removed, related proteins spoil the distribution in its most interesting region at low C, distances, which in turn prohibits any

GENE

DUPLICATION

IN

GLUTATHIONE

REDUCTASE

337

conclusion about an evolutionary connection. On the other hand, if one is overcautious and eliminates everything that may possibly be connected by evolution, the distribution is also spoiled in the most critical region and cannot be used. In addition to these intrinsic problems, the number of structurally known protein domains in a given size-group is still rather small, so that the distributions of each size group comprise too few entries. Consequently, these distributions remain unknown in the critical region of low Ca distances. (c) Simulated

chain folds

Considering the deficiencies of distance distributions derived from structurally known proteins, the use of simulated chain folds becomes a viable option ; there are no evolutionary relat’ionships and the number of generated chain folds is unlimited. Therefore, I used such chain folds for evaluating significances. The simulation procedure is illustrated in Figure 1. For a given chain length n a sphere of radius R = (32~)~‘~ is established, which approximates the volume of an equal-sized folded polypeptide chain (protein density = 1.35 g cm a). The simulation starts at residue n/2, which is placed at a random position within the sphere. The chain is then grown to the N- and C-termini by adding single residues, that means C, atoms, alternately to both ends.

FIG. 1. Simulation of chain folds, as represented by C, atoms and virtual C,C, bonds. The origin of chain growth at residue n/2 and the centre of the sphere of radius R are marked by dots.

On addition of a residue, the virtual dihedral angle a, together with the virtual bond angle T (Fig. I), was selected randomly, but following the distribution of (a, T) values observed in proteins (Levitt, 1976). The resulting new C, posit,ion was rejected whenever the distance to any of the previously generated positions more than 5 residues apart was less than 5 A, that is, whenever geometric collision occurred. Furthermore, a simple rejection probability function was applied depending on the distance T to the centre and on the angle 0 between the newly added C,-C, bond and the radius vector (Fig. 1); rejection probability = G( 1 + cosO)r/R. The chosen e-dependence stipulates isotropy. The globularity parameter G used in these simulations varied between 0.70 for n. = 20 and 0.58 for n = 120. In order to decide whether a new C, position should be rejected, the value of this function at the respective (r, 0).value was compared to an output of the random number generator, normalized to the interval between 0 and 1. If the random number was lower than the rejection probability function, the choice of the C, position was rejected. This feature ensured globularity of the resulting chain. It t,urned out that there exists a rather sharp upper limit for G, beyond which structure generation is virtually inhibited by too many collisions. The G values used here were just below this limit. Accordingly, the simulated chain folds are adjusted to optimal globularity.

338

G. E. SCHULZ

In addition, the simulation procedure accounted for a-helices. Whenever two successive accepted (a, 7) values were in the a-helix region, the helix was continued by using exclusively (N, r) values from the cc-helix region for 8 more adjacent residues. Helix growth was discontinued, however, if collision or (r, @check for any of these residues caused more than three rejections in a row. For m = 60, the average helix content turned out to be 32%, the average helix length was 6 residues. At n = 120 the corresponding values were 37% and 7. A similar simulation procedure is described in the accompanying paper (Cohen & Sternberg, 1980). Simulated chain folds are useful only if they approximate natural chain folds well. This could be achieved with respect to the distribution of (a, T) angles, the a-helix contents, and the general size and globularity of chain folds as represented by the radius of gyration. The radius of gyration is defined as the r.m.s. distance of all C, atoms from their common centre of mass. The data for simulated chain folds are given in Fig. 2. As shown in this Figure, globularity and size of simulated chain folds agree well with those of proteins.

1

Chom length

(residues)

FIG. 2. The variation of the average radius of gyration with chain length. The curve with optimal globularity parameter G denotes the chain folds used in this analysis, whereas the curve for G = zero is inserted as a reference. In addition, the average radius of gyration for fragments of natural polypeptide chains is given. This ourve was derived from 9 proteins with an average chain length of 17 1 residues by cutting all possible fragments of given length from the respective polypeptide chains. The dots show the radii of gyration of proteins, the co-ordinates of which were available ; from left to right : rubredoxin, bovine pancreatic trypsin inhibitor, cytochrome b6 hbf, cytochrome c, pancreatic ribonuclease S.

In contrast, fragments of proteins, which were used by McLachlan (1979) and Remington & Matthews (1980) for deriving significances, are much more extended and much less globular. At n < 20 the radii of gyration of protein fragments even correspond to the rather extended chain folds, which were simulated with globularity parameter G = zero, that is, which were simulated without the (T, @)-check that more or less confines the chain to the sphere of radius R (Fig. 1). Since domains are usually globular, these protein fragments seem to be less suitable for comparisons between complete folding units or domains. The main deficiency of the simulated chain folds, on the other hand, is their lack of /?-sheet structure and their lack of more complicated structural rules as, for instance, packing requirements for a-helices (Chothia et al., 1977). These features are very difficult to include, however, because they involve interactions far apart along the chain.

GENE

DUPLICKTION

IN

GLUTATHIONE

339

REDUCTASE

3. Results (a) The structure of glutathione

reductase

The enzyme glutathione reductase is a dimer of two identical, strongly intertwined subunits of molecular weight 50,000 each (Schulz et al., 1978). As indicated in Figure 3(a), each subunit can be divided into four domains that are consecutive along the

(a )

(b)

FIG. 3. Sketch of the structure of glutathione reductase. (a) Sketch of the dimer viewed along the Z-fold axis. Domains 1 through 4 are consecutive along the polypeptide chain from N- to C-terminus. Domains 1 and 2 bind FAD and NADPH, respectively. Domain 4 forms the interface. The active centre region is indicated by a broken circle. (b) Common p-sheet topology of domains 1 and 2. The ,&strands and a-helices are denoted by triangles and circles, respectively. The helix between @&ands B and C is drawn using a broken line; it is a single helix in domain 2, and 2 helices in domain 1. Amino acid residue insertions (+ ) and deletions (-) in domain 2 (NADPH) as related to domain 1 (FAD) are given as the number of residues at the appropriate place. All residue numbers are given in Fig. 4. The binding positions of both dinucleotides across the carbonyl ends of the parallel p-sheets are illustrated, indicating the deviation between flavine and nicotinamide (Table 2). Main chain parts which approach closest to the ad-, rib-, PP., rib-, flav-moieties of FAD are centered at residues 113 + 33, 39 + 138 + PP.,

10, 12 + rib,

138 + 40, 40 + 320,

nit-moieties

oFNA=H

45, respectively.

The

corresponding

are 198, 175, 176 + 270, 318, --

residuesfor

178, respectively.

the

ad-,

rib-,

For all quoted

residues the distance between the C, atom and the centre ofthe respective nucleotide moiety is less than 6 A, for underscored numbers less than 5 A. In both domains the pyrophosphates (PP) attach to the loop between b-strand A and the following helix.

340

G. E. SCHULZ

polypeptide chain. Here, domains are defined as globular structures containing only one contiguous part of the polypeptide chain, so that they can be independently folding units. This definition is different from the one we had used earlier. In the former interpretation we had been guided by the criterion that domains should be globular and that they should be separated from each other by a region of low electron density. Since the electron density between domains 1 and 3 is continuous, these two domains had been taken as one. The structure analysis showed that both nucleotide-binding domains 1 and 2 contain a parallel sheet with connecting a-helices and a sandwiched, triple-stranded antiparallel sheet, a p-meander. With this combination of secondary structures they do not belong to any of the five classes of the usual protein structure classification scheme (Levitt & Chothia, 1976; Schulz & Schirmer, 1979). These sheet assemblies have identical topology, so that they can be represented by a single sketch, which is shown in Figure 3(b). Moreover, the prosthetic group flavine-adenine-dinucleotide (FAD) and the substrate nicotinamide-adenine-dinucleotide-phosphate (NADPH) bind at equivalent positions across the carbonyl ends of the parallel p-sheet with the adenine moieties pointing to the p-meander. The same sheet topology and the same mode of FAD binding is observed in the two other structurally known flavo-enzymes (Wierenga et al., 1979; Sheriff & Herriott, 1980).

(b) Area-Ca distance between the two nucleotide-binding glutathione reductme

domains

of

In order to calculate the area-C, distance between the NADPH and the FADbinding domain, the following procedure was adopted: First, the seven p-sheet strands were selected, overlaid, and an initial r.m.s.-C, distance was calculated. Subsequently, the best equivalencing scheme between C, atoms, that is the scheme with the minimum r.m.s.-C, distance, was established by a trial and error method. The result is shown in Figure 4 and Table 1. At this point the number of inserted (or deleted) residues in each connection between sheet strands was fixed. These numbers are given in Figure 3(b). As the next step the equivalencing scheme was extended into the six connections between the p-sheets in such a way that all inserted residues within a connection were contiguous. In this procedure the strings of inserted residues within each connection were moved to the position where the r.m.s.-C, distance between equivalenced C, atoms became minimal. The resulting scheme is given in Figure 4 and Table 1. Since an evolutionary distance is manifested in geometric deviations between equivalenced atoms and in deletions and insertions, the latter have to be accounted for. By using the area-C, distance (for definition see above) as a measure, this is accomplished in a rather natural manner and without further assumptions. The area-C, distance was calculated using a triangle approximation for the area (Schulz, 1977). The accuracy was improved somewhat by subdividing the area per residue into eight instead of the two triangles proposed earlier. The area was then minimized by adjusting the relative orientation and translation of chain folds. This refinement started at the overlay corresponding to the r.m.s.-C, distance of 88 residues men-

part of chain

7 Defiued as the total number of inserted $ Residue positions are given in Fig. 4. Q Chain length averages of the compared domains.

Complete domains 2 vemua 2 divided

0.06

by the total

96§

890

Chymotrypsin verwa trypsin

0.27

Complete domains 1 VW.9U8 2

Trypsin

number

of residues

-

75

in both

-

-17.0

chains.

-

-

1.2

4.2

4.7

- 16.9

76

896§

0.21

6.3

-3.9

137

Complete domains 1 vws?M 2

-

3.8

-4-9

138

88

Chymotrypsin

-

2.0

(4

Area-C. distance

r.m.s.-C, distance (4

-4.3

Overlay Translation (4

145

Rotation (deg.)

32

Number of residues

120.6 5

residues

Fraction of inserted residues?

0.27

7 strands of fl-sheetst All residues, except insertionsf

Compared polypeptide

fold comparisons

Complete domains 1 vw.9u.a 23

Glutathione reductase

Protein

Chain

TABLE 1

10s

m 10’5

7x10s

0.7 x 106

5x

-

Signi6oance

G.

342

E.

SCHULZ

177.176

229 236,237

246-255

260

Domam 2

FIG. 4. Best polypeptide chain overlay of equivalenced parts of domains 1 and 2 of glutathione reductase. The p-sheet overlay of 32 residues and the overlay of 88 residues without insertions and deletions (Table 1) are indicated by black and black plus grey, respectively. Insertions are white. The positions of insertions and of the first and last residues in each domain are given. Since insertions in one domain can be called deletions in the other, I refer here, and at several places in the text, to insertions only. The 32 equivalenced p-sheet residues (black) are 6-10, 27-31, 107-110, 114-116, 121-125, 127-130, 132-137, and 169-173, 192-196, 223-226, 231-233, 240-244, 256-259, 262-267 in domains 1 and 2, respectively. The 56 equivalenced non-p-sheet residues (grey) are 4-6, 11-13, l&26,32-41,91-106, ill-113,117-120, 126, 131,138-140, and 167-168,174-176,179-191, 197-206, 207-222, 227-228, 230, 234-235, 238-239, 245, 261, 268-270 in domains 1 and 2, respectively. The a-helical residues in the equivalenced regions are 14-22, 45-62, 79-104, 178-189, and 208-221.

tioned above (Table 1) and followed the steepest gradient. The procedure converged to an area-C, distance of 5.3 A, as given in Table 1. The relative orientation and translation at the best overlay is expressed by a single rotation with a coupled translation along the rotation axis. As shown in Table 1, the rotation and translation changed only slightly between p-sheet overlay and the overlay of the complete chain folds. There exists no approximate Z-fold symmetry in glutathione reductase, as observed in several other cases (Adman et al., 1973; Bergsma et aE., 1975; Tang et aE., 1978). The best overlay of complete chain folds was then applied to the bound dinucleotides. The residual distances between the dinucleotide moieties are given in Table 2. The average distance amounts to 5 A.

TABLE 2 Overlay of dinucleotides

bound to domains 1 and 2 of glutathione

Dinucleotide

moiety

Distance (4

Adenine Adenine ribose Pyrophosphate Ribose of nicotinamide and flavine Nicotinamide and flavine

5 4 4 3 8

Average

5

reductase

GENE

DUPLICATION

IN

GLUTATHIONE

(c) Serine proteases as

a

REDUCTASE

343

reference

In order to establish a reference, the same overlay procedure was applied to the domains of chymotrypsin and trypsin. The Ca atom co-ordinates, as well as the alignment of the amino acid residues of both enzymes, were taken from the Protein Data Bank (Brookhaven, New York, 1977). For equivalencing C, atoms of the first domain to C, atoms of the second domain of chymotrypsin, I used the scheme worked out by McLachlan (1979) as the starting point. McLachlan’s scheme was extended to the complete chain by applying the method described above for glutathione reductase. Since its alignment to chymotrypsin is known, trypsin could be treated along the same lines. The resulting area-C, distances are given in Table 1. (d) Distribution

of mean C, distances between simulated

chain folds

In order to derive the significance of a structural similarity with a given area-& distance, I generated chain folds using the method described above. AS the next step the distributions of r.m.s.-C!, distances between these chain folds were calculated. The results are depicted in Figure 5. The simulated chain folds were further used to determine the relation between r.m.s.-Ca and area-& distance. This relation allowed the conversion of the abscissa of Figure 5 to area-C, distances. The distributions of

IO-

-2

E 59 & f 5 o-

c-:: :kA D : : Area-C,-distance

-50

0

+

:

5 5

IO

,

IO rms-CO-dtstance

a 15

a

FIG. 5. The distribution of 499,500 r.m.s.-C, distances between 1000 randomly generated chain folds (globularity parameter G = optimal) of length 20, 40, 60, 90, 120 residues. The abscissa is also given as area-C,, distance using simulated chain folds for the interconversion. The respective distributions were integrated up to points A (glutathione reductase, domain 1 versus 2), B (chymotrypsin, domain 1 versus 2)) C (trypsin, domain 1 versus 2)) and D (trypsin domain 2 versw chymotrypsin domain 2). All distributions are non-symmetrical and therefore non-Gaussian. But they resemble normal distributions, which are symmetric parabolae in this plot. Therefore parabolae were used for the required extrapolation, which is indicated by broken lines. More than a month of central processor time on our computer would have been necessary to avoid this extrapolation for readings A, B and C.

344

G.

E.

SCHULZ

area-C, distances were not calculated directly, because the computations would have been too time-consuming. The distributions of Figure 5 show that the average r.m.s.-Ca distance changes appreciably with the chain length. This variation is plotted in Figure 6. As a reference, corresponding average values were calculated for chain folds that were generated with the globularity parameter G = zero, giving rise to rather extended structures. The average values for comparisons between fragments of proteins as derived by Remington & Matthews (1980) lie approximately midway between these two curves.

Ghan length (residues)

FIO. 6. The average of all r.m.s.-C, (G = optimal) as a function of chain in Fig. 5. As a reference, the averages parameter G = zero are given as a ensemble of simulated chain folds (G right: rubredoxin, bovine pancreatic nuclease S.

(e) Significunce~

distances within the ensemble of simulated chain folds length. These are the mean values of the distributions given for the rather extended chains generated with globularity dotted line. The cross-averages between proteins and the = optimal) of equal lengths are given as dots; from left to trypsin inhibitor, cytochrome b5 hbf, cytochrome c, ribo-

for glutathione

reductase and serine proteases

The nucleotide-binding domains of glutathione reductase have an area-C, distance of 5.3 A at an average length of 120.5 residues. The probability that such a distance is found by chance can be derived from the appropriate distribution of Figure 5. This probability equals the integral from zero to 5.3 A over the distribution for 120 residues divided by the integral from zero to infinity, that is 0~11/499,500=2 x 10W7. Since the significance is the inverse of the probability for a random coincidence (Schulz t Schirmer, 1974), in glutathione reductase the significance of the domain similarity turns out to be 5 x lo6 (Table 1). It should be noted that significances also yield estimates of the number of distinct chain folds; as derived from the theory of random coincidences, the significance is half this number (Schulz, 1977). Accordingly, there exist about lo7 chain folds of 120 residues which have area-Ca distances of more than 5.3 A from each other. The corresponding significances for the similarities between first and second domains of trypsin and chymotrypsin (Table 1) were determined in the same manner, using

GENE

DUPLICATION

IN

GLUTATHIONE

REDUCTASE

345

the distribution calculated for 90 residues. The area-C, distance between the second domain of trypsin and the second domain of chymotrypsin is very small. The distribution had to be extrapolated rather far to determine the integral from zero to 1.2 A, and the resulting significance of 1015 is only a very rough estimate. This high value clearly reveals an evolutionary relationship. However, such a connection is already established, because the amino acid sequences of trypsin and chymotrypsin could be fitted to each other.

4. Discussion The significance for the similarity between the chain folds of the first and second domain of glutathione reductase (Fig. 3, Table 1) turns out as higher than 106. Although some reduction of this value has to be expected when accounting for the deficiencies of the simulated chain folds discussed below, this number is so high that the similarity is very likely a non-random event. In biology such an event reveals an evolutionary connection; that is, a gene duplication. For glutathione reductase the significance of Table 1 accounts only for the chain folds and not for the similarity of the dinucleotide binding positions. Earlier, the contribution of the dinucleotides to the significance was estimated as an additional factor of ten (Schulz & Schirmer, 1974). Now it is known that the binding of dinucleotides across the carbonyl ends of a parallel sheet is for some reason physically favoured. Accordingly, finding dinucleotides that bind in equivalent regions of two domains does not increase the significance. In glutathione reductase, however, the dinucleotides bind in the same region and at the same position with the same polarity ; the average distance is as small as 5 A and both adenine residues point in the same direction (Table 2). Therefore, the significance increases to some extent. When compared to serine proteases (Table l), the significance for the similarity within glutathione reductase is of the same order of magnitude, or higher, than the corresponding values for the similarities between first and second domains of trypsin, or chymotrypsin, respectively. Therefore, gene duplication in glutathione reductase can be considered as established if it is accepted for the proteases (McLachlan, 1979). Gene duplication is also corroborated by the observation of similar chain folds in the other two structurally known flavo-enzymes (Wierenga et al., 1979; Sheriff & Herriott, 1980). It is tempting to speculate that all these FAD-binding domains have the same genetic origin. According to this hypothesis, enzymes are built using a modular system, the genes of these modules being duplicated and fused as needed. Glutathione reductase contains an FAD-binding module. Possibly, the module was duplicated, fused and modified to form the NADPH-binding module of the same enzyme. This lineage seems likely, because other NADPH-binding domains (Matthews et al., 1978 ; Sheriff & Herriott, 1980) are dissimilar to that found in glutathione reductase. One must keep in mind that the derived singificances are based on simulated chain folds. To some degree this procedure can be checked by comparing average r.m.s.-Ca distances. If the generation procedure were optimal, structurally known proteins would be members of the ensemble of simulated folds, and the average r.m.s.-C,

346

G. E. SCHULZ

distance between a protein and this ensemble would be equal to the average r.m.s.J& distance within the ensemble. These cross-averages were calculated for five available proteins; they are stated in Figure 6. The cross-averages are consistently higher than the averages within the ensemble, which reflects a residual deviation of the simulated chain folds from natural ones. However, the mean increase is less than one standard deviation of such a comparison. This observation indicates that the incorporation of further structural knowledge like p-sheets, etc., may shift the average r.m.s.-C, distance by about 1 A to higher values. But it does not mean that the distributions of Figure 5 will only be shifted to the right. Most likely their shapes will change; in particular, one has to expect that including further structural rules will make the ensemble more uniform and will lift the low C, sides of the distributions to some extent. As a result, the significances of Table 1 will drop. But, since the significance derived for glutathione reductase is rather high, this reduction has to be very dramatic to affect the conclusion about gene duplication. I thank Drs S. J. Remington, B. W. Matthews, S. Sheriff and J. R. Herriott for communicating their results before publication. Discussions with Drs W. Kabsch, E. F. Pai, R. H. Schirmer, F. E. Cohen and M. J. E. Sternberg are much appreciated, The atomic co-ordinates of all molecules used as references were provided by the Protein Data Bank in Brookhaven. REFERENCES Adman, E. T., Sieker, L. C. & Jensen, L. H. (1973). J. Biol. Chem. 248, 3987-3996. Bergsma, J., Hol, W. G. J., Jansonius, J. N., Kalk, K. H., Ploegman, J. H. L Smit, J. D. G. (1975). J. Mol. Biol. 98, 637-643. Chothia, C., Levitt, M. & Richardson, D. (1977). Proc. Nat. Acad. Sci., U.S.A. 74, 41304134. Cohen, F. & Sternberg, M. J. E. (1979). J. Mol. Biol. 138, 321-333. Dayhoff, M. 0. (1976). Editor of AtZQs of Protein Sequence a& Structure, vol. 5, suppl. 2, and earlier volumes, Nat. Biomed. Res. Foundation, Washington, D.C. Dickerson, R. E. (1977). In MoZecuZar Evolution and Polymorphism (Kimura, M., ed.), Nat. Institute of Genetics, Mishima, Japan. Dickerson, R. E. & Timkovich, R. (1975). In The Enzymes (Boyer, P. D., ed.), vol. 11, 3rd edit., pp. 397-547, Academic Press, New York. Hartley, B. S., Burleigh, B. D., Midwinter, G. G., Moore, C. H., Morris, H. R., Rigby, P. W. J., Smith, M. J. & Taylor, S. S. (1972). FEBS Proc. 29, 151-176. Ingram, V. M. (1961). Nature (London), 189, 704-708. Kabsch, W. (1978). Acta Crystallogr. sect. A, 34, 827-828. Levitt, M. (1976). J. Mol. BioZ. 104, 59-116. Levitt, M. t Chothia, C. (1976). Nature (London), 261, 552-557. McLachlan, A. D. (1972a). Nature New BioZ. 240, 83-85. McLachlan, A. D. (1972b). Acta CrystuZZogr. sect. A, 28, 65&657. McLachlan, A. D. (1979). J. Mol. BioZ. 128, 49-79. Matthews, D. A., Alden, R. A., Bolin, J. T., Filman, D. J., Freer, S. T., Hamlin, R., HO& W. G. J., Kisliuk, R. L., Pastore, E. J., Plante, L. T., Xuong, N. H. & Kraut, J. (1978). J. BioZ. Chem. 253, 6946-6954. Remington, S. J. & Matthews, B. W. (1980). In the press. Richardson, J. S., Richardson, D. C., Thomas, K. A., Silverton, E. W. & Davies, D. R. (1976). J. MOE. BioZ. 102, 221-235. Rossmann, M. G. & Argos, P. (1977). J. Mol. BioZ. 109, 99-129.

GENE

DUPLICATION

IN

GLUTATHIONE

REDUCTASE

347

A., Maki, R., Wall, R. Br. Sakano, H., Rogers, J. H., Hiippi, K., Brack, C., Traunecker, Tonegawa, S. (1979). Nature (London), 277, 627-633. Schulz, G. E. (1977). J. Mol. Evol. 9, 339-342. Schulz, G. E. & Schirmer, R. H. (1974). Nature (London), 250, 142-144. Schulz, G. E. & Schirmer, R. H. (1979). PrincipZes of Protein. Structure, Springer Verlag, New York. Schulz, G. E., Schirmer, R. H., Sachsenheimer, W. & Pai, E. F. (1978). Nature (London), 273, 120-124. Sheriff, S. & Herriott, J. R. (1980). J. Mol. BioZ. in the press. Smith, E. L. & Margoliash, E. (1964). Fed. Proc. Fed. Amer. Sot. Exp. BioZ. 23, 1243-1247. Sternberg, M. J. E. & Thornton, J. M. (1976). J. Mol. BioZ. 105, 367-382. Tang, J., James, M. N. G., Hsu, I. N., Jenkins, J. A. & Blundell, T. L. (1978). Nature (London), 271, 618-621. Wierenga, R. K., De Jong, R. J., Kalk, K. H., Hol, W. G. J. & Drenth, J. (1979). J. Mol. BioZ. 131, 55-73.