ARCHIVES OF BIOCHEMISTRYAND BIOPHYSICS Vol. 196, No. 2, September, pp. 598-610, 19’79
Invited Aspects of Fraction
Paper
1 Protein
Evolution
S. G. WILDMAN Department of Biology,
Molecular Biology Institute, University of California, Los Angeles, California 90024
Received March 12, 1979; revised April 20, 1979
The poet writes, “All flesh is grass.” A modern bard might say, “All green grass contains Fraction 1 protein, and lots of it.” Or, “Without F-l-protein, l no flesh!” My involvement with F-l-protein began in 1944 and except for some minor diversions into tobacco mosaic virus and chloroplasts, interest in F-l-protein has remained pretty steady. The distractions were enough, however, to let poachers into the field and make their kill. Weissbach et al. (1) showed that the “carboxylation enzyme” catalyzed COZ fixation during photosynthesis, but they did not know that this enzyme was the F-l-protein. Lyttleton and T’so (2) showed that F-l-protein (they did call it that) was the major soluble protein of chloroplasts and Rutner and Lane (3) demonstrated the enzyme (they called it ribulose diphosphate [RuDP] carboxylase2) to be composed of two distinct subunits. Recovery from the shock of these missed opportunities occurred when Nobumaro Kawashima came to my laboratory in 1969 and crystallized F-lprotein from tobacco leaves which opened up new opportunities for investigation. One
opportunity has been to study the evolution of F-l-protein. It is the purpose of this essay to describe a form of protein evolution observed with F-l-protein not previously evident in work on animal and bacterial proteins. The amino acid composition of F-lprotein changes when new species of plants are created by the process of amphiploidy. However, the amino acid changes do not require simultaneous mutations in t4 genetic code. AMPHIPLOIDY
Amphiploidy is the process authorities believe produced about 40% of the 500,000 or more species of plants which now inhabit the globe. The first step in evolution by amphiploidy occurs when two different species of plants chance to hybridize. The two kinds of plants must be closely similar in genetic makeup for fertilization to occur at all. With few exceptions, species must belong to the same genus for interspecific hybridization to be successful. Usually, the interspecific hybrid lacks fertility and has no means to produce seeds to self-perpetuate itself. The lack of fertility has the same 1 Abbreviations used: F-l-protein, Fraction-l-protein; RuDP, ribulose diphosphate; RuBP, ribulose cause as that which afflicts mules or hin1,5-bisphosphate; PGA, phosphoglyceric acid; SS, nies which are also sterile, interspecific small subunit; LS, large subunit. hybrids between donkeys and horses. The 2 Those of more ecclesiastical bent would call it: 3- chromosomes of the horse are just enough phospho-Dglycerate carboxylase (dimerizing) EC different from those of the donkey that they 4.1.1.39; but patience: The name will have to be are unable to pair correctly during meiosis. changed again because the enzyme is also an oxygenase. Also RuDP carboxylase has now become RuBP Imperfect pairing prevents the proper districarboxylase. In current literature, the enzyme is fre- bution of crucial genetic information into quently called “carboxydismutase,” a name coined by sperm and egg cells as reduction in chromothe Calvin laboratory before the mechanisms of the re- some number proceeds. As a consequence, a actions it catalyzed were understood in detail. new generation of mules or hinnies cannot 0003-9861/79/100598-13$02.00/O Copyright 0 1979by AcademicPress, Inc. All rights of reproductionin any form reserved.
598
ASPECTS
OF FRACTION
be created. With plants, however, this barrier to fertility is overcome by spontaneous doubling of the somatic chromosomes of the interspecific hybrid. The phenomenon is known as amphiploidy and it is a rare event in nature. If and when amphiploidy happens, the set of chromosomes of one of the species can now pair with themselves, and the set of chromosomes of the other species likewise can pair and symmetrical distribution of genetic information will be accomplished during meiosis. And thus, a new species of plant capable of self-perpetuation by seeds appears on the evolutionary scene. In the ensuing paragraphs, the effect of amphiploidy on F-l-protein will be examined in some detail. But first, let us consider the basic construction of the F-lprotein molecule, its catalytic function, and inquire as to how long the protein may have been evolving. STRUCTURE OF F-l-PROTEIN AFTER BILLION YEARS OF EVOLUTION
A FEW
F-l-protein catalyzes the combination of carbon dioxide with ribulose 1,5-bisphosphate (RuBP) during photosynthesis to form phosphoglyceric acid (PGA). Because it can also catalyze combination of 0, with RuBP to produce PGA and phosphoglycolate, the enzyme is now known to biochemists as ribulose 1,Lbisphosphate carboxylase-oxygenase. It is the world’s most abundant protein because it is found in all plants which contain chlorophyll a. In many plants, F-l-protein may constitute 25% of the total protein. Although of ubiquitous distribution throughout the Plant Kingdom, the basic construction of the F-lprotein macromolecule wherever found is closely similar to its structure in higher plants, where it has been investigated in most detail. F-l-protein as a physical entity first came to attention as a soluble protein having a sedimentation constant of 18 S and to be a ponderable particle of around 100 A in diameter as seen in the electron microscope (4). The protein is now known to consist of two types of subunits and numerous analyses concur in showing the F-l-protein macromolecule to be composed of 8 large
1 PROTEIN
EVOLUTION
599
subunits, each about 56,000 daltons, combined with 8 small subunits, each in the neighborhood of 12 to 14 thousand daltons to produce a macromolecule of approximately one-half million in molecular weight. In the case of crystalline F-l-protein obtained from N. tabacum and other species belonging to the genus Nicotiana, the protein is composed entirely of amino acids. A comparison between F-l-protein isolated from spinach and tobacco leaves showed 9 differences in amino acid composition but no differences in quaternary structure (5). Both proteins are composed of 8 large subunits combined with 8 small subunits. There are no disulfide bonds that have been detected in a macromolecule otherwise containing more than 90 sulfhydryl groups (6). The tobacco macromolecule contains one magnesium atom but the metal is absent from the spinach macromolecule (7). No other metals are found in crystalline F-lprotein (8). There are no visible chromophores (9). The pure protein absorbs light only in the ultraviolet region of the spectrum below 300 nm with a 280/240ratio of 1.5. X-Ray diffraction and electron microscopic studies of crystals of F-l-protein from tobacco have been performed by members of Eisenberg’s laboratory (10). The results have produced a view of the macromolecule depicted in Fig. 1. The eight large subunits are organized as a cube. The actual arrangement of the eight small subunits in relation to the cube of large subunits is not known with certainty although their placement shown in Fig. 1 is plausible. In this model, pairs of small subunits occupy four out of six faces of the cube of large subunits leaving a macromolecule with sixfold symmetry and a hole down the center of one axis. Proteins fitting the size and subunit structure of F-l-protein of higher plants have been identified in all photoautotrophic eucaryotes so far analyzed and also in some, but not all, procaryotes. Blue-green algae contain macromolecules which fit the F-lprotein description and these procaryotes are of special interest in regard to estimating the time that F-l-protein has had to evolve. The now classical studies of Barghoon and Schopf and their later independent en-
S. G. WILDMAN
LARGE 56,000 DALTONS 75%OF MASS
25%OF
MASS
FIG. 1. A schematic model of Fraction 1 protein adapted from concepts of Eisenberg and co-workers (10).
deavors have made clear that fossil remains of blue-green algae are present in rocks at least 3.5 billion years old. Furthermore, the fossils have the appearance of modern day blue-green algae. Thus, it takes no great stretch of the imagination to presume that F-l-protein made photoautotrophic life possible for these fossils. We can therefore believe that F-l-protein has had the opportunity to evolve over a span of several billion years and can wonder about what changes have occurred in the macromolecule during such an immense amount of time. Adopting the conventional view that photoautotrophic bacteria were the progenitors of the more advanced blue-green algae, we can see the glimmering of the beginning of F-l-protein evolution. In several instances, the molecule which possesses RuBP carboxylase activity obtained from photosynthetic bacteria has a smaller sedimentation constant than 18 S (11). These enzymes also seem to be composed of only a single kind of subunit. In this regard, it is of importance that the Akazawa laboratory has been able to physically separate the two kinds of subunits of spinach F-l-protein and demonstrate that the large subunit (LS) retains RuBP-carboxylase activity while the small subunit (SS) does not. To be active the LS must have a minimum molecular weight of around 400,000 which suggests that a protein equivalent to eight large subunits is the catalytic entity. The molecular weight of the active LS is thus in a size class of some of the RuBP carboxylases found in photosynthetic bacteria. We can deduce therefore, that the small subunits of F-l-proteins are an embellishment that
evolved after the LS had been in existence perhaps a billion years or more before blue-green algae appeared in the evolutionary scene. In summary, we can imagine that evolution of F-l-protein has been going on for at least 3.5 billion years without change in the basic physical configuration of eight large subunits combined with eight small subunits as now found in present day F-l-protein. However, somewhere in this vast span of time, the coding information for the large and small subunits became separated into two distinct kinds of DNA. In eucaryotes, chloroplast DNA codes for the primary structure of the LS whereas nuclear DNA contains the genetic information for the SS sequence of amino acids. Whether separation of the coding information has occurred in procaryotes which synthesize F-l-proteins of the eucaryotic type is an unsolved but tantalizing mystery. Identification of chloroplast DNA genes coding for F-l-protein, mechanisms of in vitro synthesis, etc., is a rapidly burgeoning field with many participants but none from my laboratory in recent times. The new information is too extensive for me to even attempt to assign proper credit for important discoveries. So, for the purposes of completing the basic outlines of F-l-protein evolution, I will summarize the subject by saying that the LS of F-l-protein is synthesized in chloroplasts which contain enzymes for replication and transcription of the coding DNA as well as means for translation of the gene product utilizing 70 S ribosomes unique to chloroplasts. The SS is synthesized on 80 S ribosomes outside of chloroplasts. Final assembly of eight large sub-
ASPECTS
OF FRACTION
units with eight small subunits to produce the finished F-l-protein macromolecule is thought to occur in the chloroplast. What selective advantage has occurred as the result of eucaryotes having evolved two systems within the same cell for manufacture of F-l-protein? My view is that the two-system scheme is a necessary corollary to plants having opted to evolve by amphiploidy. Separation is a means for allowing flexibility with respect to accommodating changes in F-l-protein amino acid composition without altering significantly the enzymatic activity of the protein when F-lprotein evolution is concomitant with evolution of a new plant species. In addition to flexibility of this nature, F-l-protein has evolved other flexible properties in regard to quaternary structure worth noting at this point. FLEXIBILITY IN SOLUBILITY THERMAL PROPERTIES
AND
F-l-protein changes configuration when presented with either one or the other of its two substrates. When pure, crystals of F-lprotein suspended in water will dissolve when monovalent cations are added, Na+ being most effective. Dialysis or even simple dilution of the solution with water will produce F-l-protein crystals again (12). Crystals free of cations will also almost instantly dissolve when trace amounts of RuBP are added, about 8 mol of RuBP/mol of F-l-protein being sufficient (13). The same effect is produced by fructose 1,6bisphosphate but compounds such as PGA, ATP, ADP, ribose &phosphate, etc., are ineffective. Removal of excess RuBP by prolonged dialysis does not induce recrystallization. However, addition of bicarbonate and Mg*+ will transform the protein into its insoluble, crystalline form evidently as the result of converting those few RuBP molecules bound to F-l-protein into PGA. Rabin and Trown (14) showed that addition of RuBP to a solution of purified spinach F-lprotein induces a difference spectrum in the region where tyrosine and tryptophan absorb, a result confirmed for tobacco F-lprotein (15). Chollet and Anderson (16) have shown that binding of RuBP produces a
1 PROTEIN
EVOLUTION
601
confirmational change in the F-l-protein macromolecule. I am still mystifyed by the fact that the difference spectrum is stable even when no precautions are taken to eliminate dissolved 0, in equilibrium with air. When COZ is increased over that normally present in the atmosphere, the difference spectrum disappears. Evidently, for F-l-protein to act as an RuBP oxygenase, the partial pressure of O2 must be increased significantly over that in the atmosphere. In the presence of RuBP, F-l-protein is soluble in water to more than 100 mg of protein/ml. In bicarbonate, less than 0.1 mg Fl-protein/ml dissolves. With NaCl, a 10% Fl-protein solution can be made but degree of solubility is proportional to concentration of salt. I have long advocated that F-l-protein performs the dual function of being an enzyme but also constitutes the firmament of the stroma of the chloroplast. F-l-protein may compose 50% of the proteins of the stroma. Lyttleton and T’so (2) found up to 5 mg of F-1-protein/mg of chlorophyll to compose the structure of chloroplasts. This translates into about 130 mg of F-l-protein/ ml of chloroplasts, or a F-l-protein concentration of 2 mM in the chloroplast stroma! This concentration is about lOOO-fold greater than the maximum amount of RuBP to accumulate in chloroplasts. The stroma undergoes remarkable ameboid movements and interactions with mitochondria in living cells (1’7). It is my view that the unusual solubility properties of F-l-protein in the presence of one or the other of its substrates, or change in ionic environment, could form the basis for an approach to understanding the ameboid movements of the stroma. The specific RuBPease activity does not change when recrystallized F-l-protein is heated at 50°C for 20 min (18). In fact, the specific enzymatic activity declines when F-l-protein is placed at temperatures below 20°C. When put in ice, two-thirds of the specific RuBPease activity will disappear within 18 h at 0°C (19). However, the specific activity will be regained by elevating the temperature of F-l-protein. The time to regain the entire enzymatic activity is temperature dependent. Twenty minutes
602
S. G. WILDMAN
at 50°C will restore maximum activity. At 20°C many hours must go by before complete reactivation. The phenomenon is a first-order reaction with an activation energy of around 1 kcal/mol. The shift from least active to most active state is an allor-none situation involving a very slight change in configuration of the macromolecule (16). Apparently eons of exposure to drastic changes in environments have caused photoautotrophs to select for a crucial enzyme which can endure large changes in temperature without losing catalytic activity. Perhaps examination of highly purified RuBPcase from procaryotes in regard to the cold-inactivation, heat-reactivation phenomenon would provide a clue as to whether acquisition of flexibility toward temperature occurred early or late in F-l-protein evolution. FLEXIBILITY IN AMINO ACID COMPOSITION OF F-l-PROTEIN WITHOUT SIGNIFICANT CHANGE IN CATALYTIC ACTIVITY
A comparison showed the specific RuBP carboxylase activity of F-l-protein from spinach and tobacco leaves to be closely similar (5). As plants, spinach and tobacco are very widely separated in a phylogenetic sense and have had a long time to evolve the characters which make them so different in phenotypes. F-l-protein from spinach differs from tobacco F-l-protein in the amounts of 9 amino acids (20). Most of these differences are located in the SS. Whereas 17 out of 22 tryptic peptides of the LS are the same for both F-l-proteins, only 5 out of 16 tryptic peptides of the SS are alike (5). F-l-protein from different species of plants within the same genus also exhibit flexibility in amino acid composition without altering specific enzymatic activity. Table I contains a summary of data which illustrates this situation. The four kinds of F-l-proteins had been recrystallized to constant enzymatic activity (21) and all display the same specific RuBPcase activity. Compared to the amino acid composition of N. tabacum F-l-protein, the three other F-l-proteins exhibited up to seven differences in amino acid composition. Most of the differences are confined to the SS portion of the F-l-protein macromolecule (22).
The F-l-protein from N. gossei displayed a small but reproducibly higher RuBPcarboxylase activity than six other F-lproteins which were alike in activity (21). Genetic experiments demonstrated that the coding information which resulted in an F-lprotein with higher enzymatic activity was inherited only by the maternal line. We infer therefore that the change in enzymatic activity has something to do with a change in the LS coded by chloroplast DNA. Flexibility is further illustrated by the fact that the SS can be composed of different kinds of polypeptides without altering the enzymatic activity of F-l-protein. COMPARISON OF POLYPEPTIDE COMPOSITION OF F-l-PROTEIN AMONG PLANTS REPRESENTING DIFFERENT PHYLAOFTHEPLANT KINGDOM
Electrofocusing carboxymethylated F-l-protein in 8 M urea resolves the LS polypeptides from those of the SS. The LS polypeptides usually exhibit less acidic isoelectric points than the SS polypeptides. In numerous examples involving the F-lprotein isolated from more than 100 different plant species, the LS resolves into a cluster of three polypeptides separated from each other by about 0.1 pH unit. The SS resolves into from one to four polypeptides depending on the species of the plant which was the source of F-l-protein (23). Why the LS should resolve into a cluster of three polypeptides has been an intractable problem. The LS polypeptides of N. tabacum F-l-protein have been separated and analyzed for amino acids and tryptic and chymotryptic peptides without revealing significant differences (24). The problem is compounded by the fact that each polypeptide has a molecular weight in excess of 50,000 which has been an impediment toward sequencing. It is my prejudice that the polypeptides are not separate gene products but more likely posttranscriptional entities. The possibility that they could be artifacts arising from incomplete carboxymethylation cannot be eliminated. But the remarkable fact is that resolution and precise position of the three polypeptides in a pH gradient are highly repro-
603
1 PROTEIN EVOLUTION
ASPECTS OF FRACTION
TABLE I SUMMARY OF DATA SHOWING THAT A CONSIDERABLE CHANGE IN AMINO ACID COMPOSITION CAN BE WITHOUT EFFECT ON THE SPECIFIC RuBP CARBOXYLASE ACTIVITY OF Nicotiana FRACTION 1 PROTEINV N. tabacum
Amino acid LYS His Ax ASP Thr Ser GlU pr0
GUY Ala Val Met Be Leu T yr Phe
L
S
5.04 2.78 6.41 8.92 5.59 3.17 10.06 4.32 9.95 8.92 7.46 1.65 4.33 8.97 3.89 4.42
7.90 0.63 4.52 7.12 4.06 3.58 14.63 6.52 7.17 5.72 6.40 1.61 4.24 8.26 7.76 4.24
N. glutinosa
N. sylvestris
L
L
S
N. glauca S
L
S 5.92 5.50 7.90
3.53 4.97 16.70 7.53
4.98 4.40 15.80
16.10
6.06 8.00 2.41 4.83
4.70 8.26
6.02 3.70
a Data are from Kawashima et al. (22); the specific enzymatic activities of F-l-protein from these plants were indistinguishable (19). L, large subunit; S, small subunit. Data are percentages of total amino acids, and only those significantly different from the composition of N. tabacum F-l-protein are shown.
ducible. This reproducibility has made the isoelectric point of the middle polypeptide of the LS cluster an extremely useful tool for study of F-l-protein evolution. Depending upon the plant source, the SS of F-l-protein resolves into from one to four different kinds of polypeptides. In contrast to uniform differences in isoelectric points separating the cluster of the three LS polypeptides, the SS polypeptides may be close, or very different, in isoelectric points when the SS is composed of more than one kind of polypeptide. The electrofocusing composition of carboxymethylated F-l-protein subunits among different species of plants belonging to the same genus has provided insight into several aspects of F-l-protein evolution. EVOLUTION OF F-l-PROTEIN AMONG SPECIES BELONGING TO THE SAME GENERA OF PLANTS
In the genus Nicotiana composed of 66 species (tobacco being one of the species), 20 different kinds of F-l-protein molecules have arisen. With respect to the LS, four different clusters of polypeptides which dif-
fer in isoelectric point from each other have evolved. They are shown in diagrammatic form in Fig. 2, which also depicts the range of differences in isoelectric points of the 13 SS polypeptides which have also evolved. Even though the F-l-proteins have been isolated from plants grown from different sources of seeds for a given Nicotiana species, the electrofocusing compositions have been remarkably constant. Because of the peculiar geographic distribution of Nicotianas, an argument has been made that F-l-protein has been evolving in this genus for at least 75 million years (25). In this long span of time, surviving mutations affecting the isoelectric points of the LS polypeptides have been exceedingly rare, whereas evolution of SS polypeptides of different isoelectric points has been much faster. A similar course of F-l-protein evolution is seen among the 30 species composing the genus Gossypium (of which cotton is a member) which is of comparable age to Nicotiana (26). With the much younger genus Lycopersicon to which tomato belongs, only three kinds of SS polypeptides have evolved and there are no dif-
604
S. G. WILDMAN TYPE LARGE S~IJNIT
A
=
=
YPEL
-
-
=
=
gpE& v--P--
_
yPE>
_
----
-2 -
SMALL SUf3UN IT
-3
-5’
Z: -8
-9
-
9’
--I -
-4
---
‘0 -,,
FIG. 2. Diagrammatic representation of the isoelectric point positions of the 4 clusters of carboxymethylated large subunit polypeptides and 13 carboxymethylated small subunit polypeptides to have evolved among the Fraction 1 proteins contained in 66 species of Nicotiana.
ferences in isoelectric points of the LS polypeptide clusters. The same general picture of F-l-protein evolution is seen in other genera (27): ultraconservative evolution of the LS and much less conservative evolution of the SS. But what is most striking is that mixtures of LS polypeptide clusters of different isoelectric points have never been encountered, whereas mixtures of different kinds of SS polypeptides are more the rule than the exception. We had previously seen in Table I that as many as 11 differences in amounts of amino acids had not affected the specific RuBP carboxylase activity of Nicotiana F-l-proteins. From electrofocusing analyses it becomes apparent that the enzymatic activity of F-l-protein is also not obviously affected in regard to the kind of LS or numbers of different kinds of polypeptides comprising the SS. N. sylvestris and N. glauca have F-l-proteins whose LS polypeptides are different but whose small subunits contain a single kind of polypeptide; N. tabacum and N. glutinosa are different in LS polypeptides and each has two kinds of SS polypeptides, but the polypeptides have different isoelectric points in the two F-l-proteins. N. suaveolens has a small subunit with three kinds of polypeptides; N. excelsior, four. The question can be asked, therefore, as to the manner by which such changes in F-l-protein composition are brought about. Coding information controlling the isoelectric points of the cluster of three LS polypeptides is inherited exclusively by the maternal line as has been shown for F-lprotein inheritance in tobacco and Nicotianu relatives (28), oats (29), wheat (30),
cotton (26), and other plants. Thus, the coding information is contained in extranuclear DNA. The gene for the LS has been located in chloroplast DNA by restriction endonuclease mapping (31). Studies by different laboratories agree in showing that multiple kinds of F-l-protein SS polypeptides differ in terms of sequence, tryptic and chymotryptic peptides, and amino acid composition (32, 33). Many examples are now available to show that the coding information for the primary structure of the SS is contained in nuclear DNA. Since these genes are subject to manipulation in genetic experiments, it has been possible to demonstrate how F-l-protein acquires multiple kinds of SS polypeptides. HOW MULTIPLE KINDS OF SS POLYPEPTIDES OF DIFFERENT ISOELECTRIC POINTS ORIGINATE IN F-l-PROTEIN
The SS of N. tabacum F-l-protein consists of two kinds of polypeptides. N. tabacum arose by interspecific hybridization. N. sylvestris was the female parent that provided the coding information for the LS and also contributed coding information for the less acidic of the two SS polypeptides in N. tabacum F-l-protein (34). N. tomentosiformis was the male parent and therefore denied the chance to code for the LS but provided the nuclear genetic information for the more acidic of the two SS polypeptides in N. tabacum F-l-protein. The hybrid, N. sylvestris x N. tomentosiformis, was infertile. The arrangements of genes on the chromosomes of the two different plant species are too dissimilar to permit pairing and successful meiosis.
ASPECTS
OF FRACTION
This barrier to fertility was overcome when a spontaneous doubling of both sets of somatic chromosomes occurred. That is, the phenomenon of amphidiploidy occurred and consequently a new species of plant capable of sexual reproduction was created as well as a new kind of F-l-protein. Amino acid analyses show four differences in the composition of the N. sylvestris SS polypeptide of F-l-protein compared to that of N. tomentosiformis. The same four differences are present in the SS of N. tabacum. Sequence analysis shows that residue 7 in the N. tomentosiformis SS polypeptide is tyrosine compared to isoleucine at residue 7 in N. sylvestris. Prior to knowledge that the SS of N. tabacum F-l-protein is composed of two kinds of polypeptides, sequencing of the SS had produced an ambiguity at residue 7 (54). Either tyrosine or isoleucine could occupy this position. The reason for the ambiguity is now apparent: The more acidic of the two kinds of polypeptides contains tyrosine at residue 7 while the less acidic polypeptide contains isoleucine (32, 33). Substitution of tyrosine for isoleucine requires more than one base change in a coding triplet so that it is clear that the change in composition of the F-l-protein created in N. tabacum did not require a simultaneous mutation in the genetic code. Ambiguities in sequences have been uncovered in other plant proteins. One example is the ferredoxin of spinach leaves composed of a single polypeptide of 97 amino acids containing two ambiguities in the sequence (55). Genetic studies have shown that two kinds of ferredoxin molecules can exist in hybrids between two species of plants (56). Perpetuation of the two kinds of ferredoxins would only require the act of amphiploidy. The important point is that whenever an ambiguity in the sequence of a plant protein is encountered, the possibility exists that more than a single species of protein may be present. Another example of the evolution of F-lprotein composition by amphiploidy concerns N. dig&a, whose SS is composed of four kinds of polypeptides. Two polypeptides came from N. tabacum which was the male parent and two SS polypeptides came from N. glutinosa, the female parent which was also responsible for the composition of the LS (35).
1 PROTEIN
EVOLUTION
605
Particularly elegant examples of ~bringing two kinds of nuclear genomes together to create new species of plants and new kinds of F-l-proteins are those involving fusion of protoplasts in test tubes to create parasexual hybrid plants. Carlson et al. (36) first showed that fertile plants possessing hybrid characters could be produced by fusing protoplasts derived from mesophyll cells of N. glauca and N. 1angsdorfBi leaves. N. glauca F-l-protein has a small subunit composed of a single polypeptide different in isoelectric point from the two kinds of polypeptides in the SS of N. ZangsdoQlii F-lprotein. The fertile new species of plant (N. glauca + N. langsdorjjii) had an F-l-’ protein composed of either the N. glauca or N. langsdorjjii LS cluster of polypeptides and three kinds of SS polypeptides (37). In 16 N. glauca + N. langsdorjjii parasexual hybrid plants created by Smith and co-workers (38), the F-l-proteins have small subunits composed of three kinds of polypeptides and the same composition has been found in F-l-proteins isolated from F, and F3 generations of self-fertilized parasexual hybrids (39). So it is evident that a mechanism exists to insure stability in composition after a new species of F-l-protein has been created by amphiploidy. A most striking extension of the protoplast fusion technique has been that of Melchers et al. (40). They created intergeneric hybrids by fusion of tomato and potato protoplasts. The LS polypeptides were either of the tomato or potato type of F-l-protein. The SS of tomato F-l-protein has three kinds of polypeptides different in isoelectric points from the two kinds in potato. The tomato-potato hybrids have F-l-proteins with five kinds of SS polypeptides. HOW STABILITY IN COMPOSITION OF A NEW KIND OF F-l-PROTEIN IS MAINTAINED FOLLOWING AMPHIPLOIDY
Besides the stability in composition of the new species of F-l-protein created in parasexual hybrids, other examples of stability can be cited. The F-l-proteins in more than 100 individual N. tabacum plants have yielded the same electrofocusing pattern of large and small subunits. The same can be said for the F-l-proteins contained in individuals of other plant species. F-l-protein composition is not affected by the develop-
606
S. G. WILDMAN
mental stage of the plants, their nutritional status, or in leaves which had never been exposed to light (9). Gene dose does not affect composition of F-l-protein. The composition and apparent amounts of the two SS polypeptides in tobacco F-l-protein were the same in haploids, diploids, triploids, and tetraploids (41). No change was observed in the F-l-proteins from more than 20 self-fertile tobacco cultivars whose nuclear genomes have been under constant manipulation by plant breeders for hundreds of years and consequently, the plants exhibit significant differences in morphological characters. Hirai (42) showed that the constancy in F-l-protein composed of different kinds of SS polypeptides was of a unique nature. When the SS was composed of just one kind of polypeptide, the electrophoretic mobility of the F-l-protein macromolecule was that of a homogeneous species of protein. With two kinds of SS polypeptides, electrophoretical heterogeneity was found. He concluded that a mixture of nine kinds of F-l-protein macromolecules differing from each other very slightly in electrophoretic mobilities existed when the SS was composed of two polypeptides of different charge. His explanation was that the coding information for each kind of polypeptide was at separate locations within the cell, these locations being also separated from the site of LS synthesis in the chloroplast. Hirai visualized the possibility that separation of the two sites of coding information for two kinds of SS polypeptides would necessitate randomizing the opportunity for one or the other, or both, polypeptides to be included in the assembly of a F-l-protein macromolecule. In combining eight small subunits with eight large subunits the F-l-protein might, in rare instances (0.37%), be composed of a SS polypeptide of a single kind; one, or the other, of the single parental types. In more than 99% of the F-l-proteins, all possible combinations resulting from random sorting of the two kinds of SS polypeptides would be present, the most frequent (27%) being an equal mixture of both kinds of polypeptides. Hirai’s randomization hypothesis is attractive because it helps to explain another feature of F-l-protein inheritance. When amphiploidy occurs, the number of genes
coding for the SS of F-l-protein doubles. However, when genetic information for two kinds of SS polypeptides is combined in the new species of plant, the amount of each of the gene products in the newly evolved F-lprotein is reduced by half. That is, the sum of the staining intensities of the two SS polypeptides, say in N. tabacum F-l-protein, appears to be about one-half the intensity exhibited by the single polypeptide in the F-lprotein from either one of the two parents of tobacco. The apparent reduction in gene product in the face of an increase in number of genes coding for the SS product is readily explained as the consequence of need to randomize the two kinds of products. The LS constitutes three-quarters of the mass of the F-l-protein macromolecule. The LS can accept only eight SS polypeptides of whatever kind before a F-l-protein is completed and comes off the assembly line. Even though all genes presumably function with equal capacity to make SS polypeptides, the laws of chance dictate that their products will have to share in the opportunity of becoming part of the F-l-protein macromolecule. If a F-l-protein such as that in N. debneyi contains an SS composed of three kinds of polypeptides, the random hypothesis predicts that 45 kinds of F-l-protein could be present. HOW PHYSICAL SEPARATION OF GENETIC INFORMATION FOR MORE THAN ONE KIND OF SS POLYPEPTIDE IS ACCOMPLISHED
Physical separation of the coding information for different kinds of SS polypeptides most likely occurs because the genes are sequestered on heterologous chromosomes which never pair with each other during meiosis. Evidence for this interpretation comes from several examples where it has been shown that the coding information for different kinds of SS polypeptides does not segregate in F, progeny. Two examples have already been given: the three kinds of SS polypeptides in the F-l-protein of N. glauca + N. langsdorffii parasexual hybrids and the two kinds in N. tabacum F-lprotein. More recent experiments (43) have shown no segregation of genetic information for the three kinds of SS polypeptides in
ASPECTS
OF FRACTION
tomato F-l-protein, nor the four kinds in N. excelsior and Lemna purpusilla F-lproteins. Chen and Sand (44) have been able to identify a specific chromosome in a male sterile N. tabacum that codes for a specific kind of SS polypeptide in the F-l-protein from this plant. From the aforementioned examples, amphiploidy appears to have been a major factor in F-l-protein evolution. The presence of more than one kind of SS polypeptide in F-l-protein is a pretty certain sign that amphiploidy was responsible for mixing up two or more different kinds of proteins. It is important to point out that F-lproteins with single SS polypeptides could also have undergone changes in composition by amphiploidy without the change being reflected by a difference in isoelectric points of the different polypeptides. An ambiguity in sequence of SS amino acids would be suggestive of previous combination of two kinds of polypeptides which were not different in isoelectric points. Kung et aZ. (45) provide an example of single kinds of polypeptides of different composition but the same isoelectric points among Nicotiana F-l-proteins. A situation that needs careful further investigation concerns the cereals-wheat, oats, barley, and rye which are amphiploids but whose F-l-proteins have a single SS polypeptide of the same isoelectric point. Wheat is a hexaploid. Therefore, amino acid and sequence analysis of the SS of its F-lprotein compared to the SS in the diploid and tetraploid progenitors of hexaploid wheat should be highly revealing in regard to evolution of F-l-protein among wind pollinated grasses. WHAT IS THE SIGNIFICANCE TO BE ATTACHED TO THE MATERNAL INHERITANCE OF THE CODE FOR THE LS OF F-l-PROTEIN?
In numerous investigations, there have been no exceptions to the rule that the coding information which determines the isoelectric position of the cluster of LS polypeptides is inherited only by the maternal line. The analyses extend to amino acid composition (32) as well as tryptic (46) and chymotryptic (32) peptide fingerprints. But how and why the chloroplast DNA coding
1 PROTEIN
EVOLUTION
607
for the LS of F-l-protein in the male parent is excluded from transmitting its information during sexual reproduction is a deep mystery, at least for higher plants. There is evidence to suggest that specific restriction endonucleases destroy one kind of chloroplast DNA during Chlamydomonas mating (47) but I doubt that a similar mechanism could operate in higher plants. In the sexual cycle of flowering plants, pollen containing haploid complements of chromosomes germinates on the stigma of a flower, forms a pollen tube which continues growth down the style attached to the stigma, and then by this means three nuclei are deposited in the ovule. One nucleus fertilizes the haploid egg cell and the next generation of plant is derived from this act. The coding DNA for the LS of F-l-protein is contained in pollen. This is shown by the fact that haploid plants can be derived from pollen grown in a culture medium. The haploid plants are green and their chloroplasts contain F-l-protein of the same composition as that in the diploid tissue from which the pollen cells had been differentiated. Evidently, the mechanism that excludes transmission of the genetic information for the LS from the next generation operates after pollen tube growth has commenced within the style of the flower. What turns me away from an idea that the egg cell has a specific enzyme for recognizing and destroying a foreign chloroplast DNA is that a similar exclusion phenomenon operates during fusion of plant protoplasts, but it is not specific. In the previously alluded to 16 N. glauca + N. langsdorffii parasexual hybrids created by fusion of protoplasts, none were found to contain a F-l-protein with a mixture of LS polypeptides. Of more significance, the type of LS was almost equally divided between the N. glauca or the N. langsdor&% type of F-l-protein although the SS in all cases was a mixture of three kinds of polypeptides derived from the nuclear DNA of both plant species. In the potato-tomato hybrids produced by protoplast fusion, a similar condition prevails: Some plants have F-l-protein with the tomato type LS, others the potato type, but no mixtures whereas all the F-l-proteins have the kinds of SS polypeptides found in both of the species of plants. The
608
S. G. WILDMAN
trated in several ways. In a leaf of the type represented by tobacco or spinach, the content of F-l-protein may reach 50% of the soluble protein and more than 25% of the total protein. The amount of F-l-protein per leaf is proportional to the number and size of the leaf cells and the number and size of the chloroplasts contained in those leaf cells. This proportionality is maintained as long as the leaf grows by expansion in area. F-lprotein can accumulate at a faster rate than other soluble proteins during rapid growth of leaves (48). F-l-protein can also disappear faster than other proteins when leaves senesce. So there is some means for leaf cells to control the rate of F-l-protein synthesis. Mesophyll cells of leaves are totipotent because new plants capable of sexual reproduction can be regenerated from single mesophyll cells. At the beginning of the process, a tobacco mesophyll cell starts with F-l-protein constituting 50% of its soluble protein. The mesophyll cell is caused to undergo mitosis and to form a callus of largely undifferentiated cells which contain only traces of chlorophyll. The callus cells are also almost devoid of F-l-protein. Application of growth hormones causes the callus cells to differentiate roots and shoots with leaves and flowers. Chlorophyll is synthesized and so is F-l-protein which will again reach the 25% level of total leaf protein. So plant cells also have some kind of switch to shut F-l-protein synthesis on and off, or a valve to regulate its flow. The LS of F-l-protein is synthesized by 70 S ribosomes in the chloroplast with the coding DNA in close proximity to transcribing and translating machinery. Only one kind of LS is manufactured. However, the SS is made on 80 S ribosomes outside the chloroplast and there may be as many as four different kinds of SS polypeptides being synthesized simultaneously. The sites of transcription for each kind of SS polyREGULATION OF F-l-PROTEIN peptide can be physically separated from BIOSYNTHESIS each other when the coding DNA is located Physical separation of coding information on separate chromosomes. Therefore, a for different kinds of SS polypeptides and basic question about F-l-protein regulation their randomization into the F-l-protein is how a coordination is achieved which remacromolecule poses a question as to what sults in a constancy in F-l-protein composirole regulatory genes play in F-l-protein tion when the SS is composed of more than biosynthesis. That the amount of F-l- one kind of SS polypeptide. Does a signal protein is subject to modulation is illus- have to be transmitted to the chloroplast
question of why no mixtures of the LS appear in parasexual hybrids seems to me not entirely separate from the question as to why no transmission of LS coding information occurs via the male during sexual reproduction. My former colleagues, Hirai and Uchimiya, came up with a speculation that might be a start toward answering the question. They suspect that if two sources of genetic information for the LS were being simultaneously translated, an opportunity for randomizing the gene products might exist. In sharp contrast to absence of effect on RuBPcarboxylase activity of randomizing two or more kinds of SS polypeptides, randomizing two LS gene products might be a disaster. Competent F-l-protein enzyme molecules might appear at a frequency of less than 1% of the total molecules. This speculation is predicated on the idea that a very precise order in the assembly of eight monomeric LS polypeptides is required to create an active enzymatic site. Any deviation in order such as insertion of a large subunit polypeptide from another species of plant would produce an inactive F-l-protein. At first there seemed to be a way to test the idea of my colleagues. Using Akazawa’s methods (6), the octamer of eight large subunits from two F-l-proteins would be dissociated, mixed, and reassociated. If the F-l-proteins were alike in LS composition, RuBPcarboxylase activity would reappear when octamers were reformed; if unlike, RuBPcarboxylase would appear at a very much lower level or not at all. Alas, the test foundered on the obstacle that the octamer of the control could not be dissociated and enzymatic activity regained after reaggregation. Perhaps, however, this was a test of sorts. Maybe activity was not regained because the eight subunits did randomize upon reassembly and disorder precluded formation of an enzyme.
ASPECTS OF FRACTION
DNA genes and also to genes located on different chromosomes to commence synthesis? My prejudice is that the signal needs to be sent only to the chloroplast DNA to commence synthesis of the LS. The reasoning behind this prejudice derives from the unusual nature of F-l-protein isozymes compared to other plant isozymes. With two kinds of SS polypeptides, two nonallelic structural genes produce a mixture of 9 F-l-protein isozymes; with three genes, 45 isozymes could be present, etc. The invariable constancy in proportion of F-l-protein isozymes seems to be a very different situation from that seen with other plant isozymes. With maize endosperm catalase, Scandalios (49) has shown that two alleles for monomeric subunits can result in five isozymes of different electrophoretic mobility as a tetrameric catalase molecule is assembled. Sheen (50) has identified 12 isozymes of peroxidase in tobacco leaves. Whereas the composition of F-l-protein, and hence, the ratio of one isozyme to another remains constant throughout growth and development, the ratio of one catalase or peroxidase isozyme to another can change markedly during growth and development. The change in proportion must be a consequence of regulatory genes acting to modulate the synthesis of one isozyme relative to another. Since F-l-protein isozymes do not undergo these changes in proportion, my surmise is that regulatory genes that control the proportions of catalase and peroxidase electrophoretical types do not exist to modulate synthesis of the SS of F-l-protein. Rather, the SS is always being made but unless the LS is simultaneously synthesized, the amount of SS is too small to detect. I think this idea is worthy of further cogitation because pools of SS have been detected (51, 52) and more importantly, conditions have been established by Feierabend (53) where SS will accumulate in significant quantities where LS, and therefore F-l-protein, is not synthesized. ACKNOWLEDGMENTS I am grateful for the longstanding financial support I have received since my arrival at UCLA from the Department of Energy, nee Energy Resource and Development Commission, rice Atomic Energy Commission. Gratitude is also due to the National Science Foundation and National Institutes of Health.
1 PROTEIN EVOLUTION
609
The subject matter of this paper is the result of the research efforts of Nobumaro Kawashima, Shalini Singh, Shiu-yuen Kwok, Pak-Hoo Chan, Katsuhiro Sakano, Shain-dow Kung, Atsushi Hirai, John Gray, Kevin Chen, Prachuab Kwanyuen, Charlene Jope, Sarjit JohaI, and Hirofumi Uchimiya. The paper is dedicated to the memory of my friend, the late John Lyttelton. REFERENCES 1. WEISSBACH, A., HORECKER, B. L., AND HURWITZ, J. (1956) J. Biol. Chem. 218, 795. 2. LY?TLETON, J. W., AND T’so, P. 0. P. (1958) Arch. Biochem. Biophys. 73, 120. 3. RUTNER, A. C., AND LANE, M. D. (1967) Bio&em. Biophys. Res. Commun. 39, 923. 4. KAWASHIMA, N., AND WILDMAN, S. G. (1970) Annu. Rev. Plant Physiol. 21, 325. 5. KAWASHIMA, N., AND WILDMAN, S. G. (1971) Biochim. Biophys. Actu 262, 42. 6. NISHIMURA, M., TAKEBE, T., SUGIYAMA, T., AND AKAZAWA, T. (1973) J. Biochem. 75, 945. 7. KAWASHIMA, N., TANABE, Y., AND IWAI, S. (1976) Biochem. Biophys. Acta 427, 70. 8. CHOLLET, R., ANDERSON, L. L., AND HovSEPIAN, L. C. (1975). Biochem. Biophys. Res. Commun. 64, 97. 9. SAKANO, K., AND WILDMAN, S. G. (1974). Plant Sci. Lett. 2, 273. 10. BAKER, T. S., EISENBERG, D., EISERLING, F. A., AND WEISSMAN,L. (1975)J. Mol. Biol. 91,391. 11. MCFADDEN, B. A., AND PUROHIT, K. (1978) in Photosynthetic Carbon Assimilation (Siegelman, H. W., and Hind, G., eds.), p. 179, Plenum, New YorWLondon. 12. SAKANO, K., KUNG, S. D., AND WILDMAN, S. G. (1974) Plant Cell Physiol. 15, 611. 13. KWOK, S. Y., KAWASHIMA, N., AND WILDMAN, S. G. (1971) Biochim. Biophys. Acta 234, 293. 14. RABIN, B. R., AND TROWN, P. W. (1964) Nature (London) 202, 1290. 15. KWOK, S. Y., AND WILDMAN, S. G. (1974) Arch. Biochem. Biophys. 161, 354. 16. CHOLLET, R., AND ANDERSON,L. L. (1976)Arch. Biochem. Biophys. 176, 344. 17. WILDMAN, S. G. (1966) in Biochemistry of Chloroplssts (Goodwin, T. W., ed.), Vol. 2, p. 295, Academic Press, London. 18. KAWASHIMA, N., SINGH, S., AND WILDMAN, S. G. (1971) Biochem. Biophys. Res. Commun. 42, 664. 19. SINGH, S., AND WILDMAN, S. G. (1974) Plant Cell Physiol. 15, 373. 20. KAWASHIMA, N. (1969)Plant Cell Physiol. 10,31. 21. SINGH, S., AND WILDMAN, S. G. (1973)Mol. Gen. Genet. 124, 187. 22. KAWASIUMA, N., KWOK, S. Y., AND WILDMAN, S. G. (1971) B&him. Biophys. Acta 236, 578. 23. CHEN, K., KUNG, S. D., GRAY, J. C., AND WILDMAN, S. G. (1976) Plant Sci. Lett. 7, 429.
610
S. G. WILDMAN
24. GRAY, J. C., KUNG, S. D., AND WILDMAN, S. G. (19’78) Arch. Biochem. Biophys. 185, 272. 25. CHEN, K., JOHAL, S., AND WILDMAN, S. G. (1976) in Genetics and Biogenesis of Chloroplasts and Mitochondria (Bticher, Th., et al., eds.), p. 3, Elsevier/North-Holland, Amsterdam. 26. CHEN, K., AND WILDMAN, S. G. (1979) Unpublished manuscript. 27. UCHIMIYA, H., CHEN, K., AND WILDMAN, S. G. (1977). Stadler Symp. 9, 83. 28. SAKANO, K., KUNG, S. D., AND WILDMAN, S. G. (1974) Mol. Gen. Genet. 130, 91. 29. STEER, M. W. (1975) Canad. J. Genet. Cytol. 17, 337. 30. CHEN, K., GRAY, J. C., AND WILDMAN, S. G. (1975) Science 190, 1364. 31. COEN, D. M., BEDBROOK, J. R., BOGORAD, L., AND RICH, A. (1977) Proc. Nat. Acad. Sci. USA 74, 5487. 32. KAWASHIMA, N., TANABE, Y., AND IWAI, S. (1976) Biochim. Biophys. Acta 427, 70. 33. STRBBAEK, S., GIBBONS, G. C., HASLETT, B., BOULTER, D., AND WILDMAN, S. G. (1976) Carlsberg Res. Commun. 41, 335. 34. GRAY, J. C., KUNG, S. D., WILDMAN, S. G., AND SHEEN, S. J. (1974) Nature (London) 252,226. 35. KUNG, S. D., SAKANO, K., GRAY, J. C., AND WILDMAN, S. G. (1975) J. Mol. Evol. 7, 59. 36. CARLSON, P. S., SMITH, H. H., AND DEARING, R. D. (1972) Proc. Nat. Acad. Sci. USA 69, 2292. 37. KUNG, S. D., GRAY, J. C., WILDMAN, S. G., AND CARLSON, P. (1975) Science 187, 353. 38. SMITH, H. H., KAO, K. N., AND COMBATTI, N. C. (1976) J. Hered. 67, 123. 39. CHEN, K., WILDMAN, S. G., AND SMITH, H. H. (1977) Proc. Nat. Acad. Sci. USA 74, 5109.
40. MELCHERS, G., SACRISTAN, M. D., AND HOLDER, A. A. (1978) Carlsberg. Res. Commun. 43,203. 41. CHEN, K., KUNG, S. D., GRAY, J. C., AND WILD MAN, S. G. (1975) Biochem. Genet. 13, 771. 42. HIRAI, A. (1977) Proc. Nat. Acad. Sci. USA 74, 3443. 43. CHEN, K., UCHIMIYA, H., AND WILDMAN, S. G. (1979) Unpublished manuscript. 44. CHEN, K., AND SAND, S. A. (1979) Science 204, 179. 45. KUNG, S. D., LEE, C. I., WOOD, D. D., AND MOSCARELLO, M. A. (1977) Plant Physiol. 60, 89. 46. CHAN, P. H., AND WILDMAN, S. G. (1972). Biochim. Biophys. Acta 277, 677. 47. SAGER, R., AND KITCHIN, R. (1975) Science 189, 426. 48. DORNER, R. W., KAHN, A., AND WILDMAN, S. G. (1958) J. Biol. Chem. 205, 969. 49. SCANDALIOS, J. G. (1969) Biochem. Genet. 3, 37. 50. SHEEN, S. J. (1970) Theor. Appl. Genet. 40, 18. 51. KAWASHIMA, N. (1970) Biochem. Biophys. Res. Common. 38, 119. 52. HIRAI, A., AND WILDMAN, S. G. (1977) Biochim. Biophys. Acta 479, 39. 53. FEIERABEND, J. (1976) in Genetics and Biogenesis of Chloroplasts and Mitochondria (Biicher, T. H., et al., eds.), p. 99, Elsevier/North-Holland, Amsterdam. 54. GIBBONS, G. C., STRBAEK, S., HASLETT, B., AND BOULTER, D. (1975) Experientia 31, 1640. 55. MATSUBARA, H., AND SASAKI, R. M. (1968) J. Biol. Chem. 243, 1732. 56. KWANYUEN, P., AND WILDMAN, S. G. (1975) Biochim. Biophys. Acta 405, 167.