TIC,
- - ] u b 1988, VoL 4, ~ . 7
40sley, M. ,eLet a1.(1986)Cell 45, 537-.544 6 Nasmyth, K. (1985)Cell 42, 225-235 6 ArfisheveM, A. et aL (1987)Nature 328, 823-827
review
22 Ltischer,B., Stauber,C., Schindler,R. and Schfunperll,D. (1985) Proc. Natl Acad. Sci. USA 82, ,1389-4393
23 Stanber, C. et aL (1986)EMBO ]. 5, 3297-3303 7 Seiler-Tuyns, A. and Patez~an, B. M. (1987) Mol. Cell. BIOL 7~ 2,4 LEscher, B. and Sch~anperli,D. (1987) EMBO ]. 6, 1721-1726 1048-1054 25 Strub, K., Galli, G., Busslinger, M. and Birnstiel, M. L. (1984) 8 Heintz,N. and Roeder, R. G. (1984)Proc. Nail Acmt. Sci. U~A EMBO ]. 3, 2301-2807 81, 2713-2717 26 Bimsfiel, M. L. and Schaufele, F.J. (1988)in Struct~e and 9 LaBel, F., Sire, H. L., Roeder, il. G. and Heintz, N. (1.e~8) Function of Major and Minor Small Ribonudeoprotein Particles Genes Dev. 2, 32-39 (Birnsliel, M. L., ed.), pp. 155-182, Springer-Verlag I0 Fletcher, C., Heintz, N. and Roeder, R. G. (1988)Cell 51, 773- 27 Schanfele, F., Gilmartin, G. M., Bannwarth, W. and Birnstiel, 781 M. L. (1986)Nature323, 777-781 11 Dalton,S. and Wells, J. R. E. (1988) EMBO]. 7, 49-56 28 Soldafi, D. and Schfanlmrli,D. (1988)MoL Cell. Biol. 8, 151812 Clerc, R. G., Bucher, P., Strub, K. and Birnstiel, M. ~. (1983) 1524 Nud¢/c~ Res. U, 8641-8657 29 Gick, O., ~ e r , ~, Keller, W. and Birostiel, M. L. (1986) /3 Levine,B.J., Chodchoy,N., Marzhff,W. F. and Skoultchi,~ L EMBO]. 5, 1319-1326 (1987)Proc. Nati Acm~ Sci. USA 84, 6189-6193 30 Gick, O., ~ , A., Vasserot, A. and Birnstiel, M. L. (1987) 14 Pandey, N. B. and Marzluff, W. F. (1987) MoL Cell BioL 7, Proc. Natl Acad. Sci. USA 84, 8937-8940 4557-4559 31 Goios,W. F. and Stinski, M. F. (1986)MoL Cell Biol. 6, 420215 Ross,J., Pelz, S. W., Kobs, G. and Brewer, G. (1986)Mot. Ceil. 4213 Biol. 6, 4362~-371 32 Bozzoni, Let aL (1984)J. MoL Biol. 180, 987-1005 16 Capasso, (3., Bleecker, G. C. and He~tz, N. (1987)Ek~BO]. 6. 33 Plumb, M., Stein, J. and Stein, G. (1983)Nuc/e/cAc/dsRes. U, 1825-1831 2391-2410 17 Graves,R. A., Pandey,N. B., Chodchoy,N. and Marzluff,W. F. 34 Baumbach, L. L., Stein, G. S. and Stein, J. (1987)B/ockem/stry (1987) Cell 48, 615-626 26, 6178-6187 18 Butler, W, B. and MueI~, G. C. (1973)Biochim.Biophys. Acta. 35 Alterman, R. M. et aL (1984)MoL Cell. BIOL 4, 123-132 294, 481--498 19 Ross,J. and Kobs, G. (1986)]. MoL BioL 188, 579-593 20 Peltz, S. W. and Ross, J. (1987)MoL Cell. Biol, 7, 434.5-4356 21 Graves,R. A. and Marzluff,W. F. (1984)MoL Cell. Biol. 4, 351- D. ScMimpedi is in the lnstitut f~r Molekularbiologie I I der 357 UniversiMt Z~rich, H~ggerberg, 8093 Ziirich, Switzerland.
Evolution of the genetic code as affected by anticodon content
The nature of the genetic code, including the assignments for 61 codons to 20 amino acids, together with three termination (stop) codonso was revealed around 1966. The general conclusion was reached that this form of the genetic code was 'universal' for living organisms. This Syozo Osawa and Thomas H. Jukes conclusion changed in 1979, when it was discovered that the genetic code A new approack to unde~tandi~ @e evolution of the genetic code kas come from in human mitochondria1 differed from informatian about mitochondrial codes, directional mutation pxessure and stop codon the universal code by using AUA as a copture. The codesfor eario~ classes of organisms and organelles are characterized codon for methionine rather than for by differont patterns of anticodon content and usage. isoleucine, and UGA for tryptophan rather than as a stop codon. On the It has long been known that bacterial species differ in basis of these differences, it was proposedz that t h e mammalian mitochondrial code may represent an evolu- the GC content of their DNA (for references, see Ref. 6). tionary retrogression towards a more primitive type of Sueoka examined the amino acid distribution in total code. Furthermore, mitochondda contain a much smaller proteins of bacteria containing DNA ranging in GC number of anticodons than is present in codes of intact content between 35% and 72%. Amino acids whose organisms, indicating that the number of anticodons in codons were subsequently shown to be high in G and C, the latter may have increased during evolution and that such as Al,nine and glycine, were higher in bacteria with a ndtochondria may have reversed the process by genomic high GC content, and lower in bacteria with a low GC content. Most of the response to high and low GC levels, econemization, The proposal we review in this article had its begin- however, occurs in the silent positions of codons, which nings in findings by Yamao and co-workerss that UGA, a are usually the third positions. This response follows stop codon in the universal code, is read as tryptophan in Crick's wobble rules of codon-anficodon pairing. Crick7 Mycoplasma capricdum. T h e reassignment was deduced noted that in the pairing between the first anticodon to be the result of high AT content (75%) in Mycoplasma nudeotide and the third codon nucleotide, a certain DHA so that, during evolution, UGA stop codons were all amount of play, or 'wobble' often occurs. As a result, converted to UAA, tryptophan anticodon CCA was NNU and NNC codons always designate the same amino duplicated, and one duplicate mutated to UCA; mean- acid because G in anticodon GNN pairs with both U and C. while some UGG codons mutated to UGA, pairing with Similarly, NNA and NNG codons are, for the most part, anticodon UCA (Ref. 4; see later discussion and Fig. 4). synonymous. Therefore, under GC pressure, codons UGA was thus reassigned, without disruption, from stop NNU can be converted to NNC, and NNA to NNG, to tryptophan. These changes were concluded to result without changes in the amino acid sequence of the protein from directional mutation pressure on DNA towards AT coded by a gone. The result is that, under GC or AT predominating over GC. We next proposed5 that AT pressure, silent substitutions in codons occur more pressure in cilinted protozoa had changed stop codons readily than substitutions that produce amino acid UAA and UAG to glutamine codons by an analogous replacements, but no effect of GC/AT pressure on the code itself had been noted in intact organisms until the series of events, to be discussed below. © 1988.~ P t ~ a t i o m . Cambridge 0168-9525/88/$0Z00
TIG--July 1988, Vol. 4, no. 7
views Table 1. Wobble pairing in codon-anticodon binding Nucleoskle in first position of anticodon
Pairing with nuclaosides in third position of codon
U. unmodified U. C, A. G U. modified A, G U, 2-thiolated or its equivalent A {G) Uridine 5-oxyacetic acid U.A. G G C,U C G LysyI-Cin ~C,~U A I U.C.A A (rare) U M,.,ditications of G. such as queosine, pair with C and U; modifications of C, such as 2'-O-methyl cytidine, pair only with G. Wobble pairing between the third position of anticodons and the first position of codons sometimes occurs (see text and Ref. 13).
results of Yamao and co-workers~, described above, appeared. A more detailed description of the wobble rules is needed to clarify our explanation of evolution of the code. The rules7 state that U in the first position of anticedons pairs with A or G in the third codon position, G with C or U, and C only with G. Adenosine (A) rarely appears in the first position of anticodons in tRNA; its place is taken by I (inosine), which has the nucleotide base hypoxanthine. A specific enzyme, tRNA.hypoxanthine ribosyl transferase (HRT)s removes A from the first anticodon position and replaces it with I, which pairs with U, C and A (Ref. 7). The wobble rules of codon-anticodon pairing have expanded since 1966 as a result of various experimental findings, and are shown in Table 1. 'Family boxes' contain four codons that code for a single amino acid. Unmodified U, found in family box tRNAs in mitochondria, wobblepairs with all four bases, U, C, A and G, in the third positions of codons9'1°. Modifications of U, brought about by post-transcriptional enzymatic reactions, change its pairing properties. For example, modified U of one type (unidentified) present in tRNAs of vertebrate mitochondria, pairs with A and G in two-codon sets9.m. (Twocodon sets are pairs of codons, not in family boxes, that code for a single amino acid.) Other modifications of U, found in eubacteria and elsewhereIt, show still other pairing properties (see Table 1). There is also a modification of C, found in the fist position of one of the bacterial isoleucine anticodons, that allows the anticodon *CAU (*C = 2-1ysyl-C) to pair with AUA, isoleucinet3 (see below). Table I shows wobble pairing for nucleotides in the first anticodon position. Wobble in the third anticodon position has been describedla between glutamine ~ntiTable 2. Anticodons in 'universal' genetic code GAA Phe UAA Leu CAA Leu
GGA or IGA Ser UGA Ser CGA Ser
GUA Tyr
GCA Cys
GAG or lAG Lou UAG Leu CAG Leu
GGG or IGG Pro UGG Pro CGG Pro
GUG His UUG Gin CUG Gin
GCG or ICG A~g UCG Arg CCG Arg
GAU or IAU lie *CAU lie" CAU Met
GGU or IGU Thr UGU Thr CGU Thr
GUU Ash UUU Lys CUU Lys
GCU Ser UCU Arg CCU Arg
CCA Trp
GAC or IACVal GGC or IGC Ala GUCAsp GCC Gly UAC Val UGC Ala UUC Glu UCC Gly CAC Val CGC Ala CUC Glu CCC G,~t The universal code in eubacteria, hatobacteria and eukawotes has a maximum of 45 anticodons in any kingdom, not including differences from the universal code found in mitochondria ;n Mycopfasmaand ciliated protozoa, and some 'exceptional' ANN anticodons, eCAU is not present in eukaryotes. Except for *CAU. modifications to the first anticodon nucleotide are not indicated in Tables 2-7. and are described in Table I and in the text.
codon CUG and codons CA(} and UAG, and it is wenknown in the case of GUG as a 'second' inltiatinn codon pairing with CAU, the initiator methionlne anticodon. The known anticodons are listed in Table 2. All of theme except *CAU, were predictable from the wobble ruleg. Only one predicted anticodon, ICC for glycine, is missing, and itmay not exist.The anticedons in Table 2 have evolved from a much shorter listthrough gone duplicationand differentiation,influencedby directional mutation pressure. Different types of organisms, or organelles,containdifferentnumbers of the anticodonsin Table 2. The smallest number of anticodons so far found or predictable in any code is 22 (vertebrate mitochondria) and the largest is 45 (eukaryotes). Evolution of the genetic code To make deductions concerning the evolution of the code, we postulate that a code once existed (Table 3) containing 23 anticodons, the minimum number needed to pair with 62 codons for 20 amino acids. This 'minimal code' is almost identic,:l to the present mammalian mitochondrial code, except that in the mitochondrial cede CAU is the anticodon for methioulne, and anticodon UCU (arginine) is missing, so that its corresponding codons, AGA and AGG, have become stop codons. In the minimal code, there are eight family boxes occupied by a single anticodon, such as UAG (leucine; U unmodified). In addition to the eight familyboxes, there are 15 two-codon sets and two stop codons, UAA and UAG. Our scheme for the evolution of the genetic code is illustrated diagrammatically in Fig. 1. We propose that during evolution there have been alternating periods of GC and AT pressure, which brought about expansion of the minimal code, fist to an 'early code', and then to present codes. We shall first describe how the features characteristic of the codes of present-day kingdoms arose, and then discuss general implications. Earlyeode The firststep, under G C pressure, is to add G N N anticodons in family boxe~: afterduplicationof a tRNA gone, one duplicate is free to mutate, and G N N anticodons would have been the resultof U to G mutation in the UNN anticodon of one of the duplicates. This results in the early code, shown in Table 4, so that each box (except one) contained one GNN and one UNN anticodon, with U presumably modified to pair only with A and G. All 31 anticodons shown in this table are found in existingcodes. The early code is similarto present codes, especially the euhacterialcode, but in the earlycode: (1) U G A and U G G were both tt~tophan codons translated by anticodon UCA; (2) both A U A and A U G were codons for methioulne translatedby anticodon UAU; and (3) in the arginine four-codon box, codons C G U and C G C were translatedby anticodon GCG, and codons C G A and C G G by anticodonUCG. Figure 2 sets out the seriesof events postulated to explain how U G A became a stop coden, how A U A became an isoleucine codon, and how anticodons G C G and U C G were replacedby ICG and C C G in the argininefour-codon box during the evolution of the early code .topresent codes. The events shown in Fig. 2 illustratehow directional mutation pressures, together with the range of pairing possibilitiesunder the wobble rules, can bring about changes in codon and anticodon assignments. For example, a codon or an anticodon can disappear altogether (in
r views
TIG--]uly ]988, Vol. 4, .o. 7
Fig. 2. Schematic representation of changed codon/anticodon assignments during evolution from the early code to present codes. Corresponding codons, anticodons and their assignments are aligned vertically. ~ b , , GC pressure; --->,AT pressure; - - ~ , c,hangesnot directly related to GCIATpressure.Throughout (a)to (c),changesof anticodon UNN in the early code to CNN under GC pressurewould includea series of the following intermediate steps: UNN . . . . . "I----> UNN - - I X - - ~
I
i.___> UNN ~\\~\'~x' CNN One of the duplicated UNN anticodons was discarded as unnecessary upon disappearanceof NNA codons. In (b) and (c). codons are translated by the correspondingly colored anticodons shown below the codon column. (a) Reassignment of codon UGA from Trp to stop, GC pressure changed all UGA Trp codons in the early code to UGG, and Trp anticodon UCA to CCA, pairing only with UGG, At this point, codon UGA disappeared.ContinuedGC pressure changed so, he UAA stop codons to UGA, which has become reassignedas an additional stop codon.
(a) Codon
UGA UGG
Anticodon
UCA ~
Assignment
Trp
' uQG
u~ ~
CCA
UaA
B
Trp
Stop
Stop
(b) Reallocation of codon AUA from (b) Met to lie. GC pressure changed all AUA Met codons in the early code to AUG, and anticodon UAU to CAU, pairCodon ing only with Met codon AUG. At a later stage, codon AUA reappearedwith an assignment changed to fie. This resulted from duplication of Met tRNA, with anticodon CAU, and from mutations taking place in one of the duplicates. These mutations changed the Anticodon oAu . _ . > m ( oAu ) duplicated tRNA, so that C in the first position of the anticodon CUA became modified to pair with A instead of G, and this tRNA accepted lie instead of Met. This apparently unique modification of cytosine, *C. consists of the attachAssignment ment of lysine to the 2-positionTM. m Under subsequent AT pressure, some (Eukaryotes) lie AUC codons mutated to AUA, pairing with *CUA in the new fie tRNA. Later, in the eukaryotic lino, lie anticodon IAU emerged under AT pressure from GAU, pairing with AUA codons as well as AUU and AUC. Anticodon *CAU was removed by mutating to UAU (presumably *UAU), thus providing a redundant anticodon for AUA codons.
_m
m
/
m
;.c,u
m
(c) Change in Arg anticodons. GC pressure changed all CGA codons to CGG and CGC, and anticodon UCG to CCG, pairing only with codon CGG. Later, GC ~ressure changed to AT pressure, and anticodon GCG mutated to ACG. which was changed to ICG by HRT8. ICG pairs with CGAas well aswith CGU and CGC. Therefore. it was possible for CGA to reappear as an Arg codon, being produced by mutation of some CGC and CGG codons to CGA and CGU. Conversions of CGC and CGG to CGU codons are not indicated in the figure. Arg anticodon UCG was repol t from yeast. This was presumably formed from CCG as illustrated.
~
X3:
~c)
=B
m
m
m m
m BB
m
m
m
co°o°
Anticodon
m
"=
m
== m
i
m
m
m..-~ACa--. ~
m ~
~
m
( ~> cca --~ ucG) - ~oCCG
Assignment
carries 15 CNN anticodons. These CNN anticodons are needed to pair with NNG codons, because in eukaryotes modified U in UNN anticodons presumably pairs only very weakly, or not at all, with G in NNG codons. INN anticodons would produce intolerable ambiguity in pairs of two-codon sets by simultaneously recognizing codons for different amino acids. Hence, the eukaryotic
H
I
m
m
(Eukaryotes)
code contains INN anticodons only in family boxes and never in two-codon sets (Table 5). Why do *UNN anticodons, in which *U is modified by thiohtionu, accompany INN in eight cases, presumably furnishing redundant pairing for NNA? Guthrie and Abelson 14 proposed that in yeast (a eukaryote), INN may not pair efficiently with NNA, so that *UNN anticodons are
review----s
TIG--]uly 1988, Vol. 4, no. 7
present in Thermus therm@hilus (69% GC), ten in E. coli (50% GC), six inBadllus subtilis (43% UAA+---__:__:¢ GC) and only three in MycoUAG+ plasma capricolum (25% GC)2°. Codon BB mm Eubacteria, including Mycop/asma, use INN anticodon for BIm mm ovly one amino acid, arginine (Table 6), probably because eubacterial HRT seems to be specific for ACG. ...B .-> UUA BB Anticodon As described above, AT pres.... ,->UUA--,-g sure in Myc@iasma led to the capture of UGA by tryptophan (Fig. 4). This is supported by "114 the tandem arrangement of the Assignment +Stop +Stop tRNA genes for UCA and CCA in BB a single operon in Mycoplasma, and by the close similarity of the •sequences of these two tRNAs, Rg. 3. Stop codon capture in ciliated protozoa. Between (a) and (b}. the gene for Gin tRNA UUG duplicated and one duplicate mutated under indicating a recent duplication AT pressureto Gin tRNA UUA. Some Gin codons CAA and CAG mutated to UAA and UAG; thus event. Anticodon CCA is no these stop codons were captured by Gin by pairing with anticodon UUA. Since anticodon UUA longer needed, and is apparently )redominantlypairs with codon UAAand only very poorly with UAG, Gin tRNA UUA duplicated disappearing. Reduction of GNN between (c) and (d). and one duplicate mutated to Gin tRNA CUA pairingexclusivelywith codon as well as CNN anticodons has UAG. -*. AT pressure;- -->. changes not directly related to AT pressure. occurred in Myc~lasma2x as in Colors indicate corresponding codons and anticodons. mitochondria. All family boxes, except for those of arginine and threonlne, contain a single anticodon, UNN (U unmodineeded for this function. If the pairing of INN were restricted in this way, it seems u~ikely that INN would fied). *UGU is replaced by unmodified UGU, so that all have been so rigorously excluded from two-codon sets. the four codons in the threonine bex can be translated by Differences such as those described between eukaryotes two anticodons UGU and AGU in the absence of antiand eubacterla may be due to more complex eukaryotic codon GNN (Table 6). translation systems. Mitochondrial and chloroplast codes The nu'tochondrialcodes have evolved from the eubacStop codoncapture in ciliatedprotozoa In ciliated protozoa (70-75% AT), codons UAA and terial code by elimination of many tRNA genes: in UAG are no longer stop codons but are used for consequence, there are only 22 anticodons in the verteglutamine, in addition to the universal glutamine codons brate mitochondrial code (Table 6). To reduce the CAA and CA(;ts'ls. UGA is thus the sole stop codun. anticodon list, all CNN anticodons had to be discarded Possibly, the polypeptide-releasing factor in ciliated except the obligate CAU, which pairs with methiunine protozoa is specific for UGA. How were the other two codons AUA and AUG. Each familybox contains only one stop codons reassigned? Figure 3 illustrates the pro- anticodon UNN (U unmodified),and two-codon NNR sets posed sequence of events, which involved ~luplicationof pair with *U]~ anticodons9'I°. The vertebrate mitotRNA genes (leaving one gene free to mutate) and chondrial code has become a minimal code (Table 3) mutation of both codons and anticodons under AT under the influence of AT pressure and genomic pressure. The scheme implies that there should be a close relationship between Gin tRNAuuA and Gin tRNAcu^ (see Fig. 3c and d), and indeed their sequences (a) (b) (c) differ by only f o ~ nucleotides tg, indicating recent divergence from a common origin. Gin tRNAuu¢ is not quite as UGA+ closely related to either of these two tRNAs as they are Codon to each other, but is still similar enough for the diverBB gence of all these tRNAs to have been well within the time of existence of the e,karyotes.
(a)
(b)
(c)
(d)
D
Anticodon
E ubacterialcode
....
I - - . - ~
.....
.~z, (CCA)
Bacteria evolved into a series of families differing in genomic GC contents. In most species, UNN (U mod+Stop +Stop ified) anticodons are used as explained above under Assignment ~ wobble rules. CNN anticodons translate only NNG codons, and their number depends on the GC content of DNA in bacterial families, since increased GC content Fig. 4. Stop codoP capture in Mycoplasma. ->. AT pressure;- - ~. changesnot directly relatedto AT pressure. also increased the numbers of NNG codons. Thirteen For explanations,see the text. lO~ CNN anticodons have been found or are presumed to be
TIG--July 1988, Vol. 4, no. 7
iews
Table 4. Hypothetical eady code and tRNA anticodons Secondcodon base
Phe Leu
GAA UAA
Ser Ser
GGA UGA
Tyr Stop
GUA -
Cys Trp
GCA UCA
Third codon base U. C A, G
C
Leu Leu
GAG UAG
Pro Pro
GGG UGG
His Gin
GUG UUG
Arg Arg
GCG UCG
U, C A, G
A
lie Met
GAU UAU
Thr Thr
GGU UGU
Ash Lys
GUU UUU
Ser Arg
GCU UCU
U, C A, G
G
Val Val
GAC UAC
Ala Ala
GGC UGC
Asp Glu
GUC UUC
Gly Gly
GCC UCC
U, C A, G
First codon base U
U
C
A
G
'=From Ref. 2.
econornization. AT pressure has led to the loss of stop codon UGA, converted to UAA, and to the assignment of UGA to tryptophan as in Mycoplasma. However, switching of AT pressure to GC pressure in the mitocbondrial genome may occur, as is evident during the evolution of animals, where the GC contents of silent codon positions increase from 6% in flies to 50% in humans. There is also evidence that genomic GC content increased at one stage during the evolution of plant mitochondria; thus, in mitochondria of green plants (maize, watermelon and evening primrose), CGG does not code for arginine, and codons UGG and CGG both code for tryptophan. The mitochonddal code is less stable than the other codes; perhaps their smaller genome enables mitocbondria to tolerate some changes in the code that would not be acceptable to intact organisms. There are 31 anticodons in the chloroplast code (Table 6). One of these, proline anticodon GGG, is present in a pseudogene, so the only anticodon for proline is presumably UGG22. Most of the CNN anticodons in the ancestral eubacterial code have disappeared during evolution of the chloroplast code, presumably under AT pressure. Fourway wobble is used in three family boxes in the chloroplast code, which has apparently evolved from an ancestral eubacterial code along a pathw~y similar to that of the vertebrate mitochond~al code, but the evolutionary process has not proceeded as far.
1~
respectively. The methanococcal code is incomplete: presumably it also contains CAU methionine and CCA tryptophan, both of which are obligate anticodons. The halobacterial/methanococcal code (Table 7) contains no INN anticodons.
'CNN Rule" Thirteen CNN codons have been found in halobacte~=~ (67% GC) and only one in methanobacteria (31% GC), although one more (CCU) may exist in halobacteria and two more (CAU and CCA, Table 7) may be p;esumed to exist in methanobacterla. Apparently, t,~der AT pressure most CNN anticodons have disappeared from the chloroplast and mitochondrial codes (Tables 6, 7). A 'CNN rule' may be proposed, sta~ng that, in prokaryotes and in organelles derived from prokaryotes, CNN anticodons tend to increase under GC pressure and decrease under AT pressure 2°.
Discussion
GC/AT pressure affects the base composition of codons and anticodons in evolution. In addition, GC pressure increases the number of anticodons by adding GNN and CNN anticodons. Such an increase would tend to improve fidelity of translation, which would be evolutionarily advantageous. AT pressure has the opposite tez~dency in eliminating GNN and CNN anticodons, thus leading to genomic economization, as seen in the vertebrate mitochondrial and chloroplast codes and the code of Halobacterial/methanococcalcode Mycoplas~na. The halobacterial/~etJ~anococcal groups emerged We propose that evolution of the code has been after the eubacterial ~.eparation (see ]~ig. 1), and have a affected by directional mutation pressure (GC/AT presscommon ancestor with eukaryotes. The halobacterial and ure) which changes the base composition of anticodons methanococcal groups seem to share a common ancestor without deleterious consequences. Figure I summarizes and have evolved under GC and AT pressure as shown by various steps of evolution of the genetic code. In the theE high (67%) and low (31%) genomic GC contents, simplest code for 20 amino acids (minimal code), codons in family boxes are read by a single anticodon (UNN) b y Table 5. Anticodons in eukaryotic code four-way wobble, and codons in two-codon sets are read GAA Phe IGA Ser GUATyr GCACys by GNN and *UNN anticodons. GC pressure led to (UAA Leu} UGASer addition of GNN anticodons and formed the early code, so CAA Leu CGA Ser CCA Trp that codons in family boxes were translated by GNN and lAG Leu IGG Pro GUG His ICG Arg *UNN anticodons, by two-way wobble. Continuing GC UAG Leu UGGPro UUG Gin UCGArg pressure, followed by switching of GC pressure to AT CAG Leu CGG Pro CUG Gin CCGArg pressure, resulted in reassignment of certain codons IAU lie IGUThr GUU Asn GCU Ser UGA to a stop codon under GC pressure and AUA to an UAU lie (UGUThr) UUU Lys UCUArg isoleucine codon under AT pressure. The replacement of CAU Met CGUThr CUU Lys CCUArg arginine family box anticodons GCG and UCG by ICG and IAC Val IGCAla GUCAsp GCCGly CCG also took place. These changes established the UACVel UGCAla UUC Glu UCCGly CACVal (CGCAla) CUC Glu CCCGly progenitor of the present genetic codes. The eukaryotic Anticodons in parentheses have not been described but are presumed to code was formed under AT pressure, resulting in the exist.UUAand CUAare anticodonsfor glutaminein ciliatedpmtozc~only. acquisition of seven more INN avficodons in place of
review
TIG--July 1988, Vol. 4, no. 7 Tebla a. Anticodons in eubacterial code. vertebrate mitochondrial cede and chloroplast code M, C M, C C
GAA Phe UAA Leu CAA Leu
C M, C
M, C
GAG Leu UAG Leu CAG Leu
(C) M. C
GAU lie *CAU lie CAU Met
C M, C
M.C C M, C
GGA Ser UGA Ser CGA Ser
M, C
GGGPro UGG Pro CGG Pro
M. C M. C
GGUThP UGU Thr
M,C M. C
GUA'ryr
GUG His UUG Gin CUGGin GUU ASh UUU Lys CUU Lys
M, C
GCACys
C
CCATrpa
C
ICGArg"
C
CCGArg
M,C C
GCU Set UCUAig CCUArg
C M.C
GACVel GGCAle M, C GUCAsp C GCCGly UACVal M,C UGCNs M,C UUCGlu M.C UCCGly CAC Vat CUC GIu CCCGly The mitochondrialand ©hloreplestcodes are evidently derived from the eubactedalcode; M, vertebrate mitachondria; C. chloroplast.ICG (from ACG)and *CAU (from CAU)are inferred to exist in chloroplasts.(C)The chloroplastgene containinganticedon GGGis apparentlyinactive=1.* C, modified C, probablylysidine! 2. • UCA is the sole vertebrate miteshondds|snticedonfor tryptophanand UCGfor arginine.UCA is the anticodonfor t~otophan in Mycoplasma. bAGU is an anticodonfor threonine in Mycoplasma. Table 7. Known and predicted anticodons in halobact,~rial {H) and Methanococcus (M) codes H. M H(M) H
GAA Phe UAA Leu CAA Leu
H H(M) H
GGA Ser UGA Ser CGA Ser
H, M
GUATyr
H(M)
GCACys
H(M)
CCATrp
H H, M H
GAG Leu UAG Leu CAG Leu
H H, M H
GGG Pro UGG Pro CGG Pro
H. M H, M H
GUG His UUG Gin CUG Gin
H H(M) H
GCGArg UCGArg CCGArg
H(M) H, M H(M)
GAU lie *CAU lie CAU Met
H. M |H)M H
GGUThr UGUThr CGUThr
H, M H. M H
GUU Asn UUU Lys CUU Lvs
H(M) (H)M (H)
GCU Ser UCU Arg CCUArg
H GACVal H GGCNa H. M GUCAsp H GCCGly H. M UACVal H. M UGCAla H. M UUC Glu H{M) UCC Gly H CACVal H CGCAla H CUC Glu H CCCGly Halobacterialcode contains 41 known anticedonsand four more anticadonscan be predicted. Mefhanococcuscode contains 16 known anticodonsand nine more anticedonscan be predicted, if Methanococcususes UNN as sole antisodonsin family boxes {exceptthe thrconine box). • C. Probablymodified C. Predictionsforanticadons are in parentheses.
GNN. Two stop codons, UAA and UAG, were captured by 81utamine in ciliated protozoa under strong AT pressure. The eubacterial and halobacterial/methanococcal codes differentiated in various directions, depending on GC or AT pressure, leading to codes with varying numbers of CNN anticodons. Strong AT pressure in an extreme case occurred in the Mycop/asma line, so that stop codon UGA was converted to UAA, and UGA was captured for tryptophan by the reappearance of anticodon UCA. In Mycoplasma, as in mitochondria, CNN anticodons and also GNN anticodons were removed from family boxes so that codons in family boxes are mostly translated by UNN anticodons. Two offshoots from the eobacterial code led to the mitochondrial and chloroplast codes. The vertebrate mitochondrial code reverted, under AT pressure and genomic economization, essentially to the minimal code. The chloroplast code is assumed to be in the process of approaching the mitochondrial code. Reassignments of codons occur by 'codon capture'. During evolution, a codon may disappear from coding sequences, and the tRNA carrying the anticodon for its translation may al~o disappear. At a later period of evolution, the codon reappears together with an anticodon that will translate it. This time, the tRNA containing the anticodon may be assigned to the same amino acid (e.g. anticodon UCA for UGA tryptophaD codon in Myc@lasma or mitochondria), or a codon may be reassigned to a different amino acid from that of the tRNA that
disappeared (e.g. codon AUA for isoleucine, or CGG for tryptophan in green plant mitochondria). Reassignment of stop codons is made possible by confining chain termination to one or two of the stop codons, thus setting free the others for reassignment by an amino acid when a corresponding tRNA appears, with an anticodon pairing with the currently unassigned codon, as in the case of UGA captured by tryptophan in Mycop/asma and mitochondria, or UAA and UAG by giutamine in ciliated protozoa. These events, as discussed in the text, can best be explained by directional mutation pressure. GC and AT pressures presumably arise as a result of mutations in the DNA polymerase system. A tendency towards AT was described for bacteriophage T4 (Ref. 23). In contrast, Treffer's mutator gone in E. ¢oli tends" to produce an increase in transversions, AT to CG24, and hence exerts directional mutational pressure in the direction of GC. In multicelinlar organisms, the pattern is more complex. Filipski points out that chromosomes in vertebrates show regional differences in GC composition and he proposes that these are caused by_ 'different mutational bias of • and [~DNA polymerases'm as shown by/n vitro studies of these pulymerases. References 1 Barreik B. G., Banlder, A. T. and Dr,~uin,J. (1979)Nature 282, 189-194 2 Jukes, T. H. (1983)f. MoL EwL 19, 219-225 3 Yamao,F. ct al. (1985)Proc. Nat/Ace&S~. USA82, 2306-2309 4 Jokes, T. H. (1985)]. MoL EvoL 22, 361-362
rviews
T I G - - J u l y 1988, Vol. 4, no. 7
17 Caron,F. and Meyer, E. (1985)Nature 314, 185--188 S Jukes, T. H., Osawa,S. and Muto, A. (1987)Nature 325, 668 18 Preer, J. R., Jr, INeer, L. B., Rud.'nan,B. M. and Barnett, A. J. 6 Suecka,N. (1962)Proc. Natl Ac~l. Sd. USA 48, 582-592 (1985)Nature 314, 188-190 7 Crick, F. H. C. (1966)]. Mol. Biol. 19, 548--555 8 Elliott, M. S. and Trewyn, R. M. (1984)]. Biol. Chem. 259, 19 Hanyu, N., Kuchino,Y. and Nishimura,S. (1986) EMBO ]. 5, 1307-1311 2407-2418 9 Heckman,J. E. et aL (1980)Proc. NatlAc~l. Sd. USA 77, 3159- 20 Osawa, S. et ai. Nature (in press) 21 Andachi,Y., Muto,A., Yamao,F. andOsawa,S. (1987)Proc. Jim. 3163 A¢~ 63,,353-356 10 Barren, B. G. et al. (1980)Pvoc. Natl Acmt Sd. USA 77, 316422 Umesono,K. and Ozeki, H. (1937) T r ~ G~t. 3, 281-287 3166 11 Yokoyama,S. et al. (198~i)Proc. Nag Acud. Sd. USA 82, 4985- 23 Speyer,J. (1965)B~chem.B~hys. Res. Commun. 21, 6-10 24 Cox,E. C. andYanofsky,C. (1967)Proc. NatlAcud. Sd. USA 58, 4909 1895-1902 12 Muramatsu,T. et al. ]. Biol. Chem. (in press) 13 Lin,J. P., Aker, M., Sitney, K. C. and Mortimer, R. K. (1986) 25 Filipsld,J. (1987)FEB$ Lett. 217, 184--186 Gt~se49, 383-388 14 Guthrie, C. and Abelson,J. (1982) The Molecular Biolo~ of the Yeast $~cl~ron~es, Metabolism and Gene Exl~ession, pp. 487S. Osawa is in theNagoya UnivereiO, Laboratory of Molecular 528, ColdSpringHarborLaboratory Genetics, Department of Biology, Nagoya 464, Jepan. T. H. 15 Horowitz,S. and Gorovsky,M. A. (1985)Proc. Nag Acad. Sd. Juhes is in theSpace Sciences Laboratory, Univers~ of CaliforUSA 82, 2452-2455 nia, Berkeley, CA 94720, USA. 16 Helftenbein,E. (1985)NuddcAddsRes. 13, 415--433
The role of fibronectins in embryonic cell migrations
During embryogenesis, morphogeneric processes involve interactions among cells, and between cells and their immediate environment, the extraceilular matrix. The components of the extracellular matrix and Sylvie Dufour, Jean-Loup Duband, their cell-surface receptors that are Alberto R. Kornblihtt and Jean Paul Thiery involved in adhesive systems have been identified and extensively studied in many laboratories. These Fibronectins (FNs) are dimeric glycoproteins of the extraceUular milieu which participatein a va~ie(yofcellulareventssuch as attachment, spreadingand motili@of adhesive processes, encountered cells. These functions result from the presence along the molscule of several distinct during embryogenesis, malignancy, sites for binding other exWacellular macromolscules, and the existence of epedflc hemostasis, wound healing, host dereceptors on the su~ace of cells. This reviewfoc~ea on recent discoveries concerning fense and maintenance of connective the structural and func6onal pveperlies of FNs and theft r@eptors. These findings tissue integrity, involve the particiare discussed with respect to the role of FNs in the control of embryonic celt pation of giycoproteins with a subunlt miFations. molecular weight of about 250 kDa. The glycoproteins, which have been isolated, purified, and were originally named according to the sequence Arg-Gly-Asp--Ser (RGDS; reviewed in their sources or their biological activities (reviewed in Ref. 3). Ref. 1), form a family now known as fihronectins (FNs). FNs are directly involved in cell attachment and spread- Structure o f @e F N gene ing onto a substratum, as well as in cell motility. Despite the existence of multiple forms of FN, there is only one copy of the FN gene in the human and rat Structure of fibronectins a n d t h e i r gene genomes. This gone is about 75 kb long, contains around Individual FN molecules are dimers of similar but not 50 exons, and is transcribed from a single promoter into a identical polypeptides. Two major forms of FN have been single primary transcript which gives rise to the different distinguished: a soluble dimeric form that is synthesized FN mRNAs through a complicated pattern of alternative by hepatocytes and then secreted into the bloodstream splicings occurring in three separate regions. In humans, (plasma FN), and a dimeric or cross-linked multimeric the gone has been assigned to chromosome 2 (Ref. 4 and form, made by many cell types including fibroblasts and A. R. Kornbllhtt et al., unpublished). epithelial cells, which after secretion can be deposited as Isolation and sequencing of several cDNA and genomic long insoluble fibrils in the extracellular matrix (cellular cloness'e'7-x° has allowed the sequence of the longest FN). Primary stuctural differences exist, not only possible FN mRNA molecule to be deduced. It is between plasma and cellnlar FNs, but also among the composed of 8418 nucleotides, 7431 of which code for the subunits of each type. Despite their heterogeneity, all 2477 amino acids of the primary translation product. A 31 FN subunits show a common modular organization in amino acid segment comprising the signal and protheir primary sequence, which contains a series of peptides precedes the amino terminus of the mature homologous repeating units, the so-called homology polypeptide. The same pre- and pro-coding sequences types I, II and llI (about 40, 60 and 90 amino acids in were found in all mRNAs encoding the different subunlts length, respectively2). In addition to these ~trnctural of plasma and cellular FNs s. This rules out control at the similarities, the various forms of FN subunits share membrane translocation step as a mechanism to explain common functional properties that correspond to the differentiallocalization and behavior of the two forms. domains for binding other extracellular matrix macromolMost of the type I and the only type II units are ecules and the cell surface. Each subunit hears tvi0 encoded by a single exon each while 14 of the 17 type III binding sites for heparin, two for fibrin, one for collagens, units are coded for by two exons each. The other three and at least one site for binding to the cell surface (Fig. 1). type III repeats are encoded by a longer (fused?) single The cell-binding site common to all FN forms involves exon each. This strict correlation between the intron/ 19~.ElsevierPublications,Cambridge01~- 952~881502.00