The recent origins of introns Jeffrey D. Palmer and John M. Logsdon, Jr. Indiana University, Bloomington, Indiana, USA Accumulating evidence that introns are highly restricted in their phylogenetic distribution strongly supports the view that introns were inserted late in eukaryotic evolution into preformed genes and, hence, that exon-shuffling played no role in the assembly of primordial genes. Potential mechanisms of intron insertion and the possible evolution of nuclear introns and their splicing machinery from self-splicing group II introns are also discussed. Current Opinion in Genetics and Development 1991, 1:470-477
Introduction Ever since their unexpected discovery in 1977, introns have provoked intense debate about the timing, mechanism and significance of their origins. This debate has focused primarily on the spliceosome-dependent, premRNA introns found exclusively in nuclear genes and animal viruses (herein termed 'nuclear' introns), although the arguments have been extended to group I and group 1I self-splicing introns. (The protein-spliced tRNA and rRNA introns of archaebacteria and nuclei have generally not figured in these arguments and will be not discussed here). The dominant, 'introns-early' view is that the genome of the progenote was 'replete with introns' [1]. According to this view, introns were subsequently extinguished from bacterial genomes due to growth pressures favoring a streamlined genome, but suffered less dramatic, differential loss from eukaryotic genomes. Furthermore, these primordial introns are hypothesized to have played a crucial role in the early assembly of proteins by accelerating the rate of exon-shuffling [1,2,3°°,4--]. In contrast, the 'introns-late' view holds that the progenote contained few, if any, introns and hence that exon-shutfling played no role in the assembly of early genes [5°*,6°,7°*]. Instead, introns are posited to have been inserted late in evolution into pre-assembled genes of certain nuclear, organellar, and viral lineages. In this article we review recent findings that, in our opinion, provide strong support for the introns-late view. Our perspective is primarily phylogenetic: the current revolution of molecular phylogenetics paints a picture of organismal relationships which, when overlaid with data on intron distribution, makes increasingly untenable the view that the progenote contained many introns.
Restricted phylogenetic distribution of introns in nuclear genes Molecular sequence data have established that protistan lineages represent the great diversity and depth of eukaryotic evolution, whereas the well-studied, complex multicellular eukaryotes - - animals, fungi and plants (herein termed AFP) - - appear to comprise but a single late-arising eukaryotic lineage (Fig. 1; [8°]). A literature survey (Fig. 1; JM Logsdon Jr and JD Palmer, unpublished data) highlights a complete absence of nuclear introns from all examined genes of the earliest protistan lineages (see legend of figure 1 for the extent of gene sampling in protists). This finding strongly suggests that nuclear introns are relatively recently derived features of eukaryotes. In addition this survey suggests that introns have increased markedly in number in the AFP lineages relative to those (late) protistan lineages that do contain introns. To explain the intron distribution of figure 1, the introns-early view would have to postulate the parallel loss of tens of thousands of introns from many different protistan lineages, including complete intron extinction from the several earliest lineages. Analysis of individual genes that have been sequenced in a diversity of taxa buttresses this notion of the relative novelty and late explosion of nuclear introns. For example, Dibb and Newman [9] showed that only a few of the dozens of introns in genes for actin and ~and IB-tubulin are shared widely by AFP and even latearising protists. They concluded that most intron distributions could be explained by a single recent intron gain without subsequent loss. Additional compelling examples involve genes lacking introns in some, ff not all protists, but containing numerous introns in AFP. These include genes for glyceraldehyde-3-phosphate dehydro-
Abbreviations AAT--aspartate aminotransferase; AFP--animals, fungi, plants; GAPDH--glyceraldehyde-3-phosphate dehydrogenase; MDl-I~malate dehydrogenase; SLRNAs--spliced leader RNAs; snRNAs---small nuclear RNAs.
470
© Current Science Ltd ISSN 0959-437X
The recent origins of introns Palmer and Logsdon
Animals Fungi Plants
Ciliates
Eukaryotes "
Plasmodium Dictyostelium Entamoeba Naegleria
_[-[-
Trypanosomes
Trichomonas Vairimorpha Giardia Archaebacteria
Thermotoga Green non-sulfurs Deinococci 5 Phyla 8 purples 13 and y purples
[
Agrobacterium Nuclear encoded [ Mitochondrial encoded Gram positives
Pseudoanabaena Gloebacter Anacystis Oscillatoria Spirulina Microcoleus
Eubacteria
Nuclear encoded Chloroplast encoded chloroplas~
genase (GAPDH) (MW Smith and R Doolittle, personal communication) [ 10°°,11.], RNA polymerase II large subunit [12o,13,14], and ribosomal protein S14 [15o]. The two most celebrated cases of supposedly ancient introns are the two globin introns [1] and four triosephosphate isomerase introns [2] shared by plants and animals. Viewed from the 'top-down' pbylogenetic perspective of many molecular biologists, these shared introns are indeed ancient. However, viewed from the more appropriate 'bottom-up' view of the molecular evolutionist (Fig. 1), it is entirely possible that these introns are relatively recent acquisitions within the eukaryotic kingdom (the only sequenced homolog of these genes from protists lacks introns). More cogently, these examples of shared nuclear introns between even plants and animals are the exception, rather than the rule. Only a small percentage of the many introns found in AFP genes for RNA polymerase II [12°], GAPDH [10.o], actin [9], and tubu-
[~
I
Fig. 1. Sporadic distribution of nuclear introns on the tree of life. The cladogram is based primarily on rRNA gene sequences of eukaryotes [8°] and eubacteria [61,62], but is rooted using sequences of duplicated protein genes ([26°°] and references therein). The width of the shaded boxes in the right column indicates the approximate number of nuclear-type pre-mRNA introns per kb of coding sequence (JM Logsdon Jr and JD Palmer, unpublished data), with animals containing on average 6 introns per kb of coding sequence and Dictyostelium 1 intron per kb. The animal estimate is for vertebrates only, and the fungal estimate excludes yeast. Physarum contains introns, but has been excluded from the tree because its phylogenetic position is unclear. The absence of a shaded box means that no spliceosomal-dependent introns have been found in that organism. In some cases the data for protists are based on relatively few genes (Giardia, 8 genes; Trichomonas, Naegleria, and Entamoeba, 1-2 genes); however, some of these lineages have been moderately well sampled (trypanosomes, approximately 40 genes; Dictyostelium and Plasmodium, approximately 30 gene each; ciliates, approximately 15 genes).
lin [9] are shared by members of at least two of the three AFP lineages. Most dramatically, four recent studies provide examples of genes that have numerous introns in both plants and animals, with none of these intron positions being shared between the two groups [10-19°]. Thus, it appears that most of the hundreds of thousands of introns present in animal and plant nuclear genes were acquired after the relatively late divergence of these two lineages from a common ancestor. Following from this conclusion, there are now many examples where phylogenetic reasoning leads to the inference [5"] that introns have been inserted into genes and gene families within specific lineages of animals, the best studied group with respect to intron content and distribution. Some illuminating examples involve genes for serine proteases [20], laminins [21..], myosin heavy chains [22], vitellogenins [23], and calcium-binding proteins [24.°].
4,71
472
Gene organization and evolution Introns in nuclear genes for organelle proteins - - a key to the puzzle A critical test [10°.,25] of the introns-early/introns-late debate is the distribution of introns in a subset of nuclear genes that encode organellar proteins, that is, those nuclear genes that are of indisputable eubacterial origin (their location in the nucleus reflecting postendosymbiotic migration of organellar genes to the nucleus) and which also have eukaryotic counterparts present in the nucleus. These 'eubacterial' nuclear genes possess, in the lineage-dependent fashion described above, just as many spliceosome-dependent introns as nuclear genes of eukaryotic/archaebacterial origin (JM Logsdon Jr and JD Palmer, unpublished data). For three pairs of housekeeping genes, encoding GAPDH, malate dehydrogenase (MDH), and aspartate aminotransferase (AAT), the claim has been made [10-.,25,26-,,27] that shared intron positions between nuclear genes of eubacterial and eukaryotic origin reflect the common descent of primordial introns present in the progenote. However, scrutiny of this claim reveals little or no foundation for these putative positional homologies. Two of the eight introns present in each of the genes for mouse mitochondrial and cytoplasmic MDH are claimed to be present in the same locations [26-.]. However, the first of these 'shared' introns is located in the middle of a 21-22 amino acid stretch of complete dissimilarity between the two genes that includes a single amino acid gap. The second 'shared' intron is simply not present at 'identical' positions, being located before a conserved lysine for the mitochondrial gene and after it for the cytoplasmic one. Only two of the four putatively shared introns in genes for cytoplasmic and chloroplast GAPDH are actually present in identical positions, intron sliding of between three and eight nucleotides being invoked to explain small differences in intron position [10°o,25]. We reject the invocation of intron sliding as wholly arbitrary for a gene of only 552 codons that possesses a total of over 30 known intron positions. Five intron positions are indeed identical between genes for vertebrate cytoplasmic and mitochondrial AAT [26°°,27]. However, the high sequence similarly between the two AAT genes relative to a eubacterial gene [26 oo] suggests that the mitochondrial gene is not of eubacterial origin, as claimed [27], but rather represents a duplication event that occurred within eukaryotic evolution. Overall then, only two of the 11 cases of putatively shared intron positions between eubacterial and eukaryotic nuclear genes stand up to scrutiny. How correct is the claim [10-°,25], that the two unambiguously shared intron positions between genes for chloroplast and cytoplasmic GAPDH actually represent positional homology, that is, introns present in the progenote? Because chloroplasts emerged so late in eubacterial evolution (approximately a billion years ago; Fig. 1), this hypothesis would force one to postulate
retention of these introns for two-thirds of eubacterial evolution, as well as their parallel and independent loss in many separate eubacterial lineages (no introns have been found in the approximately 50 eubacterial GAPDH genes sequenced thus far (H Ochman, personal communication) [28 °, 29 ° ] ). The mitochondrion also arose relatively late in eubacterial evolution, and hence for both organelles the introns-early view would postulate that many of the estimated thousands of introns in nuclear genes of eubacterial origin survived for most of eubacterial evolution, only to be extinguished recently and completely in many separate and well established lineages. We propose to the contrary that eubacteria, including those that gave rise to the organelles, had no nuclear-type introns, and that the two cases of shared intron positions in GAPDH genes represent the parallel insertion of different introns. Consistent with this idea, each of the two shared cytoplasmic GAPDH introns is present in only a single lineage of animals [10..]. In other words, these putatively primordial introns are actually highly restricted in distribution on both the eubacterial and eukaryotic sides of the tree of life (Fig. 1). The introns-late arguments made in this and the preceding section are based largely on the topology of the 'treeof-life' cladogram shown in figure 1. As an hypothesis of phylogenetic relationships based on current molecular data, this cladogram will undoubtedly undergo some revision as further sequences are described. However, the terminal and widely separated positions of the three nuclear intron-containing lineages, late-arising eukaryotes, nuclear genes of mitochondrial origin, and nuclear genes of chloroplast origin (Fig. 1), that are central to the focus of this paper seem firmly established and unlikely to change significantly. The chances are essentially nil that the tree will ever be revolutionized to the extent that any of these three lineages is moved to its base, that is, to a position more consistent with the introns-early view than the introns-late view advocated herein.
Group I and II introns - - mostly late arrivals Group II introns are known only in mitochondrial and chloroplast genomes, and within these, their phytogenetic distribution is restricted. For mitochondria, these introns are known only in fungi and plants, two late-arising lineages, and even within these groups their distribution is highly idiosyncratic [30,,31o]. The same pattern also holds for chloroplast group II introns, which are known only in green algae and their descendants, and which are also highly lineage-specific [32.,33°]. The recent discovery [34 °] of a group II twintron - - one group II intron inserted within another - - lends further support to the idea that introns are readily propagated genetic elements. Although group I introns are present in a greater variety of organisms and genomes than group II introns and nuclear introns, they too appear to be of recent vintage, with the singular exception noted below. Like group II introns, the group I introns of organelles have a restricted
The recent origins of introns Palmer and Logsdon 473 and idiosyncratic distribution [30",31",33",35]. Group I
introns have been found in nuclear rRNA genes of Tetralonnena and Physarum, with the introns of Tetralo* mena having been shown to result from several independent insertions [36]. A few bacteriophage group I introns have been found; their restricted distribution, close similarity to mitochondrial introns, and evident genetic mobility all suggest that they are evolutionarily mobile elements of recent insertion [35,37"]. A group I intron is the singular exception to the rule promulgated in this and the preceding sections that all introns are relatively late arrivals within genes. A group I intron of evident positional and sequence homology is present in tRNA-Leu genes of most chloroplast and all cyanobacterial lineages examined [38",39"] and is sporadically present in other eubacterial phyla (MG Kuhsel and JD Palmer, unpublished data). This most ancient intron is thus at least 2-3.5 billion years old (the age of cyanobacteria) and possibly traces back to the progenote and even the 'RNA World'. The presence of this exceptional intron in many eubacteria shows that introns can and do survive in the supposedly intensely streamlined genomes of bacteria. If bacteria once had many introns but lost most due to streamlining, one would expect a different survivorship pattern than that observed, that is, a moderate number of different bacterial introns, each restricted to one or a few limited lineages, rather than this single widely distributed intron.
Exon s h u f f l i n g - late but not early A central tenet of the 'introns-early' school is that introns played a critical role in the assembly of early proteins from small modules by vastly accelerating the rate of recombinational fusion and exchange of protein domains [2,3",4",40]. Applying this tenet as a fundamental assumption, Gilbert's group [3"] recently estimated the size of the hypothetical exon universe needed to account for the extant diversity of proteins. Rebuttals by Patthy [7"'] and Doolittle [41..] highlight a number of flaws in this exercise in numerology. These include the assumption that early protein genes were assembled by intronmediated recombination, the assumption that presentday intron positions mark the boundaries of primordial exons, and the criteria chosen to ascertain exon homology. Patthy [7"] and Doolittle [41--] argue that the end result is that Dorit et al. [3"] fail to identify most of the unquestionable cases of (modern) exon-shuffling, while at the same time identifying several highly questionable cases of putative exon homology. Each of the several dozen clear-cut cases of exon-shuffling identified to date involves nuclear genes in animals, almost exclusively vertebrates [4.,,42--]. Attempts to correlate exons with domains of protein structure and function have become increasingly tenuous [40,42--], both because such correlation is not supported by newly described genes (for example, see [17,,21..] ) and because
previously correlated exons can become split by introns in newly sequenced forms of a gene [10..]. In summary, although exon-shuffling has been of some importance in vertebrate gene evolution, there is little or no basis for ascribing a role to it in the assembly of primordial proteins.
Mechanisms of intron insertion How do introns become inserted into preformed genes? Recent demonstrations of reverse self-splicing in vitro [43",44",45] suggest a ready mechanism for the insertion of group I and group II introns. Importantly, this insertional reverse splicing seems to have only a limited target site sequence requirement, ensuring that a large number of potential insertion sites exist within genes. An intron that is added to an RNA by reversal of self-splicing could subsequently be incorporated into the genome by reverse transcription followed by recombination, analogous to mechanisms proposed for intron removal from genes and for processed pseudogene formation. This insertion mechanism would, by definition, create a 'perfect intron', one that would not be deleterious because it would be precisely excised from the gene. Although there is no direct in vivo evidence of this process for introns, it is exciting that RNA editing in kinetoplasts is likely to proceed via a transesterification mechanism analogous (homologous?) to reversal of self-splicing [46",47"]. This similarity of RNA editing to splicing has led Cech [46-] to speculate that the mono- to oligonucleotide editing insertions are the 'world's smallest introns'. Several mechanisms have been proposed for the insertion of nuclear introns, but which, if any, of these is actually responsible for most intron gains is presently unclear. First, the probable relationship between nuclear and group II introns (see next section) raises the possibility that nuclear genes are continually invaded by group 11 introns by the reverse-self-splicing mechanism described above. Rapid intron degeneration to a spliceosomedependent form could then occur, perhaps stemming from a single nucleotide substitution that would establish a nuclear-intron consensus splice-site [6,,48]. Second, a number of transposable elements have been shown to function as essentially novel 'introns' by utilizing cryptic splice sites located either within themselves or within the genes into which they have been inserted [49,50",51"]. The analogy with standard introns is weak because none of these transposon 'introns' represent clean insertions (codons are either added to or deleted from the inserted gene). However, many of these insertions are also extremely recent (within the last 20 years), and such a transpositional mechanism could be potent when applied on an evolutionary time-scale. A third possible mechanism, albeit one which has not received any experimental support, involves the tandem duplication of exons and subsequent activation of internal cryptic splice sites [5"',48]. Finally, a mechanism presumably restricted to spliceosomal small nuclear RNA (snRNA) genes is represented
474
Gene organization and evolution
by the recent insertion of introns into fungal snRNA U6 genes via putatively aberrant splicing reactions [52"]. Regardless of the actual insertion mechanism, analyses by Dibb and Newman [9] and Lee et al. [24..] suggest that nuclear introns are inserted at a restricted set of sequences conforming to a 'proto-splice' site - - a sequence that is virtually identical to the consensus sequence flanking nuclear introns.
Evolutionary relationships of introns and splicing machinery Whereas group I introns appear to be unrelated to other types of introns, an evolutionary connection between group II introns and nuclear introns is suggested by the striking similarities in their splicing mechanisms and molecular structures. The products and intermediates of splicing, including the lariat-like excised intron, are essentially identical for group II and nuclear introns and are presumably formed by similar mechanisms [53",54",55"]. Even more striking are the similar secondary structures formed in c/:sby group II introns and in tram by the snRNA-nuclear intron complex (spliceosome) [54"]. Also, the spliceosomal U6 snRNA has recently been shown to play a catalytic role in nuclear splicing [53.]. These similarities have led to the proposition [54",55"] that the spliceosome is a degenerate form of a self-splicing group II intron, and, accordingly, that nuclear introns originated from group II introns. Additional support for this hypothesis comes from two independent examples of apparently degenerate group II introns in chloroplasts. The tran.vspliced psaA gene of Chlamydomonas chloroplasts contains a degenerate group II intron composed of at least three separately encoded elements; two c~sacting RNA domains still attached to exons and a middle tran.gacting RNA element [56"]. This intron - - w h o s e splicing is dependent on both c/~- and tran.vacting intronic RNA sequences - - provides an excellent model of the type of intermediate intron that is envisaged to have existed within the nucleus during the putative transition from group II introns to nuclear pre-mRNA introns. In the second example, the group III introns uniquely found in euglenoid chloroplast genomes contain some features of group II introns, but are probably too small (100 bp) and simple (90% A-T) to be self-splicing [57]. It seems likely that they are spliced by a largely tran.vacting process, analogous to that mediated by the splicesome. Consideration of structural and phylogenetic aspects of tran.gsplicing of spliced leader RNAs (SLRNAs) in trypanosomes further elaborates the hypothesis that nuclear introns originated from group II introns. These organisms appear to lack any c~spliced introns [11.,12., 14,15.]; indeed, such introns seem to have arisen after the divergence of trypanosomes from the main eukaryotic lineage (Fig. 1; see discussion above). It has been proposed [58] that trypanosomal tran.vsplicing gave rise
to ct:s-splicing. Indeed, the spliceosomes that function in tran~splicing and in c/s-splicing are clearly homologous, containing U2, U4 and U6 snRNA [59"]. A major difference is that trypanosomes lack U1 snRNA, which interacts with the 5' splice site of nuclear introns, and U5 snRN& whose function is less clear (K Watldns and N Agabian, personal communication) [59"]. Structural and functional similarities among group II introns, SL RNAs, and U1 snRNAs have led to the proposal [59"] that SL RNAs may represent 'evolutionary intermediates' between self-splicing group U introns and c/s-splicing spliceosomes. It may perhaps be more useful to think of trypanosomal trans-spliceosomes not as direct evolutionary intermediates but as derivatives of a group II-derived, ancestral trans-splicesome that played an intermediate role. If the splicesome and the 'original' nuclear intron are derived forms of group II introns, and if group II introns also continue to provide a source of new nuclear introns, then one must ask where these group II introns come from. Since many chloroplast and rnitochondrial genomes do contain group II introns, and since organelle sequences are frequently transferred into nuclear genomes [60], it seems reasonable to imagine that the organelles have been seeding the nucleus with introns [6"]. The failure to find functional group II introns - - regardless of their origin - - in nuclear genomes can be rationalized by hypothesizing their relatively rapid degeneration into standard nuclear introns ( [6.,48] and preceding section).
Conclusions The phylogenetic distribution of introns suggests that all nuclear introns and virtually all group I and II introns were inserted relatively recently into preformed genes. Accordingly, the role of exon-shuffling in the assembly of primordial genes can be disregarded. It remains to be seen whether modem cases of exon-shuffling are limited to animals (principally vertebrates), or will be discovered in other intron-containing eukaryotic lineages. Detailed study of the number, size, and distribution of introns in those eukaryotes that contain them should considerably extend our understanding of the patterns and pressures of intron gain, loss, movement, and size change. Further research is also essential to elucidate the mechanisms of intron gain and loss in nuclear genes. Finally, continued study of splicing mechanisms should critically test the hypothesis of spliceosome and nuclear intron evolution from group II introns.
Acknowledgements We thank Sandie Baldauf for re'analyzing the MDH sequences; Sandie Baldauf, Tom Blumenthal, and Ken Wolfe for critical reading of the manuscript; and H Ochman, /VlW Smith, K Watkins, N Ag'abian, and R Doolitfle for unpublished results. Preparation of this review and
The recent origins of introns Palmer and Logsdon 475 work on introns in this laboratory are supported in part by a grant from the National Institutes of Health to JD Palmer (GM-35087) and by an NIH training grant position to JM Logsdon Jr (GM-7757).
References and r e c o m m e n d e d reading Papers of special interest, published within the annual period of review, have been highlighted as: • of interest •• of outstanding interest DARNELLJE, Doot.rI'taE WF: Speculations on t h e Early Course of Evolution. Proc Natl Acad Sci USA 1986, 83:1271-1275. GILBERT W, MARCHIONNI M, McKNIGHT G: On the Antiquity of lntrons. Cell 1986, 46:151-154. DORIT RI~ SCHOENBACHL, GILBERTW: HOW Big is the Universe of Exons? Science 1990, 250:1377-1382. "~ae example p a r excellence of introns-early and exon-shuffling thinking carried to its extreme. Using ms its assumption the hypothesis that genes were assembled from exon subunits, this paper performs statistical comparisons of the available exon database to estimate that only 100(~7000 exons were needed to construct all proteins.
genes from Trypanosoma [11] and Giardia ( MW Smith and R Doolitfie, personal communication) lack introns. II. •
MICHELS PAM, MARCHAND M, KOHL L, ALLERT S, WIERENGA RK, OPPERDOEW FR: The Cytosoltc and Glycosomal lsoenzymes of Glyceraldehyde-3-Pbosphate Dehydrogenase in T r y p a n o s o m a brucet Have a Distant Evolutionary Relationship. Fur J Biochem 1991, 198:421-428. The two GAPDH genes in Trypanosoma lack introns and are so dissim. ilar to each other as to suggest that they were acquired independently by a trypanosomal ancestor. NAWRATHC, SCHELLJ, KONCZ C: H o m o l o g o u s Domains of the Largest Subunit of Eucaryotic RNA Polymerase II are Conserved in Plants. Mol Gen Genet 1990, 223:65-75. This paper and the references therein show that the gene encoding the largest subunit of RNA polymerase II contains large and variable numbers of introns in AFP (e.g. 27 introns in mouse). This gene lacks introns in the protists Plasmodium [13] and Trypanosoma [14]. 12. •
13.
It WB, BZIK DJ, Gu H, TANAKA M, FOX BA, INSELBURGJ: An Enlarged Largest Subunit of P l a s m o d i u m f a l i c i p a r u m RNA Polymerase II Defines Conserved and Variable RNA Polymerase Domains. Nucleic Acids Res 1989, 17:9621-9636.
14.
EVERS R, HAbIMER A, KOCK J, JESS W, BORST P, MEMET S, CORNELISSEN AWCA: T r y p a n o s o m a brucei Contains Two RNA Polymerase II Largest Subunit G e n e s with an Altered C-terminal Domain. Ce// 1989, 56:585-597.
3.
4. •°
STONEEM, SCHWAKrZRJ, (EDS): Intervening Sequences in Evc, lution a n d Development. Oxford: Oxford University Press, 1990. This book present a clear picture of the introns-early view, including forceful chapters by leading architects of this view, such as WF Doolitfle and CCF Blake.
15. •
PERELMAND, BOOTHROYDJC: Lack of Introt,.s in the Ribosomal Protein Gene S14 of Trypanosomes, Mol Cell Biol 1990, 10:3284-3288. The ribosomal protein gene S14 contains several variably present introns in AFP, including yeast, but lack introns in Trypanosoma bruce~ The authors also point out that no c/s-spliced introns have been found in any trypanosomes.
5. ROGERSJH: The Role of lntrons in Evolution. FEBS Lelt 1990, o• 268:339-343. A review from a leading proponent of the introns-late school on the role of exon-shuffling during evolution and on the origin of nuclear introns.
16. •
6. CAVALIER-SMrrHT: Intron Phylogeny: a New Hypothesis. • Trends Genet 1991, 7:145-148. A proponent of the introns-late school presents a 'grand-unification' scheme for the evolution of "all known intron types.
17. •
7. PATI'HYix Exons-Original Building Blocks of Proteins? Bioes. •. says 1991, 13:187-192. A detailed rebuttal (see also [41 *•] ) by another introns-late advocate of the assumptions, methods, and results of Dorit et al. [3**], who attempted to estimate the size of the exon assembly universe. 8. •
SOGtN Mix The Phyiogenetic Significance of Sequence Diversity and Length Variations in Eukaryotic Small Subunit Ribosomal RNA Coding Regions. in N e w Perspectives in Et,c> lution edited by Warren L, Koprowski H [book]. New York: Wiley Liss, 1991, pp 175-188. A recent review of eukaryotic phylogeny as based upon rRNA sequence data. 9. 10. ••
DraB NH, NEWMANAJ: Evidence that lntrons Arose at ProtoSplice Sites. EMBO J 1989, 8:2015-2021.
LIAUDME, ZHANG DX, CERFF R: Differential lntron Loss and Endosymbiotic Transfer of Chloroplast Glyceraldehyde-3Phosphate Dehydrogenase G e n e s to t h e Nucleus. Proc Nail Acad Sci USA 1990, 87:8918-8922. A classic example of introns-early thinking applied to one of the best test cases of the origins-of-introns debate. Comparison of intron positions in various GAPDH genes leads to the disputed conclusions (see text) that four introns are shared between nuclear genes encoding cytoplasmic and chloroplast GAPDH genes and that the primordial GAPDH gene was assembled by exon shuffling. Few intron positions are shared even by animal and plant cytoplasmic GAPDH genes; also, GAPDH
VAN DER STRAETEN D, RODRIGUES-POUSADARA, GOODMAN HM, VAN MONTAGUM: Plant Enolase: G e n e Structure, Expression, and Evolution. Plant Cell 1991, 3:719--735. No positions are shared between the 12 or more introns present in plant genes for the glycolytic enzyme enolase and the 10 introns in the animal gene. CAMIRANDA, ST-PIERRE B, MAR/NEAUC, BRISSON N: O c c u r r e n c e of a Copia-Like Transposable Element in O n e of the Introns of the Potato Starch Phosphorylase Gene. Mol Gen Genet 1990, 224:33-39. None of the 14 introns in the starch (glycogen) phosphorylase gene from plants is found at the same position as any of the 19 introns in the animal gene. There is also no obvious correlation between exons and protein domains. 18. •
FUJIWARAS, FUKUZAWAH, TACHIKIA, MIYACHIS: Structure and Differential Expression of T w o G e n e s Encoding Carbonic Anhydrase in C h l a m y d o m o n a s reinhardtii. Proc Natl Acad Sci USA 1990, 87:9779-9783. The ten introns in the plant carbonic arthydrase gene share no positional homology with the six introns found in the animal gene. 19. .
MOTOJIIMAK, GOTO S: Organization of Rat Uricase Chromosomal Gene Differs Greatly From that of the Corresponding Plant Gene. FEBS Lett 1990, 264:156-158. There are no shared intron positions between the uricase genes of animals and plants, each of which contains seven introns. 20. 21. •.
ROGERSJ: Exon Shuffling and Intron Insertion in Serine Protease Genes. Nature 1985, 315:458--459.
KALLUNKIT, IKONEN J, CHOW LT, KALLUNKIP, TRYGGVASONK: Structure of the H u m a n Laminin B2 Chain Gene Reveals Extensive Divergence From the Laminin BI Chain Gene. J Biol Cbem 1991, 266:221-228. O n e of the best examples of intron differences most readily explained by massive, relatively recent intron gains. The genes for laminin B1 and B2 chains result from a gene duplication and have highly similar patterns of domains and internal repeats. Yet, only 3 of the 34 and 28 introns present in the two genes, respectively, are found in the same
476
Gene organization and evolution locations, and there is a lack of correlation between exons and protein domains and internal repeats. 22.
DIBB NJ, MARUYAMAIN, KRAUSEM, KARNJ: Sequence Analysis of the Complete Caenorhabditts elegans Myosin Heavy Chain Gene Family. J Mol Biol 1989, 205:603-613.
23.
SPEITH J, NETrLETON M, ZUKER-APRISONE, LEA K, BLUMENTHAL T: Vitellogenin Motifs Conserved in Nematodes and Vertebrates. J Mol Evol 1991, 32:429-438.
LEE VD, STAPLETON M, HUANG B: Genomic Structure of C h l a m y d o m o n a s Caltractin. Evidence for Intron Insertion Suggests a Probable Genealogy for the EF-Hand Superfamily of Proteins. J Mol Biol 1988, 221:175-191. A detailed and clear example of the use of phylogenetic reasoning to infer the late insertion of introns at the protosplice sites within many different members of the EF-hand superfamily of calcium-modulated proteins. 24. **
25.
SH1H MC, HEINRICH P, GOODMAN HM: Intron Existence Predated the Divergence o f Eukaryotes and Prokaryotes. Science 1988, 242:1164-1166.
26. .•
IWABEN, KUMA K, KISHINO H, HASEGAWAM, MIYATAT: Compartmentalized lsozyme G e n e s and t h e Origin of lntrons. J Mol Evol 1990, 31:205-210. Together with the papers on GAPDH genes [10••,25], this paper and the references therein constitute the strongest evidence that specific introns were present in the progenote. Comparison of nuclear homologs for cytoplasmic and mitochondrial isozymes leads to the disputed claim (see text) that two introns in MDH genes predate the divergence of eukaryotes and prokaryotes and the phylogenetically ambiguous statement (again, see text) that five introns in AAT genes could be of equally ancient vintage. 27.
JUREaqC N, MATYESU, ZIAK M, CHRISTEN P, JAUSSl Pc Structure of the Genes of Two Homologous lntracellularly Heterotopic Isoenzymes. Eur J Biocbem 1990, 192:119-126.
33. =
PALMERJD: Plastid Chromosomes: Structure and Evolution. In Cell and Somatic Cell Culture Genetics of Plan~ Vol 7.4. 770e Molecular Biology of Plastidg edited by Vasil IK, Bogord L [book]. San Diego: Academic Press, 1991, pp 5-53. A compilation of the distribution of introns in chloroplasts genes is used to argue for a recent, certainly postendosymbiotic, inset•tonal gain for virtually all of these introns. 34. •
COPERTINODW, HALUCK RB: Group II Twintron: an Intron Within an lntron in a Chloroplast C y t o c h r o m e b-559 Gene. Fa~IBO J 1991, 10:433-442. The first discovery of an intron within an intron, in this case, both group It in•tons. Further evidence for the mobili W of group II introns,
35.
DUJONB: Group I Introns as Mobile Genetic Elements: Facts and Mechanistic Speculations - - a Minireview. Gene 1989, 82:91-114.
36.
SOGINML, INGOLD A, KARLOKM, NIELSEN H, ENGBERGJ: Phylogenetic Evidence for the Acquisition of Ribosomal RNA Introns Subsequent to the Divergence of Some of the Major Tetrahymena Groups. F~IBO J 1986, 5:3625-3630.
37. •
BELL-PEOEP, SEN D, QUIRK S, CLYMANJ, BELFORT M: Intron MobilJty in Phage T4 is D e p e n d e n t u p o n a Distinctive Class of Endonucleases and I n d e p e n d e n t of DNA Sequences Encoding the lntron Core: Mechanistic and Evolutionary Implications. Nucleic Acids Res 1990, 18:3763-3770. Presents results and a model for the mobility of the group I introns found in bacteriophage genes. 38. •
KUHSELMG, STPdCKtAND R, PAIJ.IER JD: An Ancient Group l Intron Shared by Eubacteria and Chloroplasts. Science 1990, 250:1570-1573. This and the next paper report the first and only discovery of an in•ton in a eubacterial gene. A group 1 intron in a tRNA gene is present in all examined cyanobacteria and in most chloroplast lineages; hence, this is also the oldest known intron at > 2 billion years old. Xu et al. [39 • ] also show that this intron is self-splicing in cyanobacteria. 39.
Xu MQ, LATHE SD, GOODRICH-BLMR H, NIERZWICKI-BAUERSA, SHUB DA: Bacterial Origin of a Chloroplast lntron: Conserved Self-Splicing Group I Introns in Cyanobacteria. Sci. ence 1990, 250:1566-1570. This and the pre~ious paper report the first and only discovery of an intron in a eubacterial gene. A group I self-splicing intron in a tRNA gene is present in all examined c3~nobacteria -and in most chloroplast lineages; hence, this is also the oldest known intron at > 2 billion years old. •
28. •
LAX~,ENCEJG, HARTL DL, OCHMAN H: Molecular Considerations in the Evolution of Bacterial Genes. J Mol Evol 1991, 33:241-250. An analysis of substitution patterns in two genes, including GADPH, from 12 species of enteric bacteria. Includes new sequence data for 11 of the 12 GAPDH genes. 29. •
NELSONK, WHITrMAN TS, SELANDERRK: Nucleotide Polymorp h i s m and Evolution in the Glyceraldehyde-3-Phosphate Dehydrogenase Gene (gapA) in Natural Populations of Salmonella and Escberichia coll. Proc Nail Acad Sci USA 1991, 88:6667-6671. An analysis of the sequence evolution of 29 newly sequenced GAPDH genes from two bacterial tmxa. includes references to sources of other bacterial GAPDH gene sequences. 30. .
CUMMINGSDJ, MCNALLYKL, DOMENICOJM, MATSUURAET: The Complete DNA Sequence of the Mitochondrial G e n o m e of Podospora anserimz Curr Genet 1990, 17:375-402. This paper together with [31*] highlights the highly variable and idiosyncratic distribution of group l and group U introns in fungal mitochondrial genomes. 31. .
SKELLYPJ, MAIESZKA Pc Distribution of Mitochondrial Intron Sequences A m o n g 21 Yeast Species. Curt Genet 1991, 19:89-94. This paper together with [30 • ] highlights the highly variable and idiosyncratic distribution of group l and group lI introns in fungal mitochondrial genomes. 32. •
MANHARTJR, PALMERJD: T h e Gain of T w o Chloroplast tRNA Introns Marks t h e Green Algal Ancestors of Land Plants. Nature 1990, 345:268--270. A clear application of phylogenetic analysis leading to the conclusion that two chloroplast group II introns were inserted into green algal genomes close to the evolutionary point where land plants originated.
40.
TRAUT TW: Do Exons Code for Structural or Functional Units in Proteins? Proc Natl Acad Sci USA 1988, 85:2944-2948.
DOOLFI'I'I.ERF, Response by DORIT RL, SCHOENBACH I~ GII.BERT W: Counting and Discounting t h e Universe of Exons. Science 1991, 253:677~.~30. A rebuttal by Doolittle (see also [7°*]), followed by a response from Dorit etal, of the laner's attempt [3 ,•] to estimate the size of the 'exon universe'. 41. •,
42. PATI'HY L: Modular Exchange Principles in Proteins. Curr •• Opinion Struct Biol 1991, 1:351-361. A detailed review of examples and principles of the evolutionary exchange of modules between proteins and the role of introns in facilirating this process. 43.
AUGUSTINS, MULLERMW, SCHWEYEN RJ: Reverse Self-Splicing of Group It lntron RNAs In Vitro. Nature 1990, 343:383-386. his paper together with [44",45] provides the first experimental evidence that group l and group II introns can insert themselves in vitro into intronless genes x4a a reversal of the standard self-splicing reaction. Woodson and Cech [45] provide an excellent discussion of the evolutionary implications of this work. 44. •
MORLM, SCHMELZER C: Integration of Group II lntron b l l Into a Foreign RNA by Reversal of t h e Serf-Splicing Reaction In Vitro. Ce// 1990, 60:629-636. This paper together with [43%45] provides the first experimental evidence that group I and group It introns can insert themselves in vitro
T h e r e c e n t o r i g i n s of i n t r o n s P a l m e r a n d L o g s d o n into introniess genes via a reversal of the standard self-splicing reaction. Woodson and Cech [45] provide an excellent discussion of the evolutionary implications of this work. 45.
WOODSONSA, CECH TR: Reverse Self-Splicing of the Tetrahy. m e n a Group I Intron: Implication for the Directionality of Splicing and for Intron Transposition. Cell 1989, 57:335-345.
46.
CECH TR: RNA Editing: World's Smallest lntrons? Cell 1991,64:667~7~-69. ~ v e l o p s the hypothesis that RNA editing of kinetoplast RNA.s, which involves the addition of one to a few nucleotides, may occur by a transesterification mechanism analogous to the reversal of the self-splicing mechanism of group l and group II intons. 47. .
BLUM B, STURM NR, SIMPSON AM, SIMPSON L: Chimeric gRNA-mRNA Molecules with Oligo(U) Tails Covalenfly Linked at Sites of RNA Editing Suggest that U Addition Occurs by Transesterification. Cell 1991, 65:543-550. Provides experimental evidence in support of Cech's hypothesis [46*] that RNA editing insertions in kinetoplasts are the 'World's smallest introns'.
48.
ROGERSJ: How Were Introns Inserted Into Nuclear Genes? Trends Genet 1989, 5:213-216.
49.
WESSLERSR: The Splicing of Maize Transposable Elements from pre-mRNA - - a Mini-review. Gene 1989, 82:127-133.
MENSSENA, HOHMANNS, MARTINW, SCHNABLEPS, PETERSON PA, SAEDLERH, GmRL A: The En/Spm Transposable Element of Zea mays Contains Splice Sites at the Termini Generating a Novel lntron from a dSpm Element in t h e A2 Gene. ~HBO J 1990, 9:3051-3057. O n e of the most clear-cut examples of a transposon insertion that behaves like an intron.
The first evidence that any of the spliceosomal snRNAs are in any sense catalytic. 54. ..
JAQUIER A. Self-Splicing Group 11 and Nuclear pre.mRNA lntrons: How Similar are They? Trends Biochem Sci 1990, 15:351-354. An excellent, balanced review of the hypothesis that group II introns gave rise to nuclear pre-mRNA introns. 55. .
GOTHRIE C: Messenger RNA Splicing in Yeast: Clues to W h y the Spliceosome is a Ribonucleoprotein. Science 1991, 253:157-163. A recent review summarizing evidence consistent with the hypothesis that nuclear introns and spliceosomes are degenerate forms of group II introns. 56. •.
GOLDSCHMIDT-CLERblONT M, CHOQUET Y, GIRARD-BASOU J, MICHEl.P, SCHIRMER-RAHIREM, ROCHAIXJ-D: A Small Chloroplast RNA May be Required for Tran~Splicing in Cblam.p d o m o n a s reinhardtti. Cell 1991, 65:135-143. An important discovery of a unique, tripartite group 11 intron in chloroplasts that consists of both c/s-and tran.~acting sequences. This provides an excellent example of the kind of intermediate that may have existed during the hypothetical evolution of nuclear introns from group 11 introns. 57.
CHRISTOPHERDA, HAWCK RB: Euglena gracilis Chloroplast Ribosomal Protein Operon: a New Chloroplast Gene for Ribosomal Protein L5 and Description of a Novel OrganeUe lntron Category Designated Group IlL Nucleic Acids Res 1989, 17:7591-7608.
58.
BOOTHROYDJC: Trans-Splicing of RNA. In Nucleic Acids a n d Molecular Biology vol 3 edited by Eckstein F, Killey DMJ [book]. Berlin: Springer-Verlag, 1989, pp 216-230.
50. .
51. •
FRIDELLRA, PRET AM, SEARLESLL: A Retrotransposon 412 Insertion within an Exon of the Drosophila Melanogaster Vermilion Gene is Spliced from the Precursor RaN& Genes Devel 1990, 4:559-566. A retrovirus-like transposable element behaves ms an intron by virtue of being spliced from RNA using splice donor and acceptor sites located near the ends of the transposon. 52. .
TANI T, OHSHIMAY: mRNA-type Introns in U6 Small Nuclear RNA Genes: Implications for the Catalysis in pre-mRNA Splicing. Genes Dev 1991, 5:1022-1031. This survey of 52 organisms shows that each of the six mRNA-type introns present in U6 RNA genes is restricted to a single fungal lineage, consistent with the hypothesis that these introns originated via insertion of an excised intron during pre-mRNA splicing. 53. .
FABRIZIOP, ABELSONJ: Two Domains of Yeast U6 Small Nuclear RNA Required for Both Steps of Nuclear Precursor Messenger RNA Splicing. Science 1990, 250:404--409.
59. •
BRUZIKJP, STEITZ JA: Spliced Leader RNA Sequences Can Substitute For the Essential 5' End of U 1 RNA During Splicing in a Mammalian In Vitro System. Cell 1990, 62:889--899. Presents the strongest experimental evidence supporting the hypothesis that SL RNA-mediated splicing might be intermediate between group I1 self-splicing and splicesome-dependent c~splicing. 60.
TIMMISJN, Scoyr NS: Promiscuous DNA: Sequence Homologies Between DNA of Separate Organelles. Trends Bic~bem Sci 1984, 10:271-273.
61.
WOESE CR: 51:221-271.
62.
GIOVANNON1SJ, TURNER S, OL':,EN GJ, BARNS S, LANE DJ, PACE NR: Evolutionary Relationships A m o n g Cyanobacteria and Green Chloroplasts. J Bacteriol 1988, 170:3584-3592.
Bacterial
Evolution.
Microbiol
Rev
1987,
JD Palmer, JM Logsdon Jr, Department of Biology, Indiana University, Bloomington, Indiana 47405, USA.
477