Isoenzymes, isoproteins and introns

Isoenzymes, isoproteins and introns

T I B S - December 1984 517 Reviews Isoenzymes, isoproteins and introns E. James Nilner-White Families of genes coding for closely related proteins ...

339KB Sizes 0 Downloads 79 Views

T I B S - December 1984

517

Reviews Isoenzymes, isoproteins and introns E. James Nilner-White Families of genes coding for closely related proteins are characteristic of eukaryotes. This review discusses the ways in which introns are thought to encourage evolution of such families, and stresses the possible evolutionary consequences of the observation that the introns in most related genes are more variable than the exons. lsoenzymes I are genetically-coded multiple forms of enzymes. The term isoproteins, though less widely used, is useful because it encompasses all proteins. Isoproteins are of two types, allelic or nonallelic, depending on whether the multiple forms are coded by a single genetic locus or by more than one. Many protein families show both kinds of variation. Allelic isoproteins are the variants that can probably be found for any protein within a large population of organisms if searched for. One example is given by the abnormal haemoglobins; adult forms are of two types, those affecting either the alpha or the beta chains. Often such variants form a small proportion of the proteins expressed at corresponding loci in a population, but one group of proteins for which an exceptional amount of such variation has been found is the mammalian class I MHC (major histocompatibility complex) antigen family. There is so much variation that, for certain loci, the two protein products of the corresponding maternally and paternally derived genes of any organism are rarely found to be immunologically identical. Non-allelic isoproteins include the lactate dehydrogenase, hexokinase, actin and globin families of proteins. For mammalian haemoglobins, for example, there are families of both a-type and [3-type; the members of each family are synthesized successively during development of the organism. Non-allelic isoproteins are very common 2,3 in higher eukaryotes. This review compiles evidence that this is related to the 'split' nature of their genes.

E. J. Milner-White is at the Department of Biochemistry, University of Glasgow. Glasgow, Scotland, G12 8QQ, UK.

Evolutionary advantages of split genes Most of the genes of higher eukaryotes that code for proteins are split. Their coding regions are called exons and the intervening non-coding sequences are called introns; frequently more of the gene sequence is occupied by introns than by exons. When a gene is transcribed the initial R N A transcript includes both exons and introns; the intron R N A then has to be excised by the process called splicing. Eubacterial genes, on the other hand, are not split. Many explanations for this divergence have been suggested 4. An important one is that exons often code for discrete folding units and may readily undergo rearrangements, duplications or deletions to give rise to new genes during ew)lution. Others are that introns provide extra sites for the control of gene expression or that they represent ' 'selfish DNA', i.e. DNA that persists within the gcnomc but has no phenotypic lunclion. These ideas will not bc c,~msi(Icrcd hit ther because lhcy have bccn well reviewed alrcady. Another phenomenon rclcvanl to intron function is the alternative processing~ of RNA transcripts, where a single gene codes for one or more isoproteins because different sets of splice junctions or poly(A) sites are used during R N A processing. This has been

observed in immunoglobulins, a-crystallins, calcitonins, fibrinogens and several eukaryotic viral proteins. The myosin light-chain gene has 7 two alternative promoters; this, combined with the use of different splice junctions, gives rise to isoproteins. It is not yet clear how common such events are but they probably do not apply to more than 20% of genes in higher eukaryotes. It is the purpose of this review to point out that introns seem to have another evolutionary function that has not received enough attention. Firstly two matters will be discussed: (1) intron homologies, and (2) concerted evolution. lntrons show more variability than exons When homologous genes are compared it is usually found that the introns show more variability than the exons. Table I gives data for the beta-like human globin genes. It is evident that there are small differences between the lengths of corresponding introns. Variation in intron length between related genes is in general very common and it has been suggeste& that this is frequently due to slipped mispairing, during recombination, of the short directrepeat sequences commonly found m introns. However. in the ~3-globin family, sequence differences between introns are more striking than length differences. Concerted evolution Members of gene families appear to cw4vc together by exchanging genetic inti)rmation. It is observed thaU TM, within gene families, the sequences of non-allelic genes of one species show more homology to each other than do the corresponding genes of different species, even though the members of the gene families may have diverged earlier

Table 1. Sizes of exons and introns of globin genes (bp)

1st exon 1st intron 2nd exon 2nd intron 3rd exon

92 122 223 850 129

y

8

[3

92 122 223 866 129

92 128 223 889 129

92 I30 223 8511 129

The human beta-like globin genes exist as a cluster in the genome. The data show that size variation between genes occurs exclusively in the non-coding regions. There is also very much less sequence homology between corresponding introns than between exons. This is especially pronounced for the second intron, where, for much of its sequence, there is no recognizable homology between different globins (see Ref. 8). © 1984, ElsevierSciencePublishersBV. Amsterdam (1376- Si167/84~r2.{lll

T 1 B S - December 1984

518 in evolutionary time than the species. This is called concerted, or coincidental, evolution. Furthermore, within gene families discrete medium-sized lengths of sequence (say 1 kb) are sometimes found that are noticeably more homologous than the other parts. This phenomenon is often considered to result from gene conversion, but unequal crossing-over events between tandemlyarranged homologous genes can give rise to the same effect, provided the genes are so organized. One definition of gene conversion l: is that it is the nonreciprocal transfer of genetic information between partially homologous genes, whereas crossing-over is mediated by recombination in which the transfer is reciprocal. Only in the last five years has the importance of gene conversion been generally appreciated. However, both mechanisms invoked to explain the greater-than-expected homologies within gene families are considered to involve the matching of partially-homologous sequences.

The evolutionary importance of mismatch between related genes In 1978 Tiemeier et al. ~3 sequenced a pair of beta-like globin genes as in Table I and found that the introns appeared to have a higher rate of evolutionary change than the exons. They then suggested that areas of mismatch so generated between repeated genes reduce recombination between non-allelic but related loci in eukaryotes. This idea has also been proposed by Kourilsky 14 (1983) who pointed out that the existence of relatively large divergent introns (and other non-coding nucleotides) is expected to discourage recombination and conversion between partiallyhomologous genes and that this gives rise to the evolutionary stabilization of a multigene family. Evolution of a new isoprotein in any organism can be considered to occur in two stages: (1) gene duplication, and (2) accumulation of differences between them. Recombination and conversion often occur during stage 1, but are less desirable at stage 2 because they tend to have the net effect of making both sequences identical by means of concerted evolution. These two stages are illustrated in Figure 1. Intron variability may be a means of introducing sufficient diversity between newly replicated genes that they can be stably inherited. Thus genes whose coding regions are very similar are more likely to be stable than they would otherwise be. The strategy is also likely to hasten protein evolu-

(1)

(2)

(2)

-I Fig. 1. Hypothetical evolution of eukaryotic genes. This diagram shows how the evolution of two genes that differ by only a few amino acids may occur. There are two stages: (1) gene duplication, and (2) accumulation of differences, mainly in the non-coding regions; the lengths of the exons, which are shaded, remain approximately constant. Note that, as mentioned in the legend to Table 1, variation in sequence between corresponding exons frequently occurs without any substantial changes in length.

there has been selection for polymorphism at single loci, although the reasons for this are not fully understood. Sequencing results provide evidence for Eubacterial isoproteins frequent gene conversion events According to the above argument, between the members of this family and gene evolution in organisms without it has been argued 14 that polymorphism introns is not easy. It has often been at individual loci is most easily mainnoted that, in eubacteria, gene families tained over the long term when recomare infrequent. However, several bac- bination and conversion between a large terial isoproteins coded by different family of partially homologous genes genetic loci are known and sequence can readily occur. Hence, there is a information is available for pairs of: 3 - selective advantage for the introns not to deoxy - D - arabinoheptulosonate - vary more than the exons because this discourage such genetic 7-phosphate synthetase 15, aspartate would kinase-homoserine dehydrogenase ~6, exchanges. At first sight this appears to 13-ketoadipate enol-lactone hydrolase 17, conflict with the idea presented earlier 13-ketoadipate succinyl CoA trans- that recombination and conversion ferase 17 and EF-Tu ~8 (a protein respon- between partly homologous genes have sible for catalysing peptide chain the effect of making them identical. elongation in E. coli). Despite their However, selection for polymorphic functional similarity, the amino acid or genes alters the situation. nucleotide sequence homology for any pair, except one, is less than 50%. The Exon length It has been assumed that, if two exception is EF-Tu where the two 393 amino acid proteins are identical except related genes from different loci include for the C-terminal amino acid. On the non-homologous introns, recombination whole, therefore, a fairly drastic diver- and conversion between them is less gence between duplicated genes is likely. This is an assumption and it is not needed to stabilize them as separate known how short the exons need to be genetic loci. Of course there are plenty (or how long the introns) to be effective. of examples of gene duplication in Nevertheless it seems worthwhile to eubacteria but they are usually found to enquire about their sizes. In higher be unstable. Hence protein evolution is eukaryotes the mean length of an exon expected to be slow. However, for small is about 130 bp 2°. In lower eukaryotes organisms, their large numbers and high there is a tendency for the exons to be rates of reproduction could compensate. longer and the introns to be shorter. However, the three members of the ovalbumin family of genes each have an Introns in MItC antigen genes The introns of class I MHC antigen exon of 1040 bp, one of the longest genes are not found 14,t9 to be more vari-. known in higher eukaryotes. There are able than the exons. It appears that, two families of mammalian isoproteins, during the evolution of these genes, the histones and the et-interferons 2~,

tion, which is especially desirable in slowly-reproducing organisms such as the eukaryotes.

TIBS-

519

D e c e m b e r 1984

whose genes, of 275-525 and 570 bp respectively, lack introns. In yeast, two functional intron-less genes for both cytochrome c (330 bp) 22 and alcohol dehydrogenase (1050 bp) 23 exist. In an insect 24 even the globin genes (450 bp) have no introns. These observations suggest that if, as proposed, variable introns do encourage the evolution of isoproteins, they are not essential. Flanking regions W h e n gene duplication occurs, the part that is duplicated is not necessarily confined to the coding regions; it often extends well into the flanking regions and may even include o t h e r genes. The idea that some mismatch is desirable for the evolution of gene families would be expected to apply to non-functional parts of the flanking sequences as well as to the introns. The available evidence 2s from such sequences confirms that there is considerable variability here too.

2 Rider, C. C. and Taylor, C. B. (1980) Isoenzymes, Chapman and Hall 3 Li, W.-H. (1982) Isoenzymes: Current Topics in Biological and Medical Research 6, 55-92 4 Greer, C. L and Abelson, J. (1984) Trends Biochem. Sci. 9, 139-141 5 Dootittle, W. F. (1982) in Genome Evolution (Dover, G.A. and Flavell, R. B., eds), pp. 3-28, Academic Press 6 King, C. R. and Piatigorsky (1983) J. Cell 32, 707-712 7 Nabelshima, Y., Fujii-Kuriyama, Y., Muramatsu, M. and Ogata, K. (1984) Nature 308, 333-337 8 Efstriatidis, A., Posakony, J. W., Maniatis, T., Lawn, R. M., O'Connell, C., Spritz, R.A., DeRiel, J. K., Forget, B. G., Weissman,S. M., Slightom, J. L., Blechl, A. E., Smithies, O., Barelle, F. E., Shoulders,C. C. and Proudfoot, N. J. (1980) Cell 21, 653-668 9 Moore, G. P. (1983) Trends Biochem. Sci. 8, 411-414 10 Hayashida, H., Miyata, T., YamawakiKatooka, Y., Honjo, T., Wels, J. and Blattner, F. (1984) EMBO J. 3, 2047-2053 11 Dover, G. A. (1982) Nature 299, 111-117 12 Baltimore, D. (1981) Cell 24, 592-594 13 Tiemeier, D. C., Tflghman, S. M., Polsky, J., Seidman, J. G., Leder, A., Edgell, M. H. and Leder, P. 0978) Cell 14, 237-245 14 Kourilsky,P. (1983) Biochimie 65, 85-93

15 Davies, W. D. and Davidson, B. E. (1982) Nucleic Acids Res. 10, 4045--4048 16 Ferrara, P., Duchange, N., Zakin, H. M. and Cohen, G. N. (1984) Proc. Natl Acad. Sci. USA 81, 3019-3023 17 Yeh, W. K. and Ornston. L N. (1980) Proc. Natl Acad. Sci. USA 77, 5365-5369 18 Bosch, L., Kraal, B., van der Meide, J., Duisterwinkel, F.J. and van Noort, J.M. (1983) Prog. Nucleic Acid Res. & Mol. Biol. 30, 91-126 19 Strachan, T., Sodoya, R., Damotte, M. and Jordan, B. R. (1984) EMBO J. 3, 887494 20 Blake, C. C. F. (1983) Nature 306, 535-537 21 Goeddel, D. V., Leung, D. W., Dull, T. J., Gross, M., Lawn, R.H., McCandliss, R., Seeburg, P. H., Ullrich, A., Yelverton, E. and Gray, P. (1981) Nature 290, 20-26 22 Montgomery, D. C., Leung, D. W., Smith M., Shalitt, P., Faye, G. and Hall, B. D. (1980) Proc. Natl Acad. Sci. USA 77, 541545 23 Bennetzen, J. L. and Hall. B. N. (1982) J. Biol. Chem. 257, 3018-3025 24 Antoine, M. and Niessing,J. (1984) Nature 310, 795-798 25 Jeffreys, A. J. (1982) in Genome Evolution (Dover, G. A. and Flavell, R. B., eds), pp. 157-176, Academic Press 26 Lewis, J. and Wolpert. L. J. (1981) J. Theoret. Biol. 78, 425-438

Conclusion It has been observed s that the genomes of eukaryotes seem to be more 'plastic' in an evolutionary sense than those o f eubacteria. O n e of the reasons 26 for this appears to be that diploidy, the normal state of higher eukaryotes, often enables new variants to appear with Michael Dilworth and Andrew Glenn little or no harmful effect. A n o t h e r reason is that the possession of introns allows the proteins to evolve more easily. There are several ways by which Nitrogen fixation in legume root nodules requires biochemical cooperation between introns may bring this about. This the plant a n d Rhizobium cells. Bacteroids contribute the N2-fixing system and haem review has emphasized the somewhat f o r leghaemoglobin, but apart f r o m the production o f the globin moiety o f leghaemoglobin and the assimilation and export o f the N H 3 produced, little is k n o w n neglected idea that introns allow a much greater mismatch b e t w e e n related non- about the contributions o f the plant. It n o w appears that the plant cell m a y regulate the type and~or quantity o f carbon c o m p o u n d s supplied to the Rhizobium bacteroids. allelic genes, resulting in decreased recombination and conversion between From a biochemical viewpoint, this them. This especially permits the evolu- The legume root nodule is an intraceltion of genes coding for closely related lular symbiotic association between arrangement is important for several legume root cells and R h i z o b i u m which reasons: proteins. In the g e n o m e of a bacterium, (1) Significant problems of diffusion families of genes coding for very closely fixes atmospheric N2. The contribution related, but slightly different, proteins made by legume nodule N2 fixation is must be solved for the entry of O2, rarely occur because they are not inher- fundamental to the continuation of agri- inorganic nutrients and carbon comited reliably. This failure may have pre- culture and food production, either pounds and for the exit of fixed NH3. (2) There are potential controls on vented them from evolving to give directly as grain or green material, or differentiated multicellular organisms for maintenance of soil fertility. Nodule the flow of materials between plant and because they would be unable to express formation is a complex process resulting bacteroid by two m e m b r a n e systems tissue-specific multiple forms of pro- in a plant cell containing pleomorphic rather than one. rhizobial cells t e r m e d 'bacteroids', (3) The bacteroid is totally dependent teins. Most higher eukaryotes, on the other hand, do accumulate such fam- which may be biochemically differen- for its supplies of all types of nutrients ilies. It may be the fine shades of diver- tiated. They are enclosed, singly or in on what the plant releases to it. packets, by a peri-bacteroid m e m b r a n e (4) The inverted orientation of the sity of their isoproteins that underly the derived from the plant plasmalemma peri-bacteroid m e m b r a n e is such that characteristic morphology, metabolism the bacteroid faces the outer surface of and regulatory properties of their dif- (Fig. 1). the plant m e m b r a n e . ferent tissues, cells and organelles. Nitrogen fixation is the process by M. Dilworth and A. Glenn are at the Nitrogen References Fiyation Research Group, School of Environ- which free atmospheric dinitrogen is 1 1UB-IUPAC Recommendations (1977) J. Biol. mental and Life Sciences, Murdoch Universio,, converted into ammonia; it requires the two c o m p o n e n t proteins ( M o F e protein Murdoch, W. Australia 6150. Chem. 252, 593%5941

How does a legume nodule work?

© 1984.Elsevier5k-iencePublishersBV Amsterdam(1376 5067,S4~'$ff2.(;~l