J. theor. Biol. (1991) 152, 427-428
LETTER TO THE EDITOR Evolution of Split Genes The introns of split genes could have arisen in two ways. They could have been inserted into existing genes (Crick, 1979; Rogers, 1985), or they could have been present from the beginning (Doolittle, 1978; Darnell, 1978; Blake, 1983; Gilbert, 1985; Elder, 1987). On the latter view introns were lost from prokaryotes under selection for minimal genome size. Evidence has accumulated that many introns were indeed present from a very early stage. For example, the exon-intron patterns in the triose phosphate isomerase gene from chicken and maize are strikingly similar (Gilbert, 1985). Also, the exons of this gene correspond to domains of protein tertiary structure, a not uncommon finding. This agrees with the proposal of Darnell (1978) that the exons of a gene were once small, primitive genes co-transcribed as a polycistronic mRNA. We could call these primitive genes "protogenes". The evolution of splicing would allow the production of a single large, sophisticated protein of the modern type. Here I would like to give two further arguments in support of these views. (1) The genetic code has 64 triplets of which three are terminators (UAA, UAG and UGA). If we take a random nucleotide sequence and translate it in a single reading frame, we will expect to encounter a termination triplet on average about once every 20 amino acids (64+ 3 = 21½). This is of the same order of magnitude as the average exon coding capacity (40-45 amino acids according to Blake, 1983), but is smaller by an order of magnitude than the average size of a modern protein at 300-400 amino acids. This suggests that the earliest organisms could have had considerable difficulty in evolving open reading frames sufficiently long to encode a modern protein, while exon-sized protogenes would pose no such problem. (2) Breathnach & Chambon (1981) give the following consensus for the splice site at the 5' end of introns: AG GTPuAGT The intron lies to the right of the arrow with the upstream exon to its left. The underlined sequence TPuA would be expressed in the RNA transcript as either UAA or UGA. Both are termination codons. One could suggest that this sequence was, in fact, a terminator for a primitive protogene corresponding to the contemporary upstream exon. The terminator could then have been converted into part of the splicing consensus. (The splicing system might also have tolerated an UAG terminator in this position, since splicing involves RNA "snurps" and these might accept both UAA and UAG through wobble pairing.) In this scenario, the erstwhile terminator codon should be spliced ,*27 0022-5193/91/190427+02 $03.00/0
© 1991 Academic Press Limited
428
D. ELDER
o u t as p a r t o f t h e i n t r o n , t h u s a l l o w i n g t r a n s l a t i o n a l r e a d t h r o u g h i n t o t h e n e x t e xon. I n d e e d , t h e T P u A p a r t o f t h e s p l i c i n g c o n s e n s u s d o e s lie w i t h i n the i n t r o n . T h e s e two c o n s i d e r a t i o n s s u p p o r t the i d e a t h a t i n t r o n s were p r e s e n t in genes f r o m the outset. Department of Biochemistry, University o f Adelaide, GPO Box 498, Adelaide, SA 5001, Australia (Received on 11 October 1990, Accepted on 5 March 1991)
REFERENCES
BLAKE,C. C. F. (1983). Nature, Lond. 306, 535-537. BREATHNACH,R. & CHAMBON, P. (1981). Ann. Rev. Biochem. 50, 349-383. CRICK, F. (1979). Science 204, 264-271. DARNELL,J. E. (1978). Science 202, 1257-1260. DOOLrr'rLE, W. F. (1978). Nature, Lond. 272, 581-582. ELDER, D. (1987). Rio. Biol.-Biol. Forum. 80, 499-511. GILBERT, W. (1985). Science 228, 823-824. ROGERS, J. (1985). Nature, Lond. 315, 458-459.
DAVID ELDER