Editorial
TRENDS in Biochemical Sciences
19 Kurdistani, S. and Grunstein, M. (2003) Histone acetylation and deacetylation in yeast. Nat. Rev. Mol. Cell Biol. 4, 276–284 20 Agalioti, T. et al. (2002) Deciphering the transcriptional histone code for a human gene. Cell 111, 381–392 21 Chen, L. and Widom, J. (2005) Mechanism of transcriptional silencing in yeast. Cell 120, 37–48
Vol.30 No.6 June 2005
279
22 Rombel, I. et al. (1998) The bacterial enhancerbinding protein NtRC as a molecular machine. Cold Spring Harb. Symp. Quant. Biol. 63, 157–166 0968-0004/$ - see front matter Q 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.tibs.2005.04.003
The discovery of split genes and RNA splicing Phillip A. Sharp Massachusetts Institute of Technology, Center for Cancer Research, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
Fifty years ago – two years after the discovery of the structure of DNA – everyone assumed that the structure of a gene was a contiguous string of base pairs, from which information was transferred for synthesis of a protein. This assumption was largely correct for the prokaryotic kingdom of organisms, but years later there were hints that these simple concepts for genes might not explain some observations in eukaryotic organisms. In particular, the genome of many animals contained such large amounts of DNA that the possibility that all of it encoded simple, bacteria-type genes seemed unlikely. Even if the repetitive sequences were discarded, there was still too much unique sequence DNA for the number of anticipated genes. In the 1970s, another puzzling observation was the detection of unusually long RNA in the nucleus of vertebrate cells compared with the shorter mRNA that emerged in the cytoplasm. Surprisingly, this nuclear RNA had both a cap structure at its 5 0 end and a polyadenosine [poly(A)] tract at its 3 0 end, just like the shorter cytoplasmic mRNA. Resolution of these paradoxes occurred with the discovery of split genes and RNA splicing. A simple comparison of the sequence of an mRNA and its corresponding nuclear DNA revealed sequences that were removed by splicing during the processing of the longer precursor [1,2]. This immediately explained how the long nuclear RNA and the shorter cytoplasmic mRNA could have the same termini, a cap and poly(A) tail, whereas the differences in length were due to removal of intron sequences from the middle. It was soon shown that almost all genes in multicellular organisms contained introns, with vertebrates averaging approximately ten per gene. Furthermore, the presence of these introns represented a genetic liability to genes; w25% of all mutations in the globin genes causing b-thalassemia in humans resulted from defects in splicing [3]. Comparison of the sequences of introns and exons (the names suggested by Wally Gilbert) revealed the presence of consensus sequences at their boundaries [4] and mutation of these sequences inactivated splicing. Corresponding author: Sharp, P.A. (
[email protected]). Available online 26 April 2005 www.sciencedirect.com
The development of soluble reactions that precisely spliced precursor RNA was the next crucial step in elucidating the mechanism of RNA splicing. This biochemistry quickly led to the realization that the intron was excised with a branch-structure in a lariat RNA (for review, see Ref. [5]). Even before this observation, insightful hypotheses based on complementarity between small nuclear RNAs (snRNAs; U1, U2, U4/U6 and U5) and intron consensus sequences indicated that these abundant RNAs were important for splicing [6]. Shortly thereafter, it was found that particles containing these snRNAs formed part of the spliceosome that executed intron removal via an intermediate consisting of a lariat RNA and the 5 0 exon-RNA [5]. The spliceosome is a complex machine, and mass-spectroscopic analysis of purified complexes has detected hundreds of polypeptides and the five major ribonucleoproteins [7]. Major insights from recent research on RNA splicing have illustrated the centrality of this process in cell biology and evolution, and some of these advances provide the basis for future insights. For example, there are two types of spliceosomes and introns in most eukaryotic organisms, the major U2 spliceosome (described earlier) and the minor U12 spliceosome, which processes U12-type introns that occur at a frequency of 0.1% of total introns [8]. Further analysis of these two types of introns might, in time, reveal the evolutionary origin of split genes. In most organisms, exons or introns are recognized as a unit in ‘exon definition’ for short exons or ‘intron definition’ for short introns [9]. In these cases, components cooperatively recognize both the 5 0 splice site and the 3 0 splice site. Enhancer sequences within exons promote exon recognition and, thus, RNA splicing. Mutational analysis has identified suppressor elements in both exons and introns that inhibit splicing of flanking sequences, in addition to intron-enhancer-type elements. The exon enhancers are commonly bound by proteins with Ser-Arg repeats; phosphorylation of the serine residues of these repeats is important in regulating splicing activity [10]. More recently, bioinformatic analysis of multiple genome sequences has identified sets of sequences for both exon enhancers and suppressors in addition to intron enhancers and silencers [11,12]. Although there is much
Editorial
280
TRENDS in Biochemical Sciences
CTD RNA polymerase II
TF
Transcription initiation
TATA Capping CTD
Transcription TATA
Spliceosome Cap
Splicing CTD
Polyadenylation and cleavage EJC poly(A) CTD
AAAA
mRNA
Nuclear export
NMD surveillance AAAA Translation Ti BS
Figure 1. Coupling of transcription to translation. Every step in gene expression from transcription factor activation to translation in the cytoplasm is potentially linked by well-characterized interactions. Thus, it is possible that the nature of transcriptional events could regulate splicing, polyadenylation and translation. For example, transcription factors influence the phosphorylation of the CTD of the polymerase and the nature of the elongation complex, which comprises capping factors (orange), splicing factors (turquoise) and polyadenylation factors (pink). These factors stimulate capping, which – together with other factors bound to the CTD – stimulates recognition for formation of a spliceosome on the intron. The splicing process deposits the exon junction complex (EJC), which influences transport from the nucleus and the efficiency of translation in the cytoplasm. Components of the spliceosome and factors bound to the CTD also stimulate cleavage and polyadenylation. The EJC signals nonsense-mediated decay (NMD) of the bound mRNA if the initial cytoplasmic ribosome encounters a stop codon upstream of the EJC.
more to be learned before the ‘splicing code’ is complete, current computational programs designed with known sets of enhancers, silencers and splice-site sequences can, in silico, accurately identify w60% of all vertebrate exons [12]. The pathway of gene expression in vertebrates is highly coupled (Figure 1); RNA splicing and polyadenylation are www.sciencedirect.com
Vol.30 No.6 June 2005
dependent upon factors bound to the carboxy terminal domain (CTD) of polymerase II as it undergoes initiation and elongation [13]. This coupling explains how different transcription factors stimulating a promoter can influence the pattern of splicing of the transcribed RNA [14]. Excision of an intron by the spliceosome deposits a complex of proteins – the exon-junction complex (EJC) – 24 nucleotides upstream of the splice junction [15]. This complex is important in mRNA transport from the nucleus and also signals nonsense-mediated decay (NMD) of a mRNA if a stop codon is encountered by the ribosome upstream of the EJC. This accounts for NMD of mRNA in eukaryotic organisms. Coupling of RNA splicing in the nucleus to the efficiency of translation in the cytoplasm is mediated by the EJC [16]. The similarity of branch formation by group II selfsplicing introns and the spliceosome strongly suggested that the latter was also RNA catalyzed. The recent discovery of the branch reaction by an RNA catalyst formed of U2 and U6 snRNAs indicates that this speculation was correct, and that both RNA splicing by the spliceosome and translation by the ribosome are ‘RNA world’ reactions [17]. The simplicity of this reaction and the current procedures for spliceosome purification mean that it will probably not be long before an atomic structure of the spliceosome is available with direct visualization of the RNA-catalyzed splicing reaction. It has been estimated that O50% of all human genes are, at some stage, expressed by alternative splicing. Interestingly, a subset of w2000 of these alternatively spliced exons can be computationally identified on the basis of a high evolutionary conservation of flanking sequences. This suggests that, for these genes, the different isoforms have crucial functions [18]. Although there are many documented examples of regulation of alternative splicing, few have been analyzed in detail. For example, the cell-surface protein CD44 is alternatively spliced by inclusion of different combinations of ten variable exons. Activation of the Ras pathway promotes inclusion of the variable exons that produces CD44 isoforms that mediate cellular invasion of surrounding tissue and recognition of certain growth or scattered factors [19]. This theme of alternative splicing of a set of exons with the various isoforms having different function probably explains the paradox of the obvious difference in the complexity of humans and worms despite the fact that both organisms contain approximately the same total number of genes. It has been estimated that three times more alternatively spliced forms are expressed on average per human gene than per nematode or Drosophila gene [20]. References 1 Berget, S.M. et al. (1977) Spliced segments at the 5 0 terminus of adenovirus 2 late mRNA. Proc. Natl. Acad. Sci. U. S. A. 74, 3171–3175 2 Chow, L.T. et al. (1977) An amazing sequence arrangement at the 5 0 ends of adenovirus 2 messenger RNA. Cell 12, 1–8 3 Treisman, R. et al. (1983) Specific transcription and RNA splicing defects in five cloned b-thalassaemia genes. Nature 302, 591–596 4 Breathnach, R. and Chambon, P. (1981) Organization and expression of eucaryotic split genes coding for proteins. Annu. Rev. Biochem. 50, 349–383
Editorial
TRENDS in Biochemical Sciences
5 Padgett, R.A. et al. (1986) Splicing of messenger RNA precursors. Annu. Rev. Biochem. 55, 1119–1150 6 Lerner, M.R. et al. (1980) Are snRNPs involved in splicing? Nature 283, 220–224 7 Rappsilber, J. et al. (2002) Large-scale proteomic analysis of the human spliceosome. Genome Res. 12, 1231–1245 8 Hall, S.L. and Padgett, R.A. (1994) Conserved sequences in a class of rare eukaryotic nuclear introns with non-consensus splice sites. J. Mol. Biol. 239, 357–365 9 Robberson, B.L. et al. (1990) Exon definition may facilitate splice site selection in RNAs with multiple exons. Mol. Cell. Biol. 10, 84–94 10 Black, D.L. (2003) Mechanisms of alternative pre-messenger RNA splicing. Annu. Rev. Biochem. 72, 291–336 11 Fairbrother, W.G. et al. (2002) Predictive identification of exonic splicing enhancers in human genes. Science 297, 1007–1013 12 Wang, Z. et al. (2004) Systematic identification and analysis of exonic splicing silencers. Cell 119, 831–845 13 Maniatis, T. and Reed, R. (2002) An extensive network of coupling among gene expression machines. Nature 416, 499–506
Vol.30 No.6 June 2005
281
14 Kornblihtt, A.R. et al. (2004) Multiple links between transcription and splicing. RNA 10, 1489–1498 15 Le Hir, H. et al. (2000) The spliceosome deposits multiple proteins 20–24 nucleotides upstream of mRNA exon–exon junctions. EMBO J. 19, 6860–6869 16 Nott, A. et al. (2004) Splicing enhances translation in mammalian cells: an additional function of the exon junction complex. Genes Dev. 18, 210–222 17 Valadkhan, S. (2005) Young Scientist Award assay winner. Construction of a minimal, protein free spliceosome. Science 307, 863–864 18 Yeo, G.W. et al. (2005) Identification and analysis of alternative splicing events conserved in human and mouse. Proc. Natl. Acad. Sci. U. S. A. 102, 2850–2855 19 Ponta, H. et al. (2003) CD44: from adhesion molecules to signalling regulators. Nat. Rev. Mol. Cell Biol. 4, 33–45 20 Kim, H. et al. (2004) Estimating rates of alternative splicing in mammals and invertebrates. Nat. Genet. 36, 915–916 0968-0004/$ - see front matter Q 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.tibs.2005.04.002
The ribosome revealed Peter B. Moore1,2 and Thomas A. Steitz1,2,3 1
Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520-8114, USA Department of Chemistry, Yale University, New Haven, CT 06520-8117, USA 3 Howard Hughes Medical Institute, New Haven, CT 06520-8114, USA 2
Fifty years is a long time. In 1955 – the year IUB (later renamed IUBMB) was established – it was known that protein synthesis is associated with ribonucleoprotein particles, which are abundant in cytoplasm, and that the small RNAs – now called tRNAs – are also involved, but no one knew what a ‘ribosome’ was because the word had not yet been coined [1]. Macromolecular crystallography was also in a primitive state. In the previous year, Perutz and coworkers had shown that phases for protein diffraction patterns can be determined by heavy atom isomorphous replacement [2], but more than a decade was to elapse before the promise of this discovery was first fully realized and an atomic-resolution structure of a protein obtained. Thus, if you had told someone in 1955 that the 3D structure of the ribosome would be determined at atomic resolution by the end of the century, you would have gotten a puzzled look. Nevertheless, in 2000, atomic-resolution structures were published for both ribosomal subunits [3–5] (Figure 1). This success depended on the advances in ribosome biochemistry and macromolecular crystallography that had been made in the intervening decades. By the mid-1960s, molecular biologists knew how to prepare active ribosomes from many organisms. Evidence was accumulating that the populations of ribosomes obtained were homogeneous to a first approximation and, thus, ribosomes might be crystallizable. However, for years there was little interest in ribosome crystallization; the first potentially useful crystals were not reported until 1980 [6], and only in the late 1980s were Corresponding author: Steitz, T.A. (
[email protected]). Available online 30 April 2005 www.sciencedirect.com
ribosome crystals obtained that diffracted to moderately high resolution (see, for example, Ref. [7]). It would have made little difference if high-quality ribosome crystals had been produced in 1965 or 1970,
Figure 1. A space-filling model of the 70S ribosome generated using the structures of the Haloarcula marismortui large subunit [3] and the Thermus thermophilus small subunit [4] docked by superposition on the rRNA model for the T. thermophilus 70S ribosomes [20] in addition to the A-, P- and E-site tRNAs from that model. 23S rRNA and 5S rRNA are purple and white; the 16S rRNA is yellow. Ribosomal proteins of the large subunit are blue and those of the small subunit are green. The A-site tRNA with its 3 0 end extending into the peptidyl-transferase cavity is red and the P-site tRNA is yellow. Reproduced, with permission, from Ref. [21].